TL;DR: I don’t like the way vectors are presented in calculus-based and algebra-based introductory physics. I think a more formal approach is warranted. This post addresses the problem of taking the magnitude of both sides of simple vector equations. If you want the details, read on.
This is the first post in a new series in which I will present a more formal approach to vectors in introductory physics. It will not have the same flavor as my recently begun series on angular quantities; that series serves a rather different purpose. However, there may be some slight overlap between the two series if it is appropriate.
I am also using this series to commit to paper (screen, really) some thoughts and ideas I have had for some time with the hope of turning them into papers for submission to The Physics Teacher. I’d appreciate any feedback on how useful this may be to the community.
To begin with, I want to address issues in the algebraic manipulation of vectors with an emphasis on coordinate-free methods. I feel that in current introductory physics courses, vectors are not exploited to their full potential. Instead of learning coordinate-free methods, students almost always learn to manipulate coordinate representations of vectors in orthonormal, Cartesian coordinate systems and I think that is unfortunate because it doesn’t always convey the physics in a powerful way. Physics is independent of one’s choice of coordinate system, and I think students should learn to manipulate vectors in a similar way.
Let’s begin by looking at a presumably simple vector equation:
The object is to solve for given . Don’t be fooled; it’s more difficult than it looks. In my experience, students invariably try to divide both sides by but of course this won’t work because vector division isn’t defined in Gibbsian vector analysis. Don’t let students get away with this if they try it! The reasons for not defining vector division will be the topic of a future post.
(UPDATE: Mathematics colleague Drew Lewis asked about solving this equation by combining like terms and factoring, leading to and then to . This is a perfectly valid way of solving the equation and it completely avoids the “division by a vector” issue. I want to call attention to that issue though, because when I show students this problem, they always (at least in my experience) try to solve it by dividing. Also, in future posts I will demonstrate how to solve other kinds of vector equations that must be solved by manipulating both dot products and cross products, each of which carries different geometric information, and I want to get students used to seeing and manipulating dot products. Thanks for asking Drew!)
One could simply say to “take the absolute value of both sides” like this:
but this is problematic for two reasons. First, it destroys the sign on the righthand side. Second, a vector doesn’t have an absolute value because it’s not a number. Vectors have magnitude, not absolute value, which is an entirely different concept from that of absolute value and warrants separate consideration and a separate symbol.
We need to do something to each side to turn it into a scalar because we can divide by a scalar. Let’s try taking the dot product of both sides with the same vector, , and proceed as follows:
This is a better way to proceed. It’s formal, and indeed even pedantic, but I dare say it’s the best way to go if one wants to include dot products. Of course in this simple example, one can see the solution by inspection, but my goals here are to get students to stop thinking about the concept of dividing by a vector and to manipulate vectors algebraically without referring to a coordinate system.
Let’s now look at another example with a different vector on each side of the equation.
Once again the object is to solve for given and . Note that solving for either or is obviously trivial so I won’t address it; it’s simply a matter of scalar division. Solving for is more challenging because we must again suppress the urge to divide by a vector. I will show two possible solutions. Make sure you understand what’s being done in each step.
We get two solutions, and they are geometrically opposite each other; that’s the physical implication of the signs. (I suppose we could argue over whether or not to just take the principal square root, but I don’t think we should do that here because it would throw away potentially useful geometric information.) We can find a cleaner solution that accounts for this. Consider the following solution which exploits the concepts of “factoring” a vector into a magnitude and a direction and the properties of the dot product.
See the geometry? It’s in the factor . If and are parallel, this factor is and if they are antiparallel it is . Convince yourself that those are the only two options in this case. (HINT: Show that each vector’s direction is a scalar multiple of the other vector’s direction.) This solution won’t work if the two vectors aren’t collinear. If we’re solving for then both vectors are assumed given and we know their relative geometry.
Let’s look at another example from first semester mechanics, Newton’s law of gravitation,
where and should be read as “the position of 1 relative to 2.” Let’s “take the magnitude of both sides” by first writing in terms of its magnitude and direction, dotting each side with a vector, and dividing both sides by the resulting common factor.
Okay, this isn’t an Earth-shattering result becuase we knew in advance it has to be the answer, but my point is how we formally went about getting this answer. More specifically, the point is how we went about it without dividing by a vector.
Let’s now consider a final example from introductory electromagnetic theory, and this was the example that got me thinking about this entire process of “taking the magnitude of both sides” about a year ago. It’s the expression for the electric force experienced by a charged particle in the presence of an electric field (obviously not its own electric field).
That one vector is a scalar multiple of another means the two must be collinear, so they must either be parallel or antiparallel. An issue here is that is a signed quantity. Again, we have a choice about which vector with which to dot both sides; we could use or we could use . If we use the former, we will eventually need to take the square root of the square of a signed quantity, which may lead us astray. Therefore, I suggest using the latter.
This may look overly complicated, but it’s quite logical, and it reflects goemetry. If is negative, then the dot product will also be negative and the entire quantity will be positive. If is positive, then the dot product will also be positive and again the entire quantity will be positive. Geometry rescues us again, as it should in physics. We can also rearrange this expression to solve for either or with the sign of properly accounted for by the dot product. By the way, and can’t be orthogonal becuase then their dot product would vanish and the above expression would blow up. Geometry and symmetry, particularly the latter, preclude this from happening.
In summary, “taking the magnitude of both sides” of a simple vector equation presents some challenges that are mitigated by exploiting geometry, something that is neglected in introductory calculus-based and algebra-based physics courses. I suggest we try to overcome this by showing students how to formally manipulate such equations. One advantage of doing this is students will see how vector algebra works in more detail than usual. Another advantage is that students will learn to exploit geometry in the absence of coordinate systems, which is one of the original purposes of using vectors after all.
Do you think this would make a good paper for The Physics Teacher? Feedback welcome!
I’m writing this a whole week late due, in part, to having been away at an AAPT meeting and having to plan and execute a large regional meeting of amateur astronomers.
This week was all about the concept of electric potential and how it relates to electric field. I love telling students that this topic is “potentially confusing” becuase the word “potential” comes up in two different contexts. The first is in the context of potential energy. Potential energy, which I try very hard to call interaction energy, is a property of a system, not of an individual entity. There must be at least two interacting entities to correctly speak of interaction energy. Following Hecht [reference needed], I like to think of energy, and thus interaction energy, as a way of describing change in a system using scalars rather than vectors. Conservative forces, like gravitational and electric forces, can be described with scalar energies and fortunately, these forces play a central role in introductory physics. The second context is that of electric potential, a new quantity that is the quotient of a change in electric potential energy and the amount of charge that gets moved around as a result of an interaction. The distinction between the two contexts is subtle but very important.
Oh and speaking of potential or interacting energy, Matter & Interactions is the only textbook I know of that correctly shows the origins of The World’s Most Annoying Negative Sign (TWMANS) and how it relates to potential energy. When you write the total change in your system’s energy, you can attribute it to work done by internal forces and work done by external forces. When you rearrange this expression to put all the internal terms on the lefthand side and all the external terms on the righthand side, you pick a negative sign that goes on to become TWMANS. This term with the negative sign, which is nothing more than the oppositve of the work done by forces internal to the system, is DEFINED to be the change in potential energy for the system. It’s just that simple, but this little negative sign caused me so much grief in both undergrad and graduate courses. Some authors explicitly included it, other didn’t, and instead flipped the integration limits on integrals to account for it. Chabay and Sherwood include it explicitly and consistently and there should be no trouble in knowing when and where it’s needed.
There is also some interesting mathematics in this chapter. Line integrals and gradients are everywhere and we see they are intimately related. In fact, they are inverses of each other. I want to talk about one mathematical issue in particular, though, and that is within the context of the following problem statement:
Given a region of space where there is a uniform electric field and a potential difference between two points separated by displacement , calculate the magnitude of the electric field .
This problem amounts to “unwrapping” a dot product (in this case ), something the textbooks, to my knowledge, never demonstrate how to do. My experience is that student inevitably treat the dot product as scalar multiplication and attempt to divide by and of course dividing by a vector isn’t defined in Gibbsian vector analysis. I think the only permanent cure for this problem is to take a more formal approach to introducing vectors and dot products earlier in the course but I tend to think I’m in the minority on that, and I don’t really care. The problem needs to be addressed one way or the other. Solving either a dot product or a cross product for an unknown vector requires knowledge of two quantites (the unknown’s dot product with a known vector and the unknown’s cross product with a known vector OR the unknown’s divergence and curl) as constraints on the solution. Fortunately, at this point in the course we’re dealing with static electric fields, which have no curl () or equivalently (I think) is collinear with (differeing in signs because gradient points in the direction of increasing potential (I don’t like saying that for some reason…) and electric field points in the direction of decreasing potential) so we can find something about from just a dot product alone. So, students need to solve for . Here’s the beginning of the solution. The first trick is to express the righthand side in terms of scalars.
We have a slight problem, and that is the lefthand side is a vector magnitude and thus is always positive. We must ensure that the righthand side is always positive. I see two ways to do this. If and are parallel () then must represent a negative number and TWMANS will ensure that we get a positive value for the righthand side, and thus also for the lefthand side. If and are antiparallel () then must represent a positive number and TWMANS, along with the trig function, will ensure that we get a positive value for the righthand side, and again also for the lefthand side. I want to install this kind of deep, geometric reasoning in my students but I’m finding that it’s rather difficult. Their approach is to simply take the absolute value of the righthand side.
It works numerically of course, but bypasses the physics in my opinion. There’s one more thing I want students to see here, and that is the connection to the concept of gradient. Somehow, they need to see
and I think this can be done if we think about the role of the trig function here, which tells us how much of is parallel to , and remembering that the component label is really just an arbitrary label for a particular direction. We could just as well use , , or any other label. We must be careful about signs here too, because the sign of must be consistent with the geometry relative to the displacement.
As an aside, it kinda irks me that position vectors seem to be the only vectors for which we label the components with coordinates. I don’t know why that bothers me so much, but it does. Seems to me we should use rather than just and there’s probably a deep reason for this, but I’ve yet to stumble onto it. Perhaps it’s just as simple as noticing that a position’s components are coordinates. Is it that simple?
As always, feedback is welcome.
As usual, I’m posting this the Monday after the week named in the title.
This week was all about chapter 6: energy and the energy principle. This is where Matter & Interactions really shines among introductory textbooks. I remember as a student being so confused by sign conventions that I honestly never knew when to include them or why they were even needed. The systems approach of M&I eliminates this whole bag of problems and why many educated faculty can’t (or won’t) see this huge advantage I’ll never understand. But, that’s not my problem.
The biggest revelation in chapter 6 is the origin of the concept of potential energy. It’s astonishingly simple, despite the fake complexity of traditional approaches. You define a system (this is the most important step). You identify interactions both internal to, and external to, that system. The work done by the internal interactions is defined (that’s the key word) as the opposite of a new quantity called the potential energy of the system. I’ve never like that term, though, because it’s quite vague. Stored energy? Energy that could potentially do work? Capacity to do work? Ack! All of these are bad in my opinion. I’ve seen it called interaction energy (my personal favorite that I try to promote) and configuration energy. I think either of these would be far better, but again, it’s up to individual instructors (keeping in mind that no one needs the community’s permission to introduce clearer terminology, as I’ve been told on more than one occasion…usually by grad students with no real teaching experience).
There’s no new physics in this name game, but it offers an extremely useful organizational structure: everything on the left hand side of the energy principle is internal to the system and everything on the right hand side is external. Changes crossing the system boundary from outside to inside carry a positive sign and changes crossing the system boundary from inside to outside carry a negative sign. I just don’t see how it could be simpler. Sure, these are all just conventions, but conventions should be used to make things simpler, not harder.
I tried to emphasize these points in class, but it’s so hard to tell how much sunk in. One student admitted to me that he’d not even begun his assessment portfolio yet despite having had weeks to work on it. Sigh. I just don’t know how I’m supposed to help students who flat out refuse to engage in their education and I don’t think I can be held “accountable” (note the quotes) in any professional way for the outcomes these students inevitably face.
To illustrate the simplicity of the energy principle approach, we did the typical (but interesting) case of an asteroid falling into Earth from “at rest very far away” and estimating its speed as it hits the top of our atmosphere. Accounting for other planets in the way is just a matter of adding an appropriate interaction energy term. This, along with other simplifying assumptions, makes the problem more interesting I think.
Feedback is welcome.
In section section 27-3 of The Feynman Lectures on Physics, Feynman describes a notation for manipulating vector expressions in a way that endows nabla with the property of following a rule similar to the product rule with which our introductory calculus students are familiar. It allows a vector expression with more than one variable to be expanded as though nabla operates on one variable while the other is held constant. The vector being differentiated is indicated with a subscript on nabla. Feynman’s equation 27.10 shows how this is written, and it rather like treating the subscripted nablas as partial derivative operators. Feynman’s equation 27.11 shows the resulting vector identity for the divergence of a cross product. In between these two equations. Feynman explains that the subscripted nabla can be manipulated as though it were a vector (it is not) according to the rules of dot products (commutative), cross products (anticommutative), triple scalar products (cyclic permutation, swapping dots and crosses, etc.), and triple vector products (BAC-CAB, Jacobi identity, etc.) The strategy is to end up with only one vector (the one corresponding to a subscript) immediately to the right of each correspondingly subscripted nabla. Then you drop the subscripts, and you should have a valid vector identity. In the audio version of this lecture, Feynman comments that he doesn’t understand why this technique isn’t taught. It was never shown to me as either an undergraduate or graduate student. I suspect it’s treated as “one of those things” students are simply assumed to pick up at one point or another without it ever being explicitly addressed (much like critical thinking is treated).
The issue here, for me, is whether or not Feynman invented this way of manipulating vector expressions. After all, the notation carries his name so it might be reasonable to assume he invented the underlying method. My research shows that a very similar methodology is documented in the very first (as far as I know) textbook on vector analysis, Wilson’s Vector Analysis: A Text-Book for the use of Students of Mathematics and Physics. This is the famous work based on Gibbs’ lecture notes and is the definitive work on contemporary vector analysis. I continue to be surprised at how few people have consulted it (based on my asking whether or not they have). I offer the PDF version to my physics students in the hopes they will use it in their studies. Chapter 3 is on the differential calculus of vectors and section 74 on page 159 begins a presentation of using nabla as a “partial” operator in an expression, operating on only one vector while holding another constant. Wilson introduces a subscript notation that, unlike Feynman’s, indicates which vector is held constant for a differentiation.
This brings to my mind the question of whether or not Feynman was aware of Wilson’s textbook and this method documented therein and decided to change the nature of the subscript to show what is differentiated rather than what is not. I don’t see how there is any way to know for sure, but it’s an interesting question in my mind because I suspect many students are not aware of Wilson’s textbook.
Wilson shows many worked examples on subsequent pages. Section 75 on page 161 shows more examples and consequences of this technique leading to a statement on page 162 that blows my mind! In the paragraph immediately surrounding equation (47) we see the following:
If u be a unit vector, say a, the formula (referring to equation 47) expresses the fact that the directional derivative (expression omitted) of a vector function v in the direction a is equal to the derivative of the projection of the vector v in that direction plus the vector product of the curl of v into the direction a.
Wow! This mean that applying nabla as a partial operator leads to something of geometrical significance, which to me constitutes a new identity itself. The lefthand side of Wilson’s equation (47) can be interpreted as the dot product of vector a and the gradient of vector v (a second rank tensor). My last post asks how the righthand side follows geometrically from that, something I’ve never seen in the literature.
Tai’s recent book on vector and dyadic analysis presents what he calls the “method of symbolic vector” which seems to formalize Wilson’s both Feynman’s methods. The idea is that nabla is temporarily treated as a vector (with a new symbol) and any expression in which is appears can be symbolically manipulated according to all the rules of vector analysis to end up with a valid identity when nabla is once again treated as a differential operator (and restored to its rightful symbol). Tai definitely knew about Wilson’s text as he references it frequently and devotes a considerable number of pages to commentary on Gibbs’ choices of notation (e.g. Gibbs’ use of a dot product as a symbol for divergence despite divergence not being defined as a dot product at all, and similarly his use of a cross product as a symbol for curl despite curl not being defined as a cross product), etc. Tai refers to Feynman only once, at the bottom of page 147 and continuing onto page 148, but the reference is vague.
Regardless of who initially invented the use of nabla as a partial operator, I feel we need to expose students to this as early as possible as part of a stronger foundation in classical vector analysis than they currently get in the introductory courses.
Over the past three years or so, I have been researching the history and implementation of Gibbsian vector analysis with the intent of finding ways to incorporate it more thoroughly and more meaningfully into introductory calculus-based physics (possibly algebra/trig-based physics too). Understanding the usual list of vector identities has been part of this research. One vector identity that has frustrated me involves probably the most innocent looking quantity, the gradient of the dot product of two vectors. I have seen no fewer than five different expressions for the expansion of this seemingly harmless quantity. Here they are.
Now, equation (1) uses Feynman notation, which endows the nabla operator with the property of obeying the Leibniz rule, or the product rule, for derivatives. The subscript refers to the vector on which the nabla operator is operating while the other vector is treated as a constant. Note that in chapter 3 of Wilson’s text based on Gibbs’ lecture notes, the subscript denotes which vector is to be held constant, precisely the opposite of the way Feynman presents it. Equation (1) is merely an alternative way of writing the lefthand side and offers nothing new algebraically.
Equation (2) shows nabla operating on each vector in the dot product, which is something many students never see. Like I was told years ago, they are told that one can only take the gradient of a scalar and not a vector, which is patently false. The twist is that, unlike the gradient of a scalar, the gradient of a vector is not a vector; it is a second rank tensor which can be represented by a matrix. This tensor, and its matrix representation, is also called the Jacobian. The dot product of this tensor with a vector gives a vector, so equation (2) is consistent with the fact that the lefthand side must be a vector. I can derive this expression using index notation.
Equation (3) is equation (2) written in (a very slight variation of) matrix notation (the vectors are written as vectors and not as column matrices). I don’t think there is anything more to it.
Equation (4) is the traditional expansion of the lefthand side. It is derived from the BAC-CAB rule, with suitable rearrangements to make sure nabla operates on one vector in each term. Two such applications give equation (4). The “reverse divergence” operators are actually directional derivatives operating on the vectors immediately to the right of each operator. I can derive this expression using index notation.
Equation (5) is shown in problem 1.8.12 on page 48 of Arfken (6th edition). It has the advantage of using the divergences of the two vectors, which I think are easier to understand than the “reverse divergence” operators in equation (4). However, the “reverse curl” operators are completely new to me and I have never seen them in the literature anywhere other than in this problem in Arfken. I think this equation can be derived from equation (4) by appropriately manipulating the various dot and cross products. I have not yet attempted to derive this expression with index notation.
Now, many questions come to mind. I have arranged the first and second terms on the righthand sides of equations (4) and (5) to correspond to the first term on the righthand sides of equations (1), (2), and (3). Similarly, the third and fourth terms on the righthand sides of (4) and (5) correspond to the second term on the righthand sides of equations (1), (2), and (3). By comparison, this must mean that somehow from the gradient (Jacobian) of a vector come both a dot product and a triple cross product. How can this be?
How can the gradient (Jacobian) of a vector be decomposed into a dot product and a triple cross product?
I think I can partly see where the dot product comes from, and it’s basically the notion of a directional derivative. The triple cross products are a complete mystery to me. Is there a geometrical reason for their presence? Would expressing all this in the language of differential forms help? Equations (4) and (5) also seem to imply that the triple cross products are associative, which they generally are not. I think I can justify the steps to get from (4) to (5), so if anyone can help me understand geometrically how the Jacobian can be decomposed into a dot product (directional derivative) and the cross product of a vector and the curl of the other vector, I’d be very grateful.