Vector Formalism in Introductory Physics I: Taking the Magnitude of Both Sides

TL;DR: I don’t like the way vectors are presented in calculus-based and algebra-based introductory physics. I think a more formal approach is warranted. This post addresses the problem of taking the magnitude of both sides of simple vector equations. If you want the details, read on.

This is the first post in a new series in which I will present a more formal approach to vectors in introductory physics. It will not have the same flavor as my recently begun series on angular quantities; that series serves a rather different purpose. However, there may be some slight overlap between the two series if it is appropriate.

I am also using this series to commit to paper (screen, really) some thoughts and ideas I have had for some time with the hope of turning them into papers for submission to The Physics Teacher. I’d appreciate any feedback on how useful this may be to the community.

To begin with, I want to address issues in the algebraic manipulation of vectors with an emphasis on coordinate-free methods. I feel that in current introductory physics courses, vectors are not exploited to their full potential. Instead of learning coordinate-free methods, students almost always learn to manipulate coordinate representations of vectors in orthonormal, Cartesian coordinate systems and I think that is unfortunate because it doesn’t always convey the physics in a powerful way. Physics is independent of one’s choice of coordinate system, and I think students should learn to manipulate vectors in a similar way. 

Let’s begin by looking at a presumably simple vector equation:

a\mathbf{A} = -5\mathbf{A}

The object is to solve for a given \mathbf{A}. Don’t be fooled; it’s more difficult than it looks. In my experience, students invariably try to divide both sides by \mathbf{A} but of course this won’t work because vector division isn’t defined in Gibbsian vector analysis. Don’t let students get away with this if they try it! The reasons for not defining vector division will be the topic of a future post.

(UPDATE: Mathematics colleague Drew Lewis asked about solving this equation by combining like terms and factoring, leading to (a+5)=0 and then to a = -5. This is a perfectly valid way of solving the equation and it completely avoids the “division by a vector” issue. I want to call attention to that issue though, because when I show students this problem, they always (at least in my experience) try to solve it by dividing. Also, in future posts I will demonstrate how to solve other kinds of vector equations that must be solved by manipulating both dot products and cross products, each of which carries different geometric information, and I want to get students used to seeing and manipulating dot products. Thanks for asking Drew!)

One could simply say to “take the absolute value of both sides” like this:

\left| a\mathbf{A} \right| = \left| -5\mathbf{A}\right|

but this is problematic for two reasons. First, it destroys the sign on the righthand side. Second, a vector doesn’t have an absolute value because it’s not a number. Vectors have magnitude, not absolute value, which is an entirely different concept from that of absolute value and warrants separate consideration and a separate symbol.

We need to do something to each side to turn it into a scalar because we can divide by a scalar. Let’s try taking the dot product of both sides with the same vector, \mathbf{A}, and proceed as follows:

\begin{aligned} a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{A}\bullet\mathbf{A} && \text{dot both sides with the same vector} \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{A}\rVert^2 && \text{dot products become scalars} \\ \therefore a &= -5 && \text{solve} \end{aligned}

This is a better way to proceed. It’s formal, and indeed even pedantic, but I dare say it’s the best way to go if one wants to include dot products. Of course in this simple example, one can see the solution by inspection, but my goals here are to get students to stop thinking about the concept of dividing by a vector and to manipulate vectors algebraically without referring to a coordinate system.

Let’s now look at another example with a different vector on each side of the equation.

 a\mathbf{A} = -5\mathbf{B}

Once again the object is to solve for a given \mathbf{A} and \mathbf{B}. Note that solving for either \mathbf{A} or \mathbf{B} is obviously trivial so I won’t address it; it’s simply a matter of scalar division. Solving for a is more challenging because we must again suppress the urge to divide by a vector. I will show two possible solutions. Make sure you understand what’s being done in each step.

\begin{aligned} a\mathbf{A} &= -5\mathbf{B} && \text{given equation} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\mathbf{A} && \text{dot both sides with the same vector} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\left(\dfrac{-5}{\hphantom{-}a}\mathbf{B}\right) && \text{substitute from original equality} \\ a\lVert\mathbf{A}\rVert^2 &= \dfrac{25}{a}\lVert\mathbf{B}\rVert^2 && \text{dot products become scalars} \\ a^2 &= 25\dfrac{\lVert\mathbf{B}\rVert^2}{\lVert\mathbf{A}\rVert^2} && \text{rearrange} \\ \therefore a &= \pm 5\dfrac{\lVert\mathbf{B}\rVert}{\lVert\mathbf{A}\rVert} && \text{solve} \end{aligned}

We get two solutions, and they are geometrically opposite each other; that’s the physical implication of the signs. (I suppose we could argue over whether or not to just take the principal square root, but I don’t think we should do that here because it would throw away potentially useful geometric information.) We can find a cleaner solution that accounts for this. Consider the following solution which exploits the concepts of “factoring” a vector into a magnitude and a direction and the properties of the dot product.

\begin{aligned} a\mathbf{A} &= -5\mathbf{B} && \text{given equation} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\mathbf{A} && \text{dot both sides with \textbf{A}} \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{B}\rVert\widehat{\mathbf{B}}\bullet\lVert\mathbf{A}\rVert\widehat{\mathbf{A}} && \text{factor each vector into magnitude and direction}  \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{B}\rVert\lVert\mathbf{A}\rVert\,\widehat{\mathbf{B}}\bullet\widehat{\mathbf{A}} && \text{push magnitude through the dot product} \\ \therefore a &= -5\dfrac{\lVert\mathbf{B}\rVert}{\lVert\mathbf{A}\rVert}\,\widehat{\mathbf{B}}\bullet\widehat{\mathbf{A}} && \text{solve} \end{aligned}

See the geometry? It’s in the factor \widehat{\mathbf B}\bullet\widehat{\mathbf A}. If \mathbf{A} and \mathbf{B} are parallel, this factor is +1 and if they are antiparallel it is -1. Convince yourself that those are the only two options in this case. (HINT: Show that each vector’s direction is a scalar multiple of the other vector’s direction.) This solution won’t work if the two vectors aren’t collinear. If we’re solving for a then both vectors are assumed given and we know their relative geometry.

Let’s look at another example from first semester mechanics, Newton’s law of gravitation,

\mathbf{F} = G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left( -\widehat{\mathbf r}_{12}\right)

where \mathbf{r}_{12} = \mathbf{r}_1 - \mathbf{r}_2 and should be read as “the position of 1 relative to 2.” Let’s “take the magnitude of both sides” by first writing \mathbf{F} in terms of its magnitude and direction, dotting each side with a vector, and dividing both sides by the resulting common factor.

\begin{aligned} \lVert\mathbf{F}\rVert\left(-\widehat{\mathbf{r}}_{12}\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left( -\widehat{\mathbf{r}}_{12}\right) && \text{given equation} \\ \lVert\mathbf{F}\rVert\left(-\widehat{\mathbf{r}}_{12}\bullet\widehat{\mathbf{r}}_{12}\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left(-\widehat{\mathbf{r}}_{12}\bullet\widehat{\mathbf{r}}_{12}\right) && \text{dot both sides with the same vector} \\ \lVert\mathbf{F}\rVert\left(-\lVert\widehat{\mathbf{r}}_{12}\rVert^2\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left(-\lVert\widehat{\mathbf{r}}_{12}\rVert^2 \right) && \text{dot products become scalars} \\ \therefore \lVert\mathbf{F}\rVert &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2} && \text{divide both sides by the same scalar} \end{aligned}

Okay, this isn’t an Earth-shattering result becuase we knew in advance it has to be the answer, but my point is how we formally went about getting this answer. More specifically, the point is how we went about it without dividing by a vector.

Let’s now consider a final example from introductory electromagnetic theory, and this was the example that got me thinking about this entire process of “taking the magnitude of both sides” about a year ago. It’s the expression for the electric force experienced by a charged particle in the presence of an electric field (obviously not its own electric field).

\mathbf{F} = q\mathbf{E}

That one vector is a scalar multiple of another means the two must be collinear, so they must either be parallel or antiparallel. An issue here is that q is a signed quantity. Again, we have a choice about which vector with which to dot both sides; we could use \mathbf{F} or we could use \mathbf{E}. If we use the former, we will eventually need to take the square root of the square of a signed quantity, which may lead us astray. Therefore, I suggest using the latter.

\begin{aligned} \mathbf{F} &= q\mathbf{E} && \text{given equation} \\ \mathbf{F}\bullet\mathbf{E} &= q\mathbf{E}\bullet\mathbf{E} && \text{dot both sides with the same vector} \\ \lVert\mathbf{F}\rVert\widehat{\mathbf{F}}\bullet\lVert\mathbf{E}\rVert\widehat{\mathbf{E}} &= q\lVert\mathbf{E}\rVert^2 && \text{factor LHS, simplify RHS} \\ \lVert\mathbf{F}\rVert\lVert\mathbf{E}\rVert\,\widehat{\mathbf{F}}\bullet\widehat{\mathbf{E}} &= q\lVert\mathbf{E}\rVert^2 && \text{push the magnitude through the dot product} \\ \therefore \lVert\mathbf{F}\rVert &= \dfrac{q}{\widehat{\mathbf{F}}\bullet\widehat{\mathbf{E}}}\lVert\mathbf{E}\rVert && \text{solve} \end{aligned}

This may look overly complicated, but it’s quite logical, and it reflects goemetry. If q is negative, then the dot product will also be negative and the entire quantity will be positive. If q is positive, then the dot product will also be positive and again the entire quantity will be positive. Geometry rescues us again, as it should in physics. We can also rearrange this expression to solve for either q or \lVert\mathbf{E}\rVert with the sign of q properly accounted for by the dot product. By the way, \widehat{\mathbf{F}} and \widehat{\mathbf{E}} can’t be orthogonal becuase then their dot product would vanish and the above expression would blow up. Geometry and symmetry, particularly the latter, preclude this from happening.

In summary, “taking the magnitude of both sides” of a simple vector equation presents some challenges that are mitigated by exploiting geometry, something that is neglected in introductory calculus-based and algebra-based physics courses. I suggest we try to overcome this by showing students how to formally manipulate such equations. One advantage of doing this is students will see how vector algebra works in more detail than usual. Another advantage is that students will learn to exploit geometry in the absence of coordinate systems, which is one of the original purposes of using vectors after all.

Do you think this would make a good paper for The Physics Teacher? Feedback welcome!


Conceptual Understanding in Introductory Physics XXVIII

You may not agree that the topic(s) of this question belong in an introductory calculus-based physics course, but I’m going to pretend they do for the duration of this post. Gradient, divergence, and curl are broached in Matter & Interactions within the context of electromagnetic fields. Actually, gradient appears in the mechanics portion of the course.

One problem with these three concepts, especially divergence and curl, is the distinction between their actual definitions and how they are calculuated. The former are rarely, if ever, seen at the introductory level and usually first appear in upper level courses. However, some authors [cite examples here] replace the physical definitions with the mathematical symbols invented by Heaviside and Gibbs to represent the calculation of these quantities. In other words, the divergence of \mathbf{A} is frequently defined as \nabla\cdot\mathbf{A} and the curl of \mathbf{A} is frequently defined as \nabla\times\mathbf{A}. These should be treated as nothing more than symbols representing their respective physical quantities and should not be taken as equations for calculation. If one insists on keeping this notation, then the dot and cross should at least be kept with the nabla symbol so that \nabla\cdot represents divergence and \nabla\times represents curl. Either way, these are operators that operate on vectors and their symbols should reflect that concept and should be interpreted as such and not as a recipe for calculation. This book by Tai was extremely helpful in getting this point across to me.

Gradient has its own unique problem in that some sources claim that one can only take the gradient of a scalar, which is patently false. One can indeed take the gradient of, for example, a gradient but the object one gets back is not a vector. If we adopt a unified approach to vector algebra and vector calculus we find that there are patterns associating the operand and the result when using these vector opators. For example, operating on a vector with \nabla doesn’t produce a vector; it produces a second rank tensor. This is one reason I would love to find a way to bring this approach into the introductory course. So many things would be unified.

But now, on to the questions I want to ask here

(a) Write a conceptual definition of gradient in words.

(b) Write a mathematical definition of gradient that does not depend on any particular coordinate system. You must not use the nabla symbol.

(c) Write a conceptual definition of divergence in words.

(d) Write a mathematical definition of divergence that does not depend on any particular coordinate system. You must not use the nabla symbol.

(e) Write a conceptual definition of curl in words.

(f) Write a mathematical definition of curl that does not depend on any particular coordinate system. You must not use the nabla symbol.

Go!

(Note: I need to revisit this post in the future to make sure the notion of applying gradient to a vector quantity can be handled in the coordinate free way I have in mind. My intuition is that it can be, but I need to work out some details. )


Angular Quantities I

This is the first in a series of posts in which I want to share some hopefully interesting things about mathematical descriptions of rotational motion. This series was inspired by a talk given at the 2015 winter AAPT meeting in San Diego. The author claimed to have found a way to represent angular displacement as a vector (true, such an expression exists and is not widely used) and that angular displacements commute (false, in general they do not except when infinitesimal). The same author presented an updated poster on this topic at the recent winter meeting in Atlanta. In researching the arguments presented in these two talks, following up on the references therein, and in searching the undergraduate and graduate physics and mathematics teaching literature on descriptions of angular quantities, I stumbled onto some of the most interesting topics I’ve ever encountered. As you may have already guessed, I want to find ways of bringing these gems of understanding into the introductory courses so students won’t be so mystified when then encounter the in upper level courses. By the way, the papers from these talks aren’t availble online; I only have paper copies and I do not have the author’s permission to distribute them.

I am sure most of this will be trivial for many readers, so apoligies in advance. Even though I too studied out of Goldstein in grad school, it was not the case that all my existing conceptual mysteries were solved. As always, I tend to frame things from the point of view of that introductory physics student for whom we want to provide an unparalleled physics experience. I don’t want that student to ever say, “Well that was never pointed out to me in intro physics.” I want that student’s conceptual foundation to be better than mine was when I was that student.

In this initial post, I will list as many of the questions I can think of that arose as I researched this topic. I will not answer any of them in this post, but will attempt to do that in subsequent posts. I will put the questions into some preliminary order, but I can’t guarantee that order won’t change later. Some questions may change to more accurately reflect what I’m trying to explain.

  1. What does it mean to be a vector?
  2. What do vector dot products and vector cross products mean geometrically?
  3. What is the physical significance of the double cross product (aka triple cross product)?
  4. Is there a coordinate free expression for the total time derivative of a vector?
  5. Is there a coordinate free expression for the time derivative of a unit vector (a direction)?
  6. Can angular velocity be described as a vector?
  7. Can angular displacement be described as a vector?
  8. If work is calculated as the dot product of two vectors, then when calculating rotational work how can angular displacement not be a vector?
  9. If angular velocity is a vector, shouldn’t its integral also be a vector and not a scalar?
  10. Why does translational displacement commute?
  11. How, if at all, are translation and rotation (revolution?) related?
  12. Why do infinitesimal angular displacements commute?
  13. Why do finite angular displacements not commute?
  14. What is the distinction between rotation and orientation?
  15. Is angular velocity the derivative of a rotation?
  16. So then what is angular velocity the derivative of anyway?
  17. Can angular velocity be integrated to get angular displacement?
  18. Can these ideas be brought into the introductory calculus-based or algebra-based physics courses?

I think that’s all, at least for now. I don’t claim this list to be comprehensive. The number of questions isn’t significant either. Let’s see where this goes.