Vector Formalism in Introductory Physics I: Taking the Magnitude of Both Sides

TL;DR: I don’t like the way vectors are presented in calculus-based and algebra-based introductory physics. I think a more formal approach is warranted. This post addresses the problem of taking the magnitude of both sides of simple vector equations. If you want the details, read on.

This is the first post in a new series in which I will present a more formal approach to vectors in introductory physics. It will not have the same flavor as my recently begun series on angular quantities; that series serves a rather different purpose. However, there may be some slight overlap between the two series if it is appropriate.

I am also using this series to commit to paper (screen, really) some thoughts and ideas I have had for some time with the hope of turning them into papers for submission to The Physics Teacher. I’d appreciate any feedback on how useful this may be to the community.

To begin with, I want to address issues in the algebraic manipulation of vectors with an emphasis on coordinate-free methods. I feel that in current introductory physics courses, vectors are not exploited to their full potential. Instead of learning coordinate-free methods, students almost always learn to manipulate coordinate representations of vectors in orthonormal, Cartesian coordinate systems and I think that is unfortunate because it doesn’t always convey the physics in a powerful way. Physics is independent of one’s choice of coordinate system, and I think students should learn to manipulate vectors in a similar way. 

Let’s begin by looking at a presumably simple vector equation:

a\mathbf{A} = -5\mathbf{A}

The object is to solve for a given \mathbf{A}. Don’t be fooled; it’s more difficult than it looks. In my experience, students invariably try to divide both sides by \mathbf{A} but of course this won’t work because vector division isn’t defined in Gibbsian vector analysis. Don’t let students get away with this if they try it! The reasons for not defining vector division will be the topic of a future post.

(UPDATE: Mathematics colleague Drew Lewis asked about solving this equation by combining like terms and factoring, leading to (a+5)=0 and then to a = -5. This is a perfectly valid way of solving the equation and it completely avoids the “division by a vector” issue. I want to call attention to that issue though, because when I show students this problem, they always (at least in my experience) try to solve it by dividing. Also, in future posts I will demonstrate how to solve other kinds of vector equations that must be solved by manipulating both dot products and cross products, each of which carries different geometric information, and I want to get students used to seeing and manipulating dot products. Thanks for asking Drew!)

One could simply say to “take the absolute value of both sides” like this:

\left| a\mathbf{A} \right| = \left| -5\mathbf{A}\right|

but this is problematic for two reasons. First, it destroys the sign on the righthand side. Second, a vector doesn’t have an absolute value because it’s not a number. Vectors have magnitude, not absolute value, which is an entirely different concept from that of absolute value and warrants separate consideration and a separate symbol.

We need to do something to each side to turn it into a scalar because we can divide by a scalar. Let’s try taking the dot product of both sides with the same vector, \mathbf{A}, and proceed as follows:

\begin{aligned} a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{A}\bullet\mathbf{A} && \text{dot both sides with the same vector} \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{A}\rVert^2 && \text{dot products become scalars} \\ \therefore a &= -5 && \text{solve} \end{aligned}

This is a better way to proceed. It’s formal, and indeed even pedantic, but I dare say it’s the best way to go if one wants to include dot products. Of course in this simple example, one can see the solution by inspection, but my goals here are to get students to stop thinking about the concept of dividing by a vector and to manipulate vectors algebraically without referring to a coordinate system.

Let’s now look at another example with a different vector on each side of the equation.

 a\mathbf{A} = -5\mathbf{B}

Once again the object is to solve for a given \mathbf{A} and \mathbf{B}. Note that solving for either \mathbf{A} or \mathbf{B} is obviously trivial so I won’t address it; it’s simply a matter of scalar division. Solving for a is more challenging because we must again suppress the urge to divide by a vector. I will show two possible solutions. Make sure you understand what’s being done in each step.

\begin{aligned} a\mathbf{A} &= -5\mathbf{B} && \text{given equation} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\mathbf{A} && \text{dot both sides with the same vector} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\left(\dfrac{-5}{\hphantom{-}a}\mathbf{B}\right) && \text{substitute from original equality} \\ a\lVert\mathbf{A}\rVert^2 &= \dfrac{25}{a}\lVert\mathbf{B}\rVert^2 && \text{dot products become scalars} \\ a^2 &= 25\dfrac{\lVert\mathbf{B}\rVert^2}{\lVert\mathbf{A}\rVert^2} && \text{rearrange} \\ \therefore a &= \pm 5\dfrac{\lVert\mathbf{B}\rVert}{\lVert\mathbf{A}\rVert} && \text{solve} \end{aligned}

We get two solutions, and they are geometrically opposite each other; that’s the physical implication of the signs. (I suppose we could argue over whether or not to just take the principal square root, but I don’t think we should do that here because it would throw away potentially useful geometric information.) We can find a cleaner solution that accounts for this. Consider the following solution which exploits the concepts of “factoring” a vector into a magnitude and a direction and the properties of the dot product.

\begin{aligned} a\mathbf{A} &= -5\mathbf{B} && \text{given equation} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\mathbf{A} && \text{dot both sides with \textbf{A}} \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{B}\rVert\widehat{\mathbf{B}}\bullet\lVert\mathbf{A}\rVert\widehat{\mathbf{A}} && \text{factor each vector into magnitude and direction}  \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{B}\rVert\lVert\mathbf{A}\rVert\,\widehat{\mathbf{B}}\bullet\widehat{\mathbf{A}} && \text{push magnitude through the dot product} \\ \therefore a &= -5\dfrac{\lVert\mathbf{B}\rVert}{\lVert\mathbf{A}\rVert}\,\widehat{\mathbf{B}}\bullet\widehat{\mathbf{A}} && \text{solve} \end{aligned}

See the geometry? It’s in the factor \widehat{\mathbf B}\bullet\widehat{\mathbf A}. If \mathbf{A} and \mathbf{B} are parallel, this factor is +1 and if they are antiparallel it is -1. Convince yourself that those are the only two options in this case. (HINT: Show that each vector’s direction is a scalar multiple of the other vector’s direction.) This solution won’t work if the two vectors aren’t collinear. If we’re solving for a then both vectors are assumed given and we know their relative geometry.

Let’s look at another example from first semester mechanics, Newton’s law of gravitation,

\mathbf{F} = G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left( -\widehat{\mathbf r}_{12}\right)

where \mathbf{r}_{12} = \mathbf{r}_1 - \mathbf{r}_2 and should be read as “the position of 1 relative to 2.” Let’s “take the magnitude of both sides” by first writing \mathbf{F} in terms of its magnitude and direction, dotting each side with a vector, and dividing both sides by the resulting common factor.

\begin{aligned} \lVert\mathbf{F}\rVert\left(-\widehat{\mathbf{r}}_{12}\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left( -\widehat{\mathbf{r}}_{12}\right) && \text{given equation} \\ \lVert\mathbf{F}\rVert\left(-\widehat{\mathbf{r}}_{12}\bullet\widehat{\mathbf{r}}_{12}\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left(-\widehat{\mathbf{r}}_{12}\bullet\widehat{\mathbf{r}}_{12}\right) && \text{dot both sides with the same vector} \\ \lVert\mathbf{F}\rVert\left(-\lVert\widehat{\mathbf{r}}_{12}\rVert^2\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left(-\lVert\widehat{\mathbf{r}}_{12}\rVert^2 \right) && \text{dot products become scalars} \\ \therefore \lVert\mathbf{F}\rVert &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2} && \text{divide both sides by the same scalar} \end{aligned}

Okay, this isn’t an Earth-shattering result becuase we knew in advance it has to be the answer, but my point is how we formally went about getting this answer. More specifically, the point is how we went about it without dividing by a vector.

Let’s now consider a final example from introductory electromagnetic theory, and this was the example that got me thinking about this entire process of “taking the magnitude of both sides” about a year ago. It’s the expression for the electric force experienced by a charged particle in the presence of an electric field (obviously not its own electric field).

\mathbf{F} = q\mathbf{E}

That one vector is a scalar multiple of another means the two must be collinear, so they must either be parallel or antiparallel. An issue here is that q is a signed quantity. Again, we have a choice about which vector with which to dot both sides; we could use \mathbf{F} or we could use \mathbf{E}. If we use the former, we will eventually need to take the square root of the square of a signed quantity, which may lead us astray. Therefore, I suggest using the latter.

\begin{aligned} \mathbf{F} &= q\mathbf{E} && \text{given equation} \\ \mathbf{F}\bullet\mathbf{E} &= q\mathbf{E}\bullet\mathbf{E} && \text{dot both sides with the same vector} \\ \lVert\mathbf{F}\rVert\widehat{\mathbf{F}}\bullet\lVert\mathbf{E}\rVert\widehat{\mathbf{E}} &= q\lVert\mathbf{E}\rVert^2 && \text{factor LHS, simplify RHS} \\ \lVert\mathbf{F}\rVert\lVert\mathbf{E}\rVert\,\widehat{\mathbf{F}}\bullet\widehat{\mathbf{E}} &= q\lVert\mathbf{E}\rVert^2 && \text{push the magnitude through the dot product} \\ \therefore \lVert\mathbf{F}\rVert &= \dfrac{q}{\widehat{\mathbf{F}}\bullet\widehat{\mathbf{E}}}\lVert\mathbf{E}\rVert && \text{solve} \end{aligned}

This may look overly complicated, but it’s quite logical, and it reflects goemetry. If q is negative, then the dot product will also be negative and the entire quantity will be positive. If q is positive, then the dot product will also be positive and again the entire quantity will be positive. Geometry rescues us again, as it should in physics. We can also rearrange this expression to solve for either q or \lVert\mathbf{E}\rVert with the sign of q properly accounted for by the dot product. By the way, \widehat{\mathbf{F}} and \widehat{\mathbf{E}} can’t be orthogonal becuase then their dot product would vanish and the above expression would blow up. Geometry and symmetry, particularly the latter, preclude this from happening.

In summary, “taking the magnitude of both sides” of a simple vector equation presents some challenges that are mitigated by exploiting geometry, something that is neglected in introductory calculus-based and algebra-based physics courses. I suggest we try to overcome this by showing students how to formally manipulate such equations. One advantage of doing this is students will see how vector algebra works in more detail than usual. Another advantage is that students will learn to exploit geometry in the absence of coordinate systems, which is one of the original purposes of using vectors after all.

Do you think this would make a good paper for The Physics Teacher? Feedback welcome!

Angular Quantities I

This is the first in a series of posts in which I want to share some hopefully interesting things about mathematical descriptions of rotational motion. This series was inspired by a talk given at the 2015 winter AAPT meeting in San Diego. The author claimed to have found a way to represent angular displacement as a vector (true, such an expression exists and is not widely used) and that angular displacements commute (false, in general they do not except when infinitesimal). The same author presented an updated poster on this topic at the recent winter meeting in Atlanta. In researching the arguments presented in these two talks, following up on the references therein, and in searching the undergraduate and graduate physics and mathematics teaching literature on descriptions of angular quantities, I stumbled onto some of the most interesting topics I’ve ever encountered. As you may have already guessed, I want to find ways of bringing these gems of understanding into the introductory courses so students won’t be so mystified when then encounter the in upper level courses. By the way, the papers from these talks aren’t availble online; I only have paper copies and I do not have the author’s permission to distribute them.

I am sure most of this will be trivial for many readers, so apoligies in advance. Even though I too studied out of Goldstein in grad school, it was not the case that all my existing conceptual mysteries were solved. As always, I tend to frame things from the point of view of that introductory physics student for whom we want to provide an unparalleled physics experience. I don’t want that student to ever say, “Well that was never pointed out to me in intro physics.” I want that student’s conceptual foundation to be better than mine was when I was that student.

In this initial post, I will list as many of the questions I can think of that arose as I researched this topic. I will not answer any of them in this post, but will attempt to do that in subsequent posts. I will put the questions into some preliminary order, but I can’t guarantee that order won’t change later. Some questions may change to more accurately reflect what I’m trying to explain.

  1. What does it mean to be a vector?
  2. What do vector dot products and vector cross products mean geometrically?
  3. What is the physical significance of the double cross product (aka triple cross product)?
  4. Is there a coordinate free expression for the total time derivative of a vector?
  5. Is there a coordinate free expression for the time derivative of a unit vector (a direction)?
  6. Can angular velocity be described as a vector?
  7. Can angular displacement be described as a vector?
  8. If work is calculated as the dot product of two vectors, then when calculating rotational work how can angular displacement not be a vector?
  9. If angular velocity is a vector, shouldn’t its integral also be a vector and not a scalar?
  10. Why does translational displacement commute?
  11. How, if at all, are translation and rotation (revolution?) related?
  12. Why do infinitesimal angular displacements commute?
  13. Why do finite angular displacements not commute?
  14. What is the distinction between rotation and orientation?
  15. Is angular velocity the derivative of a rotation?
  16. So then what is angular velocity the derivative of anyway?
  17. Can angular velocity be integrated to get angular displacement?
  18. Can these ideas be brought into the introductory calculus-based or algebra-based physics courses?

I think that’s all, at least for now. I don’t claim this list to be comprehensive. The number of questions isn’t significant either. Let’s see where this goes.


Matter & Interactions II, Week 5

This week was all about calculating electric fields for continuous charge distributions. This is usually students’ first exposure to what they think of as “calculus-based” physics because they are explicitly setting up and doing integrals. There’s lots going on behind the scenes though.

In calculus class, students are used to manipulating functions by taking their derivatives, indefinite integrals, and definite integrals. In physics, however, these ready made functions don’t exist. When we write dQ, there is no function Q() for which we calculate a differential. The symbol dQ represents a small quantity of charge, a “chunk” as I usually call it. That’s is. There’s nothing more. Similarly, dm represents a small “chunk” of mass rather than the differential of a function m(). The progress usually begins with uniform linear charge distributions and progresses to angular (i.e. linear charge distributions bent into arcs of varying extents), then area, then volume charge distributions (Are “area” and “volume” adjectives?). One cool thing is how each type of distribution can be constructed from a previous one. You can make a cylinder of charge out of lines of charge. You can make a loop of charge out of a line of charge. You can make a plane of charge out of lines of charge. You can make a sphere of charge out of loops of charge. Beautiful! Lots of ways to approach setting up the integral that sweeps through the charge distribution to get the net field.

It’s interesting to ponder the effect of changing the coordinate origin. Consider a charge rod. If rod’s left end is at the origin, the limits of integration are 0 and L (the rod’s length). If the rod’s center is at the origin, the limits of integration are -L/2 and +L/2. The integrand looks slightly different, but the resulting definite integral is the same in both cases! Trivial? No! It’s yet another indication that Nature doesn’t care about coordinate systems; they’re a human invention and subject to our desire for mathematical convenience. This is also a good time to recall even (f(-x) = f(x)) and odd (f(-x) = -f(x)) functions becuase then one can look at an integral and its limits and predict whether or not the integral must vanish and this connects with symmetry arguments from geometry. This, to me, is one of the very definitions of mathematical beauty. A given charge distribution’s electric field is independent of the coordinate system used to derive it. The forthcoming chapter on Gauss’s law and Ampère’s law relies on symmetries to predict electric and magnetic field structures for calculating flux and circulation and that’s foreshadowed in this chapter.

This is a lot to convey to students and from their point of view it’s a lot to understand. I hope I can do better at getting it all across to them than was done for me.

Feedback welcome as always.

Matter & Interactions I, Week 11

As (almost) usual, I’m writing this on the Monday after the week in question.

This week we hit chapter 5, which is packed full of interesting physics and mathematics! We encounter the infamous time derivative of a unit vector (aka a direction), which I have found quite mysterious because of the rather hand waving ways it’s treated in standard texts. M&I used to introduce a unit vector’s derivative as an angular velocity operating on the unit vector with a cross product, but that approach is now absent. It’s been replaced with a derivation relying on similar triangles, but the good news is that it’s relativistically valid so you get that with the package.

Let’s talk about a vector’s derivative. The traditional textbooks present a coordinate-based expression for a vector’s time derivative, namely the sum of the time derivatives of the vector’s components in some basis. However, this is essentially useless for geometric purposes, and I’m trying very hard to incorporate goemetric reasoning into the course.

Let’s write a momentum vector “factored” into a magnitude and direction:

\vec{p} = \left\lVert\vec{p}\right\rVert \hat{p}

Now apply the usually product rule from first semester calculus:


The first term represents the change in \vec{p} due to a change in its magnitude, and note that it is in the same direction as the original vector. It’s parallel to \vec{p}. Recall from the momentum principle that another name for  \dfrac{\mathrm{d}\vec{p}}{\mathrm{d}t} is \vec{F}_{\mathrm{net}} so the first term is the component of \vec{F}_{\mathrm{net}} that is parallel to \vec{p}. Geometrically, a force applied parallel to a momentum changes the momentum’s magnitude. That’s the first term.

The second term represents the change in \vec{p} due to a change in its direction, and I argue that it must be perpendicular to the original vector. (A simple proof by contradiction assuming uniform circular motion can be used to justify this. If there were a component of change parallel to \vec{p} then the momentum’s magnitude, which contradicts the assumption of uniform circular motion. Therefore, there can be no component of change parallel to the original vector, and the only other option, for uniform circular motion, is that any change must be perpendicular to the original vector.) Next, I argue that a vector’s changing direction can be thought of as a rotation around an axis. In class, we stood up, held an arm outstretched, and slowly turned in place to our left. Our arms swept out a plane around our bodies, and the bodies played the role of the axis around which our arms rotated. Turning to the left defines the “positive” direction of rotation, merely a convention. Rate at which the vector rotates defines the magnitude of a new quantity, angular velocity. We define the direction of the angular velocity to be given by the thumb of the right hand when its fingers wrap around the axis in the newly defined “positive” direction. This is weird! We’re defining a new vector quantity, angular velocity, in terms of its magnitude and direction separately. Now, how does the direction of the angular velocity and the direction of the original momentum vector relate to the direction of \dfrac{\mathrm{d}\hat{p}}{\mathrm{d}t}? We define it to be the direction of the right hand’s thumb when its fingers point in the direction of the angular velocity and the palm points in the direction of the original vector. This is all encoded in this expression for the second term:

\left\lVert\vec{p}\right\rVert\dfrac{\mathrm{d}\hat{p}}{\mathrm{d}t}=  \left\lVert\vec{p}\right\rVert\ \vec{\omega}\times\hat{p}

Factor \vec{\omega} into its magnitude and direction.

\left\lVert\vec{p}\right\rVert\dfrac{\mathrm{d}\hat{p}}{\mathrm{d}t}=  \left\lVert\vec{p}\right\rVert \left\lVert\vec{\omega}\right\rVert\hat{\omega}\times\hat{p}

where we’re taking the symbol \hat{\omega}\times\hat{p} to represent the hand machinations.

Now, rewrite \left\lVert\vec{\omega}\right\rVert as \dfrac{\left\lVert\vec{v}\right\rVert}{R} , an easily reasoned relation, and rewrite \vec{p} as Newton would have written it, m\left\lVert\vec{v}\right\rVert. The final result becomes

\left\lVert\vec{p}\right\rVert\dfrac{\mathrm{d}\hat{p}}{\mathrm{d}t}=m \dfrac{\left\lVert\vec{v}\right\rVert^2}{R}\hat{\omega}\times\hat{p}

which is one of the most important results in introductory physics. This is a simplified version of a more complicated derivation I have that relies on a brute force way of calculating the time derivative of a unit vector and employing a vector identity (BAC-CAB) to write the result as a triple cross product (really a double cross product because there are two operations, not three), and recognizing one of those cross products as the definition of angular velocity; it just quite literally falls right out from the basics and it’s so beautiful that I want to write it up for The Physics Teacher and intend to do so. Geometrically, a force applied perpendicularly to a momentum changes the momentum’s direction. That’s the second term.

About three years ago, it occured to me that we can think of \vec{\omega}\times as an operator with one slot that, when filled with a unit vector, returns its time derivative as the product of an angular velocity’s magnitude and a direction given by a cross product.

\dfrac{\mathrm{d}(\_)}{\mathrm{d}t} = \vec{\omega}\times (\_) = \left\lVert\vec{\omega}\right\rVert\hat{\omega}\times (\_)

\dfrac{\mathrm{d}\hat{a}}{\mathrm{d}t} =\left\lVert\vec{\omega}\right\rVert\hat{\omega}\times\hat{a}

(In preparation for things to come, I’m changing the way I think about vectors and treating them as “slotted machines” a la Misner, Thorne, and Wheeler. I have much more to say about this in future posts.)

Also in this chapter, students encounter the dot product for the first “official” time. However, I have already exposed them to it early in the course in a discussion of the concept of projecting vectors onto bases. The choice of basis is arbitrary, and in this chapter students see that one uses a momentum vector as a basis. In other words, students use the dot product to find components of force parallel and perpendicular to a given momentum. The dot product itself picks out the parallel component of one vector relative to another (scaled by a magnitude, etc.) and one way of getting the perpendicular component is by merely subtracting the parallel component from the original vector. However, there is a way to get the perpendicular component using, again, a triple cross product. It’s a straightforward derivation based on a common vector identity (BAC-CAB used in reverse) but I’ve not yet had the guts to present it. Maybe next year.

Comments and feedback welcomed!


Matter & Interactions I, Week 5

This week, we transitioned to chapter 1 of the Matter & Interactions textbook (fourth edition). I have WebAssign problem sets for each chapter available for formative assessment and practice while working their way through the reading. I encouraged them to use the book the way it was intended to be used, specifically by stopping and doing the checkpoints in situ and NOT going forward until they can get the correct answers by working them out (checkpoint answers are provided at the end of every chapter). My expectation is for them to have worked their way through chapter 1 by Monday of next week.

In class, I spent two days introducing vectors. I have never been happy with the introductory course’s treatment of vectors, mainly because of inconsistencies in terminology and notation (sometimes within a given textbook). More importantly, vectors are almost always presented in terms of their coordinate representations and not as the inherent geometric entities they are, and this completely undermines the main reason vectors are used in the first place: to describe physics without the need for coordinates. If we want students to understand that physics is independent of both reference frame and coordinate system, then why not present it that way from the beginning? I think it might pave the way to tensors, which are nothing but an extention of the vector concept to slightly more complicated quantities.

So, I began by introducing a vector as a quantity which can be represented with an arrow. The arrow’s length encodes the quantity’s magnitude (i.e. how much? or how many?) while the arrow’s direction encodes the quantity’s direction. I don’t like saying that a vector is “a quantity with magnitude and direction” because there are quantities that have magnitude and direction but are not vectors (e.g. finite angular displacements). Equally troublesome is saying that a vector is “a list of three numbers that transform a certain way from one frame to another” because, well, HUH? Defining a vector as an “element of a vector space” is rare in physics contexts (I don’t think I’ve seen it done that way, at least not in the intro course) and it’s really no less circular than speaking of mysterious transformation properties. Saying a vector is “something which can be represented with an arrow” seemed a relatively decent compromise. I don’t know…maybe I could do better.

As I introduced these ideas, I “invented” symbols on the board for them and gave the appropriate LaTeX commands (defined in my mandi package) to get these symbols. I want students to start associating LaTeX with material from the textbook so we can be consistent with notation from the beginning. They know to use \vect{} to indicate the symbol for a vector quantity, \magvect{} (think “magnitude of vector”) to get the symbol for a vector’s magnitude (always with the appropriate unit…mandi knows every quantity’s SI unit in as many as three different formats: base, derived, and one I made up called traditional…see the mandi documentation for details), and \dirvect{} (think “direction of vector”) to get the symbol for a vector’s direction.

The arrow, and the vector it represents, has inherent properties that don’t change from one coordinate system to another, and some properties that do indeed change from one coordinate system to another. I want to do two things: 1) make a solid connection between what I’m saying and what students see in the textbook and 2) get them to think deeply about geometry. Where do the numbers associated with vectors in the textbook comefrom? They come from projecting a vector onto a coordinate basis. What does THIS mean? Operationally, it means the following:

  1. Establish an otherwise arbitrary orthogonal coordinate system, which I drew on the board apparently arbitrarily oriented around the arrow representing our velocity vector. I intentionally kept the arrow in the first quadrant though. The arrow’s tail need not be at the origin.
  2. Place an imaginary light source “far away” above the x-axis and let it shine down on the arrow with light paths that are parallel to the y-axis so that the arrow casts a shadow onto the x-axis. The length of this shadow, along with this shadow’s orientation, tell us “how much of the arrow lies in the x direction.”
  3. Place an imaginary light source “far away” above the y-axis and let it shine down on the arrow with light paths that are parallel to the x-axis so that the arrow casts a shadow onto the y-axis. The length of this shadow, along with this shadow’s orientation, tells us “how much of the arrow lies in the y direction.”
  4. Do a similar procedure to figure out “how much of the arrow lies in the z direction” but understand that it will be zero in this particular case because our arrow lies in the xy-plane.
  5. These three “how muches” are the numbers that go into the slots in the notation M&I uses to denote the coordinate representation of a vector.

Now, when I then said that each of these “how muches” has a techinical name, and that this name is “the projection of the arrow onto an axis” there were audible “Oooos” and at least one “AHA!” because they said they’d heard this term before but didn’t truly understand what it meant until just now. I assume they were telling the truth, and I was happy. At this point, I did not distinguish between “component” and “projection” as the math textbooks do, and I will return to that later.

Next, we did a calculuation to get algebraic expressions for the three “how muches” and noted a pattern: the “how much” along a particular coordinate axis always works out to be the arrow’s length scaled by the cosine of the angle between the arrow and the axis in question. I introduced the notation  v_x = ||\vec{v}|| \cos\theta_x with obvious extension to the other two coordinate directions. Note the use of double bars for magnitude, consistent with the students’ calculus textbook. One question that was raised was why is cosine, rather than sine, used. We then discussed how this relates to the geometry of the problem, which is how much of the arrow lies along a chosen direction, so we need the side of the relevant right triangle that is adjacent to the angle we know, and thus we need cosine. Yes, you could use sine but then you’d have to refer to the complement of the obviously relevant angle and we want to be conceptually consistent throughout. So, we will always use the angle between the arrow and the chosen coordinate direction and we will always want the part of the arrow parallel to this direction and thus we will always use cosine. Students accepted this. They took the bait, because consistent use of cosine here leads to the next important idea: dot products.

Next, I explained that this process of getting “how much of a vector is parallel to a chosen direction” can be framed as a coordinate-free geometry issue. Given any two arbitrary vectors \vec{A} and \vec{B} (represented by arrows of course, but now I’m referring to the actual vectors rather than their represtational arrows), and without introducing a coordinate system, we can answer the question of “how much of one is parallel to the other” in an elegant way. I introduced the new symbol \vec{A}\bullet\vec{B} as the symbol for “how much of \vec{A} is parallel to \vec{B} with no regard for orientation.”  Note that’s one symbol! Specifically, it’s just a symbol for a real number, an element of \mathbb{R}. The actual number is merely the multiplicative product of the two vector magnitudes scaled by the cosine of the angle between them or just ||\vec{A}|| ||\vec{B}||\cos\theta, which requires visualizing them as arrows for simplicity. Think of the symbol as having two slots, each of which takes a vector as input and the complete symbol represents the resulting real number. Okay here’s where I’m setting the stage for something huge later.

This leads to three obvious extensions: 1) filling both slots with the same vector and 2) filling one slot with a unit vector (which we call simply a direction or, now, a basic vector) and 3) reversing the order of the two slots. Well, the third is interesting because it doesn’t change anything! Yes, reversing the two slots changes the way the overall symbol looks, but we get the same real number out. BOOM! We have discovered an elementary example of a symmetric tensor! The first is interesting because is gives us a geometric way of calculating the magintude of a vector…or does it? It’s not incredibly useful at this point, but we’ll make note of it and file it away nonetheless.  The second is the important one for the moment, because by dropping a basis vector into one of the slots (doesn’t matter which one, remember) we get the same “how much” number we got earlier from other considerations! Now we have a solid way to quantity what we mean by projecting a vector onto a coordinate basis. It’s just a dot product!

To follow up this discussion, students did a whiteboard exercise. I asked them to draw an arrow represent a velocity vector with an arbitrary magnitude of five units and a direction to the right on their boards. They quickly picked up on the fact that the actual length of their arrow didn’t matter as long as they labeled as having five units. Next, I came around to each group’s whiteboard and drew a new orthogonal coordinate basis on it and asked each group to “project the velocity vector onto the coordinate basis that I drew.” They got it! I mean they totally got it! The only difficulty was, as I have come to expect, someone’s calculator being in radian mode and not in degree mode. Using VPython for computations will permanently fix this problem by eliminating it entirely. Anyway, the amazing moment came when I asked them to calculate the vector’s magnitude in terms of the components they just derived and they all got five units back! Some were amazed. Some expected this. I was happy either way. The takeaway? Projecting a vector onto a coordinate basis changes the numbers we use to represent the vector the way the textbook does, but the actual vector itself doesn’t change. That’s geometry! That’s physics! That’s cool!

That huge thing I mentioned above? I didn’t explicitly go into this in class this week, but here it is. We can also think of a vector as a function  (a linear function, to be precise) that takes as an input another vector and outputs a real number by doing a dot product. Misner, Thorne, and Wheeler operationally describe is as a machine with an input slot that takes a vector and an output slot what spits out a real number. Yep…a vector framed as a function that takes another vector and outputs a real number, the dot product. This is conceputally very simple, but not at all anything close to what is seen in introductory physics texts. I’m trying to change that. Why? Because by stringing together such vectors-treated-as-functions in a certain prescribed way, we can build objects called tensors. Tensors appear in introductory physics, but we rarely point them out and expliot them. I’m trying to change that. At some point, I will revisit our class discussion from this week and present a vector as one of these functions that takes another vector and spits back a real number.

I welcome questions, feedback, and constructive criticism.

Conceptual Understanding in Introductory Physics XXV

This question emphasizes geometry and should be done without use of a coordinate system. It should also be done using only symbolic manipulation of vectors. Here it is.

Consider a particle moving with a constant, non-relativistic velocity. Starting with a general expression for kinetic energy in terms of either velocity or momentum, prove that the particle need not be under the influence of a non-zero net force. Do not refer to a coordinate system. Your argument must be stated in words as well as mathematically.

I’m not entirely happy with the way I articulated the question. I’m open to suggestions for improvement, but I want the question to have an air of vagueness about it.

Help Me! Question About Tensors and Projections

I need some help. I am working hard to find ways to bring more geometry into introductory calculus-based physics (and conceptual physics as well). By geometry, I mean specifically the geometry associated with vectors and tensors, and the information encoded therein. Yes, I said tensors.

I have been heavily influenced by these notes by Kip Thorne and Roger Blandford that form the basis of their forthcoming textbook. In particular, chapter 1 has deeply affected my view of what introductory physics could, and perhaps should, be like when an emphasis on geometry is provided. The discussion of tensors in these notes presents them as machines that take vectors as inputs and give a real number as output. This is the same approach taken in the famous general relativity textbook coauthored by Thorne (aka MTW) and in a fantastic new book by Jeevanjee. I won’t go into details here because I don’t think it’s necessary for my questions, but I look forward to exploring the approach in more depth in future posts.

Geometric entities, like vectors, have an existence independent of any coordinate system. For example, I can state that an object or particle has 35 units of momentum in a particular direction represented by an arrow and I can do so without choosing a coordinate system. I can project this momentum onto any arbitrary basis and resolve it into components if I want to. Fine. No problem.

But what about a quantity represented by a second rank tensor? What about moment of inertia (MoI0? MoI is a geometric quantity that has an existence independent of any coordinate system, but it isn’t a vector; it’s a second rank tensor that can be represented by a symmetric matrix.

So here is my question. How can I specificy a MoI in a coordinate-independent way analogous to doing so for a vector? For a vector, I can specificy a magnitude and draw a direction. What must I specify for MoI? I think I know the answer. I think I must specify the eigenvalues of the matrix representation of the MoI tensor and the principal axes to which these eigenvalues apply. These can then be projected onto a coordinate basis. Is that correct? If not, could you tell me if what I’m asking is even possible? I don’t see how it can’t be.