Matter & Interactions II, Weeks 13 and 14

I’m combining two weeks in this post.

The first week, we dealt with magnetic forces. One thing that I have never thought much about is the fact that the quantity \mathbf{v}\times\mathbf{B} is effectively an electric field, but one that depends on velocity. When velocity is involved, reference frames are involved, and that of course means Einstein is talking to us again. M&I addresses the fact that what we detect as an electric field and/or a magnetic field depends on our reference frame. This is fundamental material that I feel should be included in every introductory electromagnetic theory course. There’s really no good reason to omit it given that special relativity is a foundation of all contemporary physics. It’s sad to think that beginning next fall, our students won’t be exposed to this material any more.

The second week gets us into chapter 21, which presents Gauss’s law and Ampére’s law. There are many fine points and details to present here. I’ll try to list as many as I can think of.

  • I use the words pierciness, flowiness, spreadingoutness, and swirliness to introduce the concepts of flux, circulation, divergence, and curl respectively.
  • We have the term flux for the quantity given by surface integrals, but we rarely if ever see the term circulation for line integrals. I recommend introducing the term, primarly because it forms the basis for the definition of curl.
  • The distinction between an open surface and a closed surface is very important.
  • I, like M&I, prefer to write vector area as \hat{n}\,\mathrm{d}A rather than \mathrm{d}\mathbf{A} because it allows for introducing a “sneaky one” into the calculation of flux that lets a dot product become a product of scalars when the field is parallel to the surface’s unit normal:


  • Similarly, I like an element of vector length, at least for electromagnetic theory, as \hat{t}\,\mathrm{d}\ell rather than \mathrm{d}\mathbf{\ell} (the \ell is supposed to be bold but it doesn’t look bold to me). I don’t think I have ever seen this notation in an introductory course before, but I like it because students have seen unit tangents in calculus and this notation closely parallels that for vector area as described above. Plus, it also allows for a “sneaky one” into the calculation of circulation when the field is parallel to the path’s unit tangent::


  • After this chapter, we can finally write Maxwell’s equations for the first time. I show them as both integral equations and as differential equations. One of my usual final exam questions is to write each of the four equations as both an integral equation and a differential equation and to provide a one sentence interpretation of each form of each equation.


That’s about it for these two chapters. I thought there was something else I wanted to talk about, but it seems to have escaped me and I’ll update this post if and when I remember it.

Feedback welcome as always.

Vector Formalism in Introductory Physics I: Taking the Magnitude of Both Sides

TL;DR: I don’t like the way vectors are presented in calculus-based and algebra-based introductory physics. I think a more formal approach is warranted. This post addresses the problem of taking the magnitude of both sides of simple vector equations. If you want the details, read on.

This is the first post in a new series in which I will present a more formal approach to vectors in introductory physics. It will not have the same flavor as my recently begun series on angular quantities; that series serves a rather different purpose. However, there may be some slight overlap between the two series if it is appropriate.

I am also using this series to commit to paper (screen, really) some thoughts and ideas I have had for some time with the hope of turning them into papers for submission to The Physics Teacher. I’d appreciate any feedback on how useful this may be to the community.

To begin with, I want to address issues in the algebraic manipulation of vectors with an emphasis on coordinate-free methods. I feel that in current introductory physics courses, vectors are not exploited to their full potential. Instead of learning coordinate-free methods, students almost always learn to manipulate coordinate representations of vectors in orthonormal, Cartesian coordinate systems and I think that is unfortunate because it doesn’t always convey the physics in a powerful way. Physics is independent of one’s choice of coordinate system, and I think students should learn to manipulate vectors in a similar way. 

Let’s begin by looking at a presumably simple vector equation:

a\mathbf{A} = -5\mathbf{A}

The object is to solve for a given \mathbf{A}. Don’t be fooled; it’s more difficult than it looks. In my experience, students invariably try to divide both sides by \mathbf{A} but of course this won’t work because vector division isn’t defined in Gibbsian vector analysis. Don’t let students get away with this if they try it! The reasons for not defining vector division will be the topic of a future post.

(UPDATE: Mathematics colleague Drew Lewis asked about solving this equation by combining like terms and factoring, leading to (a+5)=0 and then to a = -5. This is a perfectly valid way of solving the equation and it completely avoids the “division by a vector” issue. I want to call attention to that issue though, because when I show students this problem, they always (at least in my experience) try to solve it by dividing. Also, in future posts I will demonstrate how to solve other kinds of vector equations that must be solved by manipulating both dot products and cross products, each of which carries different geometric information, and I want to get students used to seeing and manipulating dot products. Thanks for asking Drew!)

One could simply say to “take the absolute value of both sides” like this:

\left| a\mathbf{A} \right| = \left| -5\mathbf{A}\right|

but this is problematic for two reasons. First, it destroys the sign on the righthand side. Second, a vector doesn’t have an absolute value because it’s not a number. Vectors have magnitude, not absolute value, which is an entirely different concept from that of absolute value and warrants separate consideration and a separate symbol.

We need to do something to each side to turn it into a scalar because we can divide by a scalar. Let’s try taking the dot product of both sides with the same vector, \mathbf{A}, and proceed as follows:

\begin{aligned} a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{A}\bullet\mathbf{A} && \text{dot both sides with the same vector} \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{A}\rVert^2 && \text{dot products become scalars} \\ \therefore a &= -5 && \text{solve} \end{aligned}

This is a better way to proceed. It’s formal, and indeed even pedantic, but I dare say it’s the best way to go if one wants to include dot products. Of course in this simple example, one can see the solution by inspection, but my goals here are to get students to stop thinking about the concept of dividing by a vector and to manipulate vectors algebraically without referring to a coordinate system.

Let’s now look at another example with a different vector on each side of the equation.

 a\mathbf{A} = -5\mathbf{B}

Once again the object is to solve for a given \mathbf{A} and \mathbf{B}. Note that solving for either \mathbf{A} or \mathbf{B} is obviously trivial so I won’t address it; it’s simply a matter of scalar division. Solving for a is more challenging because we must again suppress the urge to divide by a vector. I will show two possible solutions. Make sure you understand what’s being done in each step.

\begin{aligned} a\mathbf{A} &= -5\mathbf{B} && \text{given equation} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\mathbf{A} && \text{dot both sides with the same vector} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\left(\dfrac{-5}{\hphantom{-}a}\mathbf{B}\right) && \text{substitute from original equality} \\ a\lVert\mathbf{A}\rVert^2 &= \dfrac{25}{a}\lVert\mathbf{B}\rVert^2 && \text{dot products become scalars} \\ a^2 &= 25\dfrac{\lVert\mathbf{B}\rVert^2}{\lVert\mathbf{A}\rVert^2} && \text{rearrange} \\ \therefore a &= \pm 5\dfrac{\lVert\mathbf{B}\rVert}{\lVert\mathbf{A}\rVert} && \text{solve} \end{aligned}

We get two solutions, and they are geometrically opposite each other; that’s the physical implication of the signs. (I suppose we could argue over whether or not to just take the principal square root, but I don’t think we should do that here because it would throw away potentially useful geometric information.) We can find a cleaner solution that accounts for this. Consider the following solution which exploits the concepts of “factoring” a vector into a magnitude and a direction and the properties of the dot product.

\begin{aligned} a\mathbf{A} &= -5\mathbf{B} && \text{given equation} \\ a\mathbf{A}\bullet\mathbf{A} &= -5\mathbf{B}\bullet\mathbf{A} && \text{dot both sides with \textbf{A}} \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{B}\rVert\widehat{\mathbf{B}}\bullet\lVert\mathbf{A}\rVert\widehat{\mathbf{A}} && \text{factor each vector into magnitude and direction}  \\ a\lVert\mathbf{A}\rVert^2 &= -5\lVert\mathbf{B}\rVert\lVert\mathbf{A}\rVert\,\widehat{\mathbf{B}}\bullet\widehat{\mathbf{A}} && \text{push magnitude through the dot product} \\ \therefore a &= -5\dfrac{\lVert\mathbf{B}\rVert}{\lVert\mathbf{A}\rVert}\,\widehat{\mathbf{B}}\bullet\widehat{\mathbf{A}} && \text{solve} \end{aligned}

See the geometry? It’s in the factor \widehat{\mathbf B}\bullet\widehat{\mathbf A}. If \mathbf{A} and \mathbf{B} are parallel, this factor is +1 and if they are antiparallel it is -1. Convince yourself that those are the only two options in this case. (HINT: Show that each vector’s direction is a scalar multiple of the other vector’s direction.) This solution won’t work if the two vectors aren’t collinear. If we’re solving for a then both vectors are assumed given and we know their relative geometry.

Let’s look at another example from first semester mechanics, Newton’s law of gravitation,

\mathbf{F} = G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left( -\widehat{\mathbf r}_{12}\right)

where \mathbf{r}_{12} = \mathbf{r}_1 - \mathbf{r}_2 and should be read as “the position of 1 relative to 2.” Let’s “take the magnitude of both sides” by first writing \mathbf{F} in terms of its magnitude and direction, dotting each side with a vector, and dividing both sides by the resulting common factor.

\begin{aligned} \lVert\mathbf{F}\rVert\left(-\widehat{\mathbf{r}}_{12}\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left( -\widehat{\mathbf{r}}_{12}\right) && \text{given equation} \\ \lVert\mathbf{F}\rVert\left(-\widehat{\mathbf{r}}_{12}\bullet\widehat{\mathbf{r}}_{12}\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left(-\widehat{\mathbf{r}}_{12}\bullet\widehat{\mathbf{r}}_{12}\right) && \text{dot both sides with the same vector} \\ \lVert\mathbf{F}\rVert\left(-\lVert\widehat{\mathbf{r}}_{12}\rVert^2\right) &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2}\left(-\lVert\widehat{\mathbf{r}}_{12}\rVert^2 \right) && \text{dot products become scalars} \\ \therefore \lVert\mathbf{F}\rVert &= G\dfrac{M_1 M_2}{\lVert\mathbf{r}_{12}\rVert^2} && \text{divide both sides by the same scalar} \end{aligned}

Okay, this isn’t an Earth-shattering result becuase we knew in advance it has to be the answer, but my point is how we formally went about getting this answer. More specifically, the point is how we went about it without dividing by a vector.

Let’s now consider a final example from introductory electromagnetic theory, and this was the example that got me thinking about this entire process of “taking the magnitude of both sides” about a year ago. It’s the expression for the electric force experienced by a charged particle in the presence of an electric field (obviously not its own electric field).

\mathbf{F} = q\mathbf{E}

That one vector is a scalar multiple of another means the two must be collinear, so they must either be parallel or antiparallel. An issue here is that q is a signed quantity. Again, we have a choice about which vector with which to dot both sides; we could use \mathbf{F} or we could use \mathbf{E}. If we use the former, we will eventually need to take the square root of the square of a signed quantity, which may lead us astray. Therefore, I suggest using the latter.

\begin{aligned} \mathbf{F} &= q\mathbf{E} && \text{given equation} \\ \mathbf{F}\bullet\mathbf{E} &= q\mathbf{E}\bullet\mathbf{E} && \text{dot both sides with the same vector} \\ \lVert\mathbf{F}\rVert\widehat{\mathbf{F}}\bullet\lVert\mathbf{E}\rVert\widehat{\mathbf{E}} &= q\lVert\mathbf{E}\rVert^2 && \text{factor LHS, simplify RHS} \\ \lVert\mathbf{F}\rVert\lVert\mathbf{E}\rVert\,\widehat{\mathbf{F}}\bullet\widehat{\mathbf{E}} &= q\lVert\mathbf{E}\rVert^2 && \text{push the magnitude through the dot product} \\ \therefore \lVert\mathbf{F}\rVert &= \dfrac{q}{\widehat{\mathbf{F}}\bullet\widehat{\mathbf{E}}}\lVert\mathbf{E}\rVert && \text{solve} \end{aligned}

This may look overly complicated, but it’s quite logical, and it reflects goemetry. If q is negative, then the dot product will also be negative and the entire quantity will be positive. If q is positive, then the dot product will also be positive and again the entire quantity will be positive. Geometry rescues us again, as it should in physics. We can also rearrange this expression to solve for either q or \lVert\mathbf{E}\rVert with the sign of q properly accounted for by the dot product. By the way, \widehat{\mathbf{F}} and \widehat{\mathbf{E}} can’t be orthogonal becuase then their dot product would vanish and the above expression would blow up. Geometry and symmetry, particularly the latter, preclude this from happening.

In summary, “taking the magnitude of both sides” of a simple vector equation presents some challenges that are mitigated by exploiting geometry, something that is neglected in introductory calculus-based and algebra-based physics courses. I suggest we try to overcome this by showing students how to formally manipulate such equations. One advantage of doing this is students will see how vector algebra works in more detail than usual. Another advantage is that students will learn to exploit geometry in the absence of coordinate systems, which is one of the original purposes of using vectors after all.

Do you think this would make a good paper for The Physics Teacher? Feedback welcome!

Matter & Interactions II, Week 12

We’re hanging out in chapter 19 looking at the properties of capacitors in circuits.

In response to my (chemist) department chair’s accusation that I’m not rigorous enough in my teaching of “the scientific method” as it’s practiced in chemistry, I just had “the talk” about “THE” scientific method with the class and about how it doesn’t exist. I will never forget Dave McComas (IBEX) telling the audience at an invited session I organized at AAPT in Ontario (CA) that we MUST stop presenting “the scientific method” as it is too frequently presented in the textbooks because it simply does not reflect how science works. No one hypothesizes a scientific discovery. Once a prediction is made and experimentally (or observationally in the case of astronomy) verified, that’s not a prediction because the outcome is expected. Even if the prediction isn’t verified, one of the required known possible outcomes is that the prediction is wrong. There’s nothing surprising here. True discoveries happen when we find something we had no reason to expect to be there in the first place. The Higgs boson? Not a discovery, because it was predicted forty years or so ago and we only recently had the technology to test for its presence. I don’t think anyone honestly expected it to not be found, but I think many theoretical particle physicists (not so) secretly hoped it wouldn’t be found because then we would have actually learned something new (namely that the standard model has problems).

The “scientific method” simply doesn’t exist as a finite numbered sequence of steps whose ordering is the same from discipline to discipline. Textbooks need to stop presenting that way. Scientific methodology is more akin to a carousel upon which astronomers, chemists, physicists, geologists, or biologists (and all the others I didn’t specify) jump at different places. Observational astronomers simply don’t begin by “forming an hypothesis” as too many overly simplistic sources may indicate. Practitioners in different disciplines begin the scientific process at different places by the very nature of their disciplines and I don’t think there’s a way to overcome that.

Rather than a rote sequence of steps, scientific methodology should focus on validity through testability and falsifiability. I know there are some people who think that falsifiability has problems, and I acknowledge them. However, within the context of introductory science courses, testability and falsifiability together form a more accurate framework for how science actually works. This is the approach I have been taking for over a decade in my introductory astronomy course. It is not within my purview to decide what is and is not appropriate for other disciplines, like chemistry. My chemist colleagues can present scientific methodology as they see fit. I ask for the same respect in doing so within my disciplines (physics and astronomy).

I now consider “the scientific method” to have been adequately “covered” in my calculus-based physics course.

Feedback welcome as always.


Matter & Interactions II, Week 11

More with circuits, and this time capacitors, and the brilliantly simple description M&I provides for their behavior. In chapter 19, we see that traditional textbooks have misled students in a very serious way regarding the behavior of capacitors. Those “other” textbooks neglect fringe fields. Ultimately, and unfortunately, this means that capacitors should not work at all! The reason becomes obvious in chapter 19 of M&I. We see that in a circuit consisting of a charged capacitor and and a resistor, it’s the capacitor’s fringe field that initiates the redistribution of surface charge that, in turn, establishes the electric field inside the wire that drives the current. The fringe field plays the same role that a battery’s field plays in a circuit with a flashlight bulb and battery. It initiates the charge redistribution transient interval. As you may have already guessed, the capacitor’s fringe field is what stops the charging process for an (initially) uncharged capacitor in series with a battery. As the capacitor charges, its fringe field increases and counters the electric field of the redistributed surface charges, thus decreasing the net field with time. If we want functional circuits, we simply cannot neglect fringe fields.

Ultimately, the M&I model for circuits amounts to the reality that a circuit’s behavior is entirely due to surface charge redistributing itself along the circuit’s surface in such a way as to create a steady state or a quasisteady state. It’s just that simple. You don’t need potential difference. You don’t need resistance. You don’t need Ohm’s law. You only need charged particles and electric fields.

One thing keeps bothering me though. Consider one flashlight bulb in series with a battery. The circuit draws a certain current i_1 for example. Now, consider adding nothing but a second, identical flashlight bulb in parallel with the first one. Each bulb’s brightness should be very nearly the same as that of the original bulb. The parallel circuit draws twice the current of the original lone bulb i_2 = 2i_1 but that doubled current is divided equally between the two parallel flashlight bulbs. That’s all perfectly logical, and I can correctly derive this result algebraically. I end up with a factor of 2 multiplying the product of either bulb’s fliament’s electron number density, cross sectional area, and electron mobility.

i_2 \propto 2nAu

My uneasiness is over the quantity to which we should assign the factor of 2. A desktop experiment in chapter 18 that establishes we get a greater current in a wire when the wire’s cross sectional area increases. Good. However, in putting two bulbs in parallel is it really obvious that the effective cross sectional area of the entire circuit has doubled? It’s not so obvious to me because the cross sectional area can possibly only double by virtue of adding an identical flashlight bulb in parallel with the first one. Unlike the experiment I mentioned, nothing about the wires in the circuit change. Adding a second bulb surely doesn’t change the wire’s mobile electron number density; that’s silly. Adding a second bulb also surely doesn’t change the wire’s electron mobility; that’s equally silly. Well, that leaves the cross sectional area to which we could assign the factor of 2, but it’s not obvious to me that this is so obvious. One student pointed out that the factor of 2 probably shouldn’t be thought of as “assigned to” any particular variable but rather to the quantity nAu as a whole. This immediately reminded me of the relativistic expression for a particle’s momentum \vec{p} = \gamma m \vec{v} where, despite stubborn authors who refuse to actually read Einstein’s work, the \gamma applies to the quantity as a whole and not merely to the mass.

So, my question boils down to whether or not there is an obvious way to “assign” the factor of 2 to the cross sectional area. I welcome comments, discussion, and feedback.


Matter & Interactions II, Week 10

Chpater 18. Circuits. You don’t need resistance. You don’t need Ohm’s law. All you need is the fact that charged particles respond to electric fields created by other charged particles. It’s just that simple.

When I took my first electromagnetism course, I felt stupid becuase I never could just look at a circuit and tell what was in series and what was in parallel. And the cube of resistors…well I still have bad dreams about that. One thing I know now that I didn’t know then is that according to traditional textbooks, circuits simply should not work. Ideal wires don’t exist, and neither do ideal batteries nor ideal light bulbs. Fringe fields, however, do indeed exist and capacitors just wouldn’t work without them. So basically, I now know that the traditional textbook treatment of circuits is not just flawed, but deeply flawed to the point of being unrealistic.

Enter Matter & Interactions. M&I’s approach to circuits invokes the concept of a surface charge gradient to establish a uniform electric field inside the circuit, which drives the current. This was tough to wrap my brain around at first, but now I really think it should be the new standard mainstream explanation for circuits in physics textbooks. the concept of resistance isn’t necessary. It’s there, but not in its usual macroscopic form. M&I treats circuits from a purely microscopic point of view with fundamental parameters like mobile electron number density, electron mobility, and conductivity and geometry in the form of wire length and cross sectional area. Combine these with charge conservation (in the form of the “node rule”) and energy conservation per charge (in the form of the “loop rule”) and that’s all you need. That’s ALL you need. No more “total resistance” and “total current” nonsense either. In its place is a tight, coherent, and internally consistent framework where the sought after quantities are the steady state electric field in each part of the circuit and the resulting current in each part. No more remembering that series resistors simply add and parallel resistors add reciprocally. Far more intuitive is the essentially directly observable fact that putting resistors in series is effectively the same as increasing the filament length and putting resistors in parallel is effectively the same as increasing the circuit’s cross sectional area. It’s so simple, like physics is supposed to be.

Of course, in the next chapter (chapter 19) the traditional “Ohm’s law” model of circuits is seen to be emergent from chapter 18’s microscopic description, but honestly, I see no reason to dwell on this. Most of my students are going to become engineers anyway, and they’ll have their own yearlong circuit courses in which they’ll learn all the necessary details from the engineering perspective. For now, they’re much better off understanding how circuits REALLY work and if they do, they’ll be far ahead of me when I was in their shoes as an introductory student, and will have the deepest understanding of anyone else in their classes after transferring. That’s my main goal after all.

Feedback welcome.

Conceptual Understanding in Introductory Physics XXVIII

You may not agree that the topic(s) of this question belong in an introductory calculus-based physics course, but I’m going to pretend they do for the duration of this post. Gradient, divergence, and curl are broached in Matter & Interactions within the context of electromagnetic fields. Actually, gradient appears in the mechanics portion of the course.

One problem with these three concepts, especially divergence and curl, is the distinction between their actual definitions and how they are calculuated. The former are rarely, if ever, seen at the introductory level and usually first appear in upper level courses. However, some authors [cite examples here] replace the physical definitions with the mathematical symbols invented by Heaviside and Gibbs to represent the calculation of these quantities. In other words, the divergence of \mathbf{A} is frequently defined as \nabla\cdot\mathbf{A} and the curl of \mathbf{A} is frequently defined as \nabla\times\mathbf{A}. These should be treated as nothing more than symbols representing their respective physical quantities and should not be taken as equations for calculation. If one insists on keeping this notation, then the dot and cross should at least be kept with the nabla symbol so that \nabla\cdot represents divergence and \nabla\times represents curl. Either way, these are operators that operate on vectors and their symbols should reflect that concept and should be interpreted as such and not as a recipe for calculation. This book by Tai was extremely helpful in getting this point across to me.

Gradient has its own unique problem in that some sources claim that one can only take the gradient of a scalar, which is patently false. One can indeed take the gradient of, for example, a gradient but the object one gets back is not a vector. If we adopt a unified approach to vector algebra and vector calculus we find that there are patterns associating the operand and the result when using these vector opators. For example, operating on a vector with \nabla doesn’t produce a vector; it produces a second rank tensor. This is one reason I would love to find a way to bring this approach into the introductory course. So many things would be unified.

But now, on to the questions I want to ask here

(a) Write a conceptual definition of gradient in words.

(b) Write a mathematical definition of gradient that does not depend on any particular coordinate system. You must not use the nabla symbol.

(c) Write a conceptual definition of divergence in words.

(d) Write a mathematical definition of divergence that does not depend on any particular coordinate system. You must not use the nabla symbol.

(e) Write a conceptual definition of curl in words.

(f) Write a mathematical definition of curl that does not depend on any particular coordinate system. You must not use the nabla symbol.


(Note: I need to revisit this post in the future to make sure the notion of applying gradient to a vector quantity can be handled in the coordinate free way I have in mind. My intuition is that it can be, but I need to work out some details. )

Angular Quantities II

In this post, I will address the first question on the list in the previous post. What exactly does it mean for something to be a vector?

In almost every introductory physics course, vectors are introduced as “quantities having magnitude and direction” and are eventually equated to graphical arrows. A vector is neither of these, but is something far more sophisticated. Remember that I’m coming at this as a physicist, not a pure mathematician. I will probably get more than a few things incorrect. Let me know if/when that happens. Let me see if I can present this at a level suitable for an introductory calculus-based physics course. Imagine you walk into class on the first day and start talking. Here goes.

We live in a Universe with has measureable properties, and containing physical entities that also have measureable properties. A lot of physics consists of attempting to measure, and thus quantify, these properties (experiment). More important to some physicists is describing these properties mathematically and making predictions about them (theory) rather than attempting to measure them. We can invent mathematical objects to represent these measureable properties. The word represent is important here, because the mathematical object representing an entity is not the same thing as the entity itself. These mathematical objects themselves have properties, and these properties allows us to manipulate these objects so as to use them to make predictions about Nature.

The properties possesed by the mathematical objects we use to describe Nature collectively form something with a very strange name: a vector space. That sounds very technical and complicated. It is indeed a very technical term because it means something profound. However, as I will try to convince you now, it is not necessarily complicated at all. Let me attempt to show you.

I will use bold symbols (e.g. \mathbf{u}, \mathbf{v}, \mathbf{w} etc.) to represent mathematical objects with the properties that collectively form a vector space. These mathematical objects have a generic name: vectors. Yes, that’s their name. Note that there is nothing at all here to do with arrows or anything else really. Vectors are nothing more than mathematical objects with properties that let us model and make predictions about the properties of the Universe we observe and try to understand in Nature. Be careful to understand that there are two sets of properties here, those of the Universe and its inhabitant entities, and those of the mathematical objects we use to represent those things. I’m not saying this is the best way to describe this, but it’s a start.

I will use italic symbols (e.g. a, b, c etc.) to represent ordinary numbers you are already familiar with. Technically, the are real numbers and every math course you have ever taken has used them whether or not you knew they had a name.

In a vector space, there are two and only two mathematical operations defined: addition and scalar multiplication. That’s all there is. You’ve known how to add and multiply for a long time, and there is nothing new to see here. Consider addition. So, in any vector space, you can take any two vectors, and add them and get a third vector. It’s just that simple, however, I must warn you that there is indeed something deeper going on here but there’s no need to bring it up yet because it’s a geometry issue. We will get to it soon enough. So for now, addition is the same addition you’ve already become familiar with. Oh, here’s a new technical gem for you. The simple fact that adding two vectors gives you a third vector is a property that we use to say that the vector space is closed under addition. All that means is that when you add two vectors, you get a vector. All three inhabit the vector space. That’s simple to understand. You cant add two vectors and get, say, a real number. You must always get a vector. That’s very simple. Let’s say it more mathematically.

  • In a vector space, addition is a closed operation.

    If \mathbf{u} and \mathbf{v} are vectors then \mathbf{u}+\mathbf{v} is also a vector.

Now consider scalar multiplication. You’ve known how to multiply real numbers for a long time, and again, there isn’t much new to see here. Multiplying a scalar and a vector gives another vector. We will explore the goemetric implication of this later. Like vector addition, scalar multiplication is a closed operation.

  • In a vector space, scalar multiplication is a closed operation.

    If \mathbf{w} is a vector and c is a scalar, then c\mathbf{w} is also a vector.

Here is a list of remaining properties that define a vector space.

  • In a vector space, addition is commutative, meaning that the order of the vectors being added doesn’t matter.


  • In a vector space, addition is associative, meaning vectors can be grouped in any way as long as the order isn’t changed.

    (\mathbf{u}+\mathbf{v})+\mathbf{w} = \mathbf{u}+(\mathbf{v}+\mathbf{w})

  • In a vector space, scalar multiplication is associative. The things you’re multiplying can be grouped differently as long as their order isn’t changed.You get the same vector either way. Cool!

    a(b\mathbf{c}) = (ab)\mathbf{c}

  • In a vector space, when you have the sum of two scalars multiplying a vector, the thing you get back is the sum of each scalar multiplying that vector.


  • In a vector space, scalar multiplication is distributive over vector addition. Some authors equivalently say that vector addition is linear. Both of these mean the same thing, but I think the second way of saying it is more important, and I will try to show why later. When you have a scalar multiplying the sum of two vectors, the vector you get back is the sum of that scalar multiplying each vector separately.


  • In a vector space, there is a multiplicative identity element such that multiplying it by any vector you get the same vector back. This effectively defines a unity element, commonly called 1 (one). This is important because sometimes we can exploit what I like to call a “sneaky 1” to help manipulate a mathematical expression. More on that when we need it.

    1\mathbf{u} = \mathbf{u}

  • In a vector space, there is an additive identity element such that adding it to any vector gives that same vector back as the sum. This is effectively a definition of a zero vector.Seeing zero written this way (as a vector) may seem strange, but you will get used to it.

    \mathbf{b} + \mathbf{0} = \mathbf{b}

  • In a vector space, there is a member of the vector space called an inverse element such that adding it to any vector gives the identity element (zero element). For any vector \mathbf{v} we have a vector -\mathbf{v} such that the two sum to zero. Do not think of the - sign as subtraction. Think of it as merely a symbol that turns the vector in to its additive inverse.

    \mathbf{v}+(-\mathbf{v}) = \mathbf{0}

We’re done. That’s it. These properties collectively and operationally define a vector space that is inhabited by mathematical objects called vectors. These properties also define the things we can do to manipulate vectors. Note there is no mention of subtraction, and there is no mention of division. There is vector addition and scalar multiplication. That’s all there is. This is really simple! Also note there is no mention of magnitude, direction, arrows, components, dot products, or cross products. If you don’t know what those three terms mean don’t worry. We will define them later.

Let me now convince you that you have dealt with vector spaces and vectors for many years and didn’t realize it. Consider the real numbers (that’s all positive numbers, negative numbers, and zero regardless of whether they’re rational or not, and regardless of whether they’re integers or not). Do they meet each and every one of the properties above? To convince yourself that they do, go through them one by one. Does adding two real numbers give a real number? Yes (3.2 + 5.9 = 9.1). Does adding 0 to 5 give 5? Yes. Does adding 6 to -6 give 0? Yes. You can do the rest. Therefore, I claim that without knowing it, you have been using vector spaces and vectors all along!

Now, let me ask you a new question. Consider only the natural numbers. Recall that these numbers are the ones you use for counting and you’ve probably been using them longer than you’ve been using real numbers! Do the natural numbers (counting numbers) form a vector space with each number being a vector? I will tell you that the answer is no, they do not, but I don’t want you to take my word for it. Go through each of the above properties one by one using counting numbers and see if you can convince yourself that these number do not inhabit a vector space.

This is a physics class, so let’s get more physicsy. In physics, as in all science, we use a system of units called the SI System. All scientists know about this system of units, but some subdisciplines (e.g. astrophysics) don’t use them yet. I hope this changes because it will make many things simpler, but I digress. The SI System consists of seven independent fundamental units that represent seven fundamental quantities: mass, length (I prefer spatial displacement), time (I prefer temporal displacement), thermodynamic temperature, amount, luminous intensity, and electric current. All physically measureable properties in our Universe can be expressed in various combinations of these seven fundamental quantities and their units. Your question is: Do these seven fundamental form a vector space? What a weird question! Still, it’s one you can address by, again, working your way through the defining properties of a vector space given above. See what you can come up with.

This may seem a very strange way to begin introductory physics, and it is! It’s strange, but I hope it will help get you to a place where your understanding is deeper than it would be had we begun in a traditional way. Accept the strangeness and uncomfortableness you feel right now, and then let it go. There’s much learning to be done, and it starts here.