I’m writing this for my string theory class. We are basing our lectures on Zwiebach – A First Course in String Theory, and starting off with special relativity. Not everybody in the class has a physics background (pure and applied mathematics students), and so there are likely to be questions which come up which show where I have to fill in some knowledge. We had a question about the invariant measure in special relativity (SR) and why there was a different sign in front of the time term compared with the space terms. I’ll do my best to explain here. Note that I am not explaining it in the precise chronological order of discoveries.

We start the picture off with relativity before SR – that is, Galilean Relativity. This simply states that the laws of motion are the same in all inertial (non-accelerating frames). That may sound straightaway like SR, but there’s a crucial ingredient missing which we will see in a bit.

What this means is that if you are watching some physical process in your own frame of reference (let’s say a particle in some position at time t given by x(t), then someone who is moving with respect to you (with speed v) would see everything happening with some additional velocity, ie. they would see x'(t)=x(t)-v t (where the prime isn’t a derivative, it is just the value of x from the point of view of the second observer), and such motion would satisfy the equations of motion for whatever the dynamics of that object should be.

This could be, for instance the equation for a pendulum. If x(t) is a solution to the equations for the pendulum then so is x'(t)=x(t)-v t. (In all of this we are talking about a single direction, but it holds true for a vector in a three dimensional space).

There’s something really important here. For both parties, there is a universal ticking clock, that marks off t. And if we had a third observer, they would use the same clock with the same ticks. It seems very reasonable: time goes the same for all observers. That will soon fall…

The crucial thing that we see is that although the positions (x(t) and x'(t)) would be different in each frame, and so would the velocities \dot{x}(t) and \dot{x}'(t), the accelerations \ddot{x}(t)=\ddot{x}'(t) are the same as can be seen by taking two derivatives of x'(t) above. Why is this so important?

Well, in Newtonian mechanics, the equation of motion is F=ma. So long as the forces are equivalent between non-accelerating frames, and the masses don’t change, then if the accelerations are the same, physics does what physics should do independent of the frame that you are observing it from. Physics valid in one frame is valid in a relatively moving frame – that is Galilean relativity.

We can talk about the symmetries of a space where all of the above is true. The symmetries essentially tell us that giving a boost in some direction leaves physics invariant x(t)\rightarrow x'(t)=x(t)-v t. We also know that translation will leave physics invariant – ie. letting x(t)\rightarrow x'(t)=x(t)+b. And indeed rotations in space are a symmetry of physics \vec{x}\rightarrow \vec{x}'=Rot(\theta,\phi)\vec{x}, where Rot is a matrix which rotates vectors in three dimensions.

OK, so how does special relativity differ from this? Well, let’s see how far we can push the question of the frame independence of physics. We go now to electromagnetism and Maxwell’s equations in particular. These are the equations which are the equivalent of Newton’s equations for electromagnetic fields. They tell you how they interact with one another – ie. when you alter a magnetic field, how does the electric field change, etc. If Galilean relativity is to be believed, then the equations should hold true in all frames, moving or not…but there’s an issue that arises. Maxwell’s equations lead to the wave equations for electromagnetism. For the electric field, this is:

\frac{1}{c^2}\frac{\partial^2 E}{\partial t^2}-\nabla^2 E=0

Fine, that’s just an equation of motion which says how quickly the electric field changes depending on its spatial gradient, but importantly it has a parameter in there, c, the speed of light. This says that depending on the value of c, the relationship between the acceleration of the field and the spatial gradient will be different. That doesn’t sound too bad until you go back to your Galilean relativity and see that if light is going at c in one frame, then it will be going at c+v in a relative frame, and so the constant in the equation of motion will be different. This says then that the dynamics of light in a static frame (one where c is the value we all know and love) will be different if it is in a moving frame (where light will be moving at a different speed). This means that you would be able to tell which frame is truly the static one by looking at the way the electromagnetic field behaves and reading off the value of c in that frame and then working backwards from there.

So, we have to drop one of two things – either physics is not frame independent and there is some absolute rest frame or alternatively, the speed of light is the same in all frames. It would seem like the former is the easiest one to consider, but that was put to rest (or at least was sent in the general direction of bed) by the experiments of Michelson and Morely.

This meant that indeed the speed of light really was a constant, in any reference frame, but this doesn’t tie in with Galilean relativity, in which velocities must add in the way we would imagine from our daily lives: Something moving at velocity v_1 with respect to something moving at velocity v_2 will, to the second observer be moving at v_1-v_2, and the same should hold if v_1=c. Now we know this doesn’t hold so we have to see what the consequences are. OK, so how can we understand the relative motion in different reference frames if the speed of light really is a constant? Let’s see.

Up to now we had that x'=x+vt. Let’s try and generalise this, but importantly allow time to tick differently in the two frames (call them R and R’). We are going to assume that space is homogeneous, and so we can perform a translation in space or time, and nothing should change, so any relationship between x’ and t’ and x and t should be linear. The most general one would be:

x'=\gamma x+bt

t'=Ax+Bt

(using the same notation as here). What constraints can we put on the four parameters (\gamma, b, A,B)? Well, let’s imagine that you are sat at position x’=0 in your moving reference frame, and I see you whizzing by. I can ask what position you will be in my frame at time t. If we say that at time t=0 in my frame, you line up with my x=0, then we must have that when x'=0 we must have x=vt – ie. the x position of your origin is just x=vt. Plugging this into the first equation gives us:

0=\gamma vt+bt\implies \gamma v=-b, so we now have:

x'=\gamma (x-vt)

t'=Ax+Bt

The first equation looks the same as the Galilean relation with a factor out the front. We kind of expect that when everything is happening slowly with respect to the speed of light, we should still have the Galilean transformations between coordinate systems, so \gamma should be a function of velocity, and should go to 1 for small velocities, relative to c.

We could of course have started about thinking of R’ as being at rest and R moving with velocity -v with respect to it. This would have lead to:

x=\gamma (x'+vt')

Now let’s think of a beam of light moving in R (starting at x=0, t=0). It must be going at velocity c, and so we must have that x=ct. But for this same beam of light in R’, it must also be moving at c, so we must also have x'=ct'. So we know something about the relationship between the (x,t) and (x’,t’) coordinates for light.  Whatever the transformation between the different coordinates, we must have that when x=ct, we also have x'=ct', or alternatively t=\frac{x}{c} when t'=\frac{x'}{c}, leading to:

x=\gamma (x'+v\frac{x'}{c})

and

x'=\gamma (x-v\frac{x}{c})

Plugging the second equation into the first leads to:

x=\gamma (1+\frac{v}{c})\gamma(1-\frac{v}{c})x

which gives:

\gamma=\pm \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}.

We can always make a choice for the sign of the direction in which x is increasing, which corresponds to choosing the positive sign in the above, so we have fixed \gamma.

Now if we take the same equations and again look at light signals, but this time eliminate x and x’ using x=ct and x'=ct' we get:

ct'=\gamma (ct-vt)=\gamma(ct-v\frac{x}{c})

This can now be replaced in

t'=Ax+Bt

to give:

\gamma(t-v\frac{x}{c^2})=Ax+Bt

Giving:

0=x(A+\frac{\gamma v}{c^2})+t(B-\gamma)

Plugging back in x=ct gives:

A c+B=\gamma(1-\frac{v}{c})

Now we have to do the same for the other frame, reversing the sign of v as we do so. This leads to:

-A c+B=\gamma(1+\frac{v}{c})

Solving these two equations simultaneously gives:

A=-v\frac{\gamma}{c^2} and B=\gamma. Now, finally we have our full Lorentz transformations. To transform from one frame into another in a way which leaves the speed of light the same in both frames, and such that we have translational invariance, we have:

x'=\gamma(x-vt)

t'=\gamma(t-\frac{vx}{c^2})

OK, so we’ve just derived the Lorentz transformations from the principles that:

  1. There is no special intertial reference frame
  2. The speed of light is the same in all reference frames

Now we have that the speed of light is an invariant. Is there anything which is more general which is invariant between inertial frames. In Galilean relativity we would have that the distance between two points  is invariant because:

x'_2-x'_1=(x_2-vt)-(x_1-vt)=x_2-x_1

But this isn’t the case now. Now we have that:

x'_2-x'_1=\gamma(x_2-vt)-\gamma(x_1-vt)=\gamma(x_2-x_1)

(and the same is true if we go into higher dimensions and look at \sqrt{(x_2-x_1)^2+(y_2-y_1)^2+(z_2-z_1)^2})

so lengths get shortened between relatively moving frames. Is there anything which remains the same? Well, we might think that we can take pythagorus into higher dimensions. Let’s look at:

\sqrt{(ct'_2-ct'_1)^2+(x'_2-x'_1)^2} which would seem to be like a length in space time, and we have the factor of c to get things in the same units:

\sqrt{(ct'_2-ct'_1)^2+(x'_2-x'_1)^2}=\sqrt{(\gamma(t_2c-\frac{vx_2}{c})-\gamma(t_1c-\frac{vx_1}{c}))^2+(\gamma(x_2-vt_2)-\gamma(x_1-vt_1))^2}

With some algebra, you will see that this is all quite a mess…and we are led to try something different. Instead we try:

\sqrt{-(ct'_2-ct'_1)^2+(x'_2-x'_1)^2}=\sqrt{-(\gamma(t_2c-\frac{vx_2}{c})-\gamma(t_1c-\frac{vx_1}{c}))^2+(\gamma(x_2-vt_2)-\gamma(x_1-vt_1))^2}

and with some beautiful cancellations we see that

\sqrt{-(ct'_2-ct'_1)^2+(x'_2-x'_1)^2}=\sqrt{-(ct_2-ct_1)^2+(x_2-x_1)^2}

That is, if we measure the ‘distance’ between two points in space time but using instead of Pythagorus, this ‘sign-altered’ Pythagorus, we find that this object is frame independent.

We can take the limit that the two events get closer and closer and we define (in 3+1 dimensional spacetime):

ds^2=-dt^2+dx^2+dy^2+dz^2

This measure is the same in all intertial reference frames. That is where the minus sign comes from.

Actually, we could have gotten this answer much more easily though perhaps not as satisfyingly simply by asking that the equation ct=x is frame independent. ie that light travels at the same speed in all frames. This is equivalent to saying that -ct+x=-ct'+c', or in higher dimensions (and from one point to another that -c^2(t_1-t_2)^2+(x_1-x_2)^2+(y_1-y_2)^2+(z_1-z_2)^2 is invariant. The group of transformations which leave this invariant are the Poincare group. The Lorentz group are the group of rotations in spacetime in the Poincare group (which includes boosts and spatial rotations).

How clear is this post?