Clasical Mechanics Lagrange

Short Option 7

Classical Mechanics

Prof. J. J. Binney

Michaelmas Term 2005

Revised Syllabus

The calculus of variations & Euler–Lagrange equations; principle of least action; equations of motionin strange coordinates; rigid-body dynamics; motion in an electromagnetic field; applications to nor-mal modes; symmetries and Noether’s theorem; constrained systems (including Lagrange multipliers).Hamiltonian dynamics: Legendre transformations; Hamilton’s equations; applications to harmonic os-cillator, rotating coordinates, motion in an e.m. field; Liouville’s theorem; Hamilton’s principle andconnection with quantum mechanics; Poisson brackets; canonical coordinates; generators of canonicalmaps, symmetries and conserved quantities; canonical transformations; point transformations; action-angle coordinates; Hamilton–Jacobi equation; derivation from quantum mechanics; phase-space volumesand canonical coordinates.

Books

T.W.B. Kibble, Classical Mechanics, Longman Scientific (about £18):overall the most suitable book.

L.D. Landau & E. Lifshitz, Classical Mechanics, IoP Publishing: one ofthe best books in the classic series on theoretical physics.

Tai L. Chow, Classical Mechanics, Wiley, (about £23): a useful sourceof additional information but marred by too many typos.

Oliver Johns, Analytical Mechaniucs for Relativity and Quantum Me-

chanics, OUP. As well as elementary stuff this book contains much thatgoes beyond the course but should be found stimulating. For your collegelibrary?

V.I. Arnold, Mathematical Methods of Mechanics, Springer: a uniquelyinsightful book but too sophisticated for most undergraduates.

1 Lagrangian Mechanics

Mechanics as formulated by Newton suffers from two important limitations: (i) it deals with par-ticles; (ii) it describes their motion in special Cartesian coordinate systems: if the numbers xi are thecoordinates of a particle in an inertial Cartesian coordinate system, then the position of the particlewhen subjected for a force with components fi(t) may be determined by solving the differential equa-tions xi = fi(t). Since an extended body can be decomposed into its consituent particles, and thechain rule can be used to transform the equations of motion from Cartesian coordinates to any referenceframe, Newton’s machinery enables us to determine the motion of any body in any reference framenotwithstanding these limitations. But in practice it is better to determine the dynamics of complexdynamical systems from a more powerful principle than Newton’s laws of motion. Lagrangian dynamicsprovides just such a principle.

Let qi i = 1, . . . , N be generalized coordinates for some system. That is, these N numbersenable us to specify precisely the system’s configuration. For example, six numbers suffice to specify aconfiguration of a rigid body such as a hard-boiled egg: we can take (q1, q2, q3) to be the coordinatesin some system, such as spherical polar coordinates, of the body’s centre of mass, and (q4, q5, q6) to bethe three angles that are required to define its orientation. (Box 1 defines Euler angles, the standardangles for specifying the orientation of a rigid body.) The number of generalized coordinates N requiredby a system is called the system’s number of degrees of freedom.

At each instant our system is at some point in configuration space – an imaginary N -dimensionalspace for which the qi constitute Cartesian coordinates. As the system moves, its representative pointin configuration space sweeps out a path q(t). Since Newton’s laws of motion are 2nd order in time,we expect this path to be uniquely determined by specifying at some time t1 both q(t1) and q(t1). InLagrangian mechanics we take rather a different point of view: we do not specify q(t1) but insteadspecify q at a second time t2. That is, we ask what path does our system follow if its configuration attime t1 is q(t1) and at time t2 is q(t2)? For reasons that give deep insight into the connection betweenclassical and quantum mechanics, it turns out that the sought-after path q(t) is the path that extremizesa certain quantity S. Our next task is to introduce the mathematical machinery required to define Sand to show that it is extremized on the Newtonian path. At the end of the course we shall investigatethe connection between the extremization of S and quantum mechanics.

1.1 Paths, functionals & the calculus of variations

Before a ’plane takes off from New York for London, its computer chooses an optimal path x(t); i.e.,it finds that sequence of longitudes, latitudes and altitudes at each moment t of the flight which, givenprevailing winds, will get it to London at the prescribed time with least expenditure of fuel. The quantityof fuel required to get to London in a given time is a single number F that depends on the whole pathx(t); one says that F is a functional F [x] of the path x(t).

The simplest functionals are integrals along the path of functions of x(t) and its derivatives withrespect to t:

F1[x] ≡∫ t2

t1

|x(t)|2 dt

F2[x] ≡∫ t2

t1

|x(t)|2 dt

F3[x] ≡∫ t2

t1

x · x(t) dt

· · · · · ·

How do we find the path that minimizes a functional

F [x] ≡∫ t2

t1

f(x, x, t) dt ? (1.1)

2 Calculus of Variations

Let x(t) be the minimizing path and let η(t) be a small variation, so that x(t) ≡ x(t)+η(t) ≈ x(t). Weinsist on η vanishing at t = t1, t2 so that x(t) and the modified path both start and finish at the sameplaces at the same times. Then1

F [x] ≤ F [x] =∫ t2

t1

f(x+ η, x+ η, t) dt

=

∫ t2

t1

(f(x, x, t) +

∂f

∂x· η +

∂f

∂x· η + · · ·

)dt

= F [x] +

∫ t2

t1

(∂f∂x· η +

∂f

∂x· η + · · ·

)dt.

(1.2)

We now integrate by parts the second term in the integral of the last line:

∫ t2

t1

∂f

∂x· η dt =

[∂f

∂x· η]t2

t1

−∫ t2

t1

d

dt

(∂f∂x

)· η dt. (1.3)

Since η(t1) = η(t2) = 0, the [.] vanishes. Putting this into (1.2) we have

0 ≤ F [x]− F [x] =∫ t2

t1

[(∂f∂x− d

dt

∂f

∂x

)· η + · · ·

]dt. (1.4)

This relation must hold for any η, no matter how small. So the higher terms indicated by + · · · canbe neglected. The remaining integrand is proportional to η, so if it were non-zero for some particularfunction η(t), it would have the opposite sign for η′ ≡ −η. The inequality on the extreme left wouldthen be violated for one of η of η′. Hence the integral must vanish for all η. This is possible only if thecoefficient of η vanishes for all t1 < t < t2: if it did not vanish for some t, say t′, the integral would failto vanish for the particular choice η = δ(t− t′). So x(t) minimizes F if and only if

d

dt

∂f

∂x− ∂f

∂x= 0 (1.5)

all along the path x(t).

Eq (1.5) is called the Euler-Lagrange equation (‘EL eqn’), and the theory that underlies it iscalled the calculus of variations. It is one of the few results we have in the theory of functionals—oneeverywhere in physics encounters problems that cry out for a fully fledged calculus of functionals thatshows how to integrate, Taylor expand, exponentiate etc functionals the way we do functions.

Legend has it that the calculus of variations was invented by Newton after dinner one evening tosolve this challenge problem (set in 1695 by Johann Bernoulli):

Example 1.1A bead slides on a smooth wire that passes through two rings, one at the origin, the other at(x′, y′, z′) = (x0, 0,−z0) with z0 > 0. To what curve (the ‘brachystochrone’) must the wire be bentin order to minimize the time required for the bead to slide from rest at the upper ring to the lowerring?

Solution: The optimal curve obviously lies in the plane y′ = 0. It is convenient to work incoordinates (x, y, z) such that z increases downwards. Then the time of flight is

τ =

∫ z0

0

dz

z.

1 We use the convention that y ·∂

∂x≡

∑

i

yi ·∂

∂xi.

1.2 The Principle of Least Action 3

But 12 (x

2 + z2) = gz, so z =√

2gz/[(dx/dz)2 + 1] and

τ =

∫ z0

0

dz√2gz

√(dxdz

)2+ 1. (1.6)

We need to minimize τ [x(z)] from (1.6) with respect to the path x(z). We may use the EL-eqn(1.5) provided we make the substitutions

t→ z, f(x,

dx

dz, z)=

1√2gz

√(dxdz

)2+ 1. (1.7)

Since f does not depend on x, the optimal path satisfies

0 =d

dz

(dx/dz

√z√

(dx/dz)2 + 1

),

which implies

x(z) =

∫ z

0

√Az

1−Az dz,

where A is a constant of integration. In terms of variable sin2 θ ≡ Az the answer is

x =1

A

(θ − 1

2 sin 2θ). (1.8)

If we write φ ≡ 2θ this may be written z = (1− cosφ)/2A, x = (φ− sinφ)/2A, which is a cycloidwith the origin at its cusp. A may be determined by first solving x0/z0 = (φ0− sinφ0)/(1− cosφ0)for φ0 and then using this value in A = 1

2 (1− cosφ0)/z0.

1.2 The Principle of Least Action

As was stated above, the path q(t) taken through configuration space by a dynamical system can befound by identifying the path that extremizes a quantity S[q(t)] between specified locations q(t1) andq(t2) of the system at given times t1, t2. S is called the action and is usually (but not invariably)minimized by the dynamical path. Hence the idea that the dynamical path can be determined byextremizing S is called the principle of least action.

S takes the form of an integral over t of a function L of q and q:

S =

∫ t2

t1

L(q, q) dt. (1.9)

Here L is just a function (rather than a functional) of its arguments. It is called the Lagrangian ofthe system. Since the dynamical evolution of the system is entirely determined by L, writing down Lamounts to specifying the physical content of the system.

There is no entirely general rule for writing down L – one would hardly expect one rule to be validfor every possible dynamical system – but there is a rule that works for most simple systems: L is thedifference between the system’s kinetic energy T and its potential energy V ;

L = T − V. (1.10)


Let’s se how this works out in a simple case: a particle of mass m moving in a gravitaional potentialΦ(x). Now T = 1

2mx2, V = mΦ. So L(x, x) = 1

2mx2 −mΦ(x). Setting f = L in the EL equations

(1.5) we obtain the equations of motion as

d

dtmx+m

∂Φ

∂x= 0 (1.11)

as required.

Exercise (1):Consider a shell that is fired at t1 and hits its target at t2. Explain in general terms why its actionwould be larger if it flew on either a higher or a lower trajectory than it actually does.

1.3 Equations of motion from Lagrangians

The Lagrangian provides a neat way of calculating the eqns of motion of a particle when referred toan odd coordinate system because it is easier to transform a single function to new-fangled coordinatesthat a set of eqns of motion. Consider, for example, motion in a rotating frame.

Suppose both primed and unprimed coordinates share the same origin, butthe primed coordinates rotate with constant angular velocity ω with respectto the unprimed coordinates, which are inertial. Then

vinertial = r′ + ω × r′.

So written in terms of the primed coordinates the k.e. is

T = 12mv

2 = 12m∣∣r′ + ω × r′

∣∣2

= 12m|r′|

2 +mr′ · (ω × r′) + 12m|ω × r′|2

(1.12)

The p.e. is just V (r′, t) so

L = 12m|r′|

2 +mr′ · (ω × r′) + 12m|ω × r′|2 − V. (1.13)

In writing down the EL-eqns we recall that r′ · (ω × r′) = r′ · (r′ × ω). We then find

0 =d

dt

∂L

∂r′− ∂L

∂r′

=d

dt(mr′ +mω × r′)−

[mr′ × ω +

∂

∂r′

(12m|ω × r′|2 − V

)].

(1.14)

Collecting everything together we have finally

mr′ = 2mr′ × ω − ∂Veff∂r′

where Veff ≡ V − 12m|ω × r′|2. (1.15)

In a rotating frame there is a contribution to the “acceleration” r′ from theCoriolis force 2mω×r′, andthe potential needs to be augmented by a term that gives rise to the centrifugal force rω2− (ω · r′)ω.Forces such as these, which appear because one’s frame is non-inertial, are called pseudo-forces.

A second example illustrates that Lagrangians work even for coordinates that depend explicitlyon time. In cosmology it is handy to use ‘comoving’ coordinates such that the spatial coordinates ofparticles that move apart as the Universe expands are constant. Let the primed system be inertial andthe unprimed system comoving. Then r′ = a(t)r, where a(t) is the cosmic scale factor. So

T = 12mr

′2 = 12m(ar+ ar)2. (1.16)

1.4 Lagrangian for a rigid body 5

Writing the potential energy as V = mΦ the EL eqns are

0 =d

dt

[m(ar+ ar)a

]−m(ar+ ar)a+m

∂Φ

∂r.

Cleaning up we get

r+ 2a

ar+

a

ar = − 1

a2∂Φ

∂r. (1.17)

A final example illustrates how to get T in a weird curvilinear coordinate system. Oblate spheroidalcoordinates (u, v, φ) are related to regular cylindrical polars (R, z, φ) by

R = ∆coshu cos v ; z = ∆sinhu sin v. (1.18)

Slightly changing u, v and φ in turn while leaving the othercoordinates alone, generates small displacements

δu = ∆δu(sinhu cos vR+ coshu sin vz)

δv = ∆δv(− coshu sin vR+ sinhu cos vz)

δφ = Rδφφ.

It is easy to check that these three displacement vectors are mutually perpendicular. So the distanceone goes on changing all of (u, v, φ) simultaneously is

ds2 = |δu + δv + δφ|2 = δ2u + δ2v + δ2φ

= ∆2[(δu)2(sinh2 u cos2 v + cosh2 u sin2 v)

+ (δv)2(cosh2 u sin2 v + sinh2 u cos2 v) + (δφ)2 cosh2 u cos2 v]

= ∆2(cosh2 u− cos2 v)[(δu)2 + (δv)2] + cosh2 u cos2 v(δφ)2

.

(1.19)

Dividing through by dt2 we get the kinetic energy in terms of (u, v, φ):

T = 12m(dsdt

)2= 1

2m∆2(cosh2 u− cos2 v)[u2 + v2] + cosh2 u cos2 vφ2

. (1.20)

The eqns of motion are therefore

m∆2

d

dt

[((cosh2 u− cos2 v)u

]− 1

2 sinh 2u(u2 + v2 + cos2 vφ2

)+∂V

∂u= 0

m∆2

d

dt

[((cosh2 u− cos2 v)v

]− 1

2 sin 2v(u2 + v2 − cosh2 uφ2

)+∂V

∂v= 0

m∆2

[d

dt

(cosh2 u cos2 vφ

)]+∂V

∂φ= 0.

1.4 Lagrangian for a rigid body

Lagrangian dynamics really comes into its own for the dynamics of a rigid body – that is an object suchas a spanner that contains a vast number N of particles that are so strongly coupled to each other thatwe may consider the distances between them to be fixed. In this approximation, the coordinates of everyparticle are known as soon as we have determined the six generalized coordinates that are required tospecify the position and orientation of the body. Mathematically, if ri is the position vector of the ithparticle, ri(q1, . . . , q6). Newton’s law of motion states that for i = 1, . . . , N

miri − Fi = 0 (1.21)


where Fi is the force on the ith particle. There are two contributions to Fi: any external force F(e)i and

the internal stress fi that keeps this particle in its allotted position relative to the other particles in thebody. Now we imagine instantaneously displacing the body such that ri → ri + δri. In view of (1.21)we have

0 =

N∑

i

(miri − Fi) · δri

=N∑

i

(miri − F(e)i − fi) · δri.

(1.22)

The contribution∑

i fi · δri = 0 because the internal stresses do no work (the body is rigid). So

0 =

N∑

i

(miri − F(e)i ) · δri.

Now the δri are not all independent – they arise from a displacement of the entire body so they arefunctions of six independent coordinates δq1, . . . , δq6. Hence we may write

0 =N∑

i=1

6∑

j=1

miri ·∂ri∂qj

δqj −6∑

j=1

Qjδqj , (1.23a)

where the generalized force Q is defined by

Qj ≡N∑

i=1

F(e)i ·

∂ri∂qj

. (1.23b

Since the δqj are all independent, (1.23a) implies that the coefficient of each δqj individually vanishes.That is

0 =N∑

i=1

miri ·∂ri∂qj−Qj . (1.24)

In Appendix I some rather intricate algebra is used to recast this equation into the form

0 =d

dt

( ∂T∂qj

)− ∂T

∂qj−Qj , (1.25)

where T = 12

∑imi|ri|2 is the body’s kinetic energy. When we specialize to the case in which Qj is

generated by a potential V , so Qj = −(∂V/∂qj), equation (1.25) is easily seen to be the EL equationfor L = T − V .

This analysis shows that we can obtain the equations of motion of any rigid body from the ELequations as soon as we have expressions for the body’s kinetic and potential energies in terms of anyset of independent coordinates. The analysis is easily extended to the case of a body that is made upof several rigid bodies that swivel or slide smoothly on one another.

Note:

Notice that the dimensions of the generalized force Qi are energy divided by those of qi. Thelatter is frequently dimensionless (because it is an angle, for example), so generalized forces don’tnecessarily have dimensions of force!

Let ρ(x) be the density of a rigid body that is rotating with angular velocity ω about the coordinateorigin. Then the body’s angular momentum about the origin is

J =

∫d3x ρx× (ω × x)

=

∫d3x ρ [x2ω − (ω · x)x].

(1.26)

1.5 Lagrangian for motion in an e.m. field 7

Box 1: Euler Angles

To specify the orientation of a rigid body, we imagine start-ing with the body axes bi aligned with the coordinate axesand then moving to an arbitrary orientation by compound-ing three rotations. We label the body axes b1, b2 and b3according to whether they start parallel to i, j or k. Nowwe rotate by φ about k, then we rotate by θ about the newposition of b1 and finally we rotate by ψ about the newposition of b3.

We rewrite this formula in tensor notation as

Ji =∑

j

Iijωj where Iij ≡∫

d3x ρ (x2δij − xixj). (1.27)

Here δij is the ij element of the identity matrix: it is zero if i 6= j, and unity if i = j. The matrix Idefined by (1.27) is the body’s moment of inertia tensor. Since it is a real symmetric matrix it hasreal eigenvalues Ii and eigenvectors bi. The bi are called body axes and the Ii are called principalmoments of inertia. When the body is rotated, the body axes rotate with it so they should be thoughtof as fixed within the body. According to (1.27), when the body spins such that its angular velocitylies along a body axis, its angular momentum is parallel to its angular velocity, and the proportionalityconstant between these two vectors is the appropriate principal moment of inertia.

The kinetic energy of our spinning body is

T = 12

∫d3x ρ |ω × x|2

= 12

∫d3x ρω · [x× (ω × x)]

= 12ω · I · ω.

(1.28)

This expression is especially simple in the body-axis frame:

T = 12

3∑

i=1

Iiω2i , (body-axis frame). (1.29)

If all three moments of inertia are different, evaluating T from (1.29) in terms of the derivatives ofEuler angles (Box 1) is tedious. So consider the case I1 = I2 of an axisymmetric body, such as a saucer.Since the Euler angle ψ is a rotation about the final position of b3, it is clear that ψ contributes ψb3to ω. Since I1 = I2 we can adopt any two mutually orthogonal vectors in the body’s equatorial planeas b1 and b2. So let’s choose b1 to be the axis about which we rotated through Euler angle θ. Then θcontributes θb1 to ω. An increment in φ rotates the system about k. This lies in the plane of b2 andb3 and is inclined at angle θ to b3. Hence φ contributes φ(cos θb3 − sin θb2) to ω. Adding all threecontributions together to form ω and substituting the result into (1.29) we find that the kinetic energyof an axisymmetric body is

T = 12I1(φ

2 sin2 θ + θ2) + 12I3(φ cos θ + ψ)2. (1.30)

The potential energy of an axisymmetric body can depend only on θ and is usually easy to writedown for any particular physical situation. Hence with (1.30) in hand the Lagrangian follows easily –see the problems.


1.5 Lagrangian for motion in an e.m. field

The simple rule L = T − V does not work for a charged particle that moves in a magnetic field B.To see this, recall that B does no work on the particle, so it contributes to neither T nor V . Henceit cannot appear in equations of motion that are derived from only T and V . We now show that thecorrect equations of motion follow from

L = 12mx

2 +Q(x ·A− φ), (1.31)

where Q is the particle’s charge, A(x, t) is the magnetic vector potential and φ(x, t) is the electrostaticpotential. Equation (1.31) gives the action as

S =

∫ [12mx

2 +Q(x ·A− φ)]dt, (1.32)

so the EL eqn isd

dt

(mx+QA

)+Q∇(φ− x ·A) = 0. (1.33)

Here the derivative w.r.t. t is along the path, so

dA

dt=∂A

∂t+ (x · ∇)A. (1.34)

The partial derivative here can be combined with the ∇φ term in (1.33) to produce the electric fieldE = −∇φ− ∂A/∂t. Putting all these things back into the EL eqn (1.33) yields

mx = Q[E+∇(x ·A)− (x · ∇)A

]. (1.35)

It’s now straightforward to show that the last two terms on the right of (1.35) equal x×B as one wouldhope: bearing in mind that ∇x = 0 we have

x×B = x× (∇×A)

= ∇(x ·A)− (x · ∇)A

Thus the EL eqn applied to the action (1.32) gives

mx = Q(E+ x×B) (1.36)

as required.

Note:

The action (1.32) looks rather arbitrary at this stage but is revealed to be beautifully natural whenone looks at the problem in a relativistically covariant way, as one should.

1.6 Normal modes from Lagrangians

Obviously, when a system is in equilibrium all its time derivatives vanish. From the EL eqns we inferthat equilibrium configurations correspond to ∂V/∂qi = 0, where qi is any coordinate. When disturbedfrom equilibrium, the system will zoom off if the equilibrium is unstable, or oscillate if the equilibrium isstable. Small amplitude oscillations can be represented as a superposition of normal modes. Lagrangiansprovide a relatively painless route to the frequencies and forms of these normal modes. The trick is toexpand L(q, q) in a Taylor series around the equilibrium configuration q = qs, q = 0, discarding termsof higher than second order in δq ≡ q− qs and its derivatives. Thus we write

L ' 12

∑

ij

(Mijδqiδqj + Cijδqiδqj + Fijδqiδqj

)+∑

i

Aiδqi + L0, (1.37)

1.6 Normal modes from Lagrangians 9

where M, C, F and A are constant matrices or vectors. Since Fij = ∂2L/∂qi∂qj , F is a symmetricmatrix, and the same applies to M.

Since the EL eqns involve only derivatives of L, we can discard the constant L0. It is also easy tocheck that the term involving A makes no net contribution to the equations of motion. Bearing in mindthe symmety of M and F, the EL equation of motion for qk is easily found to be

0 =∑

j

d

dt

(Mik qi +

12Cikqi

)−∑

i

(12Ckj qj + Fkjqj

)

=∑

i

[Mkiqi +

12 (Cik − Cki)qi − Fkiqi

].

(1.38)

These equations are easily solved by writing q(t) = Qeiωt, whence the eigenfrequencies ω are the rootsof

det(F+ ω2M+ iωC) = 0, (1.39)

where Cij ≡ 12 (Cij − Cji) is the antisymmetric part of C. When the dynamics is time-reversible, as is

usually the case when we are neither using a rotating frame nor working with a magnetic field, C = 0.The equilibrium is stable iff all allowed values of ω2 are positive, i.e., all eigenfrequencies are real.

For simplicity we now consider the case in which C = 0.

By expanding V (q) around the stationary point qs corresponding to an equilibrium configurationand plugging the expansion into the EL eqns, one sees that the equilibrium is stable if qs is a localminimum of V , and unstable otherwise.

1.6.1 Normal coordinates Let Qα be a vector that satisfies the eigenvalue equation

(F+ ω2αM)Qα = 0.

When we dot this equation through by another eigenvector, Qβ , we find

QβFQα = −ω2αQβMQα. (1.40)

The equation holds if the labels α and β are interchanged. Moreover, by the symmetry of F and M,QβFQα = QαFQβ and similarly for M. So when we subtract from (1.40) the equation with α and βinterchanged we obtain

0 = (ωβ − ωα)QαMQβ . (1.41)

It now follows that QαMQβ = 0 for ωβ 6= ωα, so if the eigenvectors are appropriately normalized

QαMQβ = δαβ . (1.42)

The general solution of the EL eqns (1.38) with C = 0 can now be written

q(t) =N∑

α=1

aαQα cos(ωαt+ φα), (1.43)

where the aα and φα are 2N arbitrary constants. Premultiplying by QβM we find with (1.42) that

QβMq(t) = aβ cos(ωβt+ φβ). (1.44)

For each possible value of β the left side of this equation is a particular linear combination of the originalcoordinates, and the right side shows that this combination oscillates sinusoidally at angular frequencyωβ regardless how the system is set into motion. A combination of the coordinates that inevitablyoscillates sinusoidally is called a normal coordinate.


Example 1.2

The governor of a steam engine contains two balls of mass m that aremounted on light rods, and these are in turn attached to a vertical axis.The plane of the rods rotates at constant angular velocity Ω about thevertical axis. A spring connects the two rods in such a way that thepotential energy stored in the spring is 1

2k times the square of thedistance between the centres of the balls. Find a point of equlibriumand determine the frequencies of the normal modes.

Solution: Application of the cosine law to the triangle formed by the balls and their point ofsuspension shows that the potential energy is

V = −mga(cosφ+ cos θ) + 12ka

2[2− 2 cos(φ+ θ)

]

Subtracting this from the kinetic energy, we find that

L = 12ma

2(φ2 + θ2) + 12ma

2Ω2(sin2 φ+ sin2 θ) +mga(cosφ+ cos θ)− ka2[1− cos(φ+ θ)

].

By the system’s symmetry, there is a point of equilibrium with φ = θ = θ0. Setting to zero ∂L/∂θevaluated at this point, we find the equlibrium point to satisfy

0 = ma2Ω2 sin θ0 cos θ0 −mga sin θ0 − ka2 sin 2θ0 ⇒

sin θ0 = 0 or

cos θ0 =ω2g

Ω2 − 2ω2s,

where ω2g ≡ g/a, ω2

s ≡ k/m. At (θ0, θ0) the second derivatives of L are

∂2L

∂θ2= (mΩ2 − k)a2 cos 2θ0 −mga cos θ0

∂2L

∂φ2= (mΩ2 − k)a2 cos 2θ0 −mga cos θ0

∂2L

∂θ∂φ= −ka2 cos 2θ0

Hence the equationsd

dt

(∂L∂θ

)=∂2L

∂θ2δθ +

∂2L

∂θ∂φδφ etc. that govern the normal modes are

(δθδφ

)=

(x yy x

)(δθδφ

)where

x = (Ω2 − ω2s) cos 2θ0 − ω2g cos θ0y = −ω2

s cos 2θ0(1.45)

The normal frequencies ω are given by the eigenvalues of the matrix: ω2 = −x ± y The lowestsquared frequency, ω2

g cos θ0 − Ω2 cos 2θ0, is negative for Ω2 > ω2g cos θ0/ cos 2θ0, which indicatesthat the system is unstable for large Ω.

Example 1.3A cylinder of mass m and radius a rolls on a rough horizontal table. A second cylinder, mass m andradius 1

2a rolls inside the first. Find the normal frequencies for small disturbances from equilibrium.

1.7 Noether’s theorem 11

Solution: Let θ be the angle through which the first cylinder has turned from equilibrium, and φbe the angle through which the second cylinder has rolled relative to the first (see figure). Thenthe line between the two centres makes an angle

ψ = θ − 12φ (1.46)

with the vertical. The kinetic energy of the first cylinder (translational plus rotational) is

T1 = 12m(aθ)2 + 1

2ma2θ2 = m(aθ)2. (1.47)

The motion of the centre of the second cylinder is a compound of the leftward motion aθ of thecentre of the first cylinder, plus 1

2aψ perpendicular to the line joining the centres. The second

cylinder rotates with respect to inertial space at angular velocity φ + ψ. The total kinetic energyis therefore

T = m(aθ)2 + 12m[( 12aψ cosψ − aθ)2 + ( 12aψ sinψ)2

]+ 1

2m(a/2)2(φ+ ψ)2. (1.48)

The potential energy is simplyV = −mg 12a cosψ. (1.49)

In T , which is quadratic in the velocities, we set ψ = 0. We expand V to second order in ψ, to find

T = 12ma

2( 52 θ2 + 1

2 θφ+ 18 φ

2),

V = constant + 14mga(θ − 1

2φ)2.

(1.50)

Defining ω0 ≡√g/a the equations of motion become

5θ + 12 φ+ ω20(θ − 1

2φ) = 0,

12 θ +

14 φ− 1

2ω20(θ − 1

2φ) = 0.(1.51)

The eigenfrequencies are now straightforwardly found to be ω = 0 and ω =√2ω0.

1.7 Noether’s theorem

A constant of motion is any function C(q, q) that satisfies dC/dt = 0, where q(t) is a solution of theeqns of motion. For example, in a ‘conservative’ system, energy is conserved, so E(q, q) is a constant ofmotion. Finding a constant of motion is a big step towards obtaining a general solution of the equationsof motion.

In general, a system with N degrees of freedom q1, . . . , qN admits 2N − 1 independent constantsof motion. We show this by arguing that given (q, q) at any time t, the equations of motion allow us

to give the position and velocity (q(0), q(0)) at any reference time t0. Thus q(0)i = fi(q, q, t), where fi

is some function. Similarly, q(0)i = gi(q, q, t), where gi is another function. On eliminating t between

these 2N functions, we have 2N − 1 constants of motion.

It seldom happens that we can find 2N − 1 constants of motion—a rare exception is the case ofmotion in a Kepler potential V ∝ 1/r. In fact it turns out that essentially complete information aboutsolutions of the equations of motion can be extracted from N constants of motion. A system for whichN constants of motion can be found is said to be integrable.

A theorem proved by Emmy Noether (1882–1935) provides a powerful way of extracting constantsof motion from Lagrangians. Noether’s theorem involves identifying a flow in configuration space thatleaves L invariant. A ‘flow’ is an infinitesimal transformation

q→ q′ = q+dq(q)

dλδλ. (1.52)


For example, the transformation x→ x+ iδλ, is a flow.

A flow changes the path q(t) into the path q′(t) and thus changes the value of the Lagrangian attime t by

δL =∂L

∂q· δq+

∂L

∂q· δq. (1.53)

Notice that δq is well defined: δq =∂δq

∂q· q.

Invariance of L just means that L takes the same value at all points that are joined by the flow.Noether’s theorem states that if δL vanishes along the dynamically determined path, then

dq

dλ· ∂L∂q

(1.54)

is a constant of motion. Thus from the invariance of L under translation x→ x+ iδλ along the x-axis,Noether’s theorem deduces the constancy of

i · ∂L∂q

=∂L

∂x. (1.55)

For a particle moving in a velocity-independent potential this is just the x-momentum mx.

The proof of Noether’s theorem is simple. Equating to zero equation (1.53) for δL we have

0 = δL =∂L

∂q· δq+

∂L

∂q· δq. (1.56)

Using the EL eqns to eliminate ∂L/∂q this becomes

0 =d

dt

(∂L∂q

)· δq+

∂L

∂q· δq

=d

dt

(∂L∂q· δq

),

(1.57)

and the result follows on writing δq = (dq/dλ)δλ.

Consider the proof of conservation of angular momentum by Noether’s theorem. A rotation by δθabout the unit vector n changes x by δθn× x. So if L is invariant under this rotation, the following isa constant of motion:

J ≡ n× x · ∂L∂x

= n · x× ∂L

∂x.

(1.58)

For a particle moving in a velocity-independent potential this is just the component of mx× x parallelto n.

Example 1.4A certain system with coordinates x, y, and z has Lagrangian

L = 12 (m1x

2 +m1y2 +m2z

2) +A(t)z − 12k[(x− y)2 + (y − z)2 + (z − x)2

],

where m1, m2 and k are constants and A(t) is a given function of time. Obtain an expression forA(t)−A(0) in terms of the values of x, y and z at time t and at time zero.

Solution: L depends only on the difference between coordinates, so it is invariant under (x, y, z)→(x+ ε, y + ε, z + ε). The associated invariant is

∂L

∂x+∂L

∂y+∂L

∂z= m1(x+ y) +m2z +A(t) (1.59)

1.8 Constraints 13

soA(t)−A(0) = −m1(x− x0 + y − y0)−m2(z − z0). (1.60)

Here’s an application to motion in a uniform magnetic field B = Bk. Let’s choose A = (−By, 0, 0).Then by (1.31) L = 1

2mx2 −QByx is invariant under two flows: (i) x→ x+ iδλ and (ii) x→ x+ kδλ.

Hence we have two invariants

px ≡∂L

∂x= mvx −QBy ; pz ≡

∂L

∂z= mvz. (1.61a)

Choosing A = (0, Bx, 0) we find a third invariant for the same physical problem:

py ≡∂L

∂y= mvy +QBx. (1.61b

The physical meaning of pz is obvious, but what do px and py mean physically? Add them up:

P ≡ px + ipy = m(vx + ivy) +QB(ix− y)= mξ + iQBξ

where ξ ≡ x+ iy. (1.62)

Solving this first-order d.e. for ξ we find

ξ(t) = ξ(0)e−iωt − iP

mω, where ω ≡ QB

m(1.63)

is the Larmor frequency. It is now easy to see that the real and imaginary parts of P encode the y andx coordinates of the guiding centre around which the particle gyrates.

1.8 Constraints

Sometimes it is convenient to work with more coordinates than a system has degrees of freedom. Sup-pose, for example, that the system consists of a dumbell of length s that is free to slide on a smoothtable. This system has three degrees of freedom, namely the position of the centre of mass and theorientation of the dumbell. But we might prefer to describe the system in terms of the x and y coordsof the dumbell’s particles. These are not independent, but satisfy the constraint

(x1 − x2)2 + (y1 − y2)2 = s2. (1.64)

The dynamics of the system are obtained by extremizing the action subject to this constraint equation.Lagrange multipliers (Box 2) enable us to do this simply. We write the constraint equation as C(q) = 0and evaluate

0 = δS −∫

dt λδC

=

∫ t2

t1

dt∑

i

δqi

[ ∂L∂qi− d

dt

( ∂L∂qi

)+ λ

∂C

∂qi

].

(1.65)

Here λ(q, t) is an arbitrary function. As in Lagrange’s standard argument, we choose λ to ensure thatthe coefficient of one of the δqi vanishes, and then conclude from the independence of the remaining qithat their coefficients must vanish too. Hence we have for every i that

d

dt

( ∂L∂qi

)=∂L

∂qi− λ∂C

∂qi. (1.66)

Specifically for our dumbell example, L = 12m(v21 + v22), so the equations of motion are

mx1 = −2λ(x1 − x2)mx2 = 2λ(x1 − x2)

my1 = −2λ(y1 − y2)my2 = 2λ(y1 − y2)

. (1.67)


Box 2: Lagrange Multipliers

Suppose we are given the profit G(x, y, z) when we sell some food with amounts x, y and zof additives that are constrained by the health and safety regulations such that we must haveF (x, y, z) = 0, where F is a specified function. The regulations oblige us to manufacture a productwhose representative point in (x, y, z) space lies on the two-dimensional surface F = 0, and wemaximize our profit by finding the point on this surface at which G is biggest.

If we make small changes in the inputs, our profit changes by

dG =∂G

∂xdx+

∂G

∂ydy +

∂G

∂zdz. (B2.1)

Unfortunately, we are obliged to remain on the surface F = 0, so our changes have to satisfy

0 = dF =∂F

∂xdx+

∂F

∂ydy +

∂F

∂zdz. (B2.2)

We multiply this equation by an arbitrary function λ(x, y, z) and subtract the result from equation(B2.1). We then have

0 =(∂G∂x− λ∂F

∂x

)dx+

(∂G∂y− λ∂F

∂y

)dy +

(∂G∂z− λ∂F

∂z

)dz. (B2.3)

We now choose the function λ to make the coefficient of dz vanish – that is we set λ =(∂G/∂z)/(∂F/∂z). So we now have

0 =(∂G∂x− λ∂F

∂x

)dx+

(∂G∂y− λ∂F

∂y

)dy. (B2.4)

The changes dx and dy can be chosen independently because whatever values we adopt for thesevariables, the constraint (B2.2) will be satisfied for an appropriate value of dz. One allowed choiceis dx 6= 0 with dy = 0, and for this choice equation (B2.4) holds only if the coefficient of dxvanishes. Similarly, choosing to set dx = 0 with dy 6= 0 we infer that the coefficient of dy alsovanishes. We now have four equations that must hold at the point that maximizes G, namely

F = 0 =∂G

∂x− λ∂F

∂x; 0 =

∂G

∂y− λ∂F

∂y; 0 =

∂G

∂z− λ∂F

∂z. (B2.5)

In principle we can solve these four equations for the four unknowns: the values of x, y and z atthe stationary point, and the numerical value of the function λ at that point. This procedure wasinvented by the Lagrange, so λ is called a Lagrange multiplier.

Adding the lower to the upper equations we obtain the equations of motion of the centre of mass:R = 0, where R = 1

2 (r1 + r2). Dividing the top left equation by the bottom right equation and thebottom left equation by the top right equation and then subtracting the resulting equations, we obtainxy − xy = 0, where x ≡ x1 − x2 etc, which expresses conservation of the system’s angular momentum:ddt (xy − xy) = 0.

We shall see below that pi ≡ ∂L/∂qi is the momentum ‘conjugate’ to qi. Equation (1.66) expressesthe rate of change of pi as a sum of two generalized forces. The term ∂L/∂qi is simply minus thegradient of the potential that would be associated with the coordinates in the absence of the constraint.This vanishes in our dumbell example. The term −λ(∂C/∂qi) describes the force associated withmaintenance of the constraint. In the case of the dumbell, for example, we have that the tension T inits bar is given by

−T xs= Fx = mx1 = −2λx ⇒ T = 2λs. (1.68)

Introduction to Hamiltonian Dynamics 15

Example 1.5

A moped engine contains a vertically mounted piston of mass m that is cou-pled to a fly-wheel of moment of inertia I by a light connecting rod of lengthl. The system has only one degree of freedom but two natural coordinates, φand x. The constraint equation is

l2 = x2 + r2 − 2rx cosφ. (1.69)

The Lagrangian is

L = 12Iφ

2 + 12mx

2 −mgx. (1.70)

From (1.66) the equations of motion are

d

dt(mx) = −mg − λ(2x− 2r cosφ)

d

dt(Iφ) = −λ2rx sinφ.

(1.71)

Eliminating λ we find that x and φ satisfy the d.e.

mx+(cotφ

x− cosecφ

r

)Iφ+mg = 0. (1.72)

This should be solved inconjunction with the constraint (1.69).

Sometimes it is in principle possible to write the Lagrangian in terms of as many coordinates asthe system has degrees of freedom. In such a case the constraint is called holonomic. Clearly, theconstraint (1.64) of the dumbell is of this class, although in practice holonomic constraints will be morecomplex than (1.64) and correspondingly algebraically hard to eliminate.

Sometimes a constraint cannot be eliminated, even in principle. Such unavoidable constraints arecalled non-holonomic. The classic example of a non-holonomic constraint occurs in the problem ofa rough ball moving on a rough plane. Five natural coordinates for the problem comprise the (x, y)coordinmates of the ball’s centre together with three Euler angles to specify the ball’s orientation. Twoconstraints couple the velocities of these coordinates since if the ball is moving parallel to either axis,it must be rolling and therefore the Euler angles must be incrementing in a definite way. On the otherhand, it is not possible to eliminate any of these coordinates because it turns out that by rolling the ballto a chosen position, spinning it there about its point of contact with the plane and then rolling it back,one can arrange for any given values of the Euler angles to be associated with given values of (x, y).We can obtain equations of motion for the ball’s five coordinates by a straightforward generalizationof the formalism described above: we express the ball’s Lagrangian (its kinetic energy) as a functionof q = (x, y, φ, θ, ψ) and their derivatives and then extremize the action subject to the two constraintsCα(q, q) (α = 1, 2) on the positions and velocities.

2 Hamiltonian Dynamics

The Lagrangian of a dynamical system depends on 2N variables, the system’s N coordinates and Nvelocities. The 2N -dimensional space of initial conditions (q, q) is called phase space. The eqns ofmotion allow one to determine uniquely the system’s future and past from its present position in phasespace. Geometrically, through every point of phase space there runs a curve along which the systemevolves. These curves never intersect one another.

It turns out that (q,q) are not the ideal coordinates for phase space. The natural coordinates are(p,q), where

p ≡ ∂L

∂q(2.1)

16 Chapter 2: Hamiltonian Dynamics

Box 3: Legendre transforms

Let g(x) be a convex function, that is, a function such that g′′(x) > 0. Then the Legendretransform g(p) of g is defined by

g(p) ≡ xp− g(x) where x(p) is implicitly definedas the root for given p of

p =∂g

∂x. (B3.1)

The convexity of g guarantees that the equation defining x(p) can be solved for any p that liesbetween the maximum and minimum gradients of g. Thus g(p) is well defined. It is straightforwardto show that Legendre transforms are invertible. In fact a Legendre transform is its own inverse:g(x) = g(x).

It is often helpful to consider the function G(x, p) ≡ xp − g(x) of two independent variables(x, p). Graphically, G(x, p) is the vertical displacement at ordinate x between the straight liney = px and the upward curving graph of g(x):

The Legendre transform g(p) is the value of G at the point x(p) at which the curve runs parallelto the line. Since

∂G∂x

= p− ∂g

∂x, (B3.2)

x(p) is the value of x which extremizes G for given p, as is already evident from the figure.

is the momentum ‘conjugate to q’. Changing coordinates from q to p is analogous in thermodynamicsto replacing the volume V by the pressure P since P = −(∂U/∂V )S just as p = (∂L/∂q)q. We arereplacing a variable by the gradient of some function of that variable. Transformations of this type arecalled Legendre transforms – see Box 3. When in thermodynamics we eliminate V in favour of P , itis expedient to introduce a new function H(S, P ) ≡ U + PV . So here we introduce the Hamiltonian

H(p,q) ≡ p · q− L, (2.2)

where it is understood that q is to be eliminated in favour of q, p, and t using equation (2.1).

Example 2.1When the single degree of freedom of the moped of Example 1.5 is taken to be φ (that is, x isconsidered to be a function of φ), the momentum conjugate to φ is

pφ =(∂L∂φ

)

φ= Iφ+mx

∂x

∂φ. (2.3)

Differentiating the constraint eq first w.r.t. t and then w.r.t. φ we have

0 = 2x(x− r cosφ) + 2rx sinφ φ

0 =∂x

∂φ(x− r cosφ) + rx sinφ

(2.4)

Hence

pφ =

[I +m

( rx sinφ

x− r cosφ)2]

φ. (2.5)

Introduction to Hamiltonian Dynamics 17

The total derivative of the Hamiltonian is

dH = p · dq+ q · dp−(∂L

∂q

)

q,t

· dq−(∂L

∂q

)

q,t

· dq−(∂L

∂t

)

q,q

dt

= q · dp−(∂L

∂q

)

q,t

· dq−(∂L

∂t

)

q,q

dt,

(2.6)

where the first and fourth terms cancel by (2.1). But we may also write

dH =

(∂H

∂p

)

q,t

· dp+

(∂H

∂q

)

p,t

· dq+

(∂H

∂t

)

q,p

dt. (2.7)

Since equations (2.6) and (2.7) must be the same, we have

q =

(∂H

∂p

)

q,t

;

(∂H

∂q

)

p,t

= −(∂L

∂q

)

q,t

;

(∂H

∂t

)

q,p

= −(∂L

∂t

)

q,q

. (2.8)

Using the EL eqns and simplifying the notation, the first two of these equations lead us to Hamilton’sequations

q =∂H

∂p; p = −∂H

∂q. (2.9)

Along a trajectory(q(t),p(t)

), the Hamiltonian H

(q(t),p(t), t

)changes at a rate

dH

dt=∂H

∂q· q+

∂H

∂p· p+

∂H

∂t=∂H

∂t. (2.10)

Hence, if ∂L/∂t = 0, it follows from equation (2.8) that the Hamiltonian is conserved along all dynamicaltrajectories. We can think of this as an extension of Noether’s theorem: the integral H arises from thetime-translation invariance of L.

For example, consider motion in the time-independent potential V (x). If we work in Cartesiancoordinates, the Lagrangian L = 1

2mx2 − V (x) depends only on x and x, so ∂L/∂t = 0. Hence the

Hamiltonian H is conserved. The physical quantity to which H corresponds is easily found. We havep = ∂L/∂x = mx and

H(x,p) = p · x− L

=p2

2m+ V (x),

(2.11)

which is simply the total energy E = k.e. + p.e.. Thus for motion in a fixed potential the Hamiltonianis equal to the total energy.

Consider an harmonic oscillator: a particle of mass m that oscillates at frequency ω. The energyof this system is 1

2mx2 + 1

2mω2x2, so

H =p2

2m+ 1

2mω2x2 (2.12)

and Hamilton’s equations are

p = −∂H∂x

= −mω2x ; x =∂H

∂p=

p

m(2.13)

We could solve these equations by differentiating the second equation w.r.t. t and use the first equation toeliminate p, but let’s have a little quantum-mechanics inspired fun. Consider the variable A = p+imωx,where the mω factor ensures that both terms have the same dimensions (notice that AA∗ = 2mH). A’sequation of motion is

A = p+ imωx = −mω2x+ iωp = iωA. (2.14)


Solving this trivial equation of motion yields

At = pt + imωxt = eiωt(p0 + imωx0)

A∗t = pt − imωxt = e−iωt(p0 − imωx0).

Adding and subtracting these equations, we obtain the complete solution:

pt = p0 cos(ωt)− ωmx0 sin(ωt) ; xt =p0ωm

sin(ωt) + x0 cos(ωt). (2.15)

What are p and H in a rotating frame? From (2.1) and (1.13) we have

p = m(r+ ω × r) (2.16)

which shows that p isn’t always the same as mq. In fact, here p is identical with mass times velocityin the underlying inertial frame.

Using (2.16) to eliminate r from (2.2) and (1.13) we find that the Hamiltonian for a rotating frameis

H = p ·( pm− ω × r

)− p2

2m+ V

=p2

2m+ V − ω · (r× p).

(2.17)

The first two terms sum to the energy in an underlying inertial frame, and the last term is ω · J, whereJ is the angular momentum. Unless V is axisymmetric [V = V (|ω× r|)], the energy in an inertial framechanges as V does work on the potential, but H is nonetheless constant.

Exercise (2):Show that in a rotating frame we may write H = 1

2m|r|2 − 12m|ω × r|2 + V. What is the physical

interpretation of the second term on the r.h.s?

From the Lagrangian (1.31) for non-relativistic motion in an e.m. field we find

p = mx+QA. (2.18)

Thus in an e.m. field p is not justmx. In Problem 6 of Set 2 you can explain this result by demonstratingthat the e.m. field contributes QA to p. In quantum mechanics the distinction between p and mx isof the utmost importance because it turns out that when one quantizes, it is p rather than mx thatshould be replaced by −ih∇.

Using (2.18) in (2.2) we find H for motion in an e.m. field is

H = (mx+QA) · x−[12m|x|

2 +Q(x ·A− φ)]

= 12m|x|

2 +Qφ

=1

2m|p−QA|2 +Qφ.

(2.19)

Although H is just what one would naıvely think of as the energy, when expressed in terms of p it looksodd.

2.1 Liouville’s theorem

If we imagine releasing a bunch of dynamically identical systems from neighbouring initial conditions,then the ‘phase points’ describing these systems flow through phase space like a fluid. This flow is

2.2 The Hamiltonian principle of least action 19

governed by Hamilton’s equations (2.9). It is an incompressible flow: the ‘velocity’ of the fluid is (p, q)and the divergence of this velocity is

div(p, q) =(∂p∂p

+∂q

∂q

)

=(− ∂2H

∂p∂q+

∂2H

∂q∂p

)= 0.

The divergence-freeness of the phase flow is known as Liouville’s theorem.

Let f be the probability density of systems in phase-space. Then conservation of probability requiresthat f obey the continuity equation

0 =∂f

∂t+ div

((p, q)f

)

=∂f

∂t+∂f

∂p· p+

∂f

∂q· q

=∂f

∂t− ∂f

∂p· ∂H∂q

+∂f

∂q· ∂H∂p

(2.20)

where Liouville’s theorem has been used. The continuity equation of f in either of the last two forms isknown as Liouville’s equation.

2.2 The Hamiltonian principle of least action

The principle of least action

0 = δS = δ

∫ t2

t1

dt L(q, q) (2.21)

is concerned with paths q(t) through coordinate space. We can derive classical mechanics from another,closely related, variational principle which involves paths

(p(t),q(t)

)through phase space rather than

coordinate space. This principle is that the path actually followed between (ti,qi) and (tf ,qf) is thatfor which

δS = 0 where S ≡∫

p · dq−H(p,q) dt. (2.22)

Here the path of integration runs between (ti,qi) and (tf ,qf) – neither p(ti) nor p(tf) is constrained.Showing that this principle yields Hamilton’s equations (2.9) is easy:

δS =

∫ (δp · q+ p · δq− ∂H

∂p· δp− ∂H

∂q· δq

)dt

=

∫ [(q− ∂H

∂p

)· δp−

(p+

∂H

∂q

)· δq

]dt+

[p · δq

]tfti.

(2.23)

Since δq vanishes at ti and tf by hypothesis, the final term in (2.23) vanishes. Then, with δp and δqsubject to arbitrary variation, it is clear that δS = 0 only if the contents of the pairs of large roundbrackets in (2.23) vanish. But the vanishing of brackets is precisely the content of Hamilton’s equations.

Notice that a very remarkable thing is being done with the variational principle (2.22): we aretreating p as quite independent of the value of q along the path. This makes perfectly good sense fromthe point of view of phase-space geometry, but it makes a mockery of our original definition (2.1) of p.This definition is recovered for the true path as a consequence of the variational principle (2.22):

q =∂H

∂p=

∂

∂p(p · q− L)

= q+(p− ∂L

∂q

)· ∂q∂p

.

(2.24)


Recall that we introduced H as p · q−L, with q eliminated in favour of p. Now that we are treatingp as independent of q, p · q −H becomes a quantity different from L; indeed, L depends only on theprojection of a phase-space path

(p(t),q(t)

)onto configuration space, while p · q−H depends on p(t)

as well as q(t). Thus the action principle (2.22) is entirely different from (2.21), although the extremalvalues of the two integrals are the same because along the extremal path p = ∂L/∂q.

In Appendix III (2.22) is derived from the Schrodinger equation. The basic idea is simple: from theSchrodinger equation we calculate the quantum amplitude to get from (ti,qi) to (tf ,qf) and show thatit can be expressed as a sum over all possible paths between these events of amplitudes proportional toeiS/h, where S is defined by (2.22). Then we argue that the only paths which make a net contributionto the overall amplitude are those whose values of S lie within ∼ h of a stationary value, since thecontributions of other paths are cancelled by oppositely signed contributions from neighbouring paths.Thus the overall amplitude is dominated by contributions from paths that lie within ∼ h of the classical,extremizing, path, and from a macroscopic point of view these paths are identical with the classical path.

2.3 Poisson brackets and canonical coordinates

Let A(q,p) and B(q,p) be any two functions of the phase-space coordinates. Then the Poissonbracket [A,B] is defined by

[A,B] ≡ ∂A

∂q· ∂B∂p− ∂A

∂p· ∂B∂q

. (2.25)

It is straightforward to verify the following properties of Poisson brackets:

(i) [A,B] = −[B,A] and [A+B,C] = [A,C] + [B,C],

(ii) [[A,B], C] + [[B,C], A] + [[C,A], B] = 0 (Jacobi identity),

(iii) The coordinates (q,p) satisfy the canonical commutation relations

[pi, pj ] = [qi, qj ] = 0 and [qi, pj ] = δij . (2.26)

(iv) Hamilton’s equations may be written

qi = [qi, H] ; pi = [pi, H]. (2.27)

If we write (wi ≡ qi, wN+i ≡ pi i = 1, . . . , N), and define the symplectic matrix c by

cαβ ≡ [wα, wβ ] =

±1 for β = α±N , 1 ≤ α, β ≤ 2N ;0 otherwise,

(2.28a)

we have

[A,B] =

2N∑

α,β=1

cαβ∂A

∂wα

∂B

∂wβ. (2.28b)

Any set of 2N phase-space coordinates Wα (α = 1, . . . , 2N) is called a set of canonical coordinatesif [Wα,Wβ ] = cαβ . Let Wα be such a set; then with equation (28b) and the chain rule we have

[A,B] =2N∑

α,β=1

cαβ∂A

∂wα

∂B

∂wβ=∑

κλ

(∑

αβ

cαβ∂Wκ

∂wα

∂Wλ

∂wβ

)∂A

∂Wκ

∂B

∂Wλ

=∑

κλ

[Wκ,Wλ]∂A

∂Wκ

∂B

∂Wλ=∑

κλ

cκλ∂A

∂Wκ

∂B

∂Wλ.

(2.29)

Thus the derivatives involved in the definition (2.25) of the Poisson bracket can be taken with respectto any set of canonical coordinates, just as the vector formula ∇ · a =

∑i(∂ai/∂xi) is valid in any

Cartesian coordinate system.

2.4 Canonical transformations 21

Box 4: Lorentz invariance & Symplectic structure

inertial coordinates ↔ canonical coordinates

Lorentz transformations ↔ canonical transformations

ηµν ↔ cαβ

Lorentz invariant |x|2 ↔∫∫

dp · dq (Poincare invariant)

The rate of change of an arbitrary canonical coordinate Wα along an orbit is

Wα =

2N∑

β=1

∂Wα

∂wβwβ , (2.30)

where, as usual, w ≡ (q,p). With Hamilton’s equations (2.27) and equation (2.29) this becomes

Wα =2N∑

β=1

∂Wα

∂wβ[wβ , H] =

∑

βγδ

∂Wα

∂wβcγδ

∂wβ∂wγ

∂H

∂wδ=∑

γδ

cγδ∂Wα

∂wγ

∂H

∂wδ

= [Wα, H].

(2.31)

Choosing to use the Wi as independent coordinates when evaluating the Poisson bracket, we find thatQi = ∂H/∂Pi, Pi = −∂H/∂Qi, so Hamilton’s equations (2.9) are valid in any canonical coordinatesystem.

Poisson brackets allow us to associate a one-parameter family of maps Bb of phase space onto itselfwith any function B(q,p) on phase space: from each point (q0,p0) of some (2N−1)-dimensional surfacein phase space we integrate the coupled ordinary differential equations

dq

db= [q, B] =

∂B

∂p,

dp

db= [p, B] = −∂B

∂q(2.32)

from the initial conditions q(0) = q0, p(0) = p0. If the initial (2N − 1)-surface is large enough, theintegral curves q(b),p(b) of B reach every point of phase space. Then the map Bb is defined by

Bb(q(b′),p(b′)) = (q(b+ b′),p(b+ b′)). (2.33)

The generator of the transformation, B(q,p), is indistinguishable from a Hamiltonian, since it satisfiesHamilton’s equations (2.32), with b playing the role of the time t.

In Lagrangian mechanics, invariance of the Lagrangian under a flow in configuration space givesrise to a conserved quantity (Noether’s thm). In Hamiltonian mechanics the analogue of a flow thatdoesn’t change the Lagrangian is a map Bb that doesn’t change the value of H. For Bb to have thisproperty, we must have

0 =dH

db=∂H

∂q· dqdb

+∂H

∂p· dpdb

=∂H

∂q· ∂B∂p− ∂H

∂p· ∂B∂q

= [H,B]. (2.34)

That is, the generator of a phase-space flow that leaves H invariant, has vanishing Poisson bracket withH (“commutes with H”).

The rate of change of B along our system’s trajectory is

dB

dt=∂B

∂q· q+

∂B

∂p· p =

∂B

∂q· ∂H∂p− ∂B

∂p· ∂H∂q

= [B,H] (2.35)

Thus B = 0 if and only if H is invariant under the flow that B generates. This Hamiltonian formulationof the connection between constants of motion and invariance under flows goes further than Noether’stheorem because it shows the every constant of motion is associated with a flow that leaves H invariant.


2.4 Canonical transformations

Suppose you have a function S(P,q) of some new variables Pi, i = 1, N and the regular coordinates qisuch that the equation

p =∂S

∂q(2.36a)

can be interpreted as defining P(p,q). Then it turns out that the coordinates (P,Q) are canonical,where

Q ≡ ∂S

∂P. (2.36b

That is, one may show (see Appendix II) that with these definitions, [Qi, Qj ] = 0, [Qi, Pj ] = δij ,[Pi, Pj ] = 0. The transformation (p,q) → (P,Q) is called a canonical transformation and S thegenerating function of the transformation.

The function that generates a canonical transformation need not be of the form S(P,q); other formsare S(P,p), S(Q,q) and S(Q,p). The generating function is always a function of one old coordinateand one new one. An entertaining transformation is generated by S = Q · q:

p =∂S

∂q= Q ; P =

∂S

∂Q= q. (2.37)

Canonical transformations are closely connected to the one-parameter maps introduced above. Tosee this consider functions S of the form

S = P · q+ s(P,q)δu, (2.38)

where δu¿ 1. For S of this form we have

Q = q+∂s

∂Pδu ; p = P+

∂s

∂qδu ⇒

P = p− ∂s

∂qδu.

(2.39)

Thus S = P · q generates the identity transformation P = p, Q = q. Moreover,

Q− q

δu=

∂s

∂PP− p

δu= − ∂s

∂q

(2.40)

In the limit δu→ 0 we can identify P with p on the right, and these equations become

dq

du= [q, s] ;

dp

du= [p, s], (2.41)

which is identical with (2.32). Thus canonical transformations generated by functions of the form (2.38)may be thought of as infinitesimal canonical maps.

There is no fundamental difference between a map and a coordinate transformation: every mapgenerates a coordinate transformation and every transformation a map since one can treat changedcoordinates as new numbers describing an old point (a coordinate change), or as old numbers describinga new point (a mapping).

2.6 Hamilton-Jacobi Equation 23

2.5 Point transformations

If (Qi(q), i = 1, . . . , N) are any N independent functions of the generalized coordinates q, then byequation (2.1) we obtain the new momenta Pi = (∂L/∂Qi) by expressing the Lagrangian as a functionL(Q, Q) of the Qi and their time derivatives. The coordinate change (q,p)→ (Q,P) is called a pointtransformation, because the new coordinates are functions only of the old. It is straightforward toshow that the new coordinates are canonical, by evaluating their Poisson brackets.

The importance of these results is that it is often convenient to work in curvilinear coordinates Qand derive the corresponding momenta P = (∂L/∂Q). Since the coordinates (Q,P) are canonical, thePoisson bracket (2.25) can be equally well evaluated by taking derivatives with respect to Q and P aswith respect to q and p. Hence all curvilinear coordinates have equal status in Hamiltonian mechanics.

Example 2.2A particle of mass m and charge Q1 moves in a bound orbit around a fixed charge Q2 in theplane perpendicular to a constant magnetic field B. Determine the system’s Hamiltonian in polarcoordinates (r, θ) on the orbital plane. Hence show that mr2θ + 1

2Q1r2B is constant on the orbit.

Solution: The vector potential can be written A = 12rBeθ. From (1.31) the Lagrangian is

L = 12m(r2 + r2θ2) +Q1

(rθ 12rB −

Q2

4πε0r

), (2.42)

so the momenta are

pr = mr pθ = mr2θ + 12Q1r

2B (2.43)

Finally, the Hamiltonian is

H(pr, pθ, r, θ) =p2r2m

+(pθ − 1

2Q1Br2)2

2mr2+Q1Q2

4πε0r.

The constancy of pθ follows because H is independent of θ. Notice that (2.43) is not simply thetranslation into polar coordinates of equation (2.19), which gives H in Cartsesian coordinates: whentranslating H from one coordinate system to another one must pass through the Lagrangian.

2.6 Hamilton-Jacobi Equation

Suppose we could find N constants of motion I1, . . . , IN . And suppose it were possible to find a systemof canonical coordinates (P,Q) such that Pi = Ii etc. Then the equations of motion for the P ’s wouldbe trivial,

0 = Pi = [Pi, H]

= − ∂H∂Qi

.(2.44)

and would demonstrate that H(P) would be independent of the Q’s. This last observation would allowus to solve the equations of motion for the Q’s: we would have

Qi =∂H

∂Pi≡ ωi, a constant ⇒ Qi(t) = Qi(0) + ωit. (2.45)

So everything would lie at our feet if we could find N constants of the motion and could embed theseas the ‘momenta’ of a system of canonical coordinates.2 The magic coordinates P ≡ I and Q are calledaction-angle coordinates, the I’s being the actions and the Q’s the angles.

2 Notice that to be able to embed the I’s as a set of momenta, we require [Ii, Ij ] = 0; functions satisfying this condition

are said to be ‘in involution’.


Let S(I,q) be the generating function of the transformation between regular coordinates (p,q) andaction-angle coordinates. Then we can use this to eliminate p = ∂S/∂q from H, expressing H as afunction of (I,q):

H(I,q) ≡ H(∂S∂q

,q). (2.46)

By moving on an orbit we can vary the qi pretty much at will while holding constant the Ii. As we varythe qi in this way H must remain constant at the energy E of the orbit in question. This suggests thatwe investigate the non-linear partial differential equation

H(∂S∂q

,q)= E, (Hamilton-Jacobi equation). (2.47)

If we can solve this equation, we identify the arbitrary constants on which the solution S(q) dependswith functions of the constants of motion Ii. For example, the H-J eqn for a free particle moving in twodimensions is

|∇S|22m

= E (2.48)

We write S(x) = Sx(x) + Sy(y) and solve (2.48) by separation of variables:

constant ≡ Ix =(∂S∂x

)2= 2mE −

(∂S∂y

)2≡ Iy. (2.49)

This example is very tame, but the technique works also for more complicated Hamiltonians that cannotbe solved by other means.

The similarity between the H-J eqn and the time-independent Schrodinger eqn is obvious. We canderive the H-J eqn from QM as follows. For simplicity we consider the special case of a particle thatmoves in a potential V (x). If the particle has well-defined energy, its wavefunction ψ(x) must satisfythe time-independent Schrodinger eqn Eψ = Hψ = (p2/2m+ V )ψ. Without loss of generality, we canwrite ψ = eiS/h, where S(x) is a possibly complex function of x. Then

p2ψ = −h2∇ ·(eiS/h

i∇Sh

)= eiS/h

(|∇S|2 − ih∇2S

). (2.50)

Since we are dealing with classical mechanics, we are interested in the limit h → 0. Then the secondterm in the bracket vanishes and the TISE becomes

0 =p2ψ

2m+ V ψ − Eψ = eiS/h

( |∇S|22m

+ V − E), (2.51)

which is just eiS/h times the H-J eqn. This derivation reveals that the generating function of the trans-formation from ordinary to action-angle coordinates is h times the phase of the particle’s wavefunction.When one passes from wave optics to geometrical optics, you neglect a term equivalent to that droppedfrom (2.50). Dropping this term is called making the eikonal approximation. The approximationis good when many wavelegths are contained within the smallest length within which ∇S changesappreciably.

2.7 Phase-space volumes

Often, for example when doing statistical mechanics, one needs a credible definition of ‘phase-spacevolume’. If one is using Cartesian coordinates to describe a system of n particles of mass mi, it isnatural to take the volume element to be dτ =

∏ni (m

3i d

3xid3vi). But it isn’t immediately obvious what

to use for dτ in a more complex case. In particular, if one decided to describe the system of particlesby some curvilinear coordinates q(x) and their conjugate momenta p, one would expect dτ to be of theform

dτ =

n∏

i=1

(∂(mivi,xi)

∂(pi,qi)d3pid

3qi

). (2.52)

Appendix II Proof that generating functions generate canonical transformations 25

One of the most beautiful and useful results in the subject is that the Jacobian here is just one. Infact, the Jacobian between any pair of canonical coordinates is always one. That is, the volume of anarbitrary region is

V =

∫∫

V

dNpdNq =

∫∫

V

dNPdNQ, (2.53)

where (p,q) and (P,Q) are any canonical coordinates.

Appendix I Derivation of equation (1.25)

Since the particle coordinates ri are functions of the six generalized coordinates qk, we have that

ri =

6∑

k=1

∂ri∂qk

qk ⇒ ri =

6∑

k,l=1

∂2ri∂ql∂qk

qlqk +

6∑

k=1

∂ri∂qk

qk, (I.1)

so (1.24) can be written

0 =N∑

i=1

mi

( 6∑

k,l=1

∂2ri∂ql∂qk

qlqk +6∑

k=1

∂ri∂qk

qk

)· ∂ri∂qj−Qj . (I.2)

By the chain rule the body’s k.e. is

T = 12

N∑

i=1

mi

∣∣∣∣6∑

k=1

∂ri∂qk

qk

∣∣∣∣2

, (I.3)

so

∂T

∂qj=∑

i

mi

( 6∑

k=1

∂ri∂qk

qk

)· ∂ri∂qj

(I.4)

and

d

dt

( ∂T∂qj

)=

N∑

i=1

mi

[(∑

kl

∂2ri∂ql∂qk

qlqk +∑

k

∂ri∂qk

qk

)· ∂ri∂qj

+

(∑

k

∂ri∂qk

qk

)·(∑

l

∂2ri∂ql∂qj

ql

)].

(I.5)

This expression for (d/dt)(∂T/∂qj) contains two of the terms that appear in equation (I.2). Its lastterm is unwanted. We can obtain an alternative expression for this unwanted term by calculating

∂T

∂qj=

N∑

i=1

mi

(∑

k

∂ri∂qk

qk

)·(∑

l

∂2ri∂qj∂ql

ql

). (I.6)

Substituting (I.6) into (I.5) and then using the result to simplify (I.2) we obtain (1.25).

Appendix II Proof that generating functions generate canonical transformations

We prove that given S(q,P), P and Q ≡ ∂S/∂P satisfy the canonical commutation relations. Fromthe chain rule we have that

∂

∂p

)

q

=

(∂P

∂p

)

q

· ∂∂P

)

q

∂

∂q

)

p

=∂

∂q

)

P

+

(∂P

∂q

)

p

· ∂∂P

)

q

.

(II.1)


Applying these formulae to pi and using ∂pi/∂P = ∂2S/∂qi∂P = ∂Q/∂qi yields

δij =

(∂P

∂pj

)

q

·(∂Q

∂qi

)

q

−(∂pi∂qj

)

P

=

(∂P

∂qj

)

p

·(∂Q

∂qi

)

q

(II.2)

Multiplying these equations together and summing over j we find

∑

kl

(∂Qk

∂qi

)

q

(∂Ql

∂qi′

)

q

[Pk, Pl] = −(∂pi∂qi′

)

P

+

(∂pi′

∂qi

)

P

= − ∂2S

∂qi′∂qi+

∂2S

∂qi∂qi′= 0.

(II.3)

Since the matrix ∂Qk/∂qi has an inverse by (II.2), this shows that [Pk, Pl] = 0.

Working again from equations (II.1) we have

[Qi, Pj ] =

(∂Qi

∂q

)

p

·(∂Pj∂p

)

q

−(∂Qi

∂p

)

q

·(∂Pj∂q

)

p

=

[(∂Qi

∂q

)

P

+

(∂Qi

∂P

)

q

·(∂P

∂q

)

p

]·(∂Pj∂p

)

q

−(∂Qi

∂P

)

q

·(∂P

∂p

)

q

·(∂Pj∂q

)

p

=

(∂Qi

∂q

)

P

·(∂Pj∂p

)

q

+

(∂Qi

∂P

)

q

· [P, Pj ]

=∂2S

∂Pi∂q·(∂Pj∂p

)

q

=

(∂p

∂Pi

)

q

·(∂Pj∂p

)

q

= δij .

(II.4)

Similarly,

[Qi, Qj ] =

[(∂Qi

∂q

)

P

+

(∂Qi

∂P

)

q

·(∂P

∂q

)

p

]·(∂Qj

∂p

)

q

−(∂Qi

∂P

)

q

·(∂P

∂p

)

q

·(∂Qj

∂q

)

p

=

(∂Qi

∂q

)

P

·(∂Qj

∂p

)

q

+

(∂Qi

∂P

)

q

· [P, Qj ]

=

(∂Qi

∂q

)

P

·(∂Qj

∂p

)

q

−(∂Qi

∂Pj

)

q

=∑

k

∂2S

∂Pi∂qk

(∂Qj

∂P

)

q

·(∂P

∂pk

)

q

−(∂Qi

∂Pj

)

q

.

But∂pk∂P

)

q

=∂2S

∂qk∂P, so

[Qi, Qj ] =∑

k

(∂Qj

∂Pl

)

q

(∂Pl∂pk

)

q

(∂pk∂Pi

)

q

−(∂Qi

∂Pj

)

q

=∂Qj

∂Pi− ∂Qi

∂Pj

=∂2S

∂Pi∂Pj− ∂2S

∂Pj∂Pi= 0. /

(II.5)

Appendix III Derivation of (2.22) from the Schrodinger equation 27

Appendix III Derivation of (2.22) from the Schrodinger equation

We start by finding the amplitude A12 to get from (t1,q1) to (t2,q2), where the interval t2− t1 is small.In Dirac’s notation, this amplitude is

A12 = 〈q2|ψ, t2〉, (III.1)

where |ψ, t2〉 is the ket into which |q1〉 has evolved at t2. In other words, |ψ, t2〉 is the solution of thetime-dependent Schrodinger equation (tdse) for initial condition |ψ, t1〉 = |q1〉. This is

|ψ, t2〉 = e−iH(t2−t1)/h|q1〉. (III.2)

Here the exponential is the operator with the same eigen-kets |En〉 as the Hamiltonian H, and eigenvaluesequal to eiEn(t2−t1)/h, where the En are the eigen-values of H. That is,

eiH(t2−t1)/h ≡∑

n

|En〉e−iEn(t2−t1)/h〈En|. (III.3)

(To prove that (III.2) satisfies the tdse, just substitute (III.3) into (III.2) and differentiate w.r.t. t2.)Our amplitude can now be written

A12 = 〈q2|e−iH(t2−t1)/h|q1〉

=

∫d3p〈q2|p〉〈p|e−iH(t2−t1)/h|q1〉,

(III.4)

where use has been made of the fact that∫d3p |p〉〈p| is just the identity operator since the states |p〉

of well-defined momentum form a complete set.

H and thus the function of it appearing in (III.4) is a function of the operators p and q. Let’sassume that every p has been positioned to the left of every q. Then every p can be considered to actto the left and be replaced by its eigen-value p, while every q acts similarly to the right. So the complex

number 〈p|e−iH(t2−t1)/h|q1〉 becomes simply

e−iH(t2−t1)/h〈p|q1〉 = e−iH(t2−t1)/he−ip·q1/h

(2πh)3/2, (III.5)

where H is the classical Hamiltonian evaluated at the classical phase-space point (p,q) and we haveused the fact that 〈p|q1〉 is just the complex conjugate of the wave-function of a particle of well-definedmomentum p. When we insert (III.5) into (III.4) and similarly replace 〈q2|p〉 by a plane wave, we find

A12 =1

h3

∫d3p exp

[ ih

(p · (q2 − q1)−H(t2 − t1)

)]. (III.6)

Equation (III.6) for the amplitude to get from one event to another is only valid for infinitesimal

t2 − t1. There are two issues: (i) H may be time-dependent; (ii) for finite τ the operator e−iHτ =1− iHτ + 1

2! (Hτ)2 + · · · involves high powers of H and so many reversals of the order of the operators

p and q will be required to ensure that the p’s are to the left of all q’s. In view of these objections weuse (III.6) only for small t2 − t1. Given two widely separated events (ti,qi) and (tf ,qf), we express theamplitude to pass between them by a particular path qi → q1 → . . .→ qf as the product

Ai1A12 × · · · ×Am,f (III.7))

of m amplitudes of the form (III.6) over small intervals (tj−1, tj). We then obtain the amplitude to passbetween (ti,qi) and (tf ,qf) by any path by summing (III.7) over all values of the intermediate positionsqj . The final amplitude is

Aif = limm→∞

1

h3m

∫ m∏

j

(d3pjd3qj) exp

[ ih

m∑

k

(pk · (qk+1 − qk)−H(tk+1 − tk)

)]

= constant×∫DpDq exp

[ ih

∫ (p · dq−H dt

)].

(III.8)


Here the symbol DpDq means one is to sum the integrand over all paths(p(t),q(t)

)which pass through

(ti,qi) and (tf ,qf).

Thus, as claimed in §2.2, the amplitude to get from (ti,qi) to (tf ,qf) is a sum over all paths ofeiS/h, where S is the classical action for that path. When |S| À h the contributions from paths thatdo not extremize S will cancel each other out to high precision, and the amplitude for the transition isdominated by the extremizing “classical” path.

Exercise (3):In (III.8) replace H with 1

2p2/m+ V (q) and dq by qdt. Then do the integration over every pj by

completing the square and using∫∞−∞

e−x2

dx =√π. Explain the relation of the resulting expression

for Aif to the Lagrangian principle of least action.

Clasical Mechanics Lagrange

Documents