-
Consciousness as a State of Matter
Max TegmarkDept. of Physics & MIT Kavli Institute,
Massachusetts Institute of Technology, Cambridge, MA 02139
(Dated: January 8, 2014)
We examine the hypothesis that consciousness can be understood
as a state of matter, perceptro-nium, with distinctive information
processing abilities. We explore five basic principles that
maydistinguish conscious matter from other physical systems such as
solids, liquids and gases: the infor-mation, integration,
independence, dynamics and utility principles. If such principles
can identifyconscious entities, then they can help solve the
quantum factorization problem: why do consciousobservers like us
perceive the particular Hilbert space factorization corresponding
to classical space(rather than Fourier space, say), and more
generally, why do we perceive the world around us asa dynamic
hierarchy of objects that are strongly integrated and relatively
independent? Tensorfactorization of matrices is found to play a
central role, and our technical results include a theoremabout
Hamiltonian separability (defined using Hilbert-Schmidt
superoperators) being maximized inthe energy eigenbasis. Our
approach generalizes Giulio Tononis integrated information
frameworkfor neural-network-based consciousness to arbitrary
quantum systems, and we find interesting linksto error-correcting
codes, condensed matter criticality, and the Quantum Darwinism
program, aswell as an interesting connection between the emergence
of consciousness and the emergence of time.
I. INTRODUCTION
What is the relation between the internal reality ofyour mind
and the external reality described by the equa-tions of physics?
The fact that no consensus answer tothis question has emerged in
the physics community liesat the heart of many of the most hotly
debated issuesin physics today. For example, how does quantum
fieldtheory with weak-field gravity explain the appearance ofan
approximately classical spacetime where experimentsappear to have
definite outcomes? Out of all of the pos-sible factorizations of
Hilbert space, why is the particularfactorization corresponding to
classical space so special?Does the quantum wavefunction undergo a
non-unitarycollapse when an observation is made, or are there
Ev-erettian parallel universes? Does the non-observabilityof
spacetime regions beyond horizons imply that theyin some sense do
not exist independently of the regionsthat we can observe? If we
understood consciousness as aphysical phenomenon, we could in
principle answer all ofthese questions by studying the equations of
physics: wecould identify all conscious entities in any physical
sys-tem, and calculate what they would perceive. However,this
approach is typically not pursued by physicists, withthe argument
that we do not understand consciousnesswell enough.
In this paper, I argue that recent progress in neuro-science has
fundamentally changed this situation, andthat we physicists can no
longer blame neuroscientistsfor our own lack of progress. I have
long contendedthat consciousness is the way information feels when
be-ing processed in certain complex ways [1, 2], i.e., thatit
corresponds to certain complex patterns in spacetimethat obey the
same laws of physics as other complex sys-tems, with no secret
sauce required. In the seminal pa-per Consciousness as Integrated
Information: a Provi-sional Manifesto [3], Giulio Tononi made this
idea morespecific and useful, making a compelling argument that
for an information processing system to be conscious, itneeds to
have two separate traits:
1. Information: It has to have a large repertoire ofaccessible
states, i.e., the ability to store a largeamount of
information.
2. Integration: This information must be integratedinto a
unified whole, i.e., it must be impossibleto decompose the system
into nearly independentparts.
Tononis work has generated a flurry of activity in
theneuroscience community, spanning the spectrum fromtheory to
experiment (see [4, 5] for recent reviews), mak-ing it timely to
investigate its implications for physics aswell. This is the goal
of the present paper a goal whosepursuit may ultimately provide
additional tools for theneuroscience community as well.
A. Consciousness as a state of matter
Generations of physicists and chemists have studiedwhat happens
when you group together vast numbers ofatoms, finding that their
collective behavior depends onthe pattern in which they are
arranged: the key differ-ence between a solid, a liquid and a gas
lies not in thetypes of atoms, but in their arrangement. In this
pa-per, I conjecture that consciousness can be understoodas yet
another state of matter. Just as there are manytypes of liquids,
there are many types of consciousness.However, this should not
preclude us from identifying,quantifying, modeling and ultimately
understanding thecharacteristic properties that all liquid forms of
matter(or all conscious forms of matter) share.
To classify the traditionally studied states of matter,we need
to measure only a small number of physical pa-
arX
iv:1
401.
1219
v1 [
quan
t-ph]
6 Ja
n 201
4
-
2ManyState of long-lived Information Easily Complex?matter
states? integrated? writable? dynamics?Gas N N N YLiquid N N N
YSolid Y N N NMemory Y N Y NComputer Y ? Y YConsciousness Y Y Y
Y
TABLE I: Substances that store or process information canbe
viewed as novel states of matter and investigated withtraditional
physics tools.
rameters: viscosity, compressibility, electrical conductiv-ity
and (optionally) diffusivity. We call a substance asolid if its
viscosity is effectively infinite (producing struc-tural
stiffness), and call it a fluid otherwise. We calla fluid a liquid
if its compressibility and diffusivity aresmall and otherwise call
it either a gas or a plasma, de-pending on its electrical
conductivity.
What are the corresponding physical parameters thatcan help us
identify conscious matter, and what are thekey physical features
that characterize it? If such param-eters can be identified,
understood and measured, thiswill help us identify (or at least
rule out) consciousnessfrom the outside, without access to
subjective intro-spection. This could be important for reaching
consen-sus on many currently controversial topics, ranging fromthe
future of artificial intelligence to determining whenan animal,
fetus or unresponsive patient can feel pain.If would also be
important for fundamental theoreticalphysics, by allowing us to
identify conscious observersin our universe by using the equations
of physics andthereby answer thorny observation-related questions
suchas those mentioned in the introductory paragraph.
B. Memory
As a first warmup step toward consciousness, let usfirst
consider a state of matter that we would character-ize as memory
what physical features does it have?For a substance to be useful
for storing information, itclearly needs to have a large repertoire
of possible long-lived states or attractors (see Table I).
Physically, thismeans that its potential energy function has a
large num-ber of well-separated minima. The information
storagecapacity (in bits) is simply the base-2 logarithm of
thenumber of minima. This equals the entropy (in bits)of the
degenerate ground state if all minima are equallydeep. For example,
solids have many long-lived states,whereas liquids and gases do
not: if you engrave some-ones name on a gold ring, the information
will still bethere years later, but if you engrave it in the
surface of apond, it will be lost within a second as the water
surfacechanges its shape. Another desirable trait of a memory
substance, distinguishing it from generic solids, is that itis
not only easy to read from (as a gold ring), but alsoeasy to write
to: altering the state of your hard drive oryour synapses requires
less energy than engraving gold.
C. Computronium
As a second warmup step, what properties should weascribe to
what Margolus and Toffoli have termed com-putronium [6], the most
general substance that can pro-cess information as a computer?
Rather than just re-main immobile as a gold ring, it must exhibit
complexdynamics so that its future state depends in some
com-plicated (and hopefully controllable/programmable) wayon the
present state. Its atom arrangement must beless ordered than a
rigid solid where nothing interest-ing changes, but more ordered
than a liquid or gas. Atthe microscopic level, computronium need
not be par-ticularly complicated, because computer scientists
havelong known that as long as a device can perform
certainelementary logic operations, it is universal: it can be
pro-grammed to perform the same computation as any othercomputer
with enough time and memory. Computervendors often parametrize
computing power in FLOPS,floating-point operations per second for
64-bit numbers;more generically, we can parametrize computronium
ca-pable of universal computation by FLIPS: the numberof elementary
logical operations such as bit flips that itcan perform per second.
It has been shown by Lloyd[7] that a system with average energy E
can perform amaximum of 4E/h elementary logical operations per
sec-ond, where h is Plancks constant. The performance oftodays best
computers is about 38 orders of magnitudelower than this, because
they use huge numbers of parti-cles to store each bit and because
most of their energy istied up in a computationally passive form,
as rest mass.
D. Perceptronium
What about perceptronium, the most general sub-stance that feels
subjectively self-aware? If Tononi isright, then it should not
merely be able to store and pro-cess information like computronium
does, but it shouldalso satisfy the principle that its information
is inte-grated, forming a unified and indivisible whole.
Let us also conjecture another principle that conscioussystems
must satisfy: that of autonomy, i.e., that in-formation can be
processed with relative freedom fromexternal influence. Autonomy is
thus the combinationof two separate properties: dynamics and
independence.Here dynamics means time dependence (hence
informa-tion processing capacity) and independence means thatthe
dynamics is dominated by forces from within ratherthan outside the
system. Just like integration, autonomyis postulated to be a
necessary but not sufficient condi-tion for a system to be
conscious: for example, clocks
-
3Principle DefinitionInformation A conscious system has
substantial
principle information storage capacity.Dynamics A conscious
system has substantial
principle information processing capacity.Independence A
conscious system has substantial
principle independence from the rest of the world.Integration A
conscious system cannot consist of
principle nearly independent parts.Utility A conscious system
records mainly
principle information that is useful for it.Autonomy A conscious
system has substantial
principle dynamics and independence.
TABLE II: Conjectured necessary conditions for conscious-ness
that we explore in this paper. The last one simply com-bines the
second and third.
and diesel generators tend to exhibit high autonomy, butlack
substantial information storage capacity.
E. Consciousness and the quantum factorizationproblem
Table II summarizes the candidate principles that wewill explore
as necessary conditions for consciousness.Our goal with isolating
and studying these principles isnot merely to strengthen our
understanding of conscious-ness as a physical process, but also to
identify simpletraits of conscious matter that can help us tackle
otheropen problems in physics. For example, the only propertyof
consciousness that Hugh Everett needed to assume forhis work on
quantum measurement was that of the infor-mation principle: by
applying the Schrodinger equationto systems that could record and
store information, heinferred that they would perceive subjective
randomnessin accordance with the Born rule. In this spirit, we
mighthope that adding further simple requirements such as inthe
integration principle, the independence principle andthe dynamics
principle might suffice to solve currentlyopen problems related to
observation.
In this paper, we will pay particular attention to whatI will
refer to as the quantum factorization problem:why do conscious
observers like us perceive the particu-lar Hilbert space
factorization corresponding to classicalspace (rather than Fourier
space, say), and more gener-ally, why do we perceive the world
around us as a dy-namic hierarchy of objects that are strongly
integratedand relatively independent? This fundamental problemhas
received almost no attention in the literature [9]. Wewill see that
this problem is very closely related to theone Tononi confronted
for the brain, merely on a largerscale. Solving it would also help
solve the physics-from-scratch problem [2]: If the Hamiltonian H
and the totaldensity matrix fully specify our physical world, howdo
we extract 3D space and the rest of our semiclassicalworld from
nothing more than two Hermitean matrices,
which come without any a priori physical interpretationor
additional structure such as a physical space, quan-tum
observables, quantum field definitions, an outsidesystem, etc.? Can
some of this information be extractedeven from H alone, which is
fully specified by nothingmore than its eigenvalue spectrum? We
will see that ageneric Hamiltonian cannot be decomposed using
tensorproducts, which would correspond to a decomposition ofthe
cosmos into non-interacting parts instead, there isan optimal
factorization of our universe into integratedand relatively
independent parts. Based on Tononiswork, we might expect that this
factorization, or somegeneralization thereof, is what conscious
observers per-ceive, because an integrated and relatively
autonomousinformation complex is fundamentally what a
consciousobserver is!
The rest of this paper is organized as follows. In Sec-tion II,
we explore the integration principle by quanti-fying integrated
information in physical systems, findingencouraging results for
classical systems and interestingchallenges introduced by quantum
mechanics. In Sec-tion III, we explore the independence principle,
findingthat at least one additional principle is required to
ac-count for the observed factorization of our physical worldinto
an object hierarchy in three-dimensional space. InSection IV, we
explore the dynamics principle and otherpossibilities for
reconciling quantum-mechanical theorywith our observation of a
semiclassical world. We discussour conclusions in Section V,
including applications ofthe utility principle, and cover various
mathematical de-tails in the three appendices. Throughout the
paper, wemainly consider finite Hilbert spaces that can be viewedas
collections of qubits; as explained in Appendix C, thisappears to
cover standard quantum field theory with itsinfinite Hilbert space
as well.
II. INTEGRATION
A. Our physical world as an object hierarchy
One of the most striking features of our physical worldis that
we perceive it as an object hierarchy, as illustratedin Figure 1.
If you are enjoying a cold drink, you per-ceive ice cubes in your
glass as separate objects becausethey are both fairly integrated
and fairly independent,e.g., their parts are more strongly
connected to one an-other than to the outside. The same can be said
abouteach of their constituents, ranging from water moleculesall
the way down to electrons and quarks. Let us quan-tify this by
defining the robustness of an object as theratio of the integration
temperature (the energy per partneeded to separate them) to the
independence tempera-ture (the energy per part needed to separate
the parentobject in the hierarchy). Figure 1 illustrates that all
ofthe ten types of objects shown have robustness of tenor more. A
highly robust object preserves its identity
-
4Object: Oxygen atomRobustness: 10Independence T: 1
eVIntegration T: 10 eV
Object: Oxygen nucleusRobustness: 105Independence T: 10
eVIntegration T: 1 MeV
Object: ProtonRobustness: 200Independence T: 1 MeVIntegration T:
200 MeV
Object: NeutronRobustness: 200Independence T: 1 MeVIntegration
T: 200 MeV
Object: ElectronRobustness: 1022?Independence T: 10
eVIntegration T: 1016 GeV?
Object: Down quarkRobustness: 1017?Independence T: 200
MeVIntegration T: 1016 GeV?
Object: Up quarkRobustness: 1017?Independence T: 200
MeVIntegration T: 1016 GeV?
Object: Hydrogen atomRobustness: 10Independence T: 1
eVIntegration T: 10 eV
Object: Ice cubeRobustness: 105Independence T: 3 mKIntegration
T: 300 K
Object: Water moleculeRobustness: 40Independence T: 300
KIntegration T: 1 eV
{mgh/kB~3mK permolecule
FIG. 1: We perceive the external world as a hierarchy of
objects, whose parts are more strongly connected to one anotherthan
to the outside. The robustness of an object is defined as the ratio
of the integration temperature (the energy per partneeded to
separate them) to the independence temperature (the energy per part
needed to separate the parent object in thehierarchy).
(its integration and independence) over a wide range
oftemperatures/energies/situations. The more robust anobject is,
the more useful it is for us humans to perceiveit as an object and
coin a name for it, as per the above-mentioned utility
principle.
Returning to the physics-from-scratch problem, howcan we
identify this object hierarchy if all we have to startwith are two
Hermitean matrices, the density matrix encoding the state of our
world and the Hamiltonian Hdetermining its time-evolution? Imagine
that we knowonly these mathematical objects and H and have
noinformation whatsoever about how to interpret the var-ious
degrees of freedom or anything else about them. Agood beginning is
to study integration. Consider, for
example, and H for a single deuterium atom, whoseHamiltonian is
(ignoring spin interactions for simplicity)
H(rp,pp, rn,pn, re,pe) = (1)
= H1(rp,pp, rn,pn) + H2(pe) + H3(rp,pp, rn,pn, re,pe),
where r and p are position and momentum vectors, andthe
subscripts p, n and e refer to the proton, the neutronand the
electron. On the second line, we have decom-posed H into three
terms: the internal energy of theproton-neutron nucleus, the
internal (kinetic) energy ofthe electron, and the electromagnetic
electron-nucleus in-teraction. This interaction is tiny, on average
involving
-
5much less energy than those within the nucleus:
tr H3
tr H1 105, (2)
which we recognize as the inverse robustness for a
typicalnucleus in Figure 3. We can therefore fruitfully
approx-imate the nucleus and the electron as separate objectsthat
are almost independent, interacting only weaklywith one another.
The key point here is that we couldhave performed this
object-finding exercise of dividingthe variables into two groups to
find the greatest indepen-dence (analogous to what Tononi calls the
cruelest cut)based on the functional form of H alone, without
evenhaving heard of electrons or nuclei, thereby identifyingtheir
degrees of freedom through a purely mathematicalexercise.
B. Integration and mutual information
If the interaction energy H3 were so small that wecould neglect
it altogether, then H would be decompos-able into two parts H1 and
H2, each one acting on onlyone of the two sub-systems (in our case
the nucleus andthe electron). This means that any thermal state
wouldbe factorizable:
eH/kT = eH1/kT eH2/kT = 12, (3)
so the total state can be factored into a product ofthe
subsystem states 1 and 2. In this case, the mutualinformation
I S(1) + S(2) S() (4)
vanishes, where
S() tr log2 (5)
is the Shannon entropy (in bits). Even for non-thermalstates,
the time-evolution operator U becomes separable:
U eiHt/~ = eiH1t/~eiH2t/~ = U1U2, (6)
which (as we will discuss in detail in Section III) impliesthat
the mutual information stays constant over time andno information
is ever exchanged between the objects. Insummary, if a Hamiltonian
can be decomposed withoutan interaction term (with H3 = 0), then it
describes twoperfectly independent systems.
Let us now consider the opposite case, when a sys-tem cannot be
decomposed into independent parts. Letus define the integrated
information as the mutual in-formation I for the cruelest cut (the
cut minimizingI) in some class of cuts that subdivide the system
intotwo (we will discuss many different classes of cuts be-low).
Although our -definition is slightly different from
Tononis [3]1, it is similar in spirit, and we are reusing
his-symbol for its elegant symbolism (unifying the shapesof I for
information and O for integration).
C. Maximizing integration
We just saw that if two systems are dynamically inde-pendent (H3
= 0), then = 0 at all time both for ther-mal states and for states
that were independent ( = 0)at some point in time. Let us now
consider the oppo-site extreme. How large can the integrated
information get? A as warmup example, let us consider the fa-miliar
2D Ising model in Figure 2 where n = 2500 mag-netic dipoles (or
spins) that can point up or down areplaced on a square lattice, and
H is such that they pre-fer aligning with their nearest neighbors.
When T , eH/kT I, so all n states are equally likely, alln bits are
statistically independent, and = 0. WhenT 0, all states freeze out
except the two degenerateground states (all spin up or all spin
down), so all spinsare perfectly correlated and = 1 bit. For
interme-diate temperatures, long-range correlations are seen
toexist such that typical states have contiguous spin-up
orspin-down patches. On average, we get about one bit ofmutual
information for each such patch crossing our cut(since a spin on
one side knows about at a spin on theother side), so for
bipartitions that cut the system intotwo equally large halves, the
mutual information will beproportional to the length of the cutting
curve. The cru-elest cut is therefore a vertical or horizontal
straight lineof length n1/2, giving n1/2 at the temperature
wheretypical patches are only a few pixels wide. We would
sim-ilarly get a maximum integration n1/3 for a 3D Isingsystem and
1 bit for a 1D Ising system.
Since it is the spatial correlations that provide the
in-tegration, it is interesting to speculate about whetherthe
conscious subsystem of our brain is a system near itscritical
temperature, close to a phase transition. Indeed,Damasio has argued
that to be in homeostasis, a num-ber of physical parameters of our
brain need to be keptwithin a narrow range of values [10] this is
preciselywhat is required of any condensed matter system to
benear-critical, exhibiting correlations that are
long-range(providing integration) but not so strong that the
wholesystem becomes correlated like in the right panel or in abrain
experiencing an epileptic seizure.
1 Tononis definition of [3] applies only for classical
systems,whereas we wish to study the quantum case as well. Our
ismeasured in bits and can grow with system size like an
extrinsicvariable, whereas his is an intrinsic variable akin
representing asort of average integration per bit.
-
6Morecorrelation
Too little Too muchOptimum UniformRandom
Lesscorrelation
FIG. 2: The panels show simulations of the 2D Ising model on a
50 50 lattice, with the temperature progressively decreasingfrom
left to right. The integrated information drops to zero bits at T
(leftmost panel) and to one bit as T 0(rightmost panel), taking a
maximum at an intermediate temperature near the phase transition
temperature.
D. Integration, coding theory and error correction
Bits cut off
Inte
grat
ed in
form
atio
n Hamming (8,4)-code
(16 8-bit strings)
16 random 8-bit strings
128 8-bit stringswith checksum bit
2 4 6 8
1
2
3
4
FIG. 3: For various 8-bit systems, the integrated informationis
plotted as a function of the number of bits cut off into
asub-system with the cruelest cut. The Hamming (8,4)-codeis seen to
give classically optimal integration except for a bi-partition into
4 + 4 bits: an arbitrary subset containing nomore than three bits
is completely determined by the remain-ing bits. The code
consisting of the half of all 8-bit stringswhose bit sum is even
(i.e., each of the 128 7-bit strings fol-lowed by a parity checksum
bit) has Hamming distance d = 2and gives = 1 however many bits are
cut off. A random setof 16 8-bit strings is seen to outperform the
Hamming (8,4)-code for 4+4-bipartitions, but not when fewer bits
are cut off.
Even when we tuned the temperature to the most fa-vorable value
in our 2D Ising model example, the inte-grated information never
exceeded n1/2 bits, whichis merely a fraction n1/2 of the n bits of
informationthat n spins can potentially store. So can we do
better?Fortunately, a closely related question has been
carefullystudied in the branch of mathematics known as
codingtheory, with the aim of optimizing error correcting
codes.Consider, for example, the following set of m = 16 bit
strings, each written as a column vector of length n = 8:
M =
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 10 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10
1 1 0 1 0 0 1 0 1 1 0 1 0 0 10 1 0 1 0 1 0 1 1 0 1 0 1 0 1 00 1 0 1
1 0 1 0 0 1 0 1 1 0 1 00 0 1 1 0 0 1 1 1 1 0 0 1 1 0 00 0 1 1 1 1 0
0 0 0 1 1 1 1 0 00 1 1 0 0 1 1 0 1 0 0 1 1 0 0 1
This is known as the Hamming(8,4)-code, and has Ham-ming
distance d = 4, which means that at least 4 bitflips are required
to change one string into another [11].It is easy to see that for a
code with Hamming distanced, any (d 1) bits can always be
reconstructed from theothers: You can always reconstruct b bits as
long as eras-ing them does not make two bit strings identical,
whichwould cause ambiguity about which the correct bit stringis.
This implies that reconstruction works when the Ham-ming distance d
> b.
To translate such codes of m bit strings of length ninto
physical systems, we simply created a state spacewith n bits
(interpretable as n spins or other two-statesystems) and construct
a Hamiltonian which has an m-fold degenerate ground state, with one
minimum cor-responding to each of the m bit strings in the code.In
the low-temperature limit, all bit strings will re-ceive the same
probability weight 1/m, giving an entropyS = log2m. The
corresponding integrated information of the ground state is plotted
in Figure 3 for a fewexamples, as a function of cut size k (the
number of bitsassigned to the first subsystem). To calculate for a
cutsize k in practice, we simply minimize the mutual infor-mation I
over all
(nk
)ways of partitioning the n bits into
k and (n k) bits.We see that, as advertised, the
Hamming(8,4)-code
gives gives = 3 when 3 bits are cut off. However,it gives only =
2 for bipartitions; the -value for bi-
-
7Bits cut off
Inte
grat
ed in
form
atio
n
256 r
andom
16-bi
t word
s
5 10 15
2
4
6
8
128 ra
ndom
14-bit
words
64 12-
bit wo
rds
32 10-
bit wo
rds
16 8-bit w
ords
8 6-bit wo
rds
4 4-bit words
FIG. 4: Same as for previous figure, but for random codeswith
progressively longer bit strings, consisting of a randomsubset
containing
2n of the 2n possible bit strings. For
better legibility, the vertical axis has been re-centered for
theshorter codes.
partitions is not simply related to the Hamming distance,and is
not a quantity that most popular bit string codesare optimized for.
Indeed, Figure 3 shows that for bipar-titions, it underperforms a
code consisting of 16 randomunique bit strings of the same length.
A rich and diverseset of codes have been published in the
literature, andthe state-of-the-art in terms of maximal Hamming
dis-tance for a given n is continually updated [12]. Althoughcodes
with arbitrarily large Hamming distance d exist,there is (just as
for our Hamming(8,4)-example above)no guarantee that will be as
large as d 1 when thesmaller of the two subsystems contains more
than d bits.Moreover, although Reed-Solomon codes are
sometimesbilled as classically optimal erasure codes (maximizingd
for a given n), their fundamental units are generallynot bits but
groups of bits (generally numbers modulosome prime number), and the
optimality is violated if wemake cuts that do not respect the
boundaries of these bitgroups.
Although further research on codes maximizing would be of
interest, it is worth noting that simple ran-dom codes appear to
give -values within a couple of bitsof the theoretical maximum in
the limit of large n, as il-lustrated in Figure 4. When cutting off
k out of n bits,the mutual information in classical physics clearly
can-not exceed the number of bits in either subsystem, i.e., kand n
k, so the -curve for a code must lie within theshaded triangle in
the figure. (The quantum-mechanicalcase is more complicated, and we
well see in the next sec-tion that it in a sense integrates both
better and worse.)The codes for which the integrated information is
plottedsimply consist of a random subset containing 2n/2 of the2n
possible bit strings, so roughly speaking, half the bitsencode
fresh information and the other half provide theredundancy giving
near-perfect integration.
Just as we saw for the Ising model example, these ran-dom codes
show a tradeoff between entropy and redun-dancy, as illustrated in
Figure 5. When there are n bits,how many of the 2n possible bit
strings should we use
2-logarithm of number of patterns used
Inte
grat
ed in
form
atio
n (b
its)
2 4 6 8 10 12 14
1
2
3
4
5
6
7
FIG. 5: The integrated information is shown for randomcodes
using progressively larger random subsets of the 214
possible strings of 14 bits. The optimal choice is seen to
beusing about 27 bit strings, i.e., using about half the bits
toencode information and the other half to integrate it.
to maximize the integrated information ? If we use mof them, we
clearly have log2m, since in classicalphysics, cannot exceed the
entropy if the system (themutual information is I = S1 + S2 S,
where S1 Sand S2 S so I S). Using very few bit strings istherefore
a bad idea. On the other hand, if we use all2n of them, we lose all
redundancy, the bits become in-dependent, and = 0, so being greedy
and using toomany bit strings in an attempt to store more
informa-tion is also a bad idea. Figure 5 shows that the
optimaltradeoff is to use
2n of the codewords, i.e., to use half
the bits to encode information and the other half to in-tegrate
it. Taken together, the last two figures thereforesuggest that n
physical bits can be used to provide aboutn/2 bits of integrated
information in the large-n limit.
E. Integration in physical systems
Let us explore the consequences of these results forphysical
systems described by a Hamiltonian H and astate . As emphasized by
Hopfield [13], any physicalsystem with multiple attractors can be
viewed as an in-formation storage device, since its state
permanently en-codes information about which attractor it belongs
to.Figure 6 shows two examples of H interpretable as po-tential
energy functions for a a single particle in two di-mensions. They
can both be used as information storagedevices, by placing the
particle in a potential well andkeeping the system cool enough that
the particle staysin the same well indefinitely. The egg crate
potentialV (x, y) = sin2(pix) sin2(piy) (top) has 256 minima
andhence a ground state entropy (information storage ca-pacity) S =
8 bits, whereas the lower potential has only16 minima and S = 4
bits.
The basins of attraction in the top panel are seen tobe the
squares shown in the bottom panel. If we writethe x and y
coordinates as binary numbers with bbits each, then the first 4
bits of x and y encode which
-
8FIG. 6: A particle in the egg-crate potential energy land-scape
(top panel) stably encodes 8 bits of information thatare completely
independent of one another and therefore notintegrated. In
contrast, a particle in a Hamming(8,4) poten-tial (bottom panel)
encodes only 4 bits of information, butwith excellent integration.
Qualitatively, a hard drive is morelike the top panel, while a
neural network is more like thebottom panel.
square (x, y) is in. The information in the remainingbits
encodes the location within this square; these bitsare not useful
for information storage because they canvary over time, as the
particle oscillates around a mini-mum. If the system is actively
cooled, these oscillationsare gradually damped out and the particle
settles towardthe attractor solution at the minimum, at the center
of itsbasin. This example illustrates that cooling is a
physicalexample of error correction: if thermal noise adds
smallperturbations to the particle position, altering the
leastsignificant bits, then cooling will remove these
perturba-tions and push the particle back towards the minimumit
came from. As long as cooling keeps the perturbationssmall enough
that the particle never rolls out of its basinof attraction, all
the 8 bits of information encoding itsbasin number are perfectly
preserved. Instead of inter-preting our n = 8 data bits as
positions in two dimen-sions, we can interpret them as positions in
n dimensions,where each possible state corresponds to a corner of
then-dimensional hypercube. This captures the essence ofmany
computer memory devices, where each bit is storedin a system with
two degenerate minima; the least sig-nificant and redundant bits
that can be error-correctedvia cooling now get equally distributed
among all the di-mensions.
How integrated is the information S? For the top panelof Figure
6, not at all: H can be factored as a tensorproduct of 8 two-state
systems, so = 0, just as fortypical computer memory. In other
words, if the particleis in a particular egg crate basin, knowing
any one of the
bits specifying the basin position tells us nothing aboutthe
other bits. The potential in the lower panel, on theother hand,
gives good integration. This potential retainsonly 16 of the 256
minima, corresponding to the 16 bitstrings of the
Hamming(8,4)-code, which as we saw gives = 3 for any 3 bits cut off
and = 2 bits for symmetricbipartitions. Since the Hamming distance
d = 4 for thiscode, at least 4 bits must be flipped to reach
anotherminimum, which among other things implies that no twobasins
can share a row or column.
F. The pros and cons of integration
Natural selection suggests that
self-reproducinginformation-processing systems will evolve
integration ifit is useful to them, regardless of whether they are
con-scious or not. Error correction can obviously be use-ful, both
to correct errors caused by thermal noise andto provide redundancy
that improves robustness towardfailure of individual physical
components such as neu-rons. Indeed, such utility explains the
preponderanceof error correction built into human-developed
devices,from RAID-storage to bar codes to forward error cor-rection
in telecommunications. If Tononi is correct andconsciousness
requires integration, then this raises an in-teresting possibility:
our human consciousness may haveevolved as an accidental by-product
of error correction.There is also empirical evidence that
integration is usefulfor problem-solving: artificial life
simulations of vehiclesthat have to traverse mazes and whose brains
evolve bynatural selection show that the more adapted they are
totheir environment, the higher the integrated informationof the
main complex in their brain [14].
However, integration comes at a cost, and as we willnow see,
near maximal integration appears to be pro-hibitively expensive.
Let us distinguish between the max-imum amount of information that
can be stored in a statedefined by and the maximum amount of
informationthat can be stored in a physical system defined by H.
Theformer is simply S() for the perfectly mixed (T = )state, i.e.,
log2 of the number of possible states (the num-ber of bits
characterizing the system). The latter canbe much larger,
corresponding to log2 of the number ofHamiltonians that you could
distinguish between givenyour time and energy available for
experimentation. Letus consider potential energy functions whose k
differentminima can be encoded as bit strings (as in Figure 6),and
let us limit our experimentation to finding all theminima. Then H
encodes not a single string of n bits,but a subset consisting of k
out of all 2n such strings, one
for each minimum. There are(2n
k
)such subsets, so the
information contained in H is
-
9S(H) = log2
(2n
k
)= log2
2n!
k!(2n k)!
log2(2n)k
kk= k(n log2 k) (7)
for k 2n, where we used Stirlings approximationk! kk. So crudely
speaking, H encodes not n bitsbut kn bits. For the near-maximal
integration givenby the random codes from the previous section, we
hadk = 2n/2, which gives S(H) 2n/2 n2 bits. For example,if the n
1011 neurons in your brain were maximallyintegrated in this way,
then your neural network wouldrequire a dizzying 1010000000000 bits
to describe, vastlymore information than can be encoded by all the
1089
particles in our universe combined.The neuronal mechanisms of
human memory are still
unclear despite intensive experimental and theoretical
ex-plorations [15], but there is significant evidence that thebrain
uses attractor dynamics in its integration and mem-ory functions,
where discrete attractors may be used torepresent discrete items
[16]. The classic implementa-tion of such dynamics as a simple
symmetric and asyn-chronous Hopfield neural network [13] can be
conve-niently interpreted in terms of potential energy func-tions:
the equations of the continuous Hopfield networkare identical to a
set of mean-field equations that mini-mize a potential energy
function, so this network alwaysconverges to a basin of attraction
[17]. Such a Hopfieldnetwork gives a dramatically lower information
contentS(H) of only about 0.25 bits for per synapse[17], andwe have
only about 1014 synapses, suggesting that ourbrains can store only
on the order of a few Terabytes ofinformation.
The integrated information of a Hopfield network iseven lower.
For a Hopfield network of n neurons, thetotal number of attractors
is bounded by 0.14n [17],so the maximum information capacity is
merely S log2 0.14n log2 n 37 bits for n = 1011 neurons. Evenin the
most favorable case where these bits are maxi-mally integrated, our
1011 neurons thus provide a measly 37 bits of integrated
information, as opposed toabout 5 1010 bits for a random
coding.
G. The integration paradox
This leaves us with an integration paradox: why doesthe
information content of our conscious experience ap-pear to be
vastly larger than 37 bits? If Tononis informa-tion and integration
principles from Section I are correct,the integration paradox
forces us to draw at least one ofthe following three
conclusions:
1. Our brains use some more clever scheme for encod-ing our
conscious bits of information, which allowsdramatically larger than
Hopfield networks.
2. These conscious bits are much fewer than we mightnaively have
thought from introspection, implyingthat we are only able to pay
attention to a verymodest amount of information at any instant.
3. To be relevant for consciousness, the definition ofintegrated
information that we have used must bemodified or supplemented by at
least one additionalprinciple.
We will see that the quantum results in the next sectionbolster
the case for conclusion 3.
The fundamental reason why a Hopfield network isspecified by
much less information than a near-maximallyintegrated network is
that it involves only pairwise cou-plings between neurons, thus
requiring only n2 cou-pling parameters to be specified as opposed
to 2n pa-rameters giving the energy for each of the 2n
possiblestates. It is striking how H is similarly simple for
thestandard model of particle physics, with the energy in-volving
only sums of pairwise interactions between parti-cles supplemented
with occasional 3-way and 4-way cou-plings. H for the brain and H
for fundamental physicsthus both appear to belong to an extremely
simple sub-class of all Hamiltonians, that require an unusually
smallamount of information to describe. Just as a system
im-plementing near-maximal integration via random codingis too
complicated to fit inside the brain, it is also toocomplicated to
work in fundamental physics: Since theinformation storage capacity
S of a physical system isapproximately bounded by its number of
particles [7] orby its area in Planck units by the Holographic
principle[8], it cannot be integrated by physical dynamics that
it-self requires storage of the exponentially larger informa-tion
quantity S(H) 2S/2 S2 unless the Standard ModelHamiltonian is
replaced by something dramatically morecomplicated.
An interesting theoretical direction for further
research(pursuing resolution 1 to the integration paradox)
istherefore to investigate what maximum amount of in-tegrated
information can be feasibly stored in a physi-cal system using
codes that are algorithmic (such as RS-codes) rather than random.
An interesting experimentaldirection would be to search for
concrete implementa-tions of error-correction algorithms in the
brain.
In summary, we have explored the integration prin-ciple by
quantifying integrated information in physicalsystems. We have
found that although excellent integra-tion is possible in
principle, it is more difficult in prac-tice. In theory, random
codes provide nearly maximalintegration, with about half of all n
bits coding for dataand the other half providing n bits of
integration),but in practice, the dynamics required for
implement-ing them is too complex for our brain or our
universe.Most of our exploration has focused on classical
physics,where cuts into subsystems have corresponded to parti-tions
of classical bits. As we will see in the next section,finding
systems encoding large amounts of integrated in-formation is even
more challenging when we turn to the
-
10
0.5 1.0 1.5 2.0
0.5
1.0
1.5
2.0
Entropy S
Mut
ual i
nfor
mat
ion
I
Possible onlyquantum-mechanically
(entanglement)
Possibleclassically
Bell pair
( ).4.4.2.0( ).91.03.03
.03
( )1/31/31/30
( )1000
( ).7.1.1.1
( )1/2001/2
( )1/21/200
( )1/41/41/41/4
( ).3.3.3.1
Quantum integrated
Unitary transform
ation
Unitary transformation
FIG. 7: Mutual information versus entropy for various
2-bitsystems. The different dots, squares and stars correspondto
different states, which in the classical cases are definedby the
probabilities for the four basis states 00, 01 10 and11. Classical
states can lie only in the pyramid below theupper black star with
(S, I) = (1, 1), whereas entanglementallows quantum states to
extend all the way up to the upperblack square at (0, 2). However,
the integrated information for a quantum state cannot lie above the
shaded green/greyregion, into which any other quantum state can be
broughtby a unitary transformation. Along the upper boundary ofthis
region, either three of the four probabilities are equal, orto two
of them are equal while one vanishes.
quantum-mechanical case.
III. INDEPENDENCE
A. Classical versus quantum independence
How cruel is what Tononi calls the cruelest cut, di-viding a
system into two parts that are maximally in-dependent? The
situation is quite different in classicalphysics and quantum
physics, as Figure 7 illustrates fora simple 2-bit system. In
classical physics, the state isspecified by a 22 matrix giving the
probabilities for thefour states 00, 01, 10 and 11, which define an
entropy Sand mutual information I. Since there is only one
pos-sible cut, the integrated information = I. The pointdefined by
the pair (S,) can lie anywhere in the pyra-mid in the figure, whos
top at (S,) = (1, 1) (blackstar) gives maximum integration, and
corresponds to per-fect correlation between the two bits: 50%
probability for00 and 11. Perfect anti-correlation gives the same
point.The other two vertices of the classically allowed region
are seen to be (S,) = (0, 0) (100% probability for a sin-gle
outcome) and (S,) = (2, 0) (equal probability for allfour
outcomes).
In quantum mechanics, where the 2-qubit state is de-fined by a 4
4 density matrix, the available area in the(S, I)-plane doubles to
include the entire shaded trian-gle, with the classically
unattainable region opened upbecause of entanglement. The extreme
case is a Bell pairstate such as
| = 12
(||+ ||) , (8)
which gives (S, I) = (0, 2). However, whereas there wasonly one
possible cut for 2 classical bits, there are now in-finitely many
possible cuts because in quantum mechan-ics, all Hilbert space
bases are equally valid, and we canchoose to perform the
factorization in any of them. Since is defined as I after the
cruelest cut, it is the I-valueminimized over all possible
factorizations. For simplicity,we use the notation where denotes
factorization in thecoordinate basis, so the integrated information
is
= minU
I(UU), (9)
i.e., the mutual information minimized over all possi-ble
unitary transformations U. Since the Bell pair ofequation (8) is a
pure state = ||, we can unitarilytransform it into a basis where
the first basis vector is|, making it factorizable:
U
12 0 0
12
0 0 0 00 0 0 012 0 0
12
U = 1 0 0 00 0 0 00 0 0 0
0 0 0 0
= ( 1 00 0)(
1 00 0
).
(10)This means that = 0, so in quantum mechanics, thecruelest
cut can be very cruel indeed: the most entangledstates possible in
quantum mechanics have no integratedinformation at all!
The same cruel fate awaits the most integrated 2-bit state from
classical physics: the perfectly correlatedmixed state = 12 || + 12
||. It gave = 1 bitclassically above (upper black star in the
figure), but aunitary transformation permuting its diagonal
elementsmakes it factorable:
U
12 0 0 00 0 0 00 0 0 00 0 0 12
U =
12 0 0 00 12 0 00 0 0 00 0 0 0
= ( 1 00 0)(
12 00 12
),
(11)so = 0 quantum-mechanically (lower black star in
thefigure).
B. Canonical transformations, independence andrelativity
The fundamental reason that these states are more sep-arable
quantum-mechanically is clearly that more cuts
-
11
are available, making the cruelest one crueler. Interest-ingly,
the same thing can happen also in classical physics.Consider, for
example, our example of the deuteriumatom from equation (1). When
we restricted our cuts tosimply separating different degrees of
freedom, we foundthat the group (rp,pp, rn,pn) was quite (but not
com-pletely) independent of the group (re,pe), and that therewas no
cut splitting things into perfectly independentpieces. In other
words, the nucleus was fairly indepen-dent of the electron, but
none of the three particles wascompletely independent of the other
two. However, if weallow our degrees of freedom to be transformed
beforethe cut, then things can be split into two perfectly
in-dependent parts! The classical equivalent of a
unitarytransformation is of course a canonical transformation(one
that preserves phase-space volume). If we performthe canonical
transformation where the new coordinatesare the center-of-mass
position rM and the relative dis-placements rp rp rM and re re rM ,
and cor-respondingly define pM as the total momentum of thewhole
system, etc., then we find that (rM ,pM ) is com-pletely
independent of the rest. In other words, the av-erage motion of the
entire deuterium atom is completelydecoupled from the internal
motions around its center-of-mass.
Thanks to relativity theory, this well-known decom-position into
average and relative motions is of coursepossible for any isolated
system. If two systems are com-pletely independent, then they can
gain no knowledge ofeach other, so a conscious observer in one will
be un-aware of the other. Conversely, we can view relativityas a
special case of this idea: an observer in an isolatedsystem has no
way of knowing whether she is at rest orin uniform motion, because
these are simply two differ-ent allowed states for the
center-of-mass system, whichis completely independent from the
internal-motions sys-tem of which her consciousness is a part.
C. How integrated can quantum states be?
We saw in Figure 7 that some seemingly integratedstates, such as
a Bell pair or a pair of classically per-fectly correlated bits,
are in fact not integrated at all.But the figure also shows that
some states are truly inte-grated even quantum-mechanically, with I
> 0 even forthe cruelest cut. How integrated can a quantum state
be?I have a conjecture which, if true, enables the answer tobe
straightforwardly calculated.
-Diagonality Conjecture (DC):The mutual information always takes
its min-imum in a basis where is diagonal
Although I do not have a proof of this conjecture, arigorous
proof of a closely related conjecture will be pro-
FIG. 8: When performing random unitary transformationsof a
density matrix , the mutual information appears to al-ways be
minimized when rotating into its eigenbasis, so that becomes
diagonal.
vided below for the special case of n = 2.2 Moreover,there is
substantial numerical support for the conjecture.For example,
Figure 8 plots the mutual information I for15,000 random unitary
transformations of a single ran-dom 22 matrix, as a function of the
off-diagonality, de-fined as the sum of the square modulus of the
off-diagonalcomponents. It is seen that the lower envelope forms
amonotonically increasing curve taking a minimum whenthe matrix is
diagonal, i.e., rotated into its eigenbasis.Similar numerical
experiments with a variety of n ndensity matrices up to n = 25
showed the same qualita-tive behavior. In the rest of this section,
we will assumethat the DC is in fact true and explore the
consequences;if it is false, then () will generally be even smaller
thanfor the diagonal cases we explore.
Assuming that the DC is true, the first step in com-puting the
integrated information () is thus to diago-nalize the nn density
matrix . If all n eigenvalues aredifferent, then there are n!
possible ways of doing this,corresponding to the n! ways of
permuting the eigenval-ues, so the DC simplifies the continuous
minimizationproblem of equation (9) to a discrete minimization
prob-lem over these n! permutations. Suppose that n = lm,and that
we wish to factor the m-dimensional Hilbertspace into factor spaces
of dimensionality l and m, sothat = 0. It is easy to see that this
is possible if then eigenvalues of can be arranged into a l m
matrixthat is multiplicatively separable (rank 1), i.e., the
prod-uct of a column vector and a row vector. Extracting
theeigenvalues for our example from equation (11) where
2 The converse of the DC is straightforward to prove: if =
0(which is equivalent to the state being factorizable; = 12),then
it is factorizable also in its eigenbasis where both 1 and 2are
diagonal.
-
12
l = m = 2 and n = 4, we see that(12
12
0 0
)is separable, but
(12 00 12
)is not,
and the only difference is that the order of the four num-bers
has been permuted. More generally, we see that tofind the cruelest
cut that defines the integrated infor-mation , we want to find the
permutation that makesthe matrix of eigenvalues as separable as
possible. It iseasy to see that when seeking the permutation
givingmaximum separability, we can without loss of generalityplace
the largest eigenvalue first (in the upper left corner)and the
smallest one last (in the lower right corner). Ifthere are only 4
eigenvalues (as in the above example),the ordering of the remaining
two has no effect on I.
D. The quantum integration paradox
We now have the tools in hand to answer the key ques-tion from
the last section: which state maximizes theintegrated information ?
Numerical search suggeststhat the most integrated state is a
rescaled projectionmatrix satisfying 2 . This means that some
num-ber k of the n eigenvalues equal 1/k and the remain-ing ones
vanish.3 For the n = 4 example from Fig-ure 7, k = 3 is seen to
give the best integration, witheigenvalues (probabilities) 1/3,
1/3, 1/3 and 0, giving = log(27/16)/ log(8) 0.2516.
For classical physics, we saw that the maximal at-tainable grows
roughly linearly with n. Quantum-mechanically, however, it
decreases as n increases!4
In summary, no matter how large a quantum systemwe create, its
state can never contain more than about aquarter of a bit of
integrated information! This exacer-bates the integration paradox
from Section II G, eliminat-ing both of the first two resolutions:
you are clearly awareof more than 0.25 bits of information right
now, and this
3 A heuristic way of understanding why having many equal
eigen-values is advantageous is that it helps eliminate the effect
of theeigenvalue permutations that we are minimizing over. If the
op-timal state has two distinct eigenvalues, then if swapping
themchanges I, it must by definition increase I by some finite
amount.This suggests that we can increase the integration by
bringingthe eigenvalues infinitesimally closer or further apart,
and repeat-ing this procedure lets us further increase until all
eigenvaluesare either zero or equal to some positive value.
4 One finds that is maximized when the k nonzero eigenvaluesare
arranged in a Young Tableau, which corresponds to a parti-tion of k
as a sum of positive integers k1 + k2 + ..., giving =S(p)+S(p)log2
k, where the probability vectors p and p aredefined by pi = ki/k
and p
i = k
i /k. Here k
i denotes the conju-
gate partition. For example, if we cut an even number of
qubitsinto two parts with n/2 qubits each, then n = 2, 4, 6, ...,
20 gives 0.252, 0.171, 0.128, 0.085, 0.085, 0.073, 0.056, 0.056,
0.051and 0.042 bits, respectively. This is if the diagonality
conjec-ture is true if it is not, then is even smaller.
quarter-bit maximum applies not merely to states of Hop-field
networks, but to any quantum states of any system.Let us therefore
begin exploring the third resolution: thatour definition of
integrated information must be modifiedor supplemented by at least
one additional principle.
E. How integrated is the Hamiltonian?
An obvious way to begin this exploration is to con-sider the
state not merely at a single fixed time t, butas a function of
time. After all, it is widely assumed thatconsciousness is related
to information processing, notmere information storage. Indeed,
Tononis original -definition [3] (which applies to classical neural
networksrather than general quantum systems) involves time,
de-pending on the extent to which current events affect fu-ture
ones.
Because the time-evolution of the state is determinedby the
Hamiltonian H via the Schrodinger equation
= i[H, ], (12)
whose solution is
(t) = eiHteiHt, (13)
we need to investigate the extent to which the cruelestcut can
decompose not merely but the pair (,H) intoindependent parts. (Here
and throughout, we often useunits where ~ = 1 for simplicity.)
F. Evolution with separable Hamiltonian
As we saw above, the key question for is whether itit is
factorizable (expressible as product = 1 2 ofmatrices acting on the
two subsystems), whereas the keyquestion for H is whether it is
what we will call addi-tively separable, being a sum of matrices
acting on thetwo subsystems, i.e., expressible in the form
H = H1 I + IH2 (14)
for some matrices H1 and H2. For brevity, we will oftenwrite
simply separable instead of additively separable. Asmentioned in
Section II B, a separable Hamiltonian Himplies that both the
thermal state eH/kT andthe time-evolution operator U eiHt/~ are
factorizable.An important property of density matrices which
waspointed out already by von Neumann when he inventedthem [18] is
that if H is separable, then
1 = i[H1, 1], (15)
i.e., the time-evolution of the state of the first subsystem,1
tr 2, is independent of the other subsystem and ofany entanglement
with it that may exist. This is easy to
-
13
prove: Using the identities (A16) and (A18) shows that
tr2
[H1 I, ] = tr2
[(H1 I)] tr2
[(H1 I)]= H1 tr
2[(I I)] + tr
2[(I I)]H1
= H11 1H2 = [H1, 1]. (16)Using the identity (A10) shows that
tr2
[IH2, ] = 0. (17)
Summing equations (16) and (17) completes the proof.
G. The cruelest cut as the maximization ofseparability
Since a general Hamiltonian H cannot be written inthe separable
form of equation (14), it will also include athird term H3 that is
non-separable. The independenceprinciple from Section I therefore
suggests an interest-ing mathematical approach to the
physics-from-scratchproblem of analyzing the total Hamiltonian H
for ourphysical world:
1. Find the Hilbert space factorization giving thecruelest cut,
decomposing H into parts with thesmallest interaction Hamiltonian
H3 possible.
2. Keep repeating this subdivision procedure for eachpart until
only relatively integrated parts remainthat cannot be further
decomposed with a smallinteraction Hamiltonian.
The hope would be that applying this procedure to theHamiltonian
of our standard model would reproduce thefull observed object
hierarchy from Figure 1, with the fac-torization corresponding to
the objects, and the variousnon-separable terms H3 describing the
interactions be-tween these objects. Any decomposition with H3 =
0would correspond to two parallel universes unable tocommunicate
with one another.
We will now formulate this as a rigorous mathemat-ics problem,
solve it, and derive the observational conse-quences. We will find
that this approach fails catastroph-ically when confronted with
observation, giving interest-ing hints regarding further physical
principles needed forunderstanding why we perceive our world as an
objecthierarchy.
H. The Hilbert-Schmidt vector space
To enable a rigorous formulation of our problem, letus first
briefly review the Hilbert-Schmidt vector space, aconvenient
inner-product space where the vectors are notwave functions | but
matrices such as H and . Forany two matrices A and B, the
Hilbert-Schmidt innerproduct is defined by
(A,B) tr AB. (18)
For example, the trace operator can be written as aninner
product with the identity matrix:
tr A = (I,A). (19)
This inner product defines the Hilbert-Schmidt norm(also known
as the Frobenius norm)
||A|| (A,A) 12 = (tr AA) 12 =
ij
|Aij |2 12 . (20)
If A is Hermitean (A = A), then ||A||2 is simply thesum of the
squares of its eigenvalues.
Real symmetric and antisymmetric matrices form or-thogonal
subspaces under the Hilbert-Schmidt innerproduct, since (S,A) = 0
for any symmetric matrix S(satisfying St = S) and any antisymmetric
matrix A(satisfying At = A). Because a Hermitian matrix
(sat-isfying H = H) can be written in terms of real sym-metric and
antisymmetric matrices as H = S + iA, wehave
(H1,H2) = (S1,S2) + (A1,A2),
which means that the inner product of two Hermiteanmatrices is
purely real.
I. Separating H with orthogonal projectors
By viewing H as a vector in the Hilbert-Schmidt vec-tor space,
we can rigorously define and decomposition ofit into orthogonal
components, two of which are the sep-arable terms from equation
(14). Given a factorization ofthe Hilbert space where the matrix H
operates, we definefour linear superoperators5 i as follows:
0H 1n
(tr H) I (21)
1H (
1
n2tr2
H
) I2 0H (22)
2H I1 (
1
n1tr1
H
)0H (23)
3H (I1 2 3)H (24)It is straightforward to show that these four
linear op-erators i form a complete set of orthogonal
projectors,i.e., that
3i=0
i = I, (25)
ij = iij , (26)
(iH,jH) = ||iH||2ij . (27)
5 Operators on the Hilbert-Schmidt space are usually called
su-peroperators in the literature, to avoid confusions with
operatorson the underlying Hilbert space, which are mere vectors in
theHilbert-Schmidt space.
-
14
This means that any Hermitean matrix H can be de-composed as a
sum of four orthogonal components Hi iH, so that its squared
Hilbert-Schmidt norm can bedecomposed as a sum of contributions
from the four com-ponents:
H = H0 + H1 + H2 + H3, (28)
Hi iH, (29)(Hi,Hj) = ||Hi||2ij , (30)||H||2 =
||H0||2+||H1||2+||H2||2+||H3||2. (31)
We see that H0 I picks out the trace of H, whereas theother
three matrices are trace-free. This trace term is ofcourse
physically uninteresting, since it can be eliminatedby simply
adding an unobservable constant zero-point en-ergy to the
Hamiltonian. H1 and H2 corresponds to thetwo separable terms in
equation (14) (without the traceterm, which could have been
arbitrarily assigned to ei-ther), and H3 corresponds to the
non-separable residual.A Hermitean matrix H is therefore separable
if and onlyif 3H = 0. Just as it is customary to write the norm or
avector r by r |r| (without bold face), we will denote
theHilbert-Schmidt norm of a matrix H by H ||H||. Forexample, with
this notation we can rewrite equation (31)as simply H2 = H20 +H
21 +H
22 +H
23 .
Geometrically, we can think of nn Hermitean matri-ces H as
points in the N -dimensional vector space RN ,whereN = nn
(Hermiteal matrices have n real numberson the diagonal and n(n 1)/2
complex numbers off thediagonal, constituting a total of n+ 2 n(n
1)/2 = n2real parameters). Diagonal matrices form a hyperplaneof
dimension n in this space. The projection operators0, 1, 2 and 3
project onto hyperplanes of dimen-sion 1, (n 1), (n 1) and (n 1)2,
respectively, soseparable matrices form a hyperplane in this space
of di-mension 2n 1. For example, a general 4 4 Hermiteanmatrix can
be parametrized by 10 numbers (4 real for thediagonal part and 6
complex for the off-diagonal part),and its decomposition from
equation (28) can be writtenas follows:
H =
t+a+b+v d+w c+x yd+w t+abv z cxc+x z ta+bv dwy cx dw tab+v
=
=
t 0 0 00 t 0 00 0 t 00 0 0 t
+ a 0 c 00 a 0 cc 0 a 0
0 c 0 a
+ b d 0 0d b 0 00 0 b d
0 0 d b
+
+
v w x yw v z xx z v wy x w v
(32)We see that t contributes to the trace (and H0) whilethe
other three components Hi are traceless. We also seethat tr 1H2 =
tr 2H1 = 0, and that both partial tracesvanish for H3.
J. Maximizing separability
We now have all the tools we need to rigorously max-imize
separability and test the physics-from-scratch ap-proach described
in Section III G. Given a HamiltonianH, we simply wish to minimize
the norm of its non-separable component H3 over all possible
Hilbert spacefactorizations, i.e., over all possible unitary
transforma-tions. In other words, we wish to compute
E minU||3H||, (33)
where we have defined the integration energy E by anal-ogy with
the integrated information . If E = 0, thenthere is a basis where
our system separates into two paral-lel universes, otherwise E
quantifies the coupling betweenthe two parts of the system under
the cruelest cut.
Tangent vectorH=[A,H]
Integrationenergy
E=||3||
Non-separablecomponent3
Sepa
rabl
e hy
perp
lane
: 3
=0
Subsphere H=UH*U{
FIG. 9: Geometrically, we can view the integration energyas the
shortest distance (in Hilbert-Schmidt norm) betweenthe hyperplane
of separable Hamiltonians and a subsphereof Hamiltonians that can
be unitarily transformed into oneanother. The most separable
Hamiltonian H on the subsphereis such that its non-separable
component 3 is orthogonalto all subsphere tangent vectors [A,H]
generated by anti-Hermitean matrices A.
The Hilbert-Schmidt space allows us to interpret theminimization
problem of equation (33) geometrically, asillustrated in Figure 9.
Let H denote the Hamiltonianin some given basis, and consider its
orbit H = UHUunder all unitary transformations U. This is a curved
hy-persurface whose dimensionality is generically n(n 1),i.e., n
lower than that of the full space of Hermiteanmatrices, since
unitary transformation leave all n eigen-
-
15
values invariant.6 We will refer to this curved hypersur-face as
a subsphere, because it is a subset of the full n2-dimensional
sphere: the radius H (the Hilbert-Schmidtnorm ||H||) is invariant
under unitary transformations,but the subsphere may have a more
complicated topologythan a hypersphere; for example, the 3-sphere
is knownto topologically be the double cover of SO(3), the
matrixgroup of 3 3 orthonormal transformations.
We are interested in finding the most separable point Hon this
subsphere, i.e., the point on the subsphere that isclosest to the
(2n1)-dimensional separable hyperplane.In our notation, this means
that we want to find the pointH on the subsphere that minimizes
||3H||, the Hilbert-Schmidt norm of the non-separable component. If
weperform infinitesimal displacements along the subsphere,||3H||
thus remains constant to first order (the gradientvanishes at the
minimum), so all tangent vectors of thesubsphere are orthogonal to
3H, the vector from theseparable hyperplane to the subsphere.
Unitary transformations are generated by anti-Hermitian
matrices, so the most general tangent vectorH is of the form
H = [A,H] AHHA (34)
for some anti-Hermitean nn matrix A (any matrix sat-isfying A =
A). We thus obtain the following simplecondition for maximal
separability:
(3H, [A,H]) = 0 (35)
for any anti-Hermitean matrix A. Because the most gen-eral
anti-Hermitean matrix can be written as A = iB fora Hermitean
matrix B, equation (35) is equivalent to thecondition (3H, [B,H]) =
0 for all Hermitean matricesB. Since there are n2 anti-Hermitean
matrices, equa-tion (35) is a system of n2 coupled quadratic
equationsthat the components of H must obey.
K. The Hamiltonian diagonality conjecture
By analogy with our -diagonality conjecture above,I once again
conjecture that maximal separability is at-tained in the
eigenbasis.
H-Diagonality Conjecture (HDC):The Hamiltonian is always
maximally separa-ble (minimizing ||H3||) in the energy eigenba-sis
where it is diagonal.
Numerical tests of this conjecture produce encouragingsupport
that is visually quite analogous to Figure 8, but
6 nn-dimensional Unitary matricesU are known to form an
nn-dimensional manifold: they can always be written as U = eiH
for some Hermitean matrix H, so they are parametrized by thesame
number of real parameters (n n) as Hermitean matrices.
with ||H3|| rather than I on the vertical axis. For theHDC,
however, the mathematical formalism above alsopermits rigorous
analytic tests. Let us consider the sim-plest case, when n = 4 so
that H can be parametrizedas in equation (32). Since ||H3|| is
invariant under uni-tary transformations acting only on either of
the twofactor spaces, we can without loss of generality selecta
basis where H1 and H2 are diagonal, so that the pa-rameters c = d =
0. The optimality condition of equa-tion (35) gives a separate
constraint equation for each ofn2 linearly independent
anti-Hermitean matrices A. Ifwe choose these matrices to be of the
simplest possibleform, each with all except one or two elements
vanishing,we find that the resulting n2 real-valued equations
pro-duces merely 8 linearly independent ones which can becombined
into the following four complex equations:
bw = 0, (36)
ax = 0, (37)
(a+ b)y = 0, (38)
(a b)z = 0. (39)The HDC states that H3 ||H3|| takes its
minimumvalue when H is diagonal, i.e., when w = x = y = z = 0.To
complete the proof of the HDC for n = 4, we thusneed to find all
solutions to these equations and verifythat for each of them, H3 is
greater than or equal to itsvalue in the energy eigenbasis. There
are only 6 cases toconsider for a and b:
1. a = b = 0, solving the equations and givingH1 = H2 = 0. This
clearly maximizes rather thanminimizes H3 (since H
23 = H
2H20H21H22 , andH0 is unitarily invariant), so H3 cannot be
smallerthan in the eigenbasis.
2. a = 0, b 6= 0, so w = y = z = 0, giving H3 =321/2(|v|2 +
|x|2)1/2. This is not the H3-minimum,because H is separable (H3 =
0) in its eigenbasis.
3. b = 0, a 6= 0, so x = y = z = 0, analogously givingH3 =
32
1/2(|v|2 + |y|2)1/2 0, the value in theeigenbasis.
4. a = b 6= 0, so w = x = y = 0, giving H3 = (32|v|2+16|z|2)1/2
321/2|v|2, the value in the eigenbasis.
5. a = b 6= 0, so w = x = z = 0, giving H3 =(32|v|2 + 16|y|2)1/2
321/2|v|, the value in theeigenbasis.
6. a 6= 0, b 6= 0, |a| 6= |b|, so w = x = y = z = 0,giving a
diagonal H.
In summary, when applying all unitary transformations,the
non-separable part H3 takes its minimum when His diagonal, which
completes the proof. It also takes itsmaximum when H1 = H2 = 0, and
has saddle pointswhen H1 or H2 vanish or when H1 = H2. Any
extremumwhere H1 and H2 are generic (non-zero and non-equal)has H
diagonal.
-
16
Although it appears likely that the HDC can be analo-gously
proven for any given n by using equation (35) andexamining all
solutions to the resulting system of cou-pled quadratic equations,
a general proof for arbitrary nwould obviously be more interesting.
Any diagonal nnHamiltonian H is a solution to equation (35), so
whatremains to be established is that this is the global
mini-mum.7
If the HDC is correct, then separability is always max-imized in
the energy eigenbasis, where the n n matrixH is diagonal and the
projection operators i definedby equations (21)-(24) greatly
simplify. If we arrangethe n = lm diagonal elements of H into an lm
matrixH, then the action of the linear operators i is given
bysimple matrix operations:
H0 QlHQm, (40)H1 PlHQm, (41)H2 QlHPm, (42)H3 PlHPm, (43)
where
Pm I Qm, (44)(Qm)ij 1
m(45)
are m m projection matrices satisfying P 2m = Pm,Q2m = Qm, PmQm
= QmPm = 0, Pm + Qm = I. (Toavoid confusion, we are using boldface
for n n matri-ces and plain font for smaller matrices involving
only theeigenvalues.)
L. Ultimate independence and the Quantum Zenoparadox
In Section III G, we began exploring the idea that ifwe divide
the world into maximally independent parts(with minimal interaction
Hamiltonians), then the ob-served object hierarchy from Figure 1
would emerge. As-suming that the HDC is correct, we have now found
thatthis decomposition (factorization) into maximally inde-pendent
parts can be performed in the energy eigenba-sis of the total
Hamiltonian. This means that all sub-system Hamiltonians and all
interaction Hamiltonianscommute with one another, corresponding to
an essen-tially classical world where none of the quantum
effects
7 In the large-n limit, it is rather intuitive that the HDC
should beat least approximately true, because almost all the n(n1)
n2off-diagonal degrees of freedom for H belong to H3 which is
beingminimized. Since we can restrict ourselves to bases whereH1
andH2 are diagonal as above, there are only of order n
off-diagonaldegrees of freedom not contributing to H3 (contributing
insteadto H1 and H2). If we begin in a basis where H is diagonal
andapply a unitary transformation adding off-diagonal elements,
H3thus generically increases.
FIG. 10: If the Hamiltonian of a system commutes with
theinteraction Hamiltonian ([H1,H3] = 0), then decoherencedrives
the system toward a time-independent state wherenothing ever
changes. The figure illustrates this for the BlochSphere of a
single qubit starting in a pure state and ending upin a fully mixed
state = I/2. More general initial states endup somewhere along the
z-axis. Here H1 z, generating asimple precession around the
z-axis.
associated with non-commutativity manifest themselves!In
contrast, many systems that we customarily refer toas objects in
our classical world do not commute withtheir interaction
Hamiltonians: for example, the Hamil-tonian governing the dynamics
of a baseball involves itsmomentum, which does not commute with the
position-dependent potential energy due to external forces.
As emphasized by Zurek [19], states commuting withthe
interaction Hamiltonian form a pointer basis ofclassically
observable states, playing an important rolein understanding the
emergence of a classical world.The fact that the independence
principle automaticallyleads to commutativity with interaction
Hamiltoniansmight therefore be taken as an encouraging
indicationthat we are on the right track. However, whereasthe
pointer states in Zureks examples evolve over timedue to the
systems own Hamiltonian H1, those in ourindependence-maximizing
decomposition do not, becausethey commute also with H1. Indeed, the
situationis even worse, as illustrated in Figure 10: any
time-dependent system will evolve into a time-independentone, as
environment-induced decoherence [2023, 25, 27]drives it towards an
eigenstate of the interaction Hamil-tonian, i.e., an energy
eigenstate.8
8 For a system with a finite environment, the entropy will
eventu-
-
17
The famous Quantum Zeno effect, whereby a systemcan cease to
evolve in the limit where it is arbitrar-ily strongly coupled to
its environment [28], thus has astronger and more pernicious
cousin, which we will termthe Quantum Zeno Paradox or the
Independence Para-dox.
Quantum Zeno Paradox:If we decompose our universe into
maximallyindependent objects, then all change grinds toa halt.
In summary, we have tried to understand the emer-gence of our
observed semiclassical world, with its hi-erarchy of moving
objects, by decomposing the worldinto maximally independent parts,
but our attempts havefailed dismally, producing merely a timeless
world remi-niscent of heat death. In Section II G, we saw that
usingthe integration principle alone led to a similarly
embar-rassing failure, with no more than a quarter of a bit
ofintegrated information possible. At least one more prin-ciple is
therefore needed.
IV. DYNAMICS AND AUTONOMY
Let us now explore the implications of the dynamicsprinciple
from Table II, according to which a conscioussystem has the
capacity to not only store information,but also to process it. As
we just saw above, there is aninteresting tension between this
principle and the inde-pendence principle, whose Quantum Zeno
Paradox givesthe exact opposite: no dynamics and no information
pro-cessing at all.
We will term the synthesis of these two competing prin-ciples
the autonomy principle: a conscious system hassubstantial dynamics
and independence. When explor-ing autonomous systems below, we can
no longer studythe state and the Hamiltonian H separately, since
theirinterplay is crucial. Indeed, we well see that there are
in-teresting classes of states that provide substantial dy-namics
and near-perfect independence even when the in-teraction
Hamiltonian H3 is not small. In other words,for certain preferred
classes of states, the independenceprinciple no longer pushes us to
simply minimize H3 andface the Quantum Zeno Paradox.
A. Probability velocity and energy coherence
To obtain a quantitative measure of dynamics, let usfirst define
the probability velocity v p, where the prob-
ally decrease again, causing the resumption of
time-dependence,but this Poincare recurrence time grows
exponentially with envi-ronment size and is normally large enough
that decoherence canbe approximated as permanent.
ability vector p is given by pi ii. In other words,
vk = kk = i[H, ]kk. (46)
Since v is basis-dependent, we are interested in findingthe
basis where
v2 k
v2k =k
(kk)2 (47)
is maximized, i.e., the basis where the sums of squares ofthe
diagonal elements of is maximal. It is easy to seethat this basis
is the eigenbasis of :
v2 =k
(kk)2 =
jk
(jk)2
j 6=k
(jk)2
= ||||2 j 6=k
(jk)2 (48)
is clearly maximized in the eigenbasis where all off-diagonal
elements in the last term vanish, since theHilbert-Schmidt norm
|||| is the same in every basis;||||2 = tr 2, which is simply the
sum of the squares ofthe eigenvalues of .
Let us define the energy coherence
H 12|||| = 1
2||[H, ]|| =
tr {[H, ]2}
2
=
tr [H22 HH]. (49)
For a pure state = ||, this definition implies thatH H, where H
is the energy uncertainty
H =[|H2| |H|2]1/2 , (50)
so we can think of H as the coherent part of the en-ergy
uncertainty, i.e., as the part that is due to quantumrather than
classical uncertainty.
Since |||| = ||[H, ]|| = 2H, we see that the maxi-mum possible
probability velocity v is simply
vmax =
2 H, (51)
so we can equivalently use either of v or H as
convenientmeasures of quantum dynamics.9 Whimsically speaking,the
dynamics principle thus implies that energy eigen-states are as
unconscious as things come, and that if youknow your own energy
exactly, youre dead.
9 The fidelity between the state (t) and the initial state 0
isdefined as
F (t) 0|(t), (52)and it is easy to show that F (0) = 0 and F (0)
= (H)2, sothe energy uncertainty is a good measure of dynamics in
that italso determines the fidelity evolution to lowest order, for
purestates. For a detailed review of related measures of
dynam-ics/information processing capacity, see [7].
-
18
FIG. 11: Time-evolution of Bloch vector tr1 for a single qubit
subsystem. We saw how minimizing H3 leads to a static statewith no
dynamics, such as the left example. Maximizing H, on the other
hand, produces extremely simple dynamics such asthe right example.
Reducing H by a modest factor of order unity can allow complex and
chaotic dynamics (center); shownhere is a 2-qubit system whether
the second qubit is traced out.
Although it is not obvious from their definitions,
thesequantities vmax and H are independent of time (eventhough
generally evolves). This is easily seen in theenergy eigenbasis,
where
imn = [H, ]mn = mn(Em En), (53)where the energies En are the
eigenvalues of H. In thisbasis, (t) = eiHt(0)eiHt simplifies to
(t)mn = (0)mnei(EmEn)t, (54)
This means that in the energy eigenbasis, the probabili-ties pn
nn are invariant over time. These quantitiesconstitute the energy
spectral density for the state:
pn = En||En. (55)In the energy eigenbasis, equation (50) reduces
to
H2 = H2 =n
pnE2n
(n
pnEn
)2, (56)
which is time-invariant because the spectral density pnis. For
general states, equation (49) simplifies to
H2 =mn
|mn|2En(En Em). (57)
This is time-independent because equation (54) showsthat mn
changes merely by a phase factor, leaving |mn|invariant. In other
words, when a quantum state evolvesunitarily in the Hilbert-Schmidt
vector space, both theposition vector and the velocity vector
retain theirlengths: both |||| and |||| remain invariant over
time.
B. Dynamics versus complexity
Our results above show that if all we are interested inis
maximizing the maximal probability velocity, then weshould find the
two most widely separated eigenvalues ofH, Emin and Emax, and
choose a pure state that involvesa coherent superposition of the
two:
| = c1|Emin+ c2|Emax, (58)
where |c1| = |c2| = 1/
2. This gives H = (Emax Emin)/2, the largest possible value, but
produces an ex-tremely simple and boring solution (t). Since the
spec-tral density pn = 0 except for these two energies, thedynamics
is effectively that of a 2-state system (a sin-gle qubit) no matter
how large the dimensionality of His, corresponding to a simple
periodic solution with fre-quency = Emax Emin (a circular
trajectory in theBloch sphere as in the right panel of Figure 11).
This vi-olates the dynamics principle as defined in Table II,
sinceno substantial information processing capacity exists:
thesystem is simply performing the trivial computation thatflips a
single bit repeatedly.
To perform interesting computations, the systemclearly needs to
exploit a significant part of its energyspectrum. As can be seen
from equation (54), if theeigenvalue differences are irrational
multiples of one an-other, then the time evolution will never
repeat, and will eventually evolve through all parts of Hilbert
spaceallowed by the invariants |Em||En|. The reduction ofH required
to transition from simple periodic motion
-
19
Dynamics from H1
slides along diagonal
ij0
High-decoherence subspace
High-decoherence subspace
Dynamics from H3 suppresses
off-diagonal elements ij
Dynamics from H3 suppresses
off-diagonal elements ij
Low-decoherence subspace
i
j
FIG. 12: Schematic representation of the time-evolution ofthe
density matrix ij for a highly autonomous subsystem.ij 0 except for
a single region around the diagonal(red/grey dot), and this region
slides along the diagonal un-der the influence of the subsystem
Hamiltonian H1. Any ij-elements far from the diagonal rapidly
approach zero becauseof environment-decoherence caused by the
interaction Hamil-tonian H3.
to such complex aperiodic motion is quite modest. Forexample, if
the eigenvalues are roughly equispaced, thenchanging the spectral
density pn from having all weight atthe two endpoints to having
approximately equal weightfor all eigenvalues will only reduce the
energy coherenceH by about a factor
3, since the standard deviation
of a uniform distribution is
3 times smaller than itshalf-width.
C. Highly autonomous systems: sliding along thediagonal
What combinations of H, and factorization producehighly
autonomous systems? A broad and interestingclass corresponds to
macroscopic objects around us thatmove classically to an excellent
approximation.
The states that are most robust toward environment-induced
decoherence are those that approximately com-mute with the
interaction Hamiltonian [22]. As a simplebut important example, let
us consider an interactionHamiltonian of the factorizable form
H3 = AB, (59)
and work in a basis where the interaction term A is di-agonal.
If 1 is approximately diagonal in this basis,
then H3 has little effect on the dynamics, which be-comes
dominated by the internal subsystem HamiltonianH1. The Quantum Zeno
Paradox we encountered in Sec-tion III L involved a situation where
H1 was also diag-onal in this same basis, so that we ended up with
nodynamics. As we will illustrate with examples below,classically
moving objects in a sense constitute the op-posite limit: the
commutator 1 = i[H1, 1] is essentiallyas large as possible instead
of as small as possible, con-tinually evading decoherence by
concentrating arounda single point that continually slides along
the diagonal,as illustrated in Figure 12. Decohererence rapidly
sup-presses off-diagonal elements far from this diagonal, butleaves
the diagonal elements completely unaffected, sothere exists a
low-decoherence band around the diagonal.Suppose, for instance,
that our subsystem is the center-of-mass position x of a
macroscopic object experiencinga position-dependent potential V (x)
caused by couplingto the environment, so that Figure 12 represents
the den-sity matrix 1(x, x
) in the position basis. If the potentialV (x) has a flat (V =
0) bottom of width L, then 1(x, x)will be completely unaffected by
decoherence for the band|xx| < L. For a generic smooth potential
V , the deco-herence suppression of off-diagonal elements grows
onlyquadratically with the distance |xx| from the diagonal[21, 26],
again making decoherence much slower than theinternal dynamics in a
narrow diagonal band.
As a specific example of this highly autonomous type,let us
consider a subsystem with a uniformly spaced en-ergy spectrum.
Specifically, consider an n-dimensionalHilbert space and a
Hamiltonian with spectrum
Ek =
[k n 1
2
]~ = k~ + E0, (60)
k = 0, 1, ..., n 1. We will often set ~ = 1 for simplic-ity. For
example, n = 2 gives the spectrum { 12 , 12}like the Pauli matrices
divided by two, n = 5 gives{2,1, 0, 1, 2} and n gives the simple
Harmonicoscillator (since the zero-point energy is physically
irrele-vant, we have chosen it so that tr H =
Ek = 0, whereas
the customary choice for the harmonic oscillator is suchthat the
ground state energy is E0 = ~/2).
If we want to, we can define the familiar position andmomentum
operators x and p, and interpret this systemas a Harmonic
oscillator. However, the probability ve-locity v is not maximized
in either the position or themomentum basis, except twice per
oscillation whenthe oscillator has only kinetic energy, v is
maximized inthe x-basis, and when it has only potential energy, v
ismaximized in the p-basis, and when it has only poten-tial energy.
If we consider the Wigner function W (x, p),which simply rotates
uniformly with frequency , it be-comes clear that the observable
which is always chang-ing with the maximal probability velocity is
instead thephase, the Fourier-dual of the energy. Let us
thereforedefine the phase operator
FHF, (61)
-
20
where F is the unitary Fourier matrix.
Please remember that none of the systems H that weconsider have
any a priori physical interpretation; rather,the ultimate goal of
the physics-from-scratch program isto derive any interpretation
from the mathematics alone.Generally, any thus emergent
interpretation of a subsys-tem will depend on its interactions with
other systems.Since we have not yet introduced any interactions for
oursubsystem, we are free to interpret it in whichever way
isconvenient. In this spirit, an equivalent and sometimesmore
convenient way to interpret our Hamiltonian fromequation (60) is as
a massless one-dimensional scalar par-ticle, for which the momentum
equals the energy, so themomentum operator is p = H. If we
interpret the par-ticle as existing in a discrete space with n
points and atoroidal topology (which we can think of as n
equispacedpoints on a ring), then the position operator is related
tothe momentum operator by a discrete Fourier transform:
x = FpF, Fjk 1Nei
jk2pin . (62)
Comparing equations (61) and (62), we see that x =. Since F is
unitary, the operators H, p, x and all have the same spectrum: the
evenly spaced grid ofequation (60).
As illustrated in Figure 13, the time-evolution gener-ated by H
has a very simple geometric interpretationin the space spanned by
the position eigenstates |xk,k = 1, ...n: the space is simply
rotating with frequency around a vector that is the sum of all the
position eigen-vectors, so after a time t = 2pi/n, a state |(0) =
|xkhas been rotated such that it equals the next eigenvector:|(t) =
|xk+1, where the addition is modulo n. Thismeans that the system
has period T 2pi/, and that| rotates through each of the n basis
vectors duringeach period.
Let us now quantify the autonomy of this system, start-ing with
the dynamics. Since a position eigenstate is aDirac delta function
in position space, it is a plane wavein momentum space and in
energy space, since H = p.This means that the spectral density is
pn = 1/n for aposition eigenstate. Substituting equation (60) into
equa-tion (56) gives an energy coherence
H = ~n2 1
12. (63)
For comparison,
||H|| =(n1k=0
E2k
)1/2= ~
n(n2 1)
12=n H. (64)
Let us now turn to quantifying independence and de-coherence.
The inner product between the unit vector|(0) and the vector |(t)
eiHt|(0) into which it
evolves after a time t is
fn() |eiH | = 1
n
n1k=0
eiEk = ein12
n1k=0
eik
=1
nei
n12
1 ein1 ei =
sinn
n sin, (65)
where t. This inner product fn is plotted in Fig-ure 14, and is
seen to be a sharply peaked even functionsatisfying fn(0) = 1,
fn(2pik/n) = 0 for k = 1, ..., n 1and exhibiting one small
oscillation between each of thesezeros. The angle cos1 fn() between
an initial vec-tor and its time evolution thus grows rapidly from 0
to90, then oscillates close to 90 until returning to 0 aftera full
period T . An initial state |(0) = |xk thereforeevolves as
j(t) = fn(t 2pi[j k]/n)in t