4 5 arXiv:1305.5278v1 [quant-ph] 22 May 2013 second law of thermodynamics tells us which state transformations are so statistically unlikely that they are eﬀectively forbidden. Its

arX

iv:1

305.

5278

v1 [

quan

t-ph

] 2

2 M

ay 2

013

The second laws of quantum thermodynamics

Fernando G.S.L. Brandao,1 Micha l Horodecki,2 Nelly Huei

Ying Ng,3 Jonathan Oppenheim,4, 3 and Stephanie Wehner3, 5

1University College London, Department of Computer Science2IFTIA, University of Gdansk, 80-952 Gdansk, Poland

3Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, 117543 Singapore4University College of London, Department of Physics & Astronomy, London,

WC1E 6BT and London Interdisciplinary Network for Quantum Science5School of Computing, National University of Singapore, 13 Computing Drive, 117417 Singapore

The second law of thermodynamics tells us which state transformations are so statistically unlikelythat they are effectively forbidden. Its original formulation, due to Clausius, states that “Heatcan never pass from a colder to a warmer body without some other change, connected therewith,occurring at the same time.”[1] The second law applies to systems composed of many particles,however, we are seeing that one can make sense of thermodynamics in the regime where we onlyhave a small number of particles interacting with a heat bath, or when we have highly correlatedsystems and wish to make non-statistical statements about them [2–9]. Is there a second law ofthermodynamics in this regime? Here, we find that for processes which are cyclic or very close tocyclic, the second law for microscopic or highly correlated systems takes on a very different form thanit does at the macroscopic scale, imposing not just one constraint on what state transformations arepossible, but an entire family of constraints. In particular, we find that the Renyi relative entropydistances to the equilibrium state can never increase. We further find that there are three regimeswhich govern which family of second laws govern state transitions, depending on how cyclic theprocess is. In one regime one can cause an apparent violation of the usual second law, through aprocess of embezzling work from a large system which remains arbitrarily close to its original state.

In attempting to apply the Clausius statement of the second law to the microscopic or quantum scale, we immediatelyrun into a problem, because it talks about cyclic processes in which there is no other change occurring at the sametime, and at this scale, it is impossible to design a process in which there is no change, however slight in our devicesand heat engines. Interpreted strictly, the Clausius statement of the second law, applies to situations which neveroccur in nature. The same holds true for other versions of the second law, such as the Kelvin-Planck statement, whereone also talks about cyclic processes, in which all other objects beside the system of interest are returned back totheir original state. At the macroscopic scale, the fact that a process is only approximately cyclic has generally beenassumed to be enough to guarantee the second law. Here, we show that this is not the case in the microscopic regime,and we therefore needs to talk about “how cyclic” a process is when stating the second law.

For thermodynamics at the macroscopic scale, a system in state ρ can be transformed into state ρ′ provided thatthe free energy goes down, where the free energy for a state ρ is

F (ρ) = 〈E(ρ)〉 − TS(ρ) (1)

with T the temperature of the ambient heat bath that surrounds the system, S(ρ) the entropy of the system, and〈E〉 its average energy. This can be understood as a version of the second law, where we also use the fact that thetotal energy of the system and heat bath must be conserved. This criteria governing state transitions is valid if thesystem is composed of many particles, and there are no long range correlations. In the case of microscopic, quantumor highly correlated systems, a criteria for state transitions of a total system was proven to be thermo-majorisation [5](See Figure 1) This criteria has been conjectured [10] and claimed to be [7] a second law. However, here, we will seethat if elevated to such high status, it can be violated. Namely, we will give examples where ρ→ ρ′ would violate thethermo-majorisation criteria, but nonetheless, the transition is possible via a cyclic process in which a working bodyσ - an ancilla - is returned back into its original state.

This phenomenon is related to entanglement catalysis[11], where it can be shown that some forbidden transitionscan be possible, if we can use an additional system σ as a catalyst, i.e. we may have ρ 6→ ρ′ and yet ρ⊗ σ → ρ′ ⊗ σ.In the case of thermodynamics, the catalyst σ may be thought of as a working body or heat engine which undergoesa cyclic process and is returned back into its original state. In deciding whether one can transform ρ into ρ′, onetherefore needs to ask whether there exists a working body or other ancillas σ for which ρ ⊗ σ → ρ′ ⊗ σ. Thus,thermomajorisation should only be applied to total resources including catalysts and working bodies and not thesystem of interest itself. In the case of entanglement theory, and when the catalyst is returned in exactly the samestate, the criteria for when one pure state may be transformed into another has been found [12, 13] and they arecalled trumping conditions.

http://arxiv.org/abs/1305.5278v1

2

FIG. 1: The thermo-majorisation criteria is as follows: Consider probabilities p(E, g) of the initial system ρ to be in the g’thstate of energy E. Now let us put p(E, g)eβE in decreasing order p(E1, g1)e

βE1 ≥ p(E2, g2)eβE2 ≥ p(E3, g3)e

βE3 ... – we say thatthe eigenvalues are β-ordered. We can do the same for system σ i.e. eβE1q(E1, g1) ≥ eβE2q(E2, g2) ≥ eβE3q(E3, g3).... Thenthe condition which determines whether we can transform ρ into ρ′ is depicted in the above figure. Namely, for any state, weconstruct a curve with points k given by

∑e−βEi/Z,

∑k

ipi. Then a thermodynamical transition from ρ to ρ′ is possible if

and only if, the curve of ρ lies above the curve of ρ′. One can make a previously impossible transition possible by adding workin the form of the pure state ψW which will scale each point by an amount e−βW horizontally [5]. We can define G(ρ) to bethe new probability distribution defined by probabilities p(E1, g1)e

βE1 with multiplicity proportional to e−βE1

1. A family of second laws

Here we consider all possible cyclic thermodynamical processes, and show that it affects transition laws. Ratherthan a single free energy which determines which transitions are possible, we find necessary and sufficient conditionsfor thermodynamic transitions which form, not just one, but a family of second laws – namely for states block diagonalin the energy basis

In the presence of a single heat bath, the Renyi relative entropies Dα(ρ‖ρβ) between the state ρ of the system and theequilibrium state ρβ do not increase for all α ≥ 0. That is, ∀α ≥ 0, Dα(ρ‖ρβ) ≥ Dα(ρ′‖ρβ) where ρ′ is the resulting

state.

The Renyi divergences Dα(ρ‖ρβ) are defined as

Dα(ρ‖ρβ) =1

α− 1log

∑

i

pαi q1−αi (2)

where pi, qi are the eigenvalues of ρ and ρβ , respectively. We say that Dα(ρ‖ρβ) are monotones. These set oflimitations are less stringent than thermo-majorisation. Whenever relative entropies of one state are all greater thanfor the other state, one can transform the one into the other one. We prove this in Sections D and C of the Appendix.Note that (2) establishes a continuous family of conditions, one for each value of α, and that we have been able toderive conditions without the usual error terms associated with such single-shot quantities[14]. One is often interestedin larger systems where such error terms are insignificant. In such a case, we find that for any distribution p thereexist two so-called smoothed distributions that are very close to p and in terms of these smoothed distributions theinfinite set of conditions collapses to just two conditions in terms of the two free energies D0 and D∞ found in [5](see Section A 3).

For α → 1, Dα(ρ‖ρβ) is equal to the difference between the ordinary free energy F (ρ) and the free energy ofthe Gibbs state F (ρβ). Hence our conditions include the ordinary second law (combined with energy conservation),and we thus see that it is merely one of many constraints on thermodynamical state transitions. In the macroscopicregime, and for systems which are not highly correlated, Dα(ρ‖ρβ) ≈ D1(ρ‖ρβ) for all α which explains why the singleconstraint given by the usual second law is more or less adequate in this limit. For α = 0, D0(ρ‖ρβ) = Dmin(ρ‖ρβ)is known as the min-relative entropy, which we previously found to quantify the maximal amount of distillable workextractable from a system in contact with a reservoir under all thermal operations [5] or in a model of alternatingadiabatic and isothermal operations [6]. In the case of trivial Hamiltonians, this quantity is equivalent to that foundpreviously in [3]. Thus, we find that although generic state transitions are affected by catalysts, the above result ondistillable work by going from ρ to ρβ is not. Likewise, the reverse process, the so-called work of formation [5] in

3

creating ρ corresponds to α → ∞. We thus see that the two free energies proposed in [5] are special cases of ourfamily of conditions and they hold even in the presence of catalysts.

However in other cases, we find that one can distil more work than would be obtainable without considering anancillary system which is used as a catalyst. For example, we find that by using an ancillary system, one can erase orreset a memory register at a lower work cost than previously known. In particular, when resetting a memory registerto a pure state, while retaining correlations with a reference system [4, 9]. Classically, resetting a memory requireswork, but the authors of [4] found that due to entanglement, there were cases where a memory could be reset at acost of a negative amount of work (i.e. work could actually be extracted while the memory was reset). Here, we findthat even more work can be extracted during a memory reset. In general, we find that more work can be extractedin a cyclic process, and we derive the optimal amount of extractable work in such a case, providing an operationalinterpretation of a difference of Renyi entropies, and an interpretation for when this quantity is negative.

2. Work distance

Given the monotonicity of Dα(ρ‖ρβ) we may easily compute the maximum amount of work which can be extractedwhen going from a system in state ρ to one in state ρ′. Namely, in [5] we introduced the notion of a work bit, or wit,which starts off in state |0〉 and gets raised or lowered to a state |W 〉 with energy W . This corresponds to extractingan amount of work W if W is positive, or performing work if W is negative. From our second laws, we know that atransition is possible if and only if

Dα(ρ⊗ |0〉〈0|‖ρβ) ≥ Dα(ρ′ ⊗ |W 〉〈W |‖ρβ) ∀α ≥ 0 (3)

which implies (see Section E) of the Appendix), that W ≤ D(ρ ≻ ρ′) is achievable, where

D(ρ ≻ ρ′) := kT infα

[Dα(ρ‖ρβ) − Dα(ρ′‖ρβ)]. (4)

We thus see that the Dα(ρ‖ρβ) is very much like free energies, not only in the sense that they are monotones, but alsoin the sense that the amount of work is given by the function’s difference between the initial and final state (albeit forthe minimal one). The quantity on the right hand side of Equation (4) can also be thought of as a distance measurebetween states, as was done with the thermo-majorisation criteria in [7] and we will henceforth refer to it as the workdistance from ρ to ρ′.

These results are derived by considering thermodynamics as a resource theory[2, 5, 15, 16], as is done in the contextof entanglement [17–20] and elsewhere, where we are allowed to implement a class of operations, and then quantifythe resources which cannot be created under the class of operations as resources. For thermodynamics, variousclasses of operations have been considered in the micro-regime [5, 6, 15, 21–28]. In particular, we consider thermaloperations [5, 16] where we allow the system of interest ρ to be coupled to a thermal reservoir ρR in the Gibbs stateat temperature T , and the work bit, and we allow arbitrary unitaries between the system, working body and reservoirwhich conserve energy. Energy conservation is important, because we need to account for all sources of energy whichmight get added to our system. This paradigm is equivalent to ones in which we allow interaction Hamiltonians ratherthan unitaries, or where we allow for a Hamiltonian which changes with time, provided that all ancillas are carefullyaccounted for [16].

3. Approximately cyclic processes

In this article we consider one natural additional ingredient, namely, we allow a working body (or ancilla system)C in state ρinC which must be returned in its initial form. I.e. we demand that the thermodynamic process be cyclic,in the sense that the working body is returned in state ρoutC , which is approximately equal to its original state ρinC . Inessence, how cyclic a process is can be understood in terms of how good an approximation ρinC is to ρoutC . Dependingon the desired approximation, i.e., depending on how cyclic we demand the process to be, we find several differentregimes of second laws.

The simplest case is where ρinC = ρoutC , that is, the process is perfectly cyclic and the catalyst is restored to itsoriginal form. In this setting, we have the second law as stated above. But no real process is perfectly cyclic, and so, itis important to consider the case where ρinC ≈ ρoutC . This requires us to derive approximate transformation conditionswhich we expect to also find application in entanglement theory, and are contained in Section E of the Appendix.We find that the form the second law takes when the process is not perfectly cyclic, depends highly on our desiredapproximation guarantee, where we find three separate regimes which are quantified by how cyclic a process is, interms of how close ρinC is to ρoutC .

4

In the first regime, we demand that the change in the working body through a cycle is small, in the sense thatD(ρoutC ≻ ρinC ) ≤ ǫ. In other words, any change in the working body could be corrected by applying a small amountof work. In this case, we recover the second laws as stated above.

The second regime is when the change in the working body is extensive, in the sense that the change in thesystem per particle is constant, i.e. ‖ρinC − ρoutC ‖1 ≤ ǫ/ logN , where N is the dimension of the catalyst. In thiscase, we retrieve the standard second law. The ordinary free energy continues to govern whether a thermodynamicaltransition is possible, while the Renyi-divergences do not. We thus see that the ordinary second law can arise otherin the macroscopic limit, or if we allow processes which deviate from being cyclic in a manner which is constant pernumber of particles in the working body. In order to achieve the transitions governed by the ordinary second law, werequire a catalyst of a particular form, and this is detailed in Section E 3 of the Appendix.

Finally, we consider the regime where we simply demand that the process is close to cyclic regardless of the sizeof the ancillary system. i.e. ‖ρinC − ρoutC ‖1 ≤ ǫ. Since ǫ can be arbitrarily small, one would imagine that for such anapproximately cyclic process one recovers a second law of some sort. Nonetheless, we find that for any ǫ, no matterhow small, one can construct a working body, and cyclic process, such that one can pump heat from a cold reservoirto a hot reservoir, in violation of the Clausius statement of the second law. In fact, we can make arbitrary statetransformations by taking the size of the working body to be so large, that work can be extracted from a single heatbath, while barely modifying the state of the working body. This is related to a phenomenon in entanglement theoryknown as embezzling [29]. In particular, for any desired approximation ǫ there exists a dimension d such that the

catalyst ρinC =∑d

j=11j|j〉〈j| allows us to transform any initial state ρ to any final state ρ′ such that ρinC ≈ǫ ρ

outC .

4. Discussion

The second law is often seen as arising from an experimenter’s lack of control over the system of interest, and assome statistical statement. Here we see that this is not the case – we obtain our fundamental limitations even inthe case where the experimenter can access the microscopic degrees of freedom of the heat bath and couple it in anarbitrary way with the system. The reason that such fine control does not lead to a violation of the second law isrelated to the fact that a Maxwell’s demon with microscopic control over a system cannot violate the second law –a demon which knows the positions and momentums of the particles of a system, must record this information in amemory, which then needs to be reset at the end of a cyclic process [30, 31], and which requires work to do so. For thesame reason, an ability to access the degrees of freedom of the heat bath would also require such a memory resettingstep. Remarkably, although the limitations are derived assuming that one can perform all possible operations, theyare achievable using an incredibly limited set of operations – namely, changing the energy levels of the system, andputting the system in thermal contact with the reservoir.

The second law is often regarded as being statistical in nature, which can be violated in particular instances butnot on average. But here, using tools from single-shot information theory, we have seen that it can be applied tosingle systems. We have derived a family of fundamental limitations on thermodynamical state transformations. Thecriteria is governed by the probabilities of the systems energy levels, and are necessary and sufficient between stateswhich are block-diagonal in the energy basis or for maximal work extraction or work of formation. However, as notedin [5], there are additional restrictions which place constraints on coherences between those energy levels. Theserestrictions can be considered as additional second laws. We have derived them for two level systems in [32], and asnecessary conditions for higher dimension, but the full set of criteria is an interesting open question.

We have presented here the second law, combined with the first law, i.e. energy conservation. If we wish, we canwrite the first law as giving how much energy is transformed to the heat bath via

dQ = dE + dW (5)

with W as defined as the work distance of Equation (4). It is an open question, how best to characterise the otherlaws of thermodynamics at the micro and quantum scale. The zeroeth and third law, seem less fundamental at thisscale, and one might want to consider alternatives – for example, the fact that the Gibbs state is preserved by thermaloperations[5], could be considered as a zeroeth law.

5

Appendix A: Preliminaria: Renyi relative entropies and their properties

1. Renyi divergence

Consider probability distributions p = p1, p2, ..., pn and q = q1, q2, ..., qn. The Renyi divergences are defined forα ∈ [−∞,∞] as follows

Dα(p‖q) =sgn(α)

α− 1log

n∑

i

pαi q1−αi , (A1)

where

sgn(α) =

1 (α ≥ 0);−1 (α < 0).

(A2)

We use the conventions 00 = 0 and a

0 = ∞ for a > 0. The cases α = 0, 1,∞,−∞ are defined via the suitable limit,namely

D0(p‖q) = limα→0+

Dα(p‖q) = − log

n∑

i:pi 6=0

qi, D1(p‖q) = limα→1

Dα(p‖q) =

n∑

i

pi(log pi − log qi),

D∞(p‖q) = limα→∞

Dα(p‖q) = log maxi

piqi, D−∞(p‖q) = − lim

α→−∞Dα(p‖q) = D∞(q‖p)

Note that for α 6∈ 0, 1 we have

αsgn(1 − α)D1−α(p||q) = (1 − α)sgn(α)Dα(q||p). (A3)

For some properties of Renyi divergence, the reader can refer to [33, 34]. Note that discussions in other literaturesdefine Renyi divergence for only non-negative alphas. However, the relative entropy Dα we define satisfies dataprocessing inequality for all α ∈ [−∞,∞]:

Dα(Λ(p)‖Λ(q)) ≤ Dα(p‖q), (A4)

where Λ is a stochastic map.

2. Renyi entropy

The Renyi entropies are defined for α ∈ R \ 0, 1 as

Hα(p) =sgn(α)

1 − αlog

n∑

i=1

pαi , (A5)

where sgn(α) has been defined in (A2). Again, for α ∈ −∞, 0, 1,∞ we define Hα by taking limits. Explicitly wehave

H0(p) = log rank(p), H1(p) = −n∑

i=1

pi log pi, H∞(p) = − log pmax H−∞(p) = log pmin (A6)

where rank(p) is number of nonzero elements of p,and pmax, pmin are the maximal and minimal element of p, respec-tively. The Renyi entropies can be recovered from the relative Renyi entropies as follows

Hα(p) = sgn(α) logn−Dα(p‖η) , (A7)

with η = 1n, 1n, ..., 1

n is uniform probability distribution.

It is worth noting that the Renyi divergences and entropies have generally been defined only for positive alphas.However, for completeness, we have generalized the definitions to negative alphas so that conditions for state trans-formations can be described fully by these quantities.

6

3. Smoothing relations

It turns out that our infinite set of conditions can be verified by checking just two conditions in an approximatesense. This applies to the case where only the conditions for α ≥ 0 are relevant. However, as we will argue later thisis generally sufficient.

Let us first explain how this works by considering only the Renyi entropies. These are relevant for the case of thetrivial Hamiltonian. Note that if α ≥ β then for all distributions p we have Hα(p) ≤ Hβ(p). The key to approximatelyreducing the number of conditions is to note that there exists a distribution quite close to p such that up to someerror terms the entropies can also be related in the opposite direction. Closeness is thereby measured in terms of thestatistical distance and we use Bǫ(p) = p′ : 1

2

∑

i |pi − p′i| to denote the ǫ ball of (sub-normalized) distributions p′

around p. We will also called such a p′ a smoothed distribution [35, 36].Specifically, we will show (see Lemma 18) that in fact for any 0 < α < 1, any distribution p and any ǫ > 0, there

exists a smoothed distribution p′ ∈ Bǫ(p) such that

H0(p) ≥ Hα(p) ≥ H0(p′) − log 1ǫ

1 − α. (A8)

Similarly, whenever α > 1 there exists another distribution p′′ ∈ Bǫ(p) such that

H∞(p′′) +log 1

ǫ

α− 1≥ Hα(p) ≥ H∞(p). (A9)

This means that whenever we demand that Hα(ρ) ≤ Hα(ρ′) for all values of α ≥ 0, we can reduce the set of conditionsin an approximate sense by relating Hα to H0 or H∞. More precisely, given probability distributions p and q andǫ > 0 we can construct smoothed distributions p′ ∈ Bǫ(p) and q′′ ∈ Bǫ(q) according to explicit smoothing strategiesas in Lemma 18. If the following conditions are satisfied

• For 0 < α < 1, H0(p′) − log 1ǫ

1−α≥ H0(q)

• For α > 1, H∞(p) ≥ H∞(q′′) +log 1

ǫ

α−1 ,

then ∀α > 0, it holds that Hα(p) ≥ Hα(q). As we will see these conditions can also be expressed in terms of smoothedentropies [35], however, we would like to emphasize that there are in fact only two smoothing strategies, one for α ≥ 1and one for α < 1. This means that one could apply these smoothing strategies, and only verify the two conditionsstated above. It should be emphasized however that this allows a verification in one direction only. Namely, if we findthat the conditions above are satisfied, then we can conclude that also the original conditions are satisfied for all α.However, due to the approximations above the converse does not hold in the sense that the original conditions maybe satisfied and yet the fudge terms in the conditions above no longer allow for a verification.

A similar statement can be made for the Renyi divergences, which are relevant for the case of a non-trivialHamiltonian. Here we want to check whether given initial and final states ρ and ρ′ we have that for all α ≥ 0,Dα(ρ‖ρβ) ≥ Dα(ρ′‖ρβ). Again, we have a bound in one direction as the Renyi divergences are monotonically increas-ing in α, hence Dα(ρ′‖ρβ) ≥ Dβ(ρ′‖ρβ) whenever α ≥ β [37]. As we show in the appendix one can again obtain abound in the other direction by considering smoothed distributions. We show (Lemma 20), that for any distributionp there exists a particular smoothed distribution p′ ∈ Bǫ(p) such that Dα(p‖q) ≥ D∞(p′‖q) − c for all α > 1. c is alogarithmic fudge factor that depends on the smoothing parameter. A similar statement holds for α < 1, relating Dα

to D0 (see Lemma 21). Again, the smoothed distributions thereby only depend on ǫ and whether α > 1 or α < 1.These relations now again allow us to simply our conditions in an approximate sense. Consider probability distri-

butions p, q and r where q has full rank rank(q) = n. For ǫ > 0, apply the ǫ-smoothing strategy in proof of Lemma20 for p to obtain p′ ∈ Bǫ(p). Then if

D∞(p′‖r) & D∞(q‖r), (A10)

we have Dα(p‖r) ≥ Dα(q‖r) for α > 1. Similarly, apply the ǫ-smoothing strategy in Lemma 21 for q to obtainq′ ∈ Bǫ(q). Then if the following conditions are satisfied

D0(p‖r) & D0(q′‖r),

then ∀α > 0, it holds that Dα(p‖r) ≥ Dα(q‖r). We refer to Section F for details.

7

4. Majorization and Schur convexity

There is a partial order between probability distributions called majorization, which is defined for arbitrary vectorsx, y ∈ R+

k . We say that x = (x1, . . . xk) majorizes y = (y1, . . . yk) if for all l = 1, . . . k

l∑

i=1

x↓i ≥l∑

i=1

y↓i , andd∑

i=1

xi =d∑

i=1

yi, (A11)

where x↓ is a vector obtained by arranging the components of x in decreasing order: x↓ = (x↓1, . . . , x↓k) where

x↓1 ≥ . . . ≥ x↓k. We write

x ≻ y (A12)

to indicate that x majorizes y.A function f is called Schur convex if is monotonic under majorization, i.e. if y ≻ x implies f(x) ≥ f(y). A function

is called strictly Schur convex if y ≻ x implies f(x) > f(y) except when x↓ = y↓. An useful criterion for strict Schurconvexity is the following

Lemma 1. A function f : Rk+ → R of the form f(x) =

∑

i g(xi) is (strictly) Schur convex/concave, iff g is (strictly)convex/concave.

The lemma follows from the general criterion of Schur convexity [38]. However it is easy to prove it directly, usingBirkhoff-von Neumann theorem, which states that when p ≻ q then q is a convex combination of permutations of p(cf. Theorem 5). Using it, and strict monotonicity of logarithm, the following lemma holds:

Lemma 2. For α > 0 the Renyi entropies Hα are strictly Schur concave. For α < 0 the Renyi entropies Hα arestrictly Schur convex. For α = 0,∞ he Renyi entropies are Schur concave (but not strictly Schur concave). Forα = −∞ Renyi entropy is Schur convex (but not strictly Schur convex).

Appendix B: Exact catalysis with trivial Hamiltonian

We are interested in the interplay of energy and entropy, which is the essence of thermodynamics. However,before we approach this problem, it is instructive to first consider the case where the Hamiltonian is trivial, andthermodynamics is reduced to bare information theory. This toy model for thermodynamics was described in [2] andit has its roots in the problem of exorcising Maxwell demon [30, 31].

It is constructed within a framework of so called resource theories fo thermodynamics [2, 5, 15, 16] having itsroots in research on entanglement manipulations [17–20], where one is intereted in transformations between states bymeans of an allowed class of operations. Some states can be brought for free, and they constitute a free resource. Theothers cannot be created, but only manipulated, i.e. we may transform one resource state into some other one. Inthe mentioned toy thermodynamics, which is perhaps the simplest known resource theory, the free resource are justmaximally mixed states, and all unitary transformations as well as partial trace are allowed operations. The emergingclass of operations was called noisy operations. It was shown that in the case of systems of the same size, the class ofnoisy operations is equivalent to mixtures of unitaries. Therefore the condition that ρ can be transformed into ρ′ isequivalent to majorization: ρ can be transformed into ρ′ iff the spectrum of ρ majorizes the spectrum of ρ′ [39]. Thenoisy operations are equivalent to thermal operations applied to a system with a trivial Hamiltonian.

As we mentioned, in this toy thermodynamics, the law that governs state-to-state transitions is majorization.However, this is only so when we do not allow ancillary systems that are then returned in the same state. It isinteresting to analyse how the laws change if we allow catalytic transitions between states. Since the transitionswithout catalysis are governed by majorization, the catalytic transitions will be governed by trumping. We say thatx can be trumped into y if there exists some z such that

x⊗ z ≻ y ⊗ z. (B1)

In [11] it was for the first time shown that using catalysis, one can perform transitions otherwise impossible. Inparticular the following explicit example of states that do not majorize one another, but allow catalytic transitionwas given

p =

(4

10,

4

10,

1

10,

1

10

)

, q =

(1

2,

1

4,

1

4, 0

)

(B2)

8

One checks that p1 < q1 but p1 + p2 > q1 + q2. Now, if one takes catalyst r = ( 610 ,

410 ), then q ⊗ r ≻ p⊗ r, so that q

can be trumped into p.Recently Klimesh and Turgut [12, 13] independently provided necessary and sufficient conditions for x to be trumped

into y. These were in terms of Renyi entropies Hα(x) or closely related functions. In order to make contact with whatis to follow, we present here a set of conditions, equivalent to the Klimesh-Turgut ones, written in terms of Renyidivergences. We first argue, that original conditions can be equivalently stated in terms of non-strict inequalities, whichin particular allows to remove any discrete subset of conditions by exploiting continuity with respect to parameter αbelow.

Proposition 3. Let x ∈ R+k and y ∈ R+

k be probability vectors which do not both contain components equal to zero.Then x can be trumped into y if, and only if,

Dα(x||η) ≥ Dα(y||η), (B3)

for all α ∈ (−∞,∞), with η = (1/k, . . . , 1/k) the uniform distribution.

Remark 4. Note that if any of the components of x is 0, then fα(x) = ∞ for α ≤ 0, and only α > 0 are relevant.

Proof. (Proposition 3) Klimesh [12] proved that x can be trumped into y if, and only if, for all α ∈ (−∞,∞):

fα(x) > fα(y), (B4)

where

fα(x) =

log∑k

i=1 xαi (α > 1);

∑ki=1 xi log xi (α = 1);

− log∑k

i=1 xαi (0 < α < 1);

−∑k

i=1 log xi (α = 0);

log∑k

i=1 xαi (α < 0),

(B5)

All the functions above are strictly Schur concave. Therefore Lemma 2 gives that for x↓ 6= y↓, Eq. (B4) isequivalent to fα(x) ≥ fα(y) for all α ∈ (−∞,∞). The conditions of the Theorem, namely Dα(x||η) ≥ Dα(y||η) forall α ∈ (−∞,∞), are equivalent to fα(x) ≥ fα(y) for all α, except for α = 0. However the α = 0 case is redundantand can be eliminated, as

k∑

i=1

log xi = limα→0+

α− 1

αDα(x||η) (B6)

We note that earlier, Aubrun and Nechita [40] gave conditions for trumping, in which only Hα with α > 1 wereneeded. This is because they considered a special kind of closure, where one is allowed to add an arbitrary numberof zeros to the initial vector x while returning an (arbitrarily good) approximation of the needed output y. This is akind of embezzling (see section E 1 b for definition of embezzling): One adds an ancilla and returns it with arbitrarysmall error, but the size of ancilla needs to grow in order to make the error smaller. In thermodynamics this is notallowed, as according to the second law, we should consider processes which do not change the environment. Theproposition 3 already suggests how the conditions for thermodynamics should look like: the maximally mixed stateshould get replaced by the Gibbs state. In section C we will prove that this is indeed the case.

1. Restricting to α ≥ 0 by investing a small amount of extra work

The Klimesh-Turgut conditions have some peculiar feature. Namely, if there are zeros in both x and y we haveto truncate the zeros and compute the conditions on smaller vectors. This means that we do not simply comparetwo functions, but which function we will choose depends on some relation between the vectors. More importantly,the condition of having zeros in both vectors is very unstable. If we slightly perturb y so that we remove zeros,then we do not need to truncate anymore. Note that the problem here is only with negative α, as for positive α thefunctions Dα(p||η) do not depend on additional zeros, while the functions with negative α are infinite, when at leastone component vanishes.

Let us argue, that if we are allowed to invest an arbitrary small amount of work, only the conditions with positiveα are relevant. We have three cases

9

(i) p has more zeros than q.

(ii) p has less zeros than q.

(iii) p and q have the same number of zeros.

In case (i), after truncation p will still have zeros, and so relative entropies for p with negative α will be infinite.Therefore the conditions with negative α are always satisfied. In case (ii) the transition cannot be realised, but thisis reported by comparing ranks, which can be obtained using H0 = limα→1+ . So again already the conditions withα > 0 report the impossibility of transition, and negative α’s are not needed. Finally, in case (iii), if we consider thetransition p→ q, with a small amount of work invested in addition:

p⊗ |0〉d+1〈0| ⊗ ηd → q ⊗ |0〉d〈0| ⊗ ηd+1, (B7)

where |0〉k〈0| stands for the distribution (1, 0, . . . , 0)︸︷︷︸

k

(equivalently, a pure state on Ck) and ηk stands for the uniform

distribution with k elements. The invested amount of work is log d+1d

, hence it is arbitrary small, when d→ ∞. Now,on the left hand side, we have more zeros than on the other side (as initially, we had the same number of zeros).Therefore, we are back to case (i).

Note however, that we can rule out negative α in a different way, if instead of insisting on preparing exact outputstate, we allow for preparing an ε-approximation, with arbitrary accuracy. Therefore, in case (iii) considered abovewe can add to both sides an ancilla in a pure state of the same dimension. Then both input and ouptut will have thesame number of zeros. But as the desired output we can take an approximation of it which is full rank. Then thetransition is governed solely by the conditions with positive α. The returned state of ancilla is now only approximatelypure, but the accuracy is arbitrarily good. Note that we can choose the approximation in such a way that it affectsonly the ancilla; the original ouptut state is not changed and will be produced exactly.

The above two methods of ruling out negative α do not differ very much: in the first one we input some pure stateand return a pure state of dimension smaller by 1, while in the second we input a pure state and return its arbitrarygood approximation.

Appendix C: Catalysis with general Hamiltonians

We now turn to the case of the full theory of thermodynamics, where we have an interplay between energy andentropy. In this case, the Hamiltonian of the system and reservoir are arbitrary. To derive the conditions for statetransformations, we will need a generalisation of the majorisation condition, known as d-majorisation, which wedescribe in C 1. We then derive the catalytic version of this, in C 2. Next we show that we can restrict the form of thecatalst that is required for transitions where the initial state is block diagonal in the energy eigenbasis. We can thenapply our results to the case of thermodynamics to derive second laws – these are stated seperately in Section D. Wewill later in Section E discuss the case where the thermodynamic processes do not return the catalyst in exactly thesame state, but only approximately.

1. d-majorization

The result connecting majorization and transitions between state can also be stated as follows (this is known asBirkhoff-von Neumann theorem):

Theorem 5. For two probability distributions p and p′ the following two conditions are equivalent:

(i) p majorizes p′.

(ii) there exists a channel Λ such that

Λ(p) = p′, Λ(η) = η, (C1)

where η is the uniform distribution.

10

standard majorization d-majorization

no catalysis d(p|η) ≻ d(p|η) d(p|q) ≻ d(p|q′)

∃Λ : Λ(p) = p′, Λ(η) = η ∃Λ : Λ(p) = p′, Λ(q) = q′

∀l

∑l

i=1pi ≥

∑l

i=1p′i comparing diagrams (Fig. 1)

catalysis Dα(p|η) ≥ Dα(p′|η) Dα(p|q) ≥ Dα(p

′|q′)

∃Λ, r : Λ(p⊗ r) = p′ ⊗ r, Λ(η ⊗ η) = η ⊗ η ∃Λ, r, s : Λ(p⊗ r) = p′ǫ ⊗ r, Λ(q ⊗ s) = q′ ⊗ s

TABLE I: Partial orderings as criteria for state transformations. p′ǫ is an approximation of p′

Given probability distributions p, q, p′, q′ the d-majorization order is defined as follows:

d(p||q) ≻ d(p′||q′) (C2)

if, and only if, for any convex function g,

∑

i

qig

(piqi

)

≥∑

i

q′ig

(p′iq′i

)

. (C3)

In [41] the following generalization of the Birkhoff-von-Neumann theorem to d-majorization was provided:

Proposition 6 ((Theorem 2 of [41])). For probability distributions p, p′, q, q′ the following two conditions are equiva-lent:

• p is smaller or equal to p′ with respect to d-majorization,

d(p||q) ≻ d(p′||q′). (C4)

• There exists a channel Λ such that

Λ(p) = p′, Λ(q) = q′. (C5)

In fact a limited set of convex functions is sufficient, which leads to a discrete set of conditions. In the case whenq = q′, the conditions can be expressed by the so-called thermo-majorization diagrams identified in [5]. The thermo-majorization diagrams can be easily extended also to the case q 6= q′, although this is not relevant for our applicationto thermodynamics in which q = q′ is the Gibbs state.

One can see that Proposition 6 implies Theorem 5, by taking q = q′ = η and noting that

p ≻ p′ ↔ d(p|η) ≥ d(p′|η). (C6)

A natural question is whether there is a trumping analogue for d-majorization. We show that indeed there is ananalogue in section C 2, where we prove a version of Prop. 6 allowing catalysis. We will recover analogous relationsto the Klimesh-Turgut ones. We illustrate the situation in Table I.

2. Catalytic d-majorization

The main result of this section is the following theorem:

Theorem 7. For probability distributions p, p′, q, q′, with q, q′ full rank, the following conditions are equivalent:

(i)

Dα(p||q) ≥ Dα(p′||q′), (C7)

for all α ∈ (−∞,∞).

11

(ii) For all ε > 0 there exist probability distributions r, p′ε and a stochastic map Λ such that

‖p′ − p′ε‖1 ≤ ε, (C8)

and

Λ(p⊗ r) = p′ε ⊗ r, Λ(q ⊗ η) = q′ ⊗ η, (C9)

with η the uniform distribution.

Remark 8. Theorem 7 can be viewed as a generalization of trumping relations [12, 13], on the one hand, and asa generalization of the d-majorization result of [41], on the other hand. It gives an operational interpretation to theRenyi divergences, answering the question posed in [33].

Remark 9. Similarly as in thermodynamics with a trivial Hamiltonian as described in section B1, investing anarbitrarily small amount of work, one can remove the conditions with negative α. Indeed we can consider the transitionp→ q with a small amount of extra work, p⊗|0〉d+1〈0|⊗ηd → q⊗|0〉d〈0|⊗ηd+1, where we are assuming the Hamiltonianof the extra system is trivial. Then p⊗ |0〉d+1〈0| ⊗ ηd will have a non-zero eigenvalue and the corresponding Dα willdiverge for α < 0.Alternatively, the conditions with negative α’s can be removed when, exactly as in the case of the trivial Hamiltonian,

one borrows one qubit in a pure state (with arbitrary Hamiltonian, which can be the trivial one) and returns it witharbitrary good fidelity.

To prove Theorem 7 we need a bit of notation. Consider the simplex of probability distributions piki=1 and afamily of natural numbers di, i = 1, . . . , k. We define the embedding

Γ(p) = ⊕ipiηi, (C10)

where ηi = 1di, . . . , 1

di is the uniform distribution. We also consider the inverse map Γ∗ that acts on the space of N

dimensional probability distributions p, where N =∑k

i=1 di. Writing

p = ⊕ki=1p

(i), (C11)

where p(i) is a (unnormalized) di-dimensional probability distribution, Γ∗ can be written as

Γ∗(p) = r, (C12)

with r = riki=1 the probability distribution given be ri =∑di

j=1 p(i).

The maps Γ and Γ∗ are channels, and for all probability distributions p we have Γ∗(Γ(p)) = p, i.e. Γ∗ is the inverse

of Γ on the image of the latter. Moreover, Γ(q) = I/N for q = d1

N, . . . , dk

N.

We will make use of the following two simple lemmas:

Lemma 10. Let p = piki=1 be a probability distribution, and di and N be natural numbers satisfying∑k

i=1 di = N .Define the following fine-grained probability distribution

p =

p1d1, . . . ,

p1d1

︸︷︷︸

d1

, . . . ,pkdk, . . . ,

pkdk

︸︷︷︸

dk

,(C13)

and let γ be defined as

γ =

d1N, . . . ,

dkN

. (C14)

Then for α ∈ [−∞,∞] we have

Dα(p||γ) = Dα(p||η), (C15)

with ηN = (1/N, . . . , 1/N) the uniform distribution.

12

Proof. Let us first assume that α 6∈ −∞, 0, 1,∞. Then

Dα(p||γ) =1

α− 1

∑

i

pαi

(diN

)1−α

= logN − sgn(α)Hα(p) = Dα(p||η) (C16)

For α ∈ −∞, 0, 1,∞ one can obtain the relation by considering limits. However let us show this explicitly. Forα = 0 we have

D0(p||γ) = − log∑

i:pi 6=0

γi = − log∑

i:pi 6=0

diN

= logN − log∑

i:piN

6=0

di = D0(p||η). (C17)

For α = ∞ we have

D∞(p||γ) = log maxi

λ : pi ≤ λγi = log miniλ : pi ≤ λ

diN

= log

(

N maxi

pidi

)

= logN −H∞(p) = D∞(p||η),(C18)

and similarly for α = −∞. Finally, for α = 1

D1(p||γ) =k∑

i=1

pi logpiγi

=k∑

i=1

dipidi

logpiN

di= logN −H(p) = D(p||η). (C19)

Lemma 11. Let p 6= q be distributions with q full rank. Then for all α ∈ (−∞,∞) and all 1 > δ > 0,

Dα((1 − δ)p+ δq||q) < Dα(p||q) (C20)

Proof. Direct by inspection

We are now ready to prove Theorem 7.

Proof. Proof of the theorem 7.”(i) → (ii)”. Suppose that Eq. (C7) holds. Define p′′ = (1 − δ)p′ + δq′, for δ > 0 Using Lemma 11 we find that

Dα(p||q) > Dα(p′′||q′). (C21)

Let us first prove the result assuming both q and q′ are rational, i.e. q = (d1/N, . . . dk/N) and q′ = (d′1/N, . . . , d′k/N)

for integers d1, . . . , dk, d′1, . . . , d

′k and N . Later we will show how to extend the argument to the general case.

From Theorem 3, Lemma 10 and Eq. (C21) we find the distribution p = ⊕iηipi can be trumped into p′′ = ⊕iη′i(p

′′)iwhere ηi, η

′i are maximally mixed states of dimensions di and d′i, respectively. I.e. there exists a probability distribution

r (the catalyst) and a bi-stochastic map Φ such that

Φ(p⊗ r) = (p′′ ⊗ r) (C22)

We now consider the following mapping

Λ = (Γ′∗ ⊗ I) Φ (Γ ⊗ I). (C23)

The maps Γ and Γ′ are defined according to Eq. (C10); Γ is associated with dii and Γ′ with d′ii. The map Λtransforms p⊗ r into p′′ ⊗ r. We then would like to have that it also transforms q ⊗ s into q′ ⊗ s for some s. As s weshall take the uniform distribution η. Indeed, we note that Γ(q) = I/N , so that (Γ⊗ I)(q⊗ η) is the maximally mixeddistribution. Since Φ is bistochastic, it preserve this distribution. Finally, by definition of Γ′ we have Γ′∗(I/N) = q′.

We now turn to the case in which either q or q′ is not rational. We start constructing probability distributions qand q′ which are rational and approximate the distributions q and q′:

q :=

d1N, . . . ,

dkN

, q′ :=

d′1N, . . . ,

d′kN

, (C24)

where∑k

i=1 di = N,∑k

i=1 d′i = N . Note that

|qi − qi| ≤1

N, |q′i − q′i| ≤

1

N(C25)

13

and so

‖q − q‖1 ≤ k

N, ‖q′ − q′‖1 ≤ k

N. (C26)

Thus by continuity also Dα(p||q) > Dα(p′′||q′), for N large enough.We now introduce a correction operation E that maps q into q while not perturbing p too much. Indeed one can

define a stochastic map E such that

E(q) = E(q), ‖E(p) − p‖1 ≤ O(k/N). (C27)

One construction of E we present at the end of this section. Thus Eq. (C21) implies that for N sufficiently large,

Dα(E(p)||q) > Dα(p′′||q′). (C28)

Following the first part of the proof (which established the result for rational q and q′) we find that there is acatalyst r and a stochastic operation Λ such that

Λ(E(p) ⊗ r) = p′′ ⊗ r, Λ(E(q) ⊗ η) = q′ ⊗ η, (C29)

with η the maximally mixed distribution.Let us define a second correction map E′ that maps q′ into q′ while not perturbing p′′ too much. It’s defined such

that

E′(q′) = q′, ‖E′(p′′) − p′′‖1 ≤ O(k/N). (C30)

We construct the channel similarly as the channel E. Our final stochastic map is E′ Λ E. We have

(E′ ⊗ I) Λ (E ⊗ I)(q ⊗ η) = q′ ⊗ η, (C31)

and

(E′ ⊗ I) Λ (E ⊗ I)(p⊗ r) = p′′′ ⊗ η, (C32)

with p′′′ = E′(p′′) such that

‖p′′′ − p‖ ≤ δ +O

(k

N

)

. (C33)

Since N and δ can be chosen arbitrarily, the first part of the theorem follows.”(ii) → (i)”. Suppose that for all ε > 0 there exist probability distributions r, p′ε and a stochastic map Λ such that

‖p′ − p′ε‖1 ≤ ε, (C34)

and

Λ(p⊗ r) = p′ε ⊗ r, Λ(q ⊗ η) = q′ ⊗ η, (C35)

with η the uniform distribution. Then by monotonicity of the Renyi divergences,

Dα(p′ε ⊗ r||q′ ⊗ η) ≤ Dα(p⊗ r||q ⊗ η), (C36)

which equals

Dα(p′ε||q′) +Dα(r||η) ≤ Dα(p||q) +Dα(r||η), (C37)

by additivity. Finally by continuity and taking the limit ε→ 0 we find

Dα(p′||q′) ≤ Dα(p||q). (C38)

14

Construction of channel E satisfying (C27). Let I+ be subset of those i for which qi ≤ qi and for these i we set∆+

i = qi − qi. Moreover, let I− be subset of those i, for which qi > qi, and for these i we define ∆−i = qi − qi. We

first set transition probabilities p(i→ j) of E for (i, j) being both from I+ of both from I−:

p(i→ j) =

1 for i = j, i, j ∈ I+0 for i 6= j, i, j ∈ I+

1 − ∆−

i

qifor i = j, i, j ∈ I−

0 for i 6= j, i, j ∈ I−

(C39)

Now consider cross terms. To this end we note that∑

i∈I+∆+

i =∑

i∈I−∆−

i ≡ ∆. With slight abuse of notation, let

us consider ∆+i ’s as subsets of an interval of total length ∆ (put in arbitrary order), and same for ∆−

i ’s. We thendefine ∆−

i→j = ∆+i ∩ ∆−

j . Now we are prepared to set the other transition probabilities

p(i→ j) =

0 for i ∈ I+, j ∈ I−∆−

i→j for i ∈ I−, j ∈ I+(C40)

One can verify that the above transition probabilities form a valid channel. We now will use the following estimatevalid for arbitrary distribution r

||r − E(r)|| ≤ 2 maxi

(1 − p(i→ i)) (C41)

Since ||q − q|| = 2∆, we obtain that In our particular case we have p(i→ i) ≥ 1 − ||q−q||qi

so that

||p− E(p)|| ≤ 4||q − q||mini qi

= O

(k

N

)

(C42)

where we treat 1mini qi

as some constant, since q is some fixed distribution, that does not depend on N and moreover

q is full rank.

3. For diagonal input state of the system diagonal catalyst is enough

Here we will show that if the initial state of the system is block diagonal in the energy eigenbasis, then the diagonalof the output state does not depend on coherences of the catalyst but only on its diagonal. This means that if weare interested only in the diagonal of the output state of the system, we can replace the catalyst with its dephasedversion (i.e. put a diagonal catalyst that has the same diagonal as the original one). The conditional probabilitiesform the channel, which maps the initial diagonal to the final diagonal of the state of the system. By block diagonal,we mean that the state can be written as

ρ =∑

σijk |Ei, j〉〈Ei, k| (C43)

where |Ei, j〉 has energy Ei and has degenerate levels labled by j.To see that only block diagonal catalysts are needed, we write the initial state as

ρinRSC = ρinR ⊗ ρinS ⊗ ρinC , (C44)

where ρinR is the heat bath, which is of course diagonal, ρinC is the state of catalyst that is arbitrary, and ρinS is thestate of the system which we assume to be diagonal. We then act with an energy-preserving unitary U and get theoutput state

ρoutRSC = UρinRSCU†. (C45)

We now compute the diagonals element of ρoutS , and will see that they depend only on the diagonal elements of ρinC .We have

〈ES |ρoutS |ES〉 =∑

ER,EC

〈ER, ES , EC |ρoutRSC |ER, ES , EC〉 (C46)

15

(here and in the following we sum over energies as well as degeneracies). This can be written as


ER,EC

∑

E′

R,E′

C ,E′

S,E′′

C

〈ER, ES , EC |U |E′RE

′SE

′C〉〈E′

R, E′S , E

′C |ρinRSC |E′

R, E′S , E

′′C〉×

×〈E′R, E

′S , E

′′C |U †|ER, ES , E

′C〉

where we used that ρinR and ρinS are diagonal. Since U preserves energy, we have ER + ES + EC = E′R + E′

S +E′C as

well as ER + ES + EC = E′R + E′

S + E′′C . This implies that E′

C = E′′C . We thus obtain


E′

S

p(ES |E′S)〈E′

S |ρoutS |E′S〉, (C47)

where the conditional probabilities are given by

p(ES |E′S) =

∑

ER,EC

∑

E′

R,E′

C

|〈ER, ES , EC |U |E′R, E

′S , E

′C〉|2〈E′

R|ρinR |E′R〉〈E′

C |ρinC |E′C〉. (C48)

One easily finds that indeed this is a valid conditional probability.

Appendix D: The Second Laws

We can now formulate the proposed version of the Second Laws.

Theorem 12. If we have access to a single heat bath ρβ, with inverse temperature β and ancillas that should bereturned in their initial sate, and are allowed to use initially an arbitrarily small amount of work, then a state ρ blockdiagonal in the energy eigenbasis can be transformed with arbitrary accuracy into another block diagonal state ρ′ underthermal operations if and only if, for all α > 0

Dα(ρ||ρβ) ≥ Dα(ρ′||ρβ) (D1)

We will prove it using Theorem 7. Consider initial and final states of the system ρS and ρ′S with eigenvalues p and

p′. We take q = q′ = ρβS . Suppose first that for α > 0

Dα(ρS ||ρβS) ≥ Dα(ρ′S ||ρβS) (D2)

According to Remark 9 if we invest an arbitrary amount of work, then the conditions with negative α are all satifiedfor arbitrary states ρS and ρ′S . Then, using Theorem 7, we get that there exists a channel Λ and uniform distributionη such that (i) Λ preserves the state q ⊗ η, (ii) Λ sends p ⊗ r into p′ǫ ⊗ r, where p′ǫ approximates p′. Condition (i)means that we can take the catalyst system with trivial Hamiltonian. Then

Λ(ρβS ⊗ ρβC) = ρβS ⊗ ρβC (D3)

where ρβC is the maximally mixed state on catalyst system C, i.e. Λ preserves the Gibbs state of the system SC.However thermal operations are precisely the operations that preserve the Gibbs state [5, 42].

Conversely, let us assume that for ρS and ρ′S described above there exists a quantum channel Λ, and a system Cwith the hamiltonian HC , and state ρC such that

Λ(ρβS ⊗ ρβC) = ρβS ⊗ ρβC , Λ(ρS ⊗ ρC) = ρ′S ⊗ ρC . (D4)

In Sec. C 3 we have shown that since ρS and ρ′S are diagonal, we can take the state ρC to be diagonal too, and theabove condition reads

Λcl(q ⊗ s) = q ⊗ s, Λcl(p⊗ r) = p′ ⊗ r, (D5)

where Λcl is now a classical channel. Then from Theorem 7 we obtain that

Dα(p||q) ≥ Dα(p′ǫ||q′). (D6)

By continuity this means that for all α > 0,

Dα(ρS ||ρβS) ≥ Dα(ρ′S ||ρβS). (D7)

16

1. Application: Landauer erasure with a quantum Maxwell demon

As an application of the above second laws, we consider a special case of a Maxwell demon with a memory Q instate ρQ, who wants to reset a system S in state ρS to some pure state. One imagines that the demon’s memory iscorrelated in some way with the system so that the total state of demon plus system is ρSQ. The demon wishes toreset the state of the system by transforming it to some pure state, but the state of the demon’s memory should notchange. This can be seen as a fully quantum version of the standard Maxwell demon/Landauer erasure scenario, andhas been considered in the case of a trivial Hamiltonian[4, 9] and when the Maxwell demon does not have access toancilliary systems. The result of [4] gave a thermodynamic interpretation to the notion of negative information [43].We will see here, that the amount of work which is required for such an operation, is quantitatively different whenall thermodynamical operations are considered. This suggests that in the single-shot scenario, the notion of partialinformation takes on a different form, which we shall now derive.

In fact, the result follows immediately from the work distance of Equation (4), namely, we have that the cost ofresetting the system Wreset to the pure state ψE with energy E is given by

Wreset ≤ infα

[Dα(ρQS‖ρβQS) −Dα(ρQ ⊗ ψE‖ρβSQ)]

= infα

[Dα(ρQS‖ρβQS) −Dα(ρQ‖ρβSQ)] + E (D8)

In the case when the Hamiltonian is trivial, this reduces to

Wreset ≤ infα

[Hα(ρQS) −Hα(ρQ)] (D9)

When this quantity is negative, the demon can reset the system to a pure state, and not only does this not cost work,but the corresponding amount of work is actually gained. Not only can thermodynamical work be negative, as shownin [4], but it can be even more negative! In the sense that catalytic operations can be used to gain more work thanwould be otherwise possible. Note that this gives the difference of Renyi entropies of Equation (D9) an operationalinterpretation, very similar to that enjoyed by the conditional von-Neumann entropy in the case of identically andindependently distributed states.

Appendix E: Approximate catalysis

So far we considered exact catalysis, i.e. the process is perfectly cyclic and the catalyst should be in the same stateas it initially was. However, this is usually an unreasonable demand because in physical processes there are someunavoidable inaccuracies. Therefore, we ask what are the conditions for transformations, when the catalyst ρinC canbe returned in a state ρoutC that is merely ”close” to the initial state.

1. How to quantify ”approximate”

As already discussed in the introduction, it turns out that what is meant by ” close ” matters greatly.

a. Meaning of the L1 distance

At first glance, one might be tempted to demand that close should mean that ρinC is close to ρoutC in terms of thefidelity, or essentially equivalently, the trace norm distance. The trace norm distance between two states ρ and σ canbe written as

‖ρ− σ‖1 = max0≤M≤I

tr [M (ρ− σ)] . (E1)

We say that ρ and σ are ǫ-close if ‖ρ− σ‖1 ≤ ǫ. It enjoys an appealing operational interpretation as being ǫ-close intrace norm distance means that if we were given states ρ and σ with probability 1/2, then our probability of correctlydistingiushing them by any physically allowed measurement is bounded by 1/2 + ǫ/2. In other words, being close inthe trace distance means that the two states cannot be distinguished well by any physical process. In terms of thecatalyst, one might hence ask that ‖ρinC − ρoutC ‖1 ≤ ǫ.

17

Finally, we note that when considering how cyclic a process is, then we might also ask about preserving correlationsbetween the catalyst ρinC and its purifying systems. Clearly, there exist simple operations that preserve the catalysteven exactly, and yet do not preserve correlations with separated reference systems holding the purification. Considerfor example a catalyst of the form ρinC = ρ1 ⊗ ρ2 where ρ1 = ρ2 and the purifications of ρ1 and ρ2 are held separatelyby systems 1 and 2 respectively. Returning the catalyst as ρ2 ⊗ ρ1 preserves the catalyst, but it requires interactionbetween the two purifying systems of ρ1 and ρ2 in order to restore the correlations with the catalyst.

b. The embezzling state dilemma: when there is no second law

As it turns out, however, such a measure which is independent of the number of particles in the system is not useful.As mentioned already in the introduction, there is the phenomenon of embezzling [29] in entanglement theory, whereat an expense of increasing the size of the catalyst, one can perform arbitrary transformation with arbitrary goodfidelity, while returning the catalyst in a state arbitrarily close to the initial state. More specifically, we can use theresults of [29] to show that for any ǫ, there exists a dimension d such that using the catalyst

ρinC =d∑

j=1

1

j|j〉〈j| , (E2)

allows us to transform any state ρ ǫ-close to any state ρ′ such that ‖ρoutC − ρinC ‖1 ≤ ǫ. Demanding that the catalystby returned close in the trace distance is thus too weak a condition. Intuitively, the reason why it is too weak is thatwe may still need to consume much work to obtain the original catalyst from its returned version.

If we therefore concieve of an approximately cylic process, as one which the working body is returned in a statewhich is ǫ close in fidelity to it’s original form, then there is no second law. All state transformations are possible.

2. The work distance

Here, we thus take a very operational perspective on the problem of inexact catalysis. More precisely, we propose totake as a reference point, exact catalysis, and require that the state that is returned should require a small amount ofwork in order to be restored to its original state, by using exact catalysis. This is natural, in the sense that if someoneloans you a catalyst, then they would want it returned in such a way that it would not require a large amount of workto return it back into it’s original state.

We thus consider the inexact transition

ρinS ⊗ ρinC → ρoutSC (E3)

and require that to obtain from ρoutSC the original state of the catalyst and the required output state of the system, weneed not input more than a small amount of work.

This prompts our definition of the work distance below. To make this precise, let us take a closer look how onecan derive upper bounds on the amount of work needed for state transformation, namely to restore the catalyst to itsoriginal form. To input this amount of work, we can append a battery to either provide or extract work, to facilitatethis transformation. We then apply our conditions for state transformation to the state and the battery together.The battery we use is a two level system called a wit, introduced previously in [5]. The wit initially starts out in theenergy eigenstate ωi = |0〉〈0| which has energy Ei = 0. At the end of the process, the battery is in another energyeigenstate ωf = |W 〉〈W |, having energy W . W can be either negative or positive, depending on whether work is usedfrom or stored in the battery system. This means that the following transition is possible

ρoutSC ⊗ |0〉〈0|W → ρoutS ⊗ ρinC ⊗ |W 〉〈W |W , (E4)

while the Gibbs state of the battery system is given by

τW =e−βW |W 〉〈W | + |0〉〈0|

1 + e−βW, (E5)

where β = 1kT

, k being the Boltzmann constant and T being the temperature of the bath, where the system andbattery is submerged in. For this transformation to be possible, it is required in Theorem 7 that the followingconditions hold for all α ≥ 0,

Dα(ρinS ⊗ ρinC ⊗ ωi‖ρβS ⊗ τW ) ≥ Dα(ρoutS ⊗ ρinC ⊗ ωf‖ρβS ⊗ τW ). (E6)

18

Since the initial, final, and Gibbs state of the battery is known and depends only on the parameter W , we can thenderive an upper bound for W from (E6) and (E5) as

Dα(ρinS ‖ρβS) +1

α− 1log

[1

1 + e−βW

]1−α

≥ Dα(ρoutS ‖ρβS) +1

α− 1log

[e−βW

1 + e−βW

]1−α

(E7)

Rearranging yields

Dα(ρinS ‖ρβS) ≥ Dα(ρoutS ‖ρβS) + βW , (E8)

which yields the following upper bound on W

W ≤ kT ·[

Dα(ρinS ‖ρβS) −Dα(ρoutS ‖ρβS)]

(E9)

In essence, the possible transitions are governed by Renyi divergences, up to tolerance W , which we choose. Since Whas to be smaller than the above bounds for all positive alpha, the maximal amount of work extractable for such aprocess, going from ρ to ρ′ will be given as

D(ρ ≻ ρ′) = kT · infα>0

[Dα(ρ‖ρβ) −Dα(ρ′‖ρβ)]. (E10)

where we refer to D(ρ ≻ ρ′) as the work distance from ρ to ρ′.It is interesting to see how the conditions presented in [5] arise as special cases of our conditions, and are hence

independent of a catalyst. In [5], the extractable work from a state (by thermalizing it) and the work cost for itsformation (starting with a thermal state) via thermal operations have been given by

Wext(ρ) = −kT log tr(Πρρβ) = kTD0(ρ‖ρβ) (E11)

Wcost(ρ) = kT log minλ : ρ ≤ λρβ = kTD∞(ρ‖ρβ), (E12)

The subsequent corollary shows that the work distance reduces to these quantities.

Corollary 13. Consider initial and final states ρ and ρ′:

• If ρ′ is the Gibbs state, then the maximum extractable work D(ρ ≻ ρ′) = Wext(ρ).

• If ρ is the Gibbs state, then the minimum work cost −D(ρ ≻ ρ′) = Wcost(ρ′).

Proof. If ρ′ = ρβ, then ∀α, Dα(ρ′‖ρβ) = 0, hence

D(ρ ≻ ρ′) = kT · infα>0

Dα(ρ‖ρβ) = Wext(ρ),

where the last equality holds due to the fact that the Renyi divergences are non-decreasing in positive α. If ρ = ρβ ,then similarly

−D(ρ ≻ ρ′) = −kT · infα>0

[−Dα(ρ′‖ρβ)] = kT · supα>0

[Dα(ρ′‖ρβ)] = Wcost(ρ′).

3. Small error per particle – recovering the free energy

We now consider the extensive regime, that is, where we allow a catalyst to be returned with relatively low accuracy‖ρinC −ρoutC ‖1 ≤ ǫ/ log(N) where N is the dimension of the catalyst. We will see in such a case, we recover the ordinarysecond law.

19

a. Trivial Hamiltonians – recovering the Shannon entropy

Let us again first consider the case of a trivial Hamiltonian. In particular, we will see that in the extensive regimeonly the Shannon entropy matters, and the Renyi entropies are no longer relevant. This shows that if we relax theconditions on how cyclic the process is, by allowing relatively large inaccuracies in the returned catalyst, then werecover the usual second law.

Theorem 14. Let ε ≥ 0 and let p = spec(ρ) and q = spec(ρ′) be the spectra of the input and output state respectivelywhich are diagonal in the same basis and have dimension d.If there exists a catalyst with spectrum r = rN of dimension N such that

p⊗ r ≺ s, ‖s− q ⊗ r‖1 ≤ ε

log(N), (E13)

then

H(p) ≤ H(q) − ε− ε log(d)

log(N)− h

(ε

log(N)

)

, (E14)

with h(x) := −x log(x) − (1 − x) log(1 − x) the binary entropy.Conversely, if

H(p) < H(q), (E15)

then for all N sufficiently large there exists a catalyst with spectrum r = rN of dimension N such that

p⊗ r ≺ s, ‖s− q ⊗ r‖1 = exp(−Ω(√

log(N))). (E16)

Proof. Suppose there is a catalyst with spectrum r such that Eq. (E13) holds true. Then by Fannes inequality,

|H(s) −H(q ⊗ r)| ≤ ε +ε log(d)

log(N)+ h

(ε

log(N)

)

(E17)

and Eq. (E14) follows from monotonicity of entropy under stochastic maps.Conversely, suppose Eq. (E15) holds. Then we know that for all n sufficiently large we have

p⊗n ≺ qn (E18)

with

‖qn − q⊗n‖1 ≤ exp(−Ω(−√n)). (E19)

Let us consider the following catalyst introduced in [44]:

ω = p⊗(n−1) ⊕ p⊗(n−2) ⊗ q ⊕ . . .⊕ p⊗ q⊗(n−2) ⊕ q⊗(n−1)/n. (E20)

We have

p⊗ ω = p⊗n ⊕ p⊗(n−1) ⊗ q ⊕ . . .⊕ p⊗2 ⊗ q⊗(n−2) ⊕ p⊗ q⊗(n−1)/n. (E21)

Then by Eq. (E18),

p⊗ ω ≺ s (E22)

with

s = qn ⊕ p⊗(n−1) ⊗ q ⊕ . . .⊕ p⊗2 ⊗ q⊗(n−2) ⊕ p⊗ q⊗(n−1)/n. (E23)

The result follows from the bound

‖s− q ⊗ ω‖1 = ‖qn − q⊗n‖1/n ≤ exp(−Ω(−√n)), (E24)

and the fact that the dimension of ω is N = n2n−1.

Note that in the above we do not have a condition that the states are diagonal in the energy eigenbasis because allstates are diagonal in the energy eigenbasis of the trivial Hamiltonian.

20

b. General Hamiltonians

Here we prove result analogous to the one of Sec. E 3 a, now for systems with a non-trivial Hamiltonian, in terms ofthe free energy. As above, we find that if we relax the condition on how cyclic the process must be, then we recover

the usual second law. Below by pTO→ q we mean that one can go from p to q by thermal operations.

Theorem 15. Let ε ≥ 0 and let HS be the Hamiltonian. Let p = spec(ρ), q = spec(ρ′) be the spectra of the input andoutput state, respectively, which are diagonal in the energy eigenbasis. If there exists a catalyst with spectrum r = rNof dimension N (with some Hamiltonian HC) such that

p⊗ rTO→ s, ‖s− q ⊗ r‖1 ≤ ε

maxlog(N), Emax, (E25)

where Emax is maximal energy of the Hamiltonian HS, then

F (ρ) ≤ F (ρ′) − 2ε− ε log(d)

log(N)− h

(ε

log(N)

)

, (E26)

where F is standard free energy F = E + TS.Conversely, if

F (p) < F (q), (E27)

then for all N sufficiently large there exists a catalysts with spectrum r = rN of dimension N such that

p⊗ rTO→ s, ‖s− q ⊗ r‖1 = exp(−Ω(

√

log(N))). (E28)

Proof. Suppose there is a catalyst r such that Eq. (E25) holds true. Then by Fannes inequality,

|H(s) −H(q) ⊗ r)| ≤ ε+ε log(d)

log(N)+ h

(ε

log(N)

)

. (E29)

Also,

|E(p) − E(q)| ≤ ε (E30)

Therefore |F (p) − F (q)| ≤ 2ε+ ε log(d)log(N) + h

(ε

log(N)

)

which gives (E26).

Conversely, suppose that (E27) holds. Then from the main result of Ref. [16] we know that for n sufficiently large

p⊗n TO→ qn, (E31)

with

‖qn − q⊗n‖1 ≤ exp(−Ω(√

log(N))). (E32)

We then consider a catalyst with the following Hamiltonian:

HC = ⊕Nk=1

N∑

i=1

H(i)S

(E33)

where H(i)S = I . . .⊗ HS

︸︷︷︸

i-th site

⊗I⊗ . . .⊗ I and the state of catalyst of the form (E20). we have

p⊗ ω =1

Np⊗N ⊕ ω, ω ⊗ q = ω ⊕ 1

Nq⊗N (E34)

Now, let us note that

1

Np⊗N ⊕ ω

TO→ ω ⊕ 1

Nqn. (E35)

Indeed we first apply the operation that switches the energy levels from the first term of the direct sum with thelevels of its last term. Then we apply to the levels of the last term the operation that transforms p⊗n into qn, whichwas shown to exists in Ref. [16], as metioned above. Then we reverse the above switching operation. Now, the resultfollows by

||q ⊗ ω − ω ⊕ qn|| ≤1

n||q⊗n − qn||, (E36)

(E32), and the fact that the dimension of the catalyst is N = n2n−1.

21

Appendix F: Proofs of properties of Renyi entropies and divergences

In addition to collecting and proving useful properties of the Renyi divergences, we show in this section that byallowing error terms which are independent of the dimension of our system, the Renyi divergences needed for oursecond laws collapse to just two quantities.

1. Entropies in the negative αegime

In this section, we have investigated the Renyi entropies for the α < 0 regime, and shown that they are monotonicallyincreasing in α.

Theorem 16 (Jensen’s inequality). For any convex function f , the following inequality holds:

f

(∑

i aiyi∑

i ai

)

≤∑

i aif(yi)∑

i ai. (F1)

The inequality is reversed for concave functions.

Using Theorem 16, we will prove the following lemma about the monotonicity of Hα(p) in the negative α regime.

Lemma 17. For ∀α′ ≤ α < 0, Hα′(p) ≤ Hα(p) for any probability distribution p.

Proof. For α < 0 and probability distribution p, Hα(p) = 1α−1 log

∑

i pα. we rewrite this expression into a slightly

different form, by setting β = −α, and defining

Hβ(p) =−1

1 + βlog

∑

i

xβi , (F2)

where xi = 1pi

for all i. We then prove that∂Hβ(p)

∂β≤ 0.

∂Hβ(p)

∂β=

1

(1 + β)2log

∑

i

xβi − 1

1 + β

1∑

i xβi

∑

i

xβi log xi

=1

(1 + β)21

∑

i xβi

∑

i

xβi log

∑

j

xβj

− (1 + β)∑

i

xβi log xi

=1

(1 + β)21

∑

i xβi

∑

i

xβi log

∑

j

xβj

−∑

i

xβi log x(1+β)i

. (F3)

By defining the function f(x) = x log x which is convex in R+, we can apply Theorem 16 by setting ai = x−1i = pi

and yi = x1+βi . This implies that

∑

i

xβi log

∑

j

xβj

−∑

i

xβi log x(1+β)i

≤ 0 (F4)

and hence (F3) is upper bounded by 0 due to the positivity of the first two terms.

2. Smoothing of Renyi entropies and divergences

Note that the entropies and divergences are monotonic in alpha (decreasing for Renyi entropies, and increasing forRenyi divergences). The inequality, however, can be reversed if smoothing to a nearby state is allowed. It should benoted that we have explicit methods for smoothing in the regions α < 1 and α > 1, but otherwise independent of thevalue of α. Nevertheless, our result give general relations in terms of so-called smoothed entropies and we will henceinclude them here for completeness.

22

a. Definitions of smoothing

Besides the exact Renyi entropies, their smoothed versions have also been considered in [35, 36], such that conti-nuity with regard to small changes in probability distribution is preserved, and these quantities have more physicalinterpretations in terms of operational tasks. Their definitions are as follows:

Hǫα(p) =

maxp

Hα(p) (α < 0);

minp

Hα(p) (0 ≤ α ≤ 1);

maxp

Hα(p) (α > 1).

(F5)

where optimization occurs over sub-normalized states that are ǫ-close to p in terms of trace distance.Mathematically,p ∈ Bǫ(p), for Bǫ(p) = p : 1

2

∑

i |pi − pi| ≤ ǫ.The smoothed Renyi divergences are similarly defined, by smoothing over the ǫ-ball of subnormalized states for the

first argument, where maximization/minimization is taken depending on α. Formally,

Dǫα(p‖q) =

minp

Dα(p‖q) (α < 0);

maxp

Dα(p‖q) (0 ≤ α ≤ 1);

minp

Dα(p‖q) (α > 1).

(F6)

where optimization occurs over sub-normalized states that are ǫ-close to p.

b. Technical lemmas

These quantities will be useful in considering approximate state transformations, where getting into any state closeto the target state is sufficient. Also, smoothing allows us to reformulate an infinite families of conditions on theα-Renyi entropies to only two conditions, if states in the ǫ-ball of the original state are allowed. We express this interms of the following lemma.

Lemma 18. Given any distribution p, for 0 < α < 1 and ǫ > 0, there exists a (sub-normalized) distribution p′ ∈ Bǫ(p),such that

H0(p) ≥ Hα(p) ≥ H0(p′) − log 1ǫ

1 − α.

For α > 1, there exists another smoothed distribution p′′ ∈ Bǫ(p), such that

H∞(p′′) +log 1

ǫ

α− 1≥ Hα(p) ≥ H∞(p).

Proof. To prove this statement, we construct such a smoothed state p′ which is within the ǫ-ball of p, and show thatthe inequality above holds. A construction of p′′ can be found in [45] for α > 1.

Note that the elements pi = 0 do not contribute in our calculations, furthermore the quantities we calculate areinvariant under permutations of i. Hence, without loss of generality we can arrange them such that p1 ≤ p2 ≤ · · · ≤ pn.For any ǫ, denote j as the maximum number such that

1.∑j

i=1 pi ≤ ǫ

2.∑j+1

i=1 pi > ǫ.

The smoothed probability distribution p′ is then obtained by cutting all probabilities pi where i ≤ j. Also note that

23

for i ≤ j, pi ≤ pj+1. Now, we evaluate a lower bound for the following quantity:

n∑

i=1

pαi ≥j+1∑

i=1

pαi

≥j+1∑

i=1

pα−1i pi

≥ pα−1j+1

j+1∑

i=1

pi

> pα−1j+1 · ǫ

≥ [rank(p′)]1−α · ǫ.

where the third inequality holds because α− 1 < 0, and the last inequality holds because p′j+1 is the minimum value

in the smoothed distribution p′, and therefore must be smaller than 1rank(p′) . Taking the logarithm and dividing by

1 − α > 0,

Hα(p) =1

1 − αlog

n∑

i=1

pα(xi)

≥ log rank(p′) − log 1ǫ

1 − α

≥ H0(p′) − log 1ǫ

1 − α.

It is worth noting that in extreme cases, where smoothing cannot be performed, the bound becomes trivial. To seethis, note that the minimum value p1 ≤ 1

n, where n is the rank of p. If one cannot smooth at all, then ǫ < 1

n. The

bound then translates to

H0(p′) − log 1ǫ

1 − α≤ logn− 1

1 − αlog n ≤ 0. (F7)

This means that for values of ǫ < 1n

, the bound simply becomes trivially Hα(p) ≥ 0.

From this lemma, combining with the fact that the smoothed entropy Hǫ0(p) is obtained by minimizing over all

subnormalized states in the ǫ-ball of p. More precisely, Hǫ0(p) = minp∈Bǫ(p)H0(p) ≤ H0(p′). Hence, it is easy to

obtain a corollary that corresponds to Lemmas 4.2 and 4.3 as stated in [45].

Corollary 19. Given any distribution p, for 0 < α < 1 and ǫ > 0,

Hα(p) ≥ Hǫ0(p) − log 1

ǫ

1 − α.

For α > 1,

Hα(p) ≤ Hǫ∞(p) +

log 1ǫ

α− 1.

From Lemma 18, we know that the Renyi entropies collapse to two smoothed quantities, the H0(p) and H∞(p) for0 < α < 1 and α > 1 reespectively. In the next two lemmas, we show similar results for the smoothing of Renyidivergences. Our proofs will be constructed in a similar way as the above proof, namely, we will specify the smoothedstates within some ǫ-ball such that the desired inequality holds, then link this to the smooth min- or max- divergence.

Lemma 20. Given probability distributions p, q such that supp(p) ⊆ supp(q). Then for α > 1 and ǫ > 0, there existsa smoothed distribution p′

Dα(p‖q) ≥ D∞(p′‖q) − log 1ǫ

α− 1.

24

Proof. Note that we require supp(p) ⊆ supp(q) so that D∞(p‖q) does not diverge to infinity. Now, let us consider theset Zδ = i : pi

qi≥ δ. Then the smoothed probability distribution p′ is defined by having p′i = δ · qi for all i ∈ Zδ.

The statistical distance between p and p′ is

d =∑

i∈Zδ

[pi − p′i] ≤∑

i∈Zδ

pi, (F8)

note that d can be made equal to ǫ, by tuning δ in a continuous manner. Then, we have

D∞(p′‖q) = log maxi

piqi

= log δ. (F9)

We now evaluate a lower bound on∑

i

pαi q1−αi ≥

∑

i∈Zδ

pαi q1−αi

≥∑

i∈Zδ

[piqi

]α−1

· pi

≥ δα−1 ·∑

i∈Zδ

pi

≥ δα−1 · ǫ,

where the first inequality holds because of the positivity of pi and qi for all i, the third inequality holds since α−1 > 0,and the last inequality holds due to (F8). Then, taking the logarithm, and dividing the whole equation by α− 1, wehave

Dα(p‖q) =1

α− 1log

∑

x

pαi q1−αi ≥ log δ +

log ǫ

α− 1= D∞(p′‖q) +

log ǫ

α− 1. (F10)

Lemma 21. Given probabilty distributions p and q. Then for ǫ > 0, there exists a smoothed distribution p′ such that

Dα(p‖q) ≤ D0(p′‖q) +log 1

ǫ

1 − α. (F11)

Proof. Similarly in the proof of Lemma 20, we define a particular smoothing of p, called p′. Note also, that termspxi

= 0 or qx1= 0 are automatically discarded because they do not affect our calculations in any way. Firstly, we

order the values p1

q1≤ p2

q2≤ · · · ≤ pn

qn. Subsequqently, we find an integer j such that the two conditions below are

satisfied:

1.∑j

i=1 pi ≤ ǫ,

2.∑j+1

i=1 pi > ǫ.

Note that D0(p′‖q) = − log

∑ni=j+1 qi. Also, for any i ≥ j + 1, pi

qi≥ pj+1

qj+1.

With this, we can evaluate the following bound:

n∑

i=j+1

qi ≤qj+1

pj+1·

n∑

i=j+1

pi

≤ qj+1

pj+1.

25

subsequently, we can evaluate a lower bound on the following summation:

n∑

i=1

pαi q1−αi ≥

j+1∑

i=1

pαi q1−αi

≥j+1∑

i=1

pi ·[piqi

]α−1

≥[pj+1

qj+1

]α−1

·j+1∑

i=1

pi

>

[pj+1

qj+1

]α−1

· ǫ

≥[

1∑n

i=j+1 qi

]α−1

· ǫ,

where the second and fifth inequality holds since α− 1 < 0, and others by srtaightforward manipulation.It is useful to see that even for the case where j = 0 where no smoothing is possibly done, the proof still holds since

now we know that ǫ < p1, where p1 is the value that corresponds to the minimum in pi

qi. Note that in this case,

n∑

i=1

pαi q1−αi ≥

[p1q1

]α−1

p1 ≥ ǫ. (F12)

This comes from two facts: p1 > ǫ, and the fact that p1

q1≤ 1. To see why this is the case, let us prove by contradiction,

i.e. assume that p1

q1> 1 and all other pi

qi> 1 also. Then for each i, pi > qi, and

∑ni=1 pi >

∑ni=1 qi = 1 which is

impossible.

With these two lemmas, two simple corollaries with regard to smooth divergences can be obtained:

Corollary 22. Given distributions p and q. Then for α > 1 and ǫ > 0,

Dα(p‖q) ≥ Dǫ∞(p‖q) − log 1

ǫ

α− 1. (F13)

Corollary 23. Given distributions p and q. Then for α < 1 and ǫ > 0,

Dα(p‖q) ≤ Dǫ0(p‖q) +

log 1ǫ

1 − α. (F14)

Appendix G: Comparison to other models

Having used a very specific form of work system to invest/extract work, one might wonder if the derived workdistance is general, i.e. whether by considering other proposed forms of battery systems, we arrive at the samequantity. We show that this can be the case, by using the ancilla system used in [9] as a battery instead of our workqubit. We then again apply our second laws to the initial and final state, including the battery, to derive an upperbound on W .

We start by providing the description of the battery used in [9]: denote an ancilla system as A consisting of n qubits,

where the Hamiltonian HA = I is trivial. n can be arbitrarily large. Its initial state is described as Ai = |0〉〈0|⊗2−λ1I.Physically this just implies that the state consists of λ1 maximally mixed qubits, and the remaining n−λ1 qubits arepure. Its final state is similarly defined as Af = |0〉〈0| ⊗ 2−λ2I. The Gibbs state of system A is τA = 2−n

I, which ismaximally mixed.

What is interesting about this comparison is that while in our model, work is stored in the form of energy, in[9] the work is quantified by means of purity, i.e. how many pure qubits we invest/create during the process ofstate transformation. The significance of information to work extraction has been discussed by various works [3, 46].In particular, Landauer’s principle [31, 47] states that any physical process that erases one bit of information (i.e.,

26

creating purity) in an environment of temperature T has a fundamental average work cost of kT ln(2). Similarly, byulitizing one bit of information stored (i.e., consuming purity) in a physical system, and allowing it to interact witha thermal bath at temperature T , one can draw an average work of kT ln(2).

We will now use this battery system A. Considering the conditions we derived for state transformation on the jointsystem SA, from ρ to ρ′ on system S, ∀α ≥ 0 (since the probability distributions have zeros, conditions on negativealphas become redundant),

Dα(ρ⊗ Ai‖ρβSA) ≥ Dα(ρ′ ⊗Af‖τSA). (G1)

From this we can derive an upper bound for the quantity λ = λ1 + λ2 as follows:

Dα(ρ‖ρβS) + n− λ1 ≥ Dα(ρ′‖ρβS) + n− λ2 (G2)

Dα(ρ|ρβS) −Dα(ρ′‖ρβS) ≥ λ1 − λ2 . (G3)

where we see the bound is similar to the bounds in (E7), except for a factor of kT ln(2). The quantity λ = λ1 − λ2denotes the net gain of pure qubits in the process, as explained in [9]. By Landauer’s principle λ gives a bound on themaximum amount of work extractable,as not more than kT ln(2) amount of work can be extracted given pure one bitof information. We thus see that our condition yields the same upper bound on W

W ≤ kT ln(2) infα>0

[Dα(ρ‖ρβS) −Dα(ρ′‖ρβS)]

There are, however, a few more subtle differences between these two models for work systems. For instance, notethat this battery consists of qubits, and hence λ1 and λ2 take integer values. [9] has shown that they can be furthergeneralized to take values of rational numbers. In our wit model, however, W can take any value, including irrationalvalues.Acknowledgements

We thank Robert Alicki, Piotr Cwiklinski, Milan Mosonyi, Sandu Popescu, Joe Renes, Marco Tomamichel andAndreas Winter for useful discussions. JO is supported by the Royal Society. MH is supported by the Foundationfor Polish Science TEAM project cofinanced by the EU European Regional Development Fund. NN and SW aresupported by the National Research Foundation and Ministry of Education (MOE), Singapore as well as MOE Tier3 Grant ”Random numbers from quantum processes” (MOE2012-T3-1-009).

[1] Clausius, R. Ueber die bewegende kraft der warme und die gesetze, welche sich daraus fur die warmelehre selbst ableitenlassen. Annalen der Physik 155, 368–397 (1850).

[2] Horodecki, M., Horodecki, P. & Oppenheim, J. Reversible transformations from pure to mixed states and the uniquemeasure of information. Phys. Rev. A 67, 062104 (2003). quant-ph/0212019.

[3] Dahlsten, O., Renner, R., Rieper, E. & Vedral, V. Inadequacy of von neumann entropy for characterizing extractablework. New Journal of Physics 13, 053015 (2011).

[4] Del Rio, L., Aberg, J., Renner, R., Dahlsten, O. & Vedral, V. The thermodynamic meaning of negative entropy. Nature474, 61–63 (2011).

[5] Horodecki, M. & Oppenheim, J. Fundamental limitations for quantum and nano thermodynamics (2011). arXiv:1111.3834.[6] Aberg, J. Truly work-like work extraction (2011). arXiv:1110.6121.[7] Egloff, D., Dahlsten, O. C., Renner, R. & Vedral, V. Laws of thermodynamics beyond the von neumann regime. arXiv

preprint arXiv:1207.0434 (2012).[8] Skrzypczyk, P., Short, A. J. & Popescu, S. Extracting work from quantum systems. arXiv preprint arXiv:1302.2811

(2013).[9] Faist, P., Dupuis, F., Oppenheim, J. & Renner, R. A quantitative landauer’s principle. arXiv preprint arXiv:1211.1037

(2012).[10] Ruch, E. The diagram lattice as structural principle a. new aspects for representations and group algebra of the symmetric

group b. definition of classification character, mixing character, statistical order, statistical disorder; a general principlefor the time evolution of irreversible processes. Theoretical Chemistry Accounts: Theory, Computation, and Modeling(Theoretica Chimica Acta) 38, 167–183 (1975).

[11] Jonathan, D. & Plenio, M. B. Entanglement-assisted local manipulation of pure quantum states. Phys. Rev. Lett. 83,3566–3569 (1999). quant-ph/9905071.

27

[12] Klimesh, M. Inequalities that Collectively Completely Characterize the Catalytic Majorization Relation. ArXiv e-prints(2007). 0709.3680.

[13] Turgut, S. Catalytic transformations for bipartite pure states. J. Phys. A, Math. Gen. 40, 12185–12212 (2007). 0707.0444.[14] Renner, R. Security of Quantum Key Distribution. Ph.D. thesis (2005). quant-ph/0512258.[15] Janzing, D., Wocjan, P., Zeier, R., Geiss, R. & Beth, T. Thermodynamic cost of reliability and low temperatures:

Tightening Landauer’s principle and the second law. Int. J. Theor. Phys. 39, 2717–2753 (2000). quant-ph/0002048.[16] F.G.S.L. Brandao, M. Horodecki, J. Oppenheim, J. Rennes and R.W. Spekkens, ”The Resource Theory of Quantum States

Out of Thermal Equilibrium” arxiv/1111.3812.[17] Bennett, C. H., Bernstein, H. J., Popescu, S. & Schumacher, B. Concentrating partial entanglement by local operations.

Phys. Rev. A 53, 2046–2052 (1996). quant-ph/9511030.[18] Bennett, C. H. et al. Purification of noisy entanglement and faithful teleportation via noisy channels. Phys. Rev. Lett. 76,

722–725 (1996). quant-ph/9511027.[19] Horodecki, M., Oppenheim, J. & Horodecki, R. Are the laws of entanglement theory thermodynamical? Phys. Rev. Lett.

89, 240403 (2002). quant-ph/0207177.[20] Devetak, I., Harrow, A. W. & Winter, A. A resource framework for quantum shannon theory (2005). quant-ph/0512015.[21] Alicki, R. The quantum open system as a model of the heat engine. J. Phys. A: Math. Gen. 12, L103–L107 (1979).[22] Allahverdyan, A. E. & Nieuwenhuizen, T. M. Extraction of work from a single thermal bath in the quantum regime. Phys.

Rev. Lett. 85, 1799–1802 (2000). URL http://link.aps.org/doi/10.1103/PhysRevLett.85.1799.[23] Feldmann, T. & Kosloff, R. Quantum lubrication: Suppression of friction in a first-principles four-stroke heat engine.

Phys. Rev. E 73, 025107 (2006). URL http://link.aps.org/doi/10.1103/PhysRevE.73.025107 .[24] Linden, N., Popescu, S. & Skrzypczyk, P. How small can thermal machines be? the smallest possible refrigerator. Physical

review letters 105, 130401 (2010).[25] Gemmer, J., Michel, M., Michel, M. & Mahler, G. Quantum thermodynamics: Emergence of thermodynamic behavior

within composite quantum systems (Springer Verlag, 2009).[26] Hovhannisyan, K. V., Perarnau-Llobet, M., Huber, M. & Acın, A. The role of entanglement in work extraction. arXiv

preprint arXiv:1303.4686 (2013).[27] Alicki, R. & Fannes, M. Entanglement boost for extractable work from ensembles of quantum batteries. Phys. Rev. E 87,

042123 (2013). 1211.1209.[28] Gelbwaser-Klimovsky, D., Alicki, R. & Kurizki, G. How much work can a quantum device extract from a heat engine?

ArXiv e-prints (2013). 1302.3468.[29] van Dam, W. & Hayden, P. Universal entanglement transformations without communication. Phys. Rev. A 67, 060302

(2003). quant-ph/0201041.[30] Bennett, C. H. The thermodynamics of computation-a review. Int. J. Theor. Phys. 21, 905–940 (1982).[31] Landauer, R. IBM J. Res. Develop. 5, 183 (1961).[32] Cwiklinski et. al, in preparation.[33] van Erven, T. & Harremoes, P. Renyi divergence and majorization. arXiv:1001.4448 [quant-ph] (2010).[34] van Erven, T. & Harremoes, P. Renyi divergence and kullback-leibler divergence. arXiv:1206.2459 [quant-ph] (2012).[35] Renner, R. & Wolf, S. Simple and tight bounds for information reconciliation and privacy amplification. In In Advances

in CryptologyASIACRYPT 2005, Lecture Notes in Computer Science, 199–216 (Springer-Verlag, 2005).[36] Datta, N. & Renner, R. Smooth entropies and the quantum information spectrum. Information Theory, IEEE Transactions

on 55, 2807–2815 (2009).[37] Tomamichel, M. A framework for non-asymptotic quantum information theory (2012). arXiv:1203.2142.[38] Marshall, A. W., Olkin, I. & Arnold, B. C. Inequalities: Theory of Majorization and Its Applications. Springer Series in

Statistics (Springer, 2011).[39] Uhlmann, A. Wiss Z. Karl-Marx-Univ. Leipzig 20, 633 (1971).[40] Aubrun, G. & Nechita, I. Catalytic Majorization and p-Norms. Communications in Mathematical Physics 278, 133–144

(2008). arXiv:quant-ph/0702153.[41] Ruch, E., Schranner, R. & Seligman, T. H. The mixing distance. J. Chem. Phys. (1978).[42] Janzing, D., Wocjan, P., Zeier, R., Geiss, R. & Beth, T. Thermodynamic cost of reliability and low temperatures:

Tightening landauer’s principle and the second law. Int. J. Theor. Phys. 39, 2717–2753 (2000).[43] Horodecki, M., Oppenheim, J. &Winter, A. Partial quantum information. Nature 436, 673–676 (2005). quant-ph/0505062.[44] Duan, R., Feng, Y., Li, X. & Ying, M. Multiple-copy entanglement transformation and entanglement catalysis. Phys. Rev.

A 71, 042319 (2005). arXiv:quant-ph/0404148.[45] Renner, R. & Wolf, S. Smooth renyi entropy and applications. In Information Theory, 2004. ISIT 2004. Proceedings.

International Symposium on, 233– (2004).[46] Feynman Lectures on Computation (2000).[47] Keyes, R. W. & Landauer, R. Minimal energy dissipation in logic. IBM J. Res. Dev. 14, 152–157 (1970).

http://link.aps.org/doi/10.1103/PhysRevLett.85.1799

http://link.aps.org/doi/10.1103/PhysRevE.73.025107

4 5 arXiv:1305.5278v1 [quant-ph] 22 May 2013 second law of thermodynamics tells us which state transformations are so statistically unlikely that they are eﬀectively forbidden. Its

Documents