Top Banner
Quantum stochastic processes and quantum non-Markovian phenomena Simon Milz 1, * and Kavan Modi 2, 1 Institute for Quantum Optics and Quantum Information, Austrian Academy of Sciences, Boltzmanngasse 3, 1090 Vienna, Austria 2 School of Physics and Astronomy, Monash University, Clayton, Victoria 3800, Australia (Dated: May 11, 2021) The field of classical stochastic processes forms a major branch of mathematics. They are, of course, also very well studied in biology, chemistry, ecology, geology, finance, physics, and many more fields of natural and social sciences. When it comes to quantum stochastic processes, however, the topic is plagued with pathological issues that have led to fierce debates amongst researchers. Recent developments have begun to untangle these issues and paved the way for generalizing the theory of classical stochastic processes to the quantum domain without ambiguities. This tutorial details the structure of quantum stochastic processes, in terms of the modern language of quantum combs, and is aimed at students in quantum physics and quantum information theory. We begin with the basics of classical stochastic processes and generalize the same ideas to the quantum domain. Along the way, we discuss the subtle structure of quantum physics that has led to troubles in forming an overarching theory for quantum stochastic processes. We close the tutorial by laying out many exciting problems that lie ahead in this branch of science. CONTENTS I. Introduction 2 II. Classical Stochastic Processes Some Examples 3 A. Statistical state 3 B. Memoryless process 3 C. Markov process 4 D. Non-Markovian processes 4 E. Stochastic matrix 5 1. Transforming the statistical state 5 2. Random process 6 3. Markov process 6 4. Non-Markovian process 7 F. Hidden Markov model 8 G. (Some) mathematical rigor 9 III. Classical Stochastic Processes Formal approach 10 A. What then is a stochastic process? 10 B. Kolmogorov extension theorem 11 C. Practical features of stochastic processes 12 1. Master equations 12 2. Divisible processes 13 3. Data processing inequality 14 4. Conditional mutual information 15 D. (Some more) mathematical rigor 16 IV. Early Progress on Quantum Stochastic Processes 17 A. Quantum statistical state 17 1. Decomposing quantum states 19 2. Measuring quantum states: POVMs and dual sets 20 B. Quantum stochastic matrix 21 * [email protected] [email protected] 1. Linearity and tomography 21 2. Complete positivity and trace preservation 22 3. Representations 23 4. Purification and Dilation 25 C. Quantum Master Equations 26 D. Witnessing non-Markovianity 28 1. Initial correlations 28 2. Completely positive and divisible processes 29 3. Snapshot 30 4. Quantum data processing inequalities 31 E. Troubles with quantum stochastic processes 31 1. Break down of KET in quantum mechanics 32 2. Input / output processes 32 3. KET and spatial quantum states 33 V. Quantum Stochastic Processes 34 A. Subtleties of the quantum state and quantum measurement 34 B. Quantum measurement and instrument 35 1. POVMs, Instruments, and probability spaces 36 C. Initial correlations and complete positivity 37 D. Multi-time statistics in quantum processes 40 1. Linearity and tomography 41 2. Spatiotemporal Born rule and the link product 42 3. Many-body Choi state 43 4. Complete positivity and trace preservation 44 5. ‘Reduced’ process tensors 45 6. Testers: Temporally correlated ‘instruments’ 46 7. Causality and dilation 47 E. Some mathematical rigor Generalized extension theorem (GET) 48 VI. Properties of quantum stochastic processes 49 A. Quantum Markov conditions and Causal break 49 1. Quantum Markov processes 51 2. Examples of divisible non-Markovian processes 53 B. Measures of non-Markovianity for Multi-time processes 55 arXiv:2012.01894v2 [quant-ph] 10 May 2021
69

arXiv:2012.01894v2 [quant-ph] 10 May 2021

Mar 15, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: arXiv:2012.01894v2 [quant-ph] 10 May 2021

Quantum stochastic processes and quantum non-Markovian phenomena

Simon Milz1, ∗ and Kavan Modi2, †

1Institute for Quantum Optics and Quantum Information,Austrian Academy of Sciences, Boltzmanngasse 3, 1090 Vienna, Austria

2School of Physics and Astronomy, Monash University, Clayton, Victoria 3800, Australia(Dated: May 11, 2021)

The field of classical stochastic processes forms a major branch of mathematics. They are, of course, also verywell studied in biology, chemistry, ecology, geology, finance, physics, and many more fields of natural and socialsciences. When it comes to quantum stochastic processes, however, the topic is plagued with pathological issuesthat have led to fierce debates amongst researchers. Recent developments have begun to untangle these issuesand paved the way for generalizing the theory of classical stochastic processes to the quantum domain withoutambiguities. This tutorial details the structure of quantum stochastic processes, in terms of the modern languageof quantum combs, and is aimed at students in quantum physics and quantum information theory. We beginwith the basics of classical stochastic processes and generalize the same ideas to the quantum domain. Alongthe way, we discuss the subtle structure of quantum physics that has led to troubles in forming an overarchingtheory for quantum stochastic processes. We close the tutorial by laying out many exciting problems that lieahead in this branch of science.

CONTENTS

I. Introduction 2

II. Classical Stochastic ProcessesSome Examples 3A. Statistical state 3B. Memoryless process 3C. Markov process 4D. Non-Markovian processes 4E. Stochastic matrix 5

1. Transforming the statistical state 52. Random process 63. Markov process 64. Non-Markovian process 7

F. Hidden Markov model 8G. (Some) mathematical rigor 9

III. Classical Stochastic ProcessesFormal approach 10A. What then is a stochastic process? 10B. Kolmogorov extension theorem 11C. Practical features of stochastic processes 12

1. Master equations 122. Divisible processes 133. Data processing inequality 144. Conditional mutual information 15

D. (Some more) mathematical rigor 16

IV. Early Progress on Quantum Stochastic Processes 17A. Quantum statistical state 17

1. Decomposing quantum states 192. Measuring quantum states: POVMs and dual

sets 20B. Quantum stochastic matrix 21

[email protected][email protected]

1. Linearity and tomography 212. Complete positivity and trace preservation 223. Representations 234. Purification and Dilation 25

C. Quantum Master Equations 26D. Witnessing non-Markovianity 28

1. Initial correlations 282. Completely positive and divisible processes 293. Snapshot 304. Quantum data processing inequalities 31

E. Troubles with quantum stochastic processes 311. Break down of KET in quantum mechanics 322. Input / output processes 323. KET and spatial quantum states 33

V. Quantum Stochastic Processes 34A. Subtleties of the quantum state and quantum

measurement 34B. Quantum measurement and instrument 35

1. POVMs, Instruments, and probability spaces 36C. Initial correlations and complete positivity 37D. Multi-time statistics in quantum processes 40

1. Linearity and tomography 412. Spatiotemporal Born rule and the link

product 423. Many-body Choi state 434. Complete positivity and trace preservation 445. ‘Reduced’ process tensors 456. Testers: Temporally correlated ‘instruments’ 467. Causality and dilation 47

E. Some mathematical rigorGeneralized extension theorem (GET) 48

VI. Properties of quantum stochastic processes 49A. Quantum Markov conditions and Causal break 49

1. Quantum Markov processes 512. Examples of divisible non-Markovian

processes 53B. Measures of non-Markovianity for Multi-time

processes 55

arX

iv:2

012.

0189

4v2

[qu

ant-

ph]

10

May

202

1

Page 2: arXiv:2012.01894v2 [quant-ph] 10 May 2021

2

1. Memory bond 552. Schatten measures 573. Relative entropy 57

C. Quantum Markov order 581. Non-trivial example of quantum Markov

order 60

VII. Conclusions 62

Acknowledgments 63

References 63

I. INTRODUCTION

Many systems of interest, in both natural and social sci-ences, are not isolated from their environment. However, theenvironment itself is often far too large and far too complex tomodel efficiently and thus must be treated statistically. This isthe core philosophy of open systems; it is a way to render thedescription of systems immersed in complex environmentsmanageable, even though the respective environments are in-accessible and their full description out of reach. Quantumsystems are no exception to this philosophy. If anything, theyare more prone to be affected by their complex environments,be they stray electromagnetic fields, impurities, or a many-body system. It is for this reason that the study of quantumstochastic processes goes back a full century. The field ofclassical stochastic processes is a bit older, however, not bymuch. Still, there are stark contrasts in the development ofthese two fields; while the latter rests on solid mathematicaland conceptual grounds, the quantum branch is fraught withmathematical and foundational difficulties.

The 1960s and 1970s saw great advancements in lasertechnology, which enabled isolating and manipulating singlequantum systems. However, of course, this did not mean thatunwanted environmental degrees of freedom were eliminated,highlighting the need for a better and formal understanding ofquantum stochastic processes. It is in this era great theoreticaladvancements were made to this field. Still going half a cen-tury into the future from these early developments, there is yetanother quantum revolution on the horizon; the one aimed atprocessing quantum information. While quantum engineer-ing was advancing, many of the early results in the field ofquantum stochastic processes regained importance and newproblems have arisen requiring a fresh look at how we char-acterize and model open quantum systems.

Central among these problems is the need to understandthe nature of memory that quantum environments carry. Atits core, memory is nothing more than information about thepast of the system we aim to model and understand. How-ever, the presence of this seemingly harmless feature leadsto highly complex dynamics for the system that require dif-ferent tools for their description from the ones used in theabsence of memory. This is of particular importance for en-gineering fault-tolerant quantum devices which are by designcomplex and the impact of memory effects will rise withincreased miniaturization and read-out frequencies. Conse-quently, here, one aims to characterize the underlying pro-

cesses with the hope to mitigate or outmaneuver complexnoise and making the operation of engineered devices robustto external noise. On the other hand, there are natural sys-tems that are immersed in complex environments that havefunctional or fundamental importance in, e.g., biological sys-tems. These systems too undergo open quantum processeswith memory as they interact with their complex environ-ments. Here, in order to exploit them for technological de-velopment or to understand the underlying physics, one aimsto better understand the mechanisms that are at the heart ofcomplex quantum processes observed in nature.

For the reasons stated above, over the years many bookshave been dedicated to this field of research, e.g. [1–5]. Inaddition, the progress, both in experimental and theoreticalphysics has been fast leading to many review papers focusingon different facets of open quantum systems [6–12] and thecomplex multilayered structure of memory effects in quan-tum processes [10]. This tutorial adds to this growing lit-erature and has its own distinct focus. Namely, we aim toanswer two questions: how can we overcome the conceptualproblems encountered in the description of quantum stochas-tic processes, and how can we comprehensively characterizemulti-time correlations and memory effects in the quantumregime when the system of interest is immersed in a complexenvironment.

A key aim of this tutorial is to render the connection be-tween quantum and classical stochastic processes transpar-ent. That is, while there is a well-established formal theoryof classical stochastic processes, does the same hold true foropen quantum processes? And if so, how are the two theoriesconnected? Thus we begin with a pedagogical treatment ofclassical stochastic process centered around several examplesin Sec. II. Next, in Sec. III we formalize the elements of theclassical theory, as well as present several facets of the theorythat are important in practice. In Sec. IV we discuss the earlyresults on the quantum side that are well-known. Here, wealso focus on the fundamental problems in generalizing thetheory of quantum stochastic processes such that it is on equalfooting as its classical counterpart. Sec. V begins with identi-fying the features of quantum theory that impose a fundamen-tally different structure for quantum stochastic processes thanthat encountered in the description of classical processes. Wethen go on to detail the framework that allows one to gener-alize the classical theory of stochastic processes to the quan-tum domain. Finally, in Sec. VI we present various featuresof quantum stochastic processes, like, e.g., the distinction be-tween Markovian and non-Markovian processes. Throughoutthe whole manuscript, we give examples that build intuitionfor how one ought to address multi-time correlations in anopen quantum system. We close with several applications.

Naturally, we cannot possibly hope to do the vast field ofopen quantum system dynamics full justice here. The theoryof classical stochastic processes is incredibly large, and itsquantum counterpart is at least as large and complex. Here,we focus on several aspects of the field and introduce themrather by concrete example than aiming for absolute rigor. Itgoes without saying that there are countless facets of the fieldthat will remain unexplored, and of what is known and well-established, we only scratch the surface in our presentation in

Page 3: arXiv:2012.01894v2 [quant-ph] 10 May 2021

3

this tutorial. We do, however, endeavor to present the intu-ition at the core of this vast field. While we aim to provideas many references as possible for further reading, we do sowithout a claim to comprehensiveness, and much of the re-sults that have been found in the field will be left unsaid, andfar too much will not even be addressed.

II. CLASSICAL STOCHASTIC PROCESSESSOME EXAMPLES

A typical textbook on stochastic processes would beginwith a formal mathematical treatment by introducing thetriplet (Ω,S, ω) of a sample space, a σ-algebra, and a proba-bility measure. Here, we are not going to proceed in this for-mal way. Instead, we will begin with intuitive features of clas-sical stochastic processes and then motivate the formal math-ematical language retrospectively. We will then introduce andjustify the axioms underpinning the theory of stochastic pro-cesses and present several key results in the theory of classicalstochastic processes in the next section. The principal reasonfor introducing the details of the classical theory is that, laterin the tutorial, we will see that many of these key results can-not be imported straightforwardly into the theory of quantumstochastic processes. We will then pivot to provide resolu-tions of how to generalize the features and key ingredients ofclassical stochastic processes to the quantum realm.

A. Statistical state

Intuitively, a stochastic process consists of sequences ofmeasurement outcomes, and a rule that allocates probabili-ties to each of these possible sequences. Let us start with amotivating example of a simple process – that of tossing a die– to clarify these concepts. After a single toss, a die will rollto yield one of the following outcomes

R1 = , , , , , . (1)

Here, R (for roll of the die) is called the event space capturingall possible outcomes. If we toss the die twice in a row thenthe event space is

R2 = , , . . . , , . (2)

While this looks the same as a single toss of two dice

R2 = , , . . . , , , (3)

the two experiments – tossing two dice in parallel, and tossinga single die twice in a row – can, depending on how the die istossed, indeed be different. However, in both cases the eventspaces are the same and grow exponentially with the numberof tosses. For example, for three tosses the event space R3 has6

3 entries.While the event spaces for different experiments can coin-

cide, the probabilities for the occurrence of different eventsgenerally differ. Any possible event rK ∈ RK has a probabil-ity

P(RK = rK), (4)

a. b. c.

Figure 1. Classical die processes. Panel (a) denotes fair toss; panel(b) denotes perturbed toss; and panel (c) the perturbation strengthdepends on the history.

where the boldface subscript K denotes the number of timesor the number of dice that are tossed in general, and RK is therandom variable corresponding to K tosses. Throughout, wewill denote the random variable at toss k by Rk, and the spe-cific outcome by rk and we will use boldface notation for se-quences. Importantly, two experiments with the same poten-tial outcomes and the same corresponding probabilities can-not be statistically distinguished. For example, tossing twodice in parallel, and hard tossing (see below) of one die twicein a row yield the same probabilities and could not be dis-tinguished, even though the underlying mechanisms are dif-ferent. Consequently, we call the allocation of probabilities topossible events the statistical state of the die, as it contains allinferable information about the experiment at hand. In antici-pation of our later treatment of quantum stochastic processes,we emphasize that this definition of state chimes well withthe definition of quantum states, which, too, contain all sta-tistical information that is inferable from a quantum system.Importantly, the respective probabilities not only depend onhow the die is made, i.e., its bias, but also on how it is tossed.Since we are interested in the stochastic process and, as such,sequential measurements in time, we will focus on the latteraspects below.

B. Memoryless process

Let us now, to see how the probabilities PK emerge, lookat a concrete ‘experiment’, the case where the die is tossedhard. For a single toss of a fair die, we expect the outcomesto be equally distributed as

P(R1 = ) = . . . = P(R1 = ) = 1/6. (5)

Now, imagine this fair die is tossed ‘hard’ successively. Byhard, we mean that it is shaken in between tosses – in contrastto merely being perturbed (see below). Then, importantly,the probability of future events does not depend on the pastevents; observing, say, , at some toss, has no bearing on theprobabilities of later tosses. In other words, a hard toss of afair die is a fully random process that has no memory of thepast. Consequently, this successive tossing of a single die atk times is not statistically distinguishable from the tossing ofk unbiased dice in general.

The memorylessness of the process is not affected if a bi-

Page 4: arXiv:2012.01894v2 [quant-ph] 10 May 2021

4

ased die is tossed, e.g., a die with distribution

P(R = , , , , ) = 425

and P(R = ) = 15.

(6)

Here, while the bias of the die influences the respective prob-abilities, the dependence of these probabilities on prior out-comes solely stems from the way the die is tossed. Alterna-tively, suppose, we toss two identical dice with event spacegiven in Eq. (3). Now, if we consider the aggregate outcomes(sum of the outcomes of the two dice) 2, 3, . . . , 12, they donot occur with uniform probability. Nevertheless, the processitself remains random as the future outcomes do not dependon the past outcomes. Processes without any dependence onpast outcomes are often referred to as Markov order 0 pro-cesses. We now slightly alter the tossing of a die to encounterprocesses with higher Markov order.

C. Markov process

To introduce a dependence on prior outcomes, let us nowease the tossing and imagine placing the die on a book andthen gently shaking the book horizontally for three seconds,see the depiction in Figure 1(b). We refer to this process asperturbed die. The term ‘perturbed’ here highlights that thetoss is only a small perturbation on the current configuration.In this process, the probability to tip to any one side is q,rolling to the opposite side is highly unlikely[13] (with proba-bility s), while it is highly likely (with probability p) that thedie stays on the same side. Concretely, suppose we start thedie with , then the probability for the outcomes of the nextroll will be

P(Rk∣Rk−1 = ) = [q p q q s q]T, (7)

where T denotes transpose, i.e., the probability distribution isa column vector. The perturbative nature of the toss meansthat p > q ≫ s and normalization gives us p + 4q + s = 1.Above, Rk and Rk−1 are the random variables describingthe die at the k-th and (k − 1)-th toss, respectively. Theconditional probabilities in Eq. (7) denote the probability forthe outcomes , , , , , at the k-th toss, given thatthe (k − 1)-th toss yielded . For example, for the die toyield outcome Rk = (i.e., to roll on its side) at the k-thtoss, given that it yielded rk−1 = in the previous toss, isP(Rk = ∣Rk−1 = ) = q.

A word of caution is needed. In the literature, conditionalprobabilities often carry an additional subscript to denote howmany previous outcomes the probability of the current out-come is conditioned on. For example, P1∣k would denote theprobability of one (the current) outcome conditioned on the kprevious outcomes, while Pk would represent a joint proba-bility of k outcomes. Here, in slight abuse of notation, we usethe same symbol for conditional probabilities, as we used forone-time probabilities, e.g., in Eq. (5), and we omit additionalsubscripts. However, since the number of arguments alwaysclarifies what type of probability is referred to, there is norisk of confusion, and we will maintain this naming conven-tion also for the case of conditional probabilities that dependon multiple past outcomes.

In this example, even though the die may be unbiased,the toss itself is not and the distribution for the future out-comes of the die depends on its current configuration. Assuch, the process remembers the current state. However, forthe probabilities at the kth toss, it is only the outcome at the(k − 1)th toss that is of relevance, but none of the earlierones. In other words, only the current configuration mattersfor future statistics, but the earlier history does not matter.Such processes are referred to as Markov processes, or, asthey ‘remember’ only the most current outcome, processes ofMarkov order 1. Importantly, as soon as any kind of memoryeffects are present, the successive tossing of a die can be dis-tinguished from the independent, parallel tossing of severalidentical dice, as in this latter case, the statistics of the kth diecannot depend on the (k − 1)th die (or any other die).

Again, we emphasize that this process will remain Marko-vian even if the die is replaced by two dice or by a biased die.Similarly, the above considerations would not change if theperturbation depended on the number of the toss k, i.e., if theparameters of Eq. (7) were functions q(k), p(k), s(k). Wewill now discuss the case where this assumption is not satis-fied, i.e., where the perturbation at the k-th toss can dependon past outcomes, and memory over longer periods of timestarts to play a non-negligible role.

D. Non-Markovian processes

Let us now modify the process in the last example a bitby changing the perturbation intensity as we go. Above, weconsidered the process where the die was placed on a book,and the book was shaken for three seconds. Suppose thatafter the first shake the die rolls on its side, say ↦ . Theprocess is such that, after the number of pips changes, thenext perturbation has unit intensity. If this intensity is lowenough then we are likely to see ↦ , and if that happens– i.e., the number of pips is unchanged – then the intensityis doubled the next shake; and we keep doubling the intensityuntil either die rolls to a new value or the intensity reaches thevalue of eight units (four times), which we assume to be equalto shaking the die so strongly that its initial value does notinfluence future outcomes. After this, the shaking intensityis reset to the unit level. We have depicted this process inFigure 1(c).

In this example, to predict the future probabilities we notonly need to know the current number of pips the die shows,but also its past values. That is, the probability of observingan event, say , after observing two consecutive outcomesis different than if one had previously observed and , i.e.,

P( ∣ , ) ≠ P( ∣ , ). (8)

The necessity for remembering the past beyond the mostrecent outcomes makes this process non-Markovian. On theother hand, here, we only have to remember the past fouroutcomes of the die due to the resetting protocol of the per-turbation strength. Concretely, the future probabilities are in-dependent of the past beyond four steps. For example, wehave

P( ∣ , , , , ) = P( ∣ , , , , ). (9)

Page 5: arXiv:2012.01894v2 [quant-ph] 10 May 2021

5

To be more precise, predicting the next outcome with correctprobabilities requires knowing the die’s configuration for thepast four steps. That is, the future distribution is fully deter-mined by conditional probabilities

P(Rk∣Rk−1, . . . , R0) = P(Rk∣Rk−1, . . . , Rk−4), (10)

where we only need to know a part (here, the last four out-comes) of the history.

As mentioned, the size of the memory is often referred toas the Markov order or memory length of the process. Afully random process – like the hard tossing of a die – hasa Markov order 0, and a Markov process has an order of 1. Anon-Markovian process has an order of 2 or larger. This, inturn, implies that the study of non-Markovian processes con-tains Markovian processes as well as fully random processesas special cases. Indeed, most processes in nature will carrymemory, and Markovian processes are the – well studied –exception rather than the norm [14].

In general, the complexity of a non-Markovian process ishigher than that of the Markov process in the last subsec-tion; this is because there is more to remember. Put less pro-saically, the process has to keep a ledger of the past outcomesto carry out the correct type of perturbation at each point.And, in general, the size of this ledger, or the complexity,grows exponentially with the Markov order m: for a processwith d different outcomes at each time (6 for a die), it is givenby dm. However, sometimes it is possible to compress thememory. For instance, in the above example, we only need toknow the current configuration and the number of time stepsit has remained unchanged; thus the size of the memory islinear in the Markov order for this example. Moreover, look-ing at histories larger than the Markov order will not revealanything new and thus does not add to the complexity of theprocess.

E. Stochastic matrix

Having discussed stochastic processes and memory at ageneral level, it is now time to look in more detail at themathematical machinery used to describe them. A convenientway to model stochastic processes is the stochastic matrix,which transforms the current state of the system into the fu-ture state. It also lends itself to a clear graphical depiction ofthe process in terms of a circuit, see, e.g., Figure 2 for cir-cuits corresponding to the three examples above. In what fol-lows, we will write down the stochastic matrices correspond-ing to the three processes above. The future states can thenbe computed by following the circuit and performing appro-priate matrix multiplication.

1. Transforming the statistical state

Before describing the process, let us write down the stateof the system at time k − 1. At any given time, the die hasa probability of be in one of six states, not necessarily uni-formly distributed. We can think of this distribution as the

ΓΓΓ Γ ΓRandom and perturbed die (Markov order 0 and 1)

ℙ(Rk−2) ℙ(Rk−1) ℙ(Rk) ℙ(Rk+ 1) ℙ(Rk+ 2)

Γ Γ ΓΓΓNon-Markovian die ( , 3 )

ℙ(Rk−2) ℙ(Rk−1) ℙ(Rk) ℙ(Rk+ 1) ℙ(Rk+ 2)

Figure 2. Random, Markovian, and non-Markovian processes.Top panel shows the circuits for random and Markovian die. Inthese cases, there are no extra lines of communication between thetosses (represented by boxes). Only the system carries the infor-mation forward for a Markov process. The bottom panel shows thenon-Markovian die. Here, information is sent between tosses (rep-resented by boxes) in addition to what the system carries, which isthe memory of the past states of the system (die). This memory isthen denoted by the thick line. The memory has to carry the infor-mation about the state of the die in the past four tosses to determinethe intensity of the next perturbation.

statistical state of the system:

P(Rk−1) = [P( ) P( ) P( ) P( ) P( ) P( )]T. (11)

Here again, T denotes transposition, i.e., the statistical stateis a column vector.

Suppose the die in the (k − 1)-th toss rolls to rk−1. Alongwith this, if we knew the conditional (or transition) probabil-ities P(rk∣rk−1), the probability to find the die to rolls to rkin the k-th toss can be straight forwardly computed via

P(rk) = ∑rk−1

P(rk∣rk−1)P(rk−1). (12)

This can be phrased more succinctly as

P(Rk) = Γ(k∶k−1) P(Rk−1), (13)

where stochastic matrix Γ(k∶k−1) is the mechanism by whichthe statistical state changes in time from time step k − 1 to k.For brevity we will generally omit the subscript on Γ (the timeat which it acts will be clear from the respective argumentsit acts on) unless it is required for clarity. The elements ofthe stochastic matrix are called transition probabilities as theyindicate how two events at k and k − 1 are correlated.

Before examining the explicit stochastic matrices for theabove examples of processes, let us first discuss their gen-eral properties. First, all entries of Γ are positive, as theycorrespond to transition probabilities. Second, to ensure thatthe l.h.s of Eq. (13) is a probability distribution, the columnsof the stochastic matrix sum to one, which is a direct con-sequence of the identity ∑rk

P(rk∣rk−1) = 1 which holdsfor all rk−1. On the other hand, the rows of Γ do not haveto add to unity, as generally we have ∑rk−1

P(rk∣rk−1) ≠ 1(this is also clear in Eq. (14) for a biased die below). In thecase where the rows actually add to 1, the matrix is calledbistochastic, and it has some nice properties and applica-tions [15], which we will not cover in detail in this tutorial;for example, any bistochastic matrix can be represented as aconvex combination of permutation matrices, a fact known asBirkhoff’s theorem.

Page 6: arXiv:2012.01894v2 [quant-ph] 10 May 2021

6

2. Random process

Now, making the concept of stochastic matrices more con-crete, we begin by constructing the stochastic matrix for thefully random process of the tossing of a die without memory.In this case, it does not matter what the current state of the dieis, and the future state will be the one given in Eq. (11). Thisis achieved by the following matrix

Γ(0)

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P( ) P( ) P( ) P( ) P( ) P( )P( ) P( ) P( ) P( ) P( ) P( )P( ) P( ) P( ) P( ) P( ) P( )P( ) P( ) P( ) P( ) P( ) P( )P( ) P( ) P( ) P( ) P( ) P( )P( ) P( ) P( ) P( ) P( ) P( )

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (14)

As stated above, a fully random process has Markov order of0, which we denote by the extra superscript (0). Additionally,all the columns of the above Γ

(0) add up to one, independentof whether or not the die is biased, while in general, i.e., whenthe die is biased, the rows do not add up to unity.

It is easy to check that the above stochastic matrix indeedleads to the correct transitions; suppose the current state ofthe die is , i.e., P(Rk−1) = [0 0 0 0 0 1]T. The statisticalstate after the roll will be the one given in Eq. (11), i.e.,

P(Rk) = Γ(0) P(Rk−1)

= [P( ) P( ) P( ) P( ) P( ) P( )]T.

(15)

Evidently, this process does not care about the current state –the ‘new’ probabilities at the k-th toss do not depend on theprevious ones – but it merely independently samples from theunderlying distribution corresponding to the bias of the coin.As already mentioned, we could readily incorporate a tempo-ral change of said bias, by making it dependent on the num-ber of tosses. However, as long as this dependence is only onthe number of tosses, and not on the previous outcomes, wewould still consider this process memoryless (strictly speak-ing, the die along with a clock represents a memoryless pro-cess). To avoid unnecessary notational cluttering, we will al-ways assume that the bias and/or the transition probabilitiesare independent of the absolute toss number but may dependon previous outcomes, as shown below.

For an unbiased die the above stochastic matrix will besimply

Γ(0)

=16

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

, (16)

which is not only a stochastic, but a bistochastic map. Again,it is easy to check that the output is the uniform distribution

Γ(0) P(Rk−1) = P(Rk) = 1

6[1 1 1 1 1 1]T

, (17)

for any P(Rk−1).

3. Markov process

Let us now move to the perturbed die process, which weargued is a Markovian process. In this case the stochasticmatrix has the form

Γ(1)

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P( ∣ ) P( ∣ ) ⋯ P( ∣ )P( ∣ ) P( ∣ ) ⋯ P( ∣ )

⋮ ⋮ ⋱ ⋮

P( ∣ ) P( ∣ ) ⋯ P( ∣ )

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

, (18)

where, again, we have used the superscript (1) to signify thatthe underlying process is of Markov order 1.

The hallmark of this matrix is that it gives us different fu-ture probabilities, depending on the current configuration; theprobability P( ∣ ) to find the die showing at the k-th toss,given that it showed at the k − 1-th toss generally differsfrom the probability P( ∣ ) to show given that it previ-ously showed . In contrast, for the fully random processabove, both of these transition probabilities would be givenby P( ).

Concretely, for the perturbed die process given in Eq. (7),the stochastic matrix will have the form

Γ(1)

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

p q q q q s

q p q q s q

q q p s q q

q q s p q q

q s q q p q

s q q q q p

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (19)

Again, here the conditions p > q ≫ s and p + 4q + s = 1are assumed, and we have P( ∣ ) = s ≠ q = P( ∣ ).Again, it is easy to see, that the normalization of the condi-tional probabilities implies that the columns of the stochasticmatrix add to one. Additionally, here, the rows of Γ

(1) add upto one, too, making it a bistochastic matrix.

For a Markov process, the state P(Rk) is related to an ear-lier state P(Rj), with j < k, by repeated applications of thestochastic matrix

P(Rk) = Γ(1)(k∶k−1)⋯ Γ

(1)(j+2∶j+1)Γ

(1)(j+1∶j)P(Rj). (20)

Alternatively, we may describe the process from j to k withthe stochastic matrix

Γ(1)(k∶j) ∶= Γ

(1)(k∶k−1)⋯ Γ

(1)(j+2∶j+1)Γ

(1)(j+1∶j). (21)

This is clearly desirable as the above stochastic matrix is sim-ply obtained by matrix multiplications, which is easy to do ona computer. Another way to compute the probability for twosequential events, say rk given we saw event rj at respectivetimes, is by employing Eq. (12):

P(rk∣rj) =∑rmk−1m>j

k−1

∏i=j

P(ri+1∣ri)P(rj). (22)

Page 7: arXiv:2012.01894v2 [quant-ph] 10 May 2021

7

ΞΞΞ Ξ Ξ

ΞΞΞ Ξ Ξ

ΞΞΞ Ξ Ξ

Markov order 2

Markov order 3

Markov order 4

ℙ(Rk−2) ℙ(Rk−1) ℙ(Rk) ℙ(Rk+ 1) ℙ(Rk+ 2)

ℙ(Rk−2) ℙ(Rk−1) ℙ(Rk) ℙ(Rk+ 1) ℙ(Rk+ 2)

ℙ(Rk−2) ℙ(Rk−1) ℙ(Rk) ℙ(Rk+ 1) ℙ(Rk+ 2)

Figure 3. Memory in non-Markovian processes. For processeswith memory, besides the state of the system at a time/toss k, weneed additional information – depicted by the additional memorylines – about the past to correctly predict future statistics. If onlythe probability of the next outcome is of interest, then a map Γ

(m)

of the form of Eq. (24) is sufficient, if all future probabilities are tobe computed via the concatenation of a single map, then Ξ, givenin Eq. (26), is required. Together, the system and memory undergoMarkovian dynamics.

This is known as the Chapman-Kolmogorov equation. Here,we have summed over all trajectories between event rj andevent rk, i.e., all possible sequences that begin with outcomerj at tj and end with outcome rk at tk.

4. Non-Markovian process

Above, we have required that the stochastic matrix Γ mapsthe statistical state P(Rj) at a single time to another single-time statistical state P(Rk). This was the correct way ofcomputing future statistics, as they only depended on the cur-rent state of the system, but not on any additional memory.Now, turning our attention to non-Markovian dynamics, wewill expand our view to consider processes that map multi-time statistical states, e.g., P(Rj−1, Rj−2, . . . , Rj−m), to ei-ther a single-time state, e.g. P(Rk), or a multi-time state,e.g., P(Rk−1, Rk−2, . . . , Rk−m), depending on what we aimto describe. This can be done in several ways, either by con-sidering collections of stochastic map, or a single stochasticmap that acts on a larger space. We briefly discuss both ofthese options.

First, let us consider the stochastic matrix for the non-Markovian process described in Sec. II D, where the pertur-bation intensity depended on the sequence of previously ob-served number of pips the die showed. As mentioned before,for this example we need to know the current state and thenumber of times it has not changed – which we will denoteby µ – to correctly predict future statistics. As the perturba-tion strength is reset after the die has shown the same numberof pips three consecutive times, we have µ ∈ [0, 1, 2, 3]. For

each µ, we can then write the stochastic matrix as

Γ(µ)

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

Pµ( ∣ ) Pµ( ∣ ) ⋯ Pµ( ∣ )Pµ( ∣ ) Pµ( ∣ ) ⋯ Pµ( ∣ )

⋮ ⋮ ⋱ ⋮

Pµ( ∣ ) Pµ( ∣ ) ⋯ Pµ( ∣ )

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

, (23)

where the superscript on the transition probabilities and thestochastic matrices denotes that they depend on the numberof times the outcome has not changed. For µ = 3, the pertur-bation strength is such that the process becomes the randomprocess given in Eq. (14), and µ = 4 is the same as µ = 0.Evidently, Eq. (23) defines four distinct stochastic matrices,one for each µ that leads to distinct future statistics. For anygiven µ, Γ

(µ) allows us to correctly predict the probability ofthe next toss of the die.

It is always possible to write down a family of stochasticmatrices for any non-Markovian process. Given the currentstate and history, we make use of the appropriate stochasticmatrix to get the correct future state of the system. In gen-eral, for Markov order m, there are at most dm distinct histo-ries, i.e., µ ∈ 0, . . . , d

m−1 − 1; each such history (prior tothe current outcome) then requires a distinct stochastic matrixto correctly predict future probabilities. This exponentiallygrowing storage requirement of distinct pasts highlights thecomplexity of a non-Markovian process.

On the other hand, such a collection of stochastic matri-ces for a process of Markov order m could equivalently becombined into one d × dm matrix of the form

Γ(m)

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

P( ∣ ⋯ ) P( ∣ ⋯ ) ⋯ P( ∣ ⋯ )P( ∣ ⋯ ) P( ∣ ⋯ ) ⋯ P( ∣ ⋯ )

⋮ ⋮ ⋱ ⋮

P( ∣ ⋯ ) P( ∣ ⋯ ) ⋯ P( ∣ ⋯ )

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

(24)

that acts on dm-dimensional probability vectors

P(RK)=[P( ⋯ )⋯P( ⋯ ) P( ⋯ )]T, (25)

to yield the correct future statistics, i.e., P(Rk) =

Γ(m)P(RK). Here, K denotes the last m tosses and thus

by RK we denote the random variable corresponding to se-quences of the lastm outcomes starting at the (k−1)-th toss.As before, Γ

(m) is a stochastic matrix, as all of its entries arepositive, and its columns sum to one. However, in contrastto the Markovian and the fully random case, it ceases to bea square matrix. We thus have to widen our understandingof a statistical ‘state’ from probability vectors of outcomes atone time/toss, to probability vectors of outcomes at sequencesof times/tosses. In quantum mechanics, this shift of perspec-tive allows one to resolve many of the apparent paradoxesthat appear to plague the description of quantum stochasticprocesses. In the following section, we will see a concreteexample of this way of describing non-Markovian processes.

We have graphically depicted non-Markovian processes,with Markov orders 2, 3, and 4, in Figure 3. Here, the lines

Page 8: arXiv:2012.01894v2 [quant-ph] 10 May 2021

8

above the boxes denote the memory that is passed to the fu-ture and required to correctly predict future statistics. Eachbox simply has to pass the information about the current state– which generally is a multi-time object – to future boxes,which, again, can make use of this information. ConsideringFigure 3, we can already see that the description of stochasticprocesses with the memory provided above is somewhat in-complete. While Γ

(m) allows us to compute the probabilitiesof the next outcome, given the lastm outcomes, it only yieldsa one-time state, not an m-time state. While this is sufficientif we are only interested in the statistics of the next outcome,it is not enough to compute statistics further in the future.Concretely, we cannot let Γ

(m) act successively to obtain allfuture statistics. Expressed more graphically, a map that al-lows us to fully compute statistics for a process of Markovorder m needs m input and m output lines (see Figure 3).Naturally, such a map, which we will denote as Ξ can alwaysbe constructed from Γ

(m), as we discuss in more detail in thenext section. Importantly, its action looks just like a squarestochastic matrix:

Ξ(1) P(Rk−1, . . . , Rk−m)=P(Rk, . . . , Rk−m+1), (26)

which allows us to simply compute statistics via the con-catenation of Ξ just like in the Markovian case and hencethe superscript 1. In other words, we can think of any non-Markovian process as a Markovian process on a larger sys-tem, as depicted in the bottom panel. Graphically, this canalready easily be seen in Figure 3, where the system of inter-est (the die) plus the required memory lines form a Markovianprocess.

Returning to our discussion of the complexity of non-Markovian processes, usually, not all distinct pasts – evenwithin the Markov order – lead to distinct futures, and mem-ory can be compressed. This effect can already be seen for theperturbed coin above, where, instead of 6

3= 216 stochas-

tic matrices, we can compute the correct future by meansof merely 4 stochastic matrices. We will not discuss the is-sue of memory compression in this tutorial, but details canbe found in the vast literature on the so-called ε-machines,see, for example, Ref. [16–18]. Finally, we emphasize that,while here we have been focusing on the underlying mech-anisms through which the respective probabilities emerge, astochastic process is also fully described once all joint prob-abilities for events are known. For example, considering athree-fold toss of a die, once the probabilities P(R2, R1, R0)are known, all probabilities for smaller sequences of tosses(say, for example, P(R2, R0)) as well as all conditional prob-abilities for those three tosses can be computed. Knowing thefull joint distribution is thus equivalent to knowing the under-lying mechanism.

F. Hidden Markov model

An important concept in many disciplines and one that iscrucial to be able to deduce probabilities from sequences ofmeasurement outcomes is that of stationarity. For stochas-tic processes, stationarity means time translation symmetry.

!t!h

ℙ(h | t)

ℙ(t | t) ℙ(t |h)

ℙ(h |h)

ℙ(t)

ℙ(h)

!r

ℙ(h | tt)

ℙ(t |hh)

ℙ(t | th)ℙ(h |ht)!h h !t h

!h t !t t

Markov order 0 Markov order 1

Markov order 2

a. b.

c.

Figure 4. Markov chains. Given a time series of coin flips we candeduce any of the above hidden Markov models. At memory length2 we have a deterministic process and therefore longer memory willnot yield any more information. In other words, the Markov orderof the process in panel (c) is 2.

That is, it does not matter when we flip a coin, we only needto consider its states up to the Markov order. This is usefulbecause often we are interested in characterizing a processwhose inner workings are hidden from us. In such a case, wecan try to infer the inner working by noting the statistics ofthe state of the system in a time series. For example, givena long sequence of coin-flip outcomes F1 = h, t, we candetermine the statistics for seeing ‘heads’ h and ‘tails’ t, orany other sequence, say hhttht. Of course, this requires thatthe total data size is much larger than any sequence whoseprobability we wish to estimate. From this data we constructhidden Markov model for the system that will reproduce thestatistics up any desired Markov order [16–18].

An illuminating graphical representation for a stationarystochastic process is the so-called Markov chain, which isassociated with the stochastic matrix. For simplicity ofthe diagram let us consider a process dichotomic outcomes;e.g., a coin flip with the random variables Fk (for flips).Again, this can be a fully random process, a Markov pro-cess, or a non-Markovian process, depending on how thecoin is flipped. Suppose now that the coin is flipped a mil-lion times in succession and we are given the sequence ofresults. For simplicity, we assume stationarity, i.e., proba-bilities and conditional probabilities do not depend on thecardinal number of the coin toss. Under this assumption,from the observed results, we can compute how frequentlyone sees h or t, which is quantified by P(Fk). We mightcompute how often h flips to t or remain h and so on;this is quantified by P(Fk∣Fk−1) or the joint distributions

Page 9: arXiv:2012.01894v2 [quant-ph] 10 May 2021

9

P(Fk∣Fk−1) for all k. Analogously, we may also computethe probability of seeing longer sequences, like hhh, hht,etc. With all of this, we can obtain conditional probabilitiesof the form P(Fk∣Fk−1, Fk−2) and P(Fk∣Fk−1, Fk−2, Fk−3).Let us assume that both of these conditional probabilitiescoincide, which leads us to conclude that the Markov or-der of the process is 2 (technically, we should check thatP(Fk∣Fk−1, Fk−2) = P(Fk∣Fk−1, . . . , Fk−n) for all n ≥ 2,but as it is unlikely that only longer memory exists, we con-sider this test for Markov order 2 sufficient.).

In this case, following the ideas laid out below Eq. (24), theprobabilities of future outcomes can be described by a singlestochastic matrix of the form

Γ(2)=

⎛⎜⎝P(h∣hh) P(h∣ht) P(h∣th) P(h∣tt)P(t∣hh) P(t∣ht) P(t∣th) P(t∣tt)

⎞⎟⎠. (27)

This map will act on a statistical state that has the form

P(Fk, Fk−1) = [P(hh) P(ht) P(th) P(tt)]T. (28)

The action of the stochastic matrix on the statistical stategives us the probability for next flip.

P(Fk+1) = Γ(2)P(Fk, Fk−1)

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

∑xy∈ht P(h∣xy)P(xy)

∑xy∈ht P(t∣xy)P(xy)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

(29)

Combining the probabilities for two successive outcomesinto a single probability vector thus allows us to compute theprobabilities for the next outcome in a Markovian fashion,i.e., by applying a single stochastic matrix to said probabilityvector. However, as already alluded to above, there is a slightmismatch in Eq. (29); while the random variables we lookat on the r.h.s. are sequences of two successive outcomes,the random variable on the l.h.s. is a single outcome at thek + 1-th toss. To obtain a fully Markovian model, one wouldrather desire a stochastic matrix that provides the transitionprobabilities from one sequence of two outcomes to another,i.e., a stochastic matrix Ξ that yields

P(Fk+1, Fk) = Ξ(1)P(Fk, Fk−1), (30)

where, for better bookkeeping, we formally distinguish be-tween the random variables on the LHS and the RHS. Ad-ditionally, we give Ξ an extra superscript to underline that itdescribes a process of Markov order one. To do so, it has toact on a larger space of random variables, namely, the com-bined previous two outcomes.

Now, in our case, it is easy to see that the action of Ξ(1) can

be simply computed from Γ(2) as

P(Fk+1, Fk) = Ξ(1)P(Fk, Fk−1)

= δFkFkΓ(2)P(Fk, Fk−1),

(31)

where δ is the Kronecker function. This, in turn, implies thatΞ

(1) and Γ(2) contain the same information, and the distinc-

tion between them is more of formal then of fundamental

nature. Importantly though, Ξ(1) can be applied in succes-

sion, e.g., we have P(Fk+n, Fk+n−1) = (Ξ(1))nP(Fk, Fk−1),while the same is not possible for Γ

(2) due to the mismatchof input and output spaces.

Eq. (30) then describes a Markovian model for the randomvariable F2, which takes values hh, ht, th, tt. As knowl-edge of all relevant, i.e., within the Markov order, transitionprobabilities allows the computation of all joint probabilities,such an embedding into a higher dimensional Markovian pro-cess via a redefinition of the considered random variablesis always possible. The corresponding Markovian model isoften called hidden Markov model. As a brief aside, wenote that the amount of memory that needs to be consideredin an experiment depends both on the intrinsic Markov or-der of the process at hand, as well as the amount of infor-mation an experimenter can or wants to store. If, for ex-ample, one is only interested in correctly recreating transi-tion probabilities P(Rk∣Rk−1) between adjacent times, butnot necessarily higher-order transition probabilities, like, e.g.,P(Rk∣Rk−1, Rk−2), then a Markovian model without anymemory is fully sufficient (but will not properly reproducehigher-order transition probabilities).

Returning to our process, we have depicted the correspond-ing Markov chains for each case in Figure 4. For a fullyrandom process, the Markov chain only has one state; aftereach flip, the state returns to itself, and the future probabili-ties do not change based on the past. For a Markov process,the chain has two states, and four transitions are possible. Fi-nally, the non-Markovian process is chosen to be determin-istic: hh always goes to th, and so on. Note that, here, asmentioned above, if we only care about transition probabili-ties P(Fk∣Fk−1), i.e., we only consider the last outcome andnot the last two outcomes (i.e., we identify hh and ht, and thand tt), then the process of the panel (c) in Figure 4 reducesto the simpler one in panel (b), but the information is lost.

All of the panels of Figure 4 describe Markovian processes,however for different random variables. This is a generalfeature: any non-Markovian process can be represented bya hidden Markov model or a Markov chain by properly com-bining the past into a ‘large enough’ random variable [14] (forexample, the random variable with values hh, th, ht, tt inpanel (c)). This intuition will come in handy when we moveto the case of quantum stochastic processes. But first, weneed to formalize the theory of classical stochastic processand show where lie the pitfalls when generalizing this theoryto the quantum domains.

G. (Some) mathematical rigor

As mentioned, in our presentation of stochastic processes,we rather opt for intuitive examples than full mathematicalrigor. However, laying the fundamental concepts of probabil-ity theory in detail provides a more comprehensive picture ofstochastic processes, and renders the generalizations neededto treat quantum processes mathematically straightforward.

The basic ingredient for the discussion of stochastic pro-cesses is the triplet (Ω,S, ω) of a sample space Ω, a σ-algebraS and a probability measure ω. Intuitively, Ω is the set of all

Page 10: arXiv:2012.01894v2 [quant-ph] 10 May 2021

10

events that can occur in a given experiment (for example, Ωcould represent the myriad of microstates a die can assumeor the possible numbers of pips it can show), S correspondsto all the outcomes that can be resolved by the measurementdevice (for the case of the die, S could, for example, corre-spond to the number of pips the die can show, or to the lessfine-grained information ‘odd’ or ‘even’) and ω allocates aprobability to each of these observable outcomes.

More rigorously, we have the following definition [19]:

Definition (σ-algebra) Let Ω be a set. A σ-algebra on Ω isa collection S of subsets of Ω, such that

• Ω ∈ S and ∅ ∈ S.

• If s ∈ S, then Ω \ s ∈ S.

• S is closed under (countable) unions and intersections,

i.e., if s1, s2, ⋅ ⋅ ⋅ ∈ S, then∞

⋃j=1

sj ∈ S and∞

⋂j=1

sj ∈ S.

For example, if the sample space is given by Ω = , . . . , and we only resolve whether the outcome of the toss of adie is odd or even, the corresponding σ-algebra is given by , , , , , ,∅,Ω, while in the case where weresolve the individual numbers of pips, S is simply the powerset of Ω.

A pair (Ω,S) is called a measurable space, as now, we canintroduce a probability measure for observable outcomes in awell-defined way:

Definition (Probability measure) Let (Ω,S) be a measur-able space. A probability measure ω ∶ S → R is a real-valued function that satisfies

• ω(Ω) = 1.

• ω(s) ≥ 0 for all s ∈ S

• ω is additive for (countable) unions of disjoint events,i.e., ω (⋃∞

j=1 sj) = ∑∞j ω(sj) for sj ∈ S and sj ∩

sj ′ = ∅ when j ≠ j ′.

The corresponding triplet (Ω,S, ω) is then called a probabil-ity space [19]. As the name suggests, ω maps each event sjto its corresponding probability, and, using the convention ofthe previous sections, we could have denoted it by P, and willdo so in what follows. Evidently, in our previous discussions,we already made use of sample spaces, σ-algebras and prob-ability measures, without caring too much about their mathe-matical underpinnings.

The mathematical machinery of probability spaces pro-vides a versatile framework for the description of stochas-tic processes, both on finitely and infinitely many times (seeSec. III D for an extension of the above concepts to the multi-time case).

So far, we have talked about processes that are discreteboth in time and space. It does not make much sense to talkabout the state of a die when it is in mid-air; nor does it makesense to attribute a state of 4.4 to a die. On the other hand,of course, there are processes that are both continuous in timeand space. A classic example is Brownian motion [20], which

requires that time be treated continuously. If not, the resultslead to pathological situations where the kinetic energy of theBrownian particle blows up. Moreover, in such instances, theevent space is the position of the Brownian particle and cantake uncountably many different real values. Nevertheless,the central object in the theory of stochastic processes doesnot change; it remains the joint probability distribution for allevents, which in the case of infinitely many times is a prob-ability distribution on a rather complicated, and not easy tohandle σ-algebra. Below, we will discuss how due to a funda-mental result by Kolmogorov it is sufficient to deal with finitedistributions instead of distributions on σ-algebras on infiniteCartesian products of sample spaces. Finally, this machin-ery straightforwardly generalizes to positive operator valuedmeasures (POVMs) as well as instruments, fundamental in-gredients for the discussion of quantum stochastic processes.

III. CLASSICAL STOCHASTIC PROCESSESFORMAL APPROACH

Up to this point, both in the examples we provided, aswell as the more rigorous formulation, we have somewhatleft open what exactly we mean by a stochastic process, andwhat quantity encapsulates it. We will do so now, and pro-vide a fundamental theorem for the theory of stochastic pro-cesses, the Kolmogorov extension theorem (KET), which al-lows one to properly define stochastic processes on infinitelymany times, based on finite time information.

A. What then is a stochastic process?

Intuitively, a stochastic process on a set of times Tk ∶=t0, t1, . . . , tk with ti ≤ tj for i ≤ j is the joint probabil-ity distribution over observable events. Namely, the centralquantity that captures everything that can be learned about anunderlying process is

PTk+1∶= P(Rk, tk;Rk−1, tk−1; . . . ;R0, t0), (32)

corresponding to all joint probabilities

P(Rk = rk, Rk−1 = rk−1, . . . , R0 = r0, )rk,...,r0 (33)

to observe all possible realizations Rk = rk at time tk,Rk−1 = rk−1 at time tk−1 and so on. Evidently, the time label– which we omit above and for most of this tutorial – couldalso correspond to a label of the number of tosses, etc. Wealso adopt the compact notation of PTk+1

, as defined above,to denote a probability distribution on a set of k + 1 times.

More concretely, suppose the process we have in mind istossing a die five times in a row. This stochastic process isfully characterized by the probability of observing all possi-ble sequence of events

P( , , , , ), . . . , P( , , , , ), . . .⋮ ⋮

P( , , , , ), . . . , P( , , , , ),(34)

Page 11: arXiv:2012.01894v2 [quant-ph] 10 May 2021

11

where, as before, we omit the respective time/tossing numberlabels.

From the joint distribution for five tosses, one can ob-tain any desired marginal distributions for fewer tosses, e.g.P(R3); or any conditional distributions (for five tosses), suchas, for example, the conditional probability P(R2 = ∣R1 =

, R0 = ), to obtain outcome at the third toss, havingobserved two in a row previously; the conditional distribu-tions in turn allows computing the stochastic matrices, whichin turn allow casting processes as a Markov chain. Havingthe total distribution is enough to determine whether a pro-cess is fully random, Markovian, or non-Markovian. Thisstatement, however, is contingent on the respective set oftimes. Naturally, without any further assumptions of mem-ory length and/or stationarity, knowing the joint probabilitiesof outcomes – and thus everything that can be learned – ona set of times Tk does not provide knowledge about the cor-responding process on a different set of times Tk′ . Conse-quently, we identify a stochastic process with the joint prob-abilities it displays with respect to a fixed set of times.

While joint probabilities contain all inferable informationabout a stochastic process, working with them is not alwaysdesirable because their number of entries grows exponen-tially. Nevertheless, they are the central quantity in the the-ory of classical stochastic processes. Our first aim when ex-tending the notion of stochastic processes to the quantum do-main will thus be to construct the analogy to joint distribu-tion for time-ordered events. Doing so has been troubling forthe same foundational reasons that make quantum mechanicsso interesting. Most notably, quantum processes, in general,do not straightforwardly allow for a Kolmogorov extensiontheorem, which we discuss below. However, upon closer in-spection, such obstacles can be overcome by properly gener-alizing the concept of joint probabilities to the quantum do-main. Before doing so, we will first return to our more rig-orous mathematical treatment and define stochastic processesin terms of probability spaces.

B. Kolmogorov extension theorem

While, for the example of the tossing of a die, a descriptionof the process at hand in terms of joint probabilities on finitelyor countably many times/tosses is satisfactory, this is not al-ways the case. For example, even though it can in practiceonly be probed at finitely many points in time, when consid-ering Brownian motion, one implicitly posits the existence ofan ‘underlying’ stochastic process, from which the observedjoint probabilities stem. Intuitively, for the case of Brownianmotion, this underlying process should be fully described bya probability distribution that ascribes a probability to all pos-sible trajectories the particle can take. Connecting the opera-tionally well-defined finite joint probabilities a physicist canobserve and/or model with the concept of an underlying pro-cess is the aim of the Kolmogorov extension theorem (KET).

Besides not being experimentally accessible, working withprobability distributions on infinitely many times has the ad-ditional drawback that the respective mathematical objectsare rather cumbersome to use, and would make the model-

x = 0 x = 1 x = 2 x = 3 … x = n

t = 0

t = 1

t = 2

t = 3

t = k

s1 s2sn s3

Figure 5. Continuous process. A stochastic process is a joint prob-ability distribution over all times. From a physical perspective, wecan think of it as the probability of observing a trajectory sk. This ishighly desirable when talking about the motion of a Brownian parti-cle. However, this interpretation requires some caution as there arecases where trajectories may not be smooth or even continuous.

ing of stochastic processes a fairly tedious business. Luckily,the KET allows one to deduce the existence of an underlyingprocess on infinitely many times, from properties of only fi-nite objects. With this, modeling a proper stochastic processon infinitely many times amounts to constructing finite timejoint probabilities that ‘fit together’ properly.

To see what we mean by this last statement let PT`be the

joint distribution obtained for an experiment for some fixed` times. For now, we will stick with the case of Brownianmotion, and PT`

could correspond to the probability to find aparticle at positions x0, . . . , x`−1 when measuring it at timesT` = t0, . . . , t`−1. As mentioned before, PT`

contains allstatistical information for fewer times as marginals, i.e., forany subset Tk ⊆ T` we have

PTk= ∑

T`\Tk

PT`=∶ P∣Tk

T`, (35)

where we denote the sum over the times in the complementof the intersection of Tk and T` by T` \Tk and use an addi-tional superscript to signify that the respective joint probabil-ity distribution is restricted to a subset of times via marginal-ization. For simplicity of notation, here and in what follows,we always denote the marginalization by a summation, eventhough, in the uncountably infinite case, it would correspondto an integration.

For classical stochastic processes, all probabilities on a setof time can be obtained from those on a superset of timesby marginalization. We will call this consistency conditionbetween joint probability distributions of a process on differ-ent sets of times Kolmogorov consistency conditions. Nat-urally, consistency conditions hold in particular if the finitejoint probability distributions stem from an underlying pro-

Page 12: arXiv:2012.01894v2 [quant-ph] 10 May 2021

12

cess on infinitely many times T ⊇ T` ⊇ Tk, where we leavethe nature of the corresponding probability distribution PT

somewhat vague for now (see Sec. III D for a more thoroughdefinition).

Importantly, the KET shows, that satisfaction of the con-sistency condition on all finite sets Tk ⊆ T` ⊆ T is al-ready sufficient to guarantee the existence of an underlyingprocess on T. Specifically, the Kolmogorov extension theo-rem [3, 19, 21, 22] defines the minimal properties finite prob-ability distributions have to satisfy in order for an underlyingprocess to exist:

Theorem. (KET) Let T be a set of times. For each fi-nite Tk ⊆ T, let PTk

be a (sufficiently regular) k-step jointprobability distribution. There exists an underlying stochas-tic process PT that satisfies PTk

= P∣Tk

T for all finite Tk ⊆ T

iff PTk= P∣Tk

T`for all Tk ⊆ T` ⊆ T.

Put more intuitively, the KET shows that for a given familyof finite joint probability distributions that satisfy consistencyconditions,[23] the existence of an underlying process, thatcontains all of the finite ones as marginals, is ensured. Im-portantly, this underlying process does not need to be knownexplicitly in order to properly model a stochastic process.

We emphasize that, in the (physically relevant) case whereT is an infinite set, the probability distribution PT is gener-ally not experimentally accessible. For example, in the caseof Brownian motion, the set T could contain all times in theinterval [0, t] and each realization would represent a possiblecontinuous trajectory of a particle over this time interval, seeFigure 5. While we assume the existence of these underlyingtrajectories (and hence the existence of PT) in experimentsconcerning Brownian motion, we often only access their fi-nite time manifestations, i.e., PTk

for some Tk. The KETthus bridges the gap between the finite experimental realityand the underlying infinite stochastic process, in turn defin-ing in terms of accessible quantities what one means by astochastic process on infinitely many times. For this reason,many books on stochastic processes begin with the statementof KET.

In addition, the KET also enables the modeling of stochas-tic processes: Any mechanism that leads to finite joint prob-ability distributions that satisfy a consistency condition is en-sured to have an underlying process. For example, the proofof the existence of Brownian motion relies on the KET as afundamental ingredient [24–27].

Loosely speaking, the KET holds for classical stochas-tic processes, because there is no difference between ‘do-ing nothing’ and conducting a measurement but ‘not look-ing at the outcomes’ (i.e., summing over the outcomes at atime); otherwise, as we shall see in the discussion of quantumstochastic processes, Kolmogorov consistency conditions arenot satisfied. Put differently, the validity of the KET is basedon the fundamental assumption that the interrogation of a sys-tem does not, on average, influence its state. This assump-tion generally fails to hold in quantum mechanics, whichmakes the definition of quantum stochastic processes some-what more involved, and their structure much richer than thatof their classical counterparts.

Indivisible processes Γ(t:r) ≠ Γ(t:s)Γ(s:r)

ℙT ⊇ ℙTk ⊇ … ⊇ ℙT3 ⊇ ℙT2

Kolmogorov Extension Theorem proves the existence of

Markovian processes, Master equations,

Data processing inequality

Generic non-Markovian correlations ℙ(Xk |Xk−1, …, X0) ≠ ℙ(Xk |X′ k−1, …, X′0)

Figure 6. Hierarchy of multi-time processes. A stochastic processis the joint probability distribution over all times. Of course, in prac-tice one looks only at finite time statistics. However, the set of allk-time probability distributions PTk

contain, as marginals, all j-time probability distributions PTj

for j < k. Moreover, the set oftwo and three time distributions play a significant roles in the theoryof stochastic processes.

C. Practical features of stochastic processes

Now that we have a formal definition of a stochastic pro-cess let us ask what it is useful for. It is worth saying thatworking with a probability distribution of a large number ofrandom variables is not desirable as the complexity growsexponentially. However, for a given problem, what we careabout is the structure of the stochastic process and what wemay anticipate when we sample from this distribution. Wedepict the hierarchy of stochastic processes in Figure 6, andin this section focus on the short end of the hierarchy, i.e.,Markovian processes or non-Markovian processes with lowMarkov order.

Naturally, the examples in Sec. II and the formal theory inthe last subsection only begin to scratch at the massive liter-ature on stochastic processes. We, of course, cannot coverall facets of this field here. However, in practice, there area few important topics that must be mentioned. Below wewill discuss several common tools that one encounters in thefield of stochastic processes. Here, we do so rather to providea quick overview than a thorough introduction to the field.First among the tools used in the field are master equations,which are employed ubiquitously in the sciences, finance, andbeyond. Next, we will briefly cover methods to differentiatebetween Markovian and non-Markovian processes, as well asquantify the memory using tools of information theory. Whilemany of these examples only deal with two-time correlations,we do emphasize that there are problems that naturally re-quire multi-time correlations.

1. Master equations

A master equation is a differential equation that relates therate of change in probabilities with the current and the paststates of the system. Put simply, they are equations of mo-tion for stochastic processes and thus provide the underlyingmechanism by which the transition probabilities we discussedin the previous section come about. There are of course manyfamous master equations in physics: Pauli, Fokker-Plank,Langevin, to name a few on the classical side. We will not

Page 13: arXiv:2012.01894v2 [quant-ph] 10 May 2021

13

delve into the details of this very rich topic here, and onceagain just begin to scratch the surface. We refer the readerto other texts for more in-depth coverage of master equa-tions [20, 28, 29].

It will suffice for our purpose that a master equation, ingeneral, has the following form[30]

d

dtP(Xt) = ∫

t

sG(t, τ)P(Xτ) dτ, (36)

where G(t, τ) is a matrix operator. The time derivative of thestate at t depends on the previous states up to a time s, whichis the memory length. If the memory length is infinite, thens → −∞. As mentioned before, such a master equation al-lows one, in principle, to compute the change of probabilities,given some information about the past of the system.

Since the master equation expresses the probabilities con-tinuously in time it may be then tempting to think that a mas-ter equation is equivalent to a stochastic process as definedabove by means of the KET. However, this is not the casebecause a master equation needs at most joint probabilitiesof two times or lower. Namely, the set of joint probabilitydistributions,

PT2 ∶= P(Xb, Xa)b>a ∀ b > a > 0 (37)

is sufficient to derive Eq. (36). The LHS can be computedby setting b = t and a = t − dt. While the RHS can be ex-pressed as a linear combination of product of stochastic ma-trices Γc∶bΓb∶a, with c = t, b = τ ≥ s, and a = r < τ . In fact,the RHS is concerned with functions such as Γc∶a − Γc∶bΓb∶a,which measure the temporal correlations between a and c,given an observation at b. In any case, these stochastic ma-trices only depend on joint distributions of two times, as seenin Eqs. (12) and (13), and are not concerned with multi-timestatistics. Thus, the family of distributions in Eq. (37) suf-fices for the RHS. Formally, showing that the RHS can beexpressed as a product of two stochastic matrices can be doneby means of the Laplace transform [31, 32] or the ansatzknown as the transfer tensor [33–35]. These technical de-tails aside, master equations play an important practical rolefor the description of scenarios, where only two-time prob-abilities and/or the change of single time probabilities arerequired. By construction, they do not, however, allow forthe computation of multi-time joint probabilities. In turn,this implies that they do not provide a full description ofstochastic processes in the sense of the KET. Nonetheless,they constitute an important tool for the description of aspectsof stochastic processes.

2. Divisible processes

To shed more light on the concept of master equations, letus consider a special case (which we will also encounter inthe quantum setting). Specifically, let us consider a family ofstochastic matrices that satisfy

Γ(t∶r) = Γ(t∶s)Γ(s∶r) ∀ t > s > r. (38)

Processes described by such a family are called divisible.Once the functional dependence of Γ(t∶r) on t and r is known,one can build up the set of distributions contained in Eq. (37).It is easy to see that the family of stochastic matrices inEq. (38) is a superset of Markovian processes. That is, anyMarkov process will satisfy the above equation. However,there are non-Markovian processes that also satisfy the divis-ibility property [36]. Nevertheless, checking for divisibilityis often far simpler than checking for the satisfaction of theMarkov conditions since the latter requires the collection ofmulti-time statistics, while the former can be decided basedon two-time statistics only. Moreover, as we will see shortly,the divisibility of the process implies several highly desirableproperties for the process.

A nice property of divisible processes is the correspondingmaster equation. Applying Eq. (38) to the LHS of Eq. (36)we get

P(Xt) − P(Xt−dt)dt

=Γ(t∶t−dt) − 11

dtP(Xt−dt) (39)

where 11 is the identity matrix. Taking the limit dt → 0yields the generator Gt ∶= limdt→0[Γ(t∶t−dt) − 11]/dt. Thisis a time-local master equation in the sense that the deriva-tive of P – in contrast to the more general case of Eq. (36)– only depends on the current time t, but not on previoustimes. In turn, the generator is related to the stochastic matrixas Γ(t∶t−dt) = exp(Gtdt), which is obtained by integration.When the process is stationary, i.e., symmetric under time-translation, both Γ and G will be time independent.

A divisible Markovian process. To make the above moreconcrete, let us consider a two level system that undergoesthe following infinitesimal process

Γ(t∶t−dt) = (1 − γdt) (1 0

0 1) + γdt(

g0 g0

g1 g1) . (40)

The first part of the process is just the identity process, andthe second part is a random process. However, together theyform a Markov process. Using Eq. (39) we can derive thegenerator for the master equation. This process is very similarto the perturbed die in the last section, with the difference thathere, we consider a process that is continuous in time; it takesany state P(Xt−dt) at t − dt to

P(Xt) = (1 − γdt)P(Xt−dt) + γdt G, (41)

where G = [g0 g1]T. After some time τ = ndt, i.e., after napplications of the stochastic matrix, we have

P(Xτ) = (1 − γdt)nP(Xt) + γndt G. (42)

That is, the process relaxes any state of the system to the fixedG exponentially fast with a rate γ. Many processes, such asthermalization, have such a form. In fact, one often asso-ciates Markov processes with exponential decay. However,as already mentioned above, such an identification is not ex-act, since there are non-Markovian processes that satisfy adivisible master equation as we shall see now by means oftwo explicit examples (we will encounter an explicit exampleof this phenomenon in the quantum case in Sec. VI A 2.

Page 14: arXiv:2012.01894v2 [quant-ph] 10 May 2021

14

Figure 7. Stroboscopic divisible non-Markovian process. At eachtime tj , each of the possible outcomes 0 and 1 occurs with probabil-ity 1/2 (for example, they could be drawn from urns with uniformdistributions). At the final time t4, the observed outcome is equalto the sum (modulo 2) of the previous three outcomes. While thestochastic map between any two points in time is completely ran-dom – and thus the process is divisible – the overall joint probabil-ity distribution shows multi-time memory effects (as laid out in thetext).

A stroboscopic divisible non-Markovian process. Asmentioned, divisibility and Markovianity do not coincide. Tosee this, we provide the following example which comes fromRef. [37] and provides a stroboscopic – in the sense that weonly consider it at fixed points in time – non-Markovian pro-cess that is divisible. Let us consider a single bit process withxj = 0, 1 with probability 1/2 for j = 1, 2, 3. That is, theprocess yields random bits in the first three times. At the nexttime, we let x4 = x1+x2+x3, where the addition is modulo2 (see Figure 7). It is easy to see that the stochastic matrixbetween any two times will correspond to a random process,making the process divisible. However, P(X4, X3, X2, X1)is not uniform; when x1+x2+x3 = x4 the probability will be18

and 0 otherwise. Consequently, there are genuine four-timecorrelations, but there are not two or three time correlations.

A process with long memory. Let us now consider aprocess where the probability of observing an outcome xt iscorrelated with what was observed some time ago xt−s withsome probability

P(Xt = xt∣Xt−s = xt−s) = p δxt,xt−s+

1 − pd

(43)

Here d is the size of the system. This process only has two-time correlations, but the process is non-Markovian as thememory is a long-range one. A master equation, of the typeof Eq. (36), for this process, can be derived by differentiating.For sake of brevity, we forego this exercise.

As mentioned, master equations are a ubiquitously usedtool for the description of stochastic processes, both in theclassical as well as the quantum (see below) domain. Theyallow one to model the evolution of the one-time probabil-ity distribution P(Xt). However, they are not well-suited forthe description of multi-time joint probabilities. This willbe particularly true for the quantum case, where intermedi-ate measurements – required to collect multi-time statistics– unavoidable influence the state of the system. For manyreal world applications though, knowledge about P(Xt) issufficient, making master equations an indispensable tool forthe modeling of stochastic processes. On the other hand, inorder to analyze memory length and strength in detail, one

must – particularly in the quantum case – go beyond the de-scription of stochastic processes in terms of master equations(see Sec. V D). This widening of the horizon beyond masterequations then also enables one to carry over the intuition de-veloped for stochastic processes in the classical case to thequantum realm, as well as a rigorous definition of quantumstochastic processes in the spirit of the KET.

At this stage, it is worth pointing out why Markov pro-cesses are of interest in many cases, and how they fit into thepicture. Suppose we are following the trajectory of a particleat position x at time t, which then moves to x′ at t′. If the dif-ference in time is arbitrarily small, say δt, then for a physicalprocess, x′ cannot be too different from x due to continuity.Thus, it is natural to write down a master equation to describesuch a process. Since the future state will always depend onthe current position, the process will be at least Markovian.Still, the process may have higher-order correlations, but theyare often neglected for simplicity. Importantly, if the processis indeed memoryless, then master equations actually allowfor the computation of all joint probability distributions andprovide a complete picture of the process at hand. Due to theirpractical importance, we now provide some tools that are fre-quently used when dealing with memoryless processes, and togauge deviation from Markovian statistics in an operationallyaccessible way. As before, this short overview is by no meansintended to be comprehensive but merely aimed at providinga quick glimpse of possible ways to quantify memory.

3. Data processing inequality

Somewhat abstractly, a stochastic process can be under-stood as a state being processed in time. Memory, then,means that some information about the past of the state ofthe system at hand is stored and used at a later time to influ-ence the future statistics of the system. Unsurprisingly, themathematical means we use to make this intuition manifestand quantify the presence of memory are borrowed from in-formation theory. Here, we introduce them, starting from thespecial case of divisible processes.

One of the most useful properties of Markov (and, moregenerally, divisible) processes is the satisfaction of the dataprocessing inequality (DPI). Suppose we are able to preparethe system in two possible initial states P(X0) and R(X0),and then subject each to a process Γ(t∶0) to yield P(Xt) =Γ(t∶0)P(X0) and R(Xt) = Γt∶0R(X0), respectively. The in-tuition behind DPIs is that the process has no mechanism toincrease the distinguishability between two initial states un-less it has some additional information.

For instance, a natural measure for distinguishing proba-bility distribution is the so called the Kolmogorov distance orthe trace distance

∥P(X) − R(X)∥1 ∶=1

2∑x

∣P(x) − R(x)∣. (44)

When two states are fully distinguishable, the trace distancewill be 1, which is the maximal value it can assume. Onthe other hand, if the two distributions are the same then thetrace distance will be 0. The DPI guarantees that the distance

Page 15: arXiv:2012.01894v2 [quant-ph] 10 May 2021

15

between distributions is non-increasing under the action ofstochastic maps, i.e.,

∥P(X0) − R(X0)∥1 ≥ ∥P(Xt) − R(Xt)∥1 (45)

for all times t > 0 and equality (for all pairs of initial distri-butions) holds if and only if the process is reversible. Addi-tionally, for Markov processes the DPI will hold for all timest ≥ s, i.e.

∥P(Xs) − R(Xs)∥1 ≥ ∥P(Xt) − R(Xt)∥1. (46)

Conversely, if the distinguishability between two distribu-tions increases at any point of their evolution, then the un-derlying dynamics cannot be Markovian and stochastic mapsΓt∶s between two points in time do not provide a full pictureof the process at hand.

There are many metrics and pseudo-metrics that sat-isfy DPI, but not all. For instance, the Euclidean norm,∥P(X)∥2 ∶=

√∑x P(x)2, does not satisfy the DPI. As

an example consider a two-bit process with initial statesP(X0) Pu and R(X0) Pu, where the second bit’s state isthe uniform distribution Pu. If the process simply discardsthe second bit, then the final Euclidean distance is simplyP(X0) − R(X0). However, the initial Euclidean distance isexactly 1

2(P(X0)−R(X0)). Thus the class of functions that

are contractive under the action of a stochastic matrix are typ-ically good candidates to formulate DPIs.

The DPI plays an important role in information theory be-cause it holds for two important metrics, the mutual infor-mation and the Kullback-Leibler divergence (also known asrelative entropy). For a random variable, the Shannon entropyis defined as

H(X) ∶= −∑x

P(x) log[P(x)]. (47)

The mutual information between two random variablesX andY , that posses a joint distribution P(X,Y ), is defined as

H(X ∶ Y ) ∶= H(X) +H(Y ) −H(XY ). (48)

Here, H(X) is computed from the marginal distributionP(X) = ∑y P(X,Y = y); and H(Y ) is computed fromthe marginal distribution P(Y ) = ∑x P(X = x, Y ). Thecorresponding DPI then is

H(X0 ∶ Y0) ≥ H(Xt ∶ Yt), (49)

under the action of a stochastic matrix. For Markov processeswe have a stronger inequality

H(Xs ∶ Ys) ≥ H(Xt ∶ Yt), (50)

for all times t ≥ s.The relative entropy between two distributions P(X) and

P′(X) is defined as

H[P(X)∥R(X)] ∶= −∑x

P(x) log [R(x)P(x) ] . (51)

Note that this is not a quantity that is symmetric in its argu-ments. The relative entropy is endowed with an operational

meaning as the probability of confusion [38]; that is, if one ispromised R(X) but given P(X) instead, then after n samplesthe confusion probability is quantitatively given by P(X) forR(X)

Prconf = exp(−nH[P(X)∥R(X)]). (52)

We will see later that a similar expression can be employedin the quantification of memory effects in quantum stochasticprocesses (see Sec. VI B 3). The corresponding DPI here hasthe form

H[P(X0)∥R(X0)] ≥ H[P(Xt)∥R(Xt)], (53)

under a stochastic transformation. For Markov processes, weget the stronger version

H[P(Xs)∥R(Xs)] ≥ H[P(Xt)∥R(Xt)], (54)

that holds for all t ≥ s.The behavior of relative entropy and the related pseudo-

metric in quantum and classical dynamics is an ongoing re-search effort [39–41]. The meaning of all of these DPIs forMarkov processes is that the system is progressively loosinginformation as time marches forward. This clearly, has impli-cation on our understanding of the second law of thermody-namics and the arrow of time. There are still other inequal-ities that are being discovered, e.g. see Ref. [37] for the so-called monogamy inequality. For detailed coverage of DPIsee [42, 43]. Moreover, recently, researchers have employedthe so-called entropy cone [44, 45] to infer causality in pro-cesses, which is closely related to many of our interests in thistutorial. However, for brevity, we do not go into these detailshere. Here, we merely aimed to emphasize that metrics thatsatisfy DPI can be used as a herald for non-Markovian be-haviour based on two-time distributions only.

4. Conditional mutual information

Naturally, we can go further in the investigation of the con-nection of memory and correlation measures from informa-tion theory. While Markov processes, i.e., processes withfinite Markov order 1, satisfy the DPI, a general processwith finite Markov order (introduced in Sec. II D) has van-ishing conditional mutual information (CMI), mirroring thefact that such a process is conditionally independent of pastoutcomes that lie further back than a certain memory length(Markov order) of `.

For ease of notation, we will group the times tk, . . . , t0on which the process at hand is defined into three seg-ments: the history H = t1,⋯, tk−`−1, the memory M =

tk−`,⋯, tk−1 and the future F = tk,⋯, tn. With this,the CMI of a joint probability distribution on past, memoryand future is defined as

H(F ∶H∣M)=H(F ∣M)+H(H∣M)−H(F,H∣M), (55)

where the conditional entropy is given by

H(X∣Y ) = H(XY ) −H(Y ). (56)

Page 16: arXiv:2012.01894v2 [quant-ph] 10 May 2021

16

This latter quantity is the entropy of the conditional distri-bution P(X∣Y ) and has a clear interpretation in informationtheory as the number of bits X must send to Y so the latterparty can reconstruct the full distribution.

Consequently, H(F ∶ H∣M) is a measure of the corre-lations that persist between F and H , once the outcomes onM are known. Intuitively then, for a process of Markov or-der `, H(F ∶ H∣M) should vanish as soon as M containsmore than ` times. This can be shown by direct insertion. Re-call that by means of (the general form of) Eq. (10), we canwrite P(F ∣M,H) = P(F ∣M) for a process of Markov order` ≤ ∣M∣, implying

P(F,H∣M) = P(F ∣M)P(H∣M). (57)

This means that H(F,H∣M) = H(F ∣M) +H(H∣M) and,consequently, the CMI in Eq. (55) vanishes. Importantly, theCMI only vanishes for processes with finite Markov order(and ∣M∣ ≥ `), but not in general. If the CMI vanishes, thenthe future is decoupled from the entire history given knowl-edge of the memory. Vanishing CMI can thus be used as analternative, equivalent definition of Markov order.

Following this interpretation, the Markov order then en-codes the complexity of the process at hand, as it is directlyrelated to the number of past outcomes that need to be remem-bered to correctly predict future statistics; if there are d dif-ferent possible outcomes at each time, then no more than d`

different sequences need to be remembered. While, in prin-ciple, ` may be large for many processes, they can often beapproximated by processes with short Markov order. This is,in fact, the assumption that is made when real-life processesare modeled by means of Markovian Master equations.

Additionally, complementing the conditional indepen-dence between history and future, processes with vanishingCMI admit a so-called ‘recovery map’ RM→FM that allowsone to deduce P(F,M,H) from P(M,H) by means of a mapthat only acts on M (but not on H). Indeed, we have

P(F,M,H) = P(F ∣M)P(M,H)=∶ RM→FM[P(M,H)], (58)

where we have added additional subscripts to clarify whatvariables the respective joint probability distributions act on.In spirit, the recovery map is analogous to the map Ξ

(1) wediscussed in Sec. II F in the context of hidden Markov models,with the important difference that, here, the input and outputspaces of RM→FM differ.

While seemingly trivial, the above equation states that thefuture statistics of a process with Markov order ` can be re-covered by only looking at the memory block. Whenever thememory block one looks at is shorter than the Markov order,any recovery map only approximately yields the correct fu-ture statistics. Importantly, though, the approximation erroris bounded by the CMI between F and H [46, 47], providingan operational interpretation of the CMI, as well as quantifi-able reasoning for the memory truncation of non-Markovianprocesses.

While the treatment of concepts used to detect and quan-tify memory we provide here is necessarily cursory, there aretwo simple overall points that will carry over to the quan-tum case. On the one hand, in a process without memory,

the distinguishability between distributions cannot increase,a fact mirrored by the satisfaction of the DPI. Put more in-tuitively, in a process without memory, information is leakedinto the environment but never the other way round, leadingto a ‘wash-out’ of distributions and a decrease in their dis-tunguishability. On the other hand, memory is generally aquestion of conditional independence between outcomes inthe future and the past. One way to make this concept mani-fest is by means of the CMI.

As we will see in Sec. VI C, many of these propertieswill also apply in some form to quantum processes of finiteMarkov order, with the caveat that the question of memorylength possesses a much more layered answer in the quantumcase than it does in the classical one.

D. (Some more) mathematical rigor

In this section we discussed master equation as a means tomodel aspects of stochastic processes in a fashion that is con-tinuous in time. This point of view is somewhat at odds withthe discrete examples and definitions we discussed in the pre-vious sections. As promised above, we shall now define whatwe mean by a stochastic process in more rigorous terms, andthus give a concrete meaning to the probability distributionPT when ∣T∣ is infinite.

Before advancing, a brief remark is necessary to avoid po-tential confusion. In the literature, stochastic processes aregenerally defined in terms of random variables [3, 22], andabove, we have already phrased some of our examples interms of them. However, both in the previous examples, aswell as those that follow, explicit reference to random vari-ables is not a necessity, and all of the results we present canbe phrased in terms of joint probabilities alone. Thus, forego-ing the need for a rigorous introduction of random variablesand trajectories thereof, we shall phrase our formal definitionof stochastic processes in terms of probability distributionsonly. For all intents and purposes, though, there is no differ-ence between our approach and the one generally found in theliterature.

To obtain a definition of stochastic processes on infinitesets of times, we will define stochastic processes – first forfinitely many times, then for infinitely many – in terms ofprobability spaces, which we introduced in Sec. II G. Thiscan be done by merely extending their definition to sequencesof measurement outcomes at (finitely many) multiple times,like, for example, the sequential tossing of a die (with or with-out memory) we discussed above.

Definition (Classical stochastic process) A stochastic pro-cess on times α ∈ Tk with ∣Tk∣ = k < ∞ is a triplet(ΩTk

,STk,PTk

) of a sample space

ΩTk= α∈Tk

Ωα, (59)

a σ-algebra STkon ΩTk

, and a probability measure PTkon

STkwith PTk

(ΩTk) = 1.

The symbol × denotes the Cartesian product for sets. Nat-urally, as already mentioned, the set Tk the stochastic pro-cess is defined on does not have to contain times, but could,

Page 17: arXiv:2012.01894v2 [quant-ph] 10 May 2021

17

as in the case of the die tossing, contain general labels ofthe observed outcomes. Each Ωα corresponds to a samplespace at tα, and the probability measure PTk

∶ STk→ [0, 1]

maps any sequence of outcomes at times tαα∈Tkto its

corresponding probability of being measured. A priori, thisdefinition of stochastic processes is not concerned with therespective mechanism that leads to the probability measurePTk

; above, we have already seen several examples of howit emerges from the stochastic matrices we considered. How-ever, as mentioned, once the full statistics PTk

are known, allrelevant stochastic matrices can be computed. Put differently,once PTk

is known, there is no more information that can belearned about a classical process on Tk.

We now formally define a stochastic process on sets oftimes T, where ∣T∣ can be infinite. Using the mathematicalmachinery we introduced, this is surprisingly simple:

Definition A stochastic process on times α ∈ T is a triplet(PT,ΩT,ST) of a sample space

ΩT = α∈TΩα, (60)

a σ-algebra ST on ΩT, and a probability measure PT on ST

with PT(ΩT) = 1.

While almost identical to the analogous definition forfinitely many times, conceptually, there is a crucial differ-ence between the two. Notably, PT is not an experimentallyreconstructable quantity unless ∣T∣ is finite. Additionally,here, we simply posit the σ-algebra ST. However, gener-ally, the explicit construction of this σ-algebra from scratchis not straightforward, and starting the description of a givenstochastic process on times T from the construction of ST

is a daunting task, which is why, for example, the model-ing of Brownian motion processes does not follow this route.Nonetheless, we often implicitly assume the existence ofan ‘underlying’ process, given by (PT,ΩT,ST) when dis-cussing, for example, Brownian motion on finite sets of times.Connecting finite joint probability distributions to the con-cept of an underlying process is the main achievement of theKolmogorov extension theorem, as we will lay out in detailbelow.

IV. EARLY PROGRESS ON QUANTUM STOCHASTICPROCESSES

Our goal in the present section, as well as the next section,will be to follow the narrative presented in the last two chap-ters to obtain a consistent description of quantum stochas-tic processes. However, the subtle structure of quantum me-chanics will generate technical and foundational problemsthat will challenge our attempts to generalize the theory ofclassical stochastic processes to the quantum domain. Nev-ertheless, it is instructive to understand the kernel of theseproblems before we present the natural generalization in thenext section. Thus we begin with the elements of quantumstochastic processes that are widely accepted. It should benoted that we assume a certain level of mastery of quan-tum mechanics from the reader. Namely, statistical quantum

states, generalized quantum measurements, composite sys-tems, and unitary dynamics. We refer the readers unfamiliarwith these standard elements of quantum theory to textbookson quantum information theory, e.g. [48–50]. However, forcompleteness, we briefly introduce some of these elements inthis section.

The intersection of quantum mechanics and stochastic pro-cesses dates back to the inception of quantum theory. Af-ter all, a quantum measurement itself is a stochastic pro-cess. However, the term quantum stochastic process meansa lot more than that a quantum measurement has to be in-terpreted probabilistically. Perhaps, the von Neumann equa-tion (also due to Landau) is the first instance where elementsof the stochastic process come together with those of quan-tum mechanics. Here, the evolution of a (mixed) quantumstate is written as a master equation, though this equation isfully deterministic. Nevertheless, a few years after the vonNeumann equation, genuine phenomenological master equa-tions appeared to explain atomic relaxations and particle de-cays [51]. Later, further developments were made as necessi-tated, e.g., Jaynes introduced what is now known as a randomunitary channel [52].

Serious and formal studies of quantum stochastic processesbegan in the late 1950s and early 1960s. Two early discov-eries were the exact non-Markovian master equation due toNakajima and Zwanzig [53, 54] as well as the phenomeno-logical study of the maser and laser [55–58]. It took anotherdecade for the derivation of the general form of Markovianmaster equations [59, 60]. In the early 1960s, Sudarshan etal. [61, 62] generalized the notion of the stochastic matrix tothe quantum domain, which was again discovered in the early1970s by Kraus [63].

Here, in a sense, we follow the historic route by not directlyfully generalizing classical stochastic processes to the quan-tum domain, but rather doing it piecewise, with an emphasison the problems encountered along the way. We begin by in-troducing the basic elements of quantum theory and move toquantum stochastic matrices (also called quantum channels,quantum maps, dynamical maps), and discuss their proper-ties and representations. This then lays the groundwork fora consistent description of quantum stochastic processes thatallows one to incorporate genuine multi-time probabilities.

A. Quantum statistical state

As with the classical case we begin with defining the notionof quantum statistical state. A (pure) quantum state ∣ψ⟩ is aray in a d-dimensional Hilbert space HS (where we employthe subscript S for system). Just like in the classical case, dcorresponds to the number of perfectly distinguishable out-comes. Any such pure state can be written in terms of a basis:

∣ψ⟩ =d

∑s=1

cs ∣s⟩ , (61)

where ∣s⟩ is an orthonormal basis, cs are complex num-ber, and we assume d < ∞ throughout this article. Thus thequantum state is a complex vector, which is required to sat-isfy the property ⟨ψ∣ψ⟩ = 1, implying ∑s ∣cs∣

2= 1. It may

Page 18: arXiv:2012.01894v2 [quant-ph] 10 May 2021

18

be tempting to think of ∣ψ⟩ as the quantum generalization ofthe classical statistical state P. However, as mentioned, a statethat is represented in the above form is pure, i.e., there is nouncertainty about what state the system is in. To account forpotential ignorance, one introduces density matrices, whichare better suited to fill the role of quantum statistical states.

Density matrices are written in the form

ρ =n

∑j=1

pj ∣ψj⟩⟨ψj∣ , (62)

which can be interpreted as an ensemble of pure quantumstates ∣ψj⟩nj=1 that are prepared with probabilities pj suchthat∑n

j=1 pj = 1. Such a decomposition is also called a con-vex mixture. Naturally, pure states are special cases of densitymatrices, where pj = 1 for some j. In other words, densitymatrices represent our ignorance about which element of theensemble or the exact pure quantum state we possess. It isimportant though, to add a qualifier to this statement: seem-ingly, Eq. (62) provides the ‘rule’, by which the statisticalquantum state at hand was prepared. However, this decom-position in terms of the pure state is neither unique nor dothe states ∣ψj⟩ that appear in it have to be orthogonal. Forany non-pure density matrix, there are infinitely many waysof decomposing it as a convex mixture of pure states [64–66]. This is in stark contrast to the classical case, where anyprobability vector can be uniquely decomposed as a convexmixture of perfectly distinguishable ‘pure’ states, i.e., eventsthat happen with unit probability.

For a d-dimensional system, the density matrix is a d × dsquare matrix (i.e., an element of the space B(H) of boundedoperators on the Hilbert space H)

ρ =d

∑r,s=1

ρrs ∣r⟩⟨s∣ and ρ ∈ B(H). (63)

Due to physical considerations, like the necessity for proba-bilities to be real, positive, and normalised, the density matrixmust be

• Hermitian ρrs = ρ∗sr,

• positive semidefinite ⟨x∣ρ∣x⟩ ≥ 0 for all ∣x⟩, and

• unit-trace∑r ρrr = 1.

Throughout, we will denote semidefiniteness by ρ ≥ 0. Asnoted above, the density matrix is really the generalization ofthe classical probability distribution P. In fact, a density ma-trix that is diagonal in the computational basis is just a classi-cal probability distribution. Conversely, the off-diagonal ele-ments of a density matrix, known as coherences, make quan-tum mechanics non-commutative and are responsible for theinterference effects that give quantum mechanics its wave-like nature. However, it is important to realize that a densitymatrix is like the single time probability distribution P, inthe sense that it provides the probabilities for any conceiv-able measurement outcome at a single given time to occur. Itwill turn out that the key to the theory of quantum stochasticprocess lies in clearly defining a multi-time density matrix.

There are many interesting properties and distinct originsfor the density matrix. While we have simply heuristically in-troduced it as an object that accounts for the uncertainty aboutwhat current state the system at hand is in, there are more rig-orous ways to motivate it. One way to do so is, e.g., Gleason’stheorem, which is grounded in the language of measure the-ory [67, 68], and, basically, derives density matrices as themost general statistical object that provides ‘answers’ to allquestions an experimenter can ask.

Concerning its properties, a density matrix is pure if andonly if ρ2

= ρ. Any (non-pure) mixed quantum state ρS ,of the system S, can be thought of as the marginal ρS =

trS ′[∣ψ⟩⟨∣SS ′ψ] of a bipartite pure quantum state ∣ψ⟩SS ′ ,which must be entangled. This fact is known as quantumpurification and it is an exceedingly important property thatwe will discuss in Sec. IV B 4. Of course, the same stateρS can also be thought as a proper mixture of an ensembleof quantum states on the space S alone. However, quan-tum mechanics does not differentiate between proper mix-tures and improper mixtures, i.e., mixedness due to entan-glement (see [69] for a discussion of these different conceptsof mixtures). As mentioned, mixtures are non-unique. Thesame holds true for purifications; for a given density ma-trix ρS , there are infinitely many pure states that have it asa marginal.

Finally, let us say a few words about the mathematicalstructure of density matrices. Density matrices are elementsof the vector space of d × d Hermitian matrices, which isd

2-dimensional. Consequently, akin to the decomposition ofpure state in Eq. (61) in terms of an orthonormal basis, a den-sity matrix can also be cast in terms of a fixed set of d2 or-thonormal basis operators:

ρ =d2

∑k=1

rkσk, (64)

where we can choose different sets σk of basis matrices.[70]

They can, for example, be Hermitian observables (e.g., Paulimatrices plus the identity matrix), in which case rk arereal numbers. Also, σk can be non-Hermitian elemen-tary matrices, in which case rk are complex numbers; Inboth cases, we may have the matrix orthonormality condi-tion tr[σj σ†

k] = Nδjk, with N being a normalization con-stant. However, in neither case, the matrices σk correspondto physical states, as there is no set of d2 orthogonal d × dquantum states, since a d-dimensional system can only haved perfectly distinguishable states.

We can, however, drop the demand for orthonormality, andwrite any density matrix as a linear sum of a fixed set of d2

linearly independent density matrices %k

ρ =∑k

qk%k. (65)

Here, qk will be real but generally not positive, see Fig-ure 8. This appears to be in contrast to Eq. (62), wherethe density matrix is written as a convex mixture of physi-cal states. The reason for this distinction is that in the lastequation we have fixed the basis operators %k, which span

Page 19: arXiv:2012.01894v2 [quant-ph] 10 May 2021

19

|x+⟩

|x− ⟩

|y− ⟩ |y+⟩

ρ

Figure 8. Non-convex decomposition. All states in the x − y planeof the Bloch sphere, including the pure states, can be described bythe basis state %1, %2, and %4 in Eq. (69). However, only the statesin shaded region will be convex mixtures of these basis states. Ofcourse, no pure state can be expressed as a convex mixture.

the whole space of Hermitian matrices, and demand that anyquantum state can be written as a linear combination of them,while in Eq. (62) the states ∣ψj⟩ can be any quantum state,i.e., they would have to vary to represent different density ma-trices as convex mixtures. Understanding these distinctionswill be crucial in order to grasp the pitfalls that lie before usas well as to overcome them.

1. Decomposing quantum states

Let us illustrate the concept of quantum states with a con-crete example for d = 2, i.e., the qubit case. A generic stateof one qubit can, for example, be written as

α =1

2(σ0 + a1σ1 + a2σ2 + a3σ3) (66)

in terms of Pauli operators σ1, σ2, σ3 and the identity ma-trix σ0. The set σ0, σ1, σ2, σ3 forms an orthogonal basis ofthe space of 2×2 Hermitian matrices, which implies aj ∈ R,while positivity of α enforces ∑ a

2j ≤ 1. We can write the

same state in terms of elementary matrices

α =∑ij

eij εij where εij = ∣i⟩⟨j∣, (67)

with complex coefficients e00, e01, e10, e11 being

1 + a3

2,a1 − ia2

2,a1 + ia2

2,

1 − a3

2 . (68)

The elementary matrices are non-Hermitian but orthonormal,i.e., tr[εij ε†

kl] = δikδjl.These are, of course, two standard ways to represent a qubit

state in terms of well-known orthonormal bases. On the otherhand, we can expand the same state in terms of the followingbasis states

%1 = ∣+x⟩⟨+x∣ , %2 = ∣+y⟩⟨+y∣ ,%3 = ∣+z⟩⟨+z∣ , %4 = ∣−x⟩⟨−x∣ , (69)

where ∣±x⟩, ∣±y⟩, and ∣±z⟩ are the eigenvectors of σ1, σ2,and σ3, respectively. With this, for any Hermitian matrix α

we have α = ∑k qk%k. It is easy to see that the density ma-trices %k are Hermitian and linearly independent, but not or-thonormal. The real coefficients q1, q2, q3, q4 are obtainedby following the inner product

qk = tr(αD†k) =∑

j

qjtr(%jD†k) (70)

where the set Dk is dual to the set of matrices in Eq. (69)satisfying the condition tr(%iD†

j) = δij . We will see belowthat such dual matrices are a helpful tool for the experimen-tal reconstruction of density matrices. See the Appendix inRefs. [71, 72] for a method for constructing the dual basis.

For example, for the set of density matrices in Eq. (69) thedual set is

D1 =σ0 + σ1 − σ2 − σ3

2, D2 = σ2,

D3 = σ3, D4 =σ0 − σ1 − σ2 − σ3

2,

(71)

Note that, even though the states %k are positive, this dualset Dk does not consist of positive matrices (all the dualsof a set of Hermitian matrices are Hermitian, though [72]).Nonetheless, it gives us the coefficient in Eq. (70) as

1 + a1 − a2 − a3

2, a2, a3,

1 − a1 − a2 − a3

2 . (72)

Interestingly, the dual set of a basis itself also forms a linearbasis, and we can write any state α as

α =∑k

pkD†k, (73)

where pk = tr(α%k). Note that, if all basis elements %k arepositive semidefinite, then pk ≥ 0, and we have ∑k pk = 1if ∑k %k = 11. This decomposition in particular lends itselfnicely to experimental reconstruction of the state α. Specifi-cally, given many copies of α, the value tr(α%k) is obtainedby projecting α along directions x, y, z, i.e., measuring theobservables σ1, σ2, and σ3. The inner product tr(α%k) isthen nothing more than a projective measurement along di-rection k and pk is the probability of observing the respectiveoutcomes. Importantly, as the duals Dk can be computedfrom the basis %k, these probabilities then allow us to esti-mate the state via Eq. (73).

Intuitively, this procedure is not unlike the way in whichone determines the classical state of a system; for example,in order to determine the bias of a coin, one flips it manytimes and records the respective outcome probabilities forheads and tails. The crucial difference is quantum mechanicsis that one must measure in different directions to fully con-struct the state of interest. Algebraically, this fact is reflectedby the dimension of the space of d × d Hermitian matrices,which is d2 dimensional, thus, in order to fully determine adensity matrix, one needs to know its overlap with d2 linearlyindependent Hermitian matrices. If, however, one knows inwhich basis the state one aims to represent is diagonal – asis the case in classical physics – then the overlap with the dprojectors that make up its eigenbasis is sufficient.

Page 20: arXiv:2012.01894v2 [quant-ph] 10 May 2021

20

The procedure to estimate a quantum state by measuring itis called quantum state tomography [73–75]. There are manysophisticated methods to this nowadays, which we will onlybriefly touch on in this tutorial.

2. Measuring quantum states: POVMs and dual sets

As we have seen, a quantum state can be recronstructedexperimentally, by measuring enough observables (above, theobservables σ1, σ2, and σ3 were used), and collecting the cor-responding outcome probabilities. Performing pure projec-tive measurement is not the only way in quantum mechanicsto gather information about a state. More generally, a mea-surement is described by a positive operator valued measure(POVM), a collection J = Eknk=1 of positive operators(here, matrices), that add up to the identity, i.e., ∑k Ek = 11(we will comment on the physical realizability of POVMs be-low; for the moment, they can just be thought of as a naturalgeneralization of projective measurements). Each Ek corre-sponds to a possible measurement outcome, and the probabil-ity to observe said outcome is given by the Born rule:

pk = tr(ρEk). (74)

Projective measurements are then a special case of POVMs,where Ek = ∣k⟩⟨k∣ and ∣k⟩ are the eigenstates of themeasured observable. For example, when measuring theobservable σ3, the corresponding POVM is given by J =

∣+ z⟩⟨+z∣, ∣− z⟩⟨−z∣, and the respective probabilities arecomputed via p± = ⟨±z∣ρ∣ ± z⟩ = tr(ρ∣ ± z⟩⟨±z∣).

A less trivial example on a qubit is the symmetric in-formationally complete (SIC) POVM [73] J = Ek =

12∣φk⟩⟨φk∣4

k=1, where

∣φ1⟩ = ∣0⟩ ,

∣φk⟩ =√

13∣0⟩ +

√23ei2(k−2)π

3 ∣1⟩ for k = 2, 3, 4.(75)

While still pure (up to normalization), these POVM elementsare not orthogonal. However, as they are linearly indepen-dent, they span the d2

= 4-dimensional space of Hermitianqubit matrices, and every density matrix is fully character-ized once the probabilities pk = tr(ρEk) are known. Asthis holds true in any dimension for POVMs consisting of d2

linearly independent elements, such POVMs are called ‘in-formationally complete’ (IC).[76] Importantly, using the ideasoutlined above, an informationally complete POVM allowsone to fully reconstruct density matrices.

In short, to do so, one measures the system with an IC-POVM, whose operators Ek linearly span the matrix spaceof the system at hand. The POVM yields probabilities pk,and the measurement operators Ek have a dual set ∆k.The density matrix is then of the form (see also Eq. (73))

ρ =∑k

pkƠk, (76)

which can be seen by direct insertion; the above state yieldsthe correct probability with respect to each of the POVM ele-

ments Ek. Concretely, we have

tr(ρEk) =∑`

p`tr(∆`Ek) = pk , (77)

where we have used tr(∆`Ek) = δ`k. As the POVM is infor-mationally complete, this implies the state defined in Eq. (76)yields the correct probabilities with respect to every POVM.

It remains to comment on the existence of IC-POVMs, andthe physical realizability of POVMs in general, which, at firstsight, appear to be a mere mathematical construction. Con-cerning the former, it is easy to see that there always existsa basis of d2 × d

2 Hermitian matrices that only consists of

positive elements. Choosing such a set Fkd2

k=1 of positive

elements, one can set F ∶= ∑d2

k=1 Fk. By construction, F ispositive semi-definite. Without much loss of generality, let usassume that F is invertible (if it is not, then we could workwith the pseudo-inverse in what follows). Then,

J = Ek = F −1/2FkF

−1/2d2

k=1 (78)

constitutes a set of positive matrices that add up to 11. To see

that the matrices Ekd2

k=1 are linearly independent, let us as-sume the opposite and that, for example, E1 can be written

in terms of the remaining Ek, i.e., E1 = ∑d2

k=2 akEk. Mul-tiplying this expression from the left and the right by F 1/2

then yields F1 = ∑d2

k=2 akFk, which contradicts the original

assumption that the matrices Fkd2

k=1 are linearly indepen-

dent. Consequently, the set Ekd2

k=1 is an IC POVM. Morepragmatically, one could make one’s life easier and sample

d2 − 1 positive matrices Ekd

2−1k=1 according to one’s mea-

sure of choice. In general, these sampled matrices are lin-early independent. Then, one chosses an α > 0 such that

Ed2 ∶= 11 −∑kEkd2−1k=1 ≥ 0. With this, the set Ekd

2

k=1 isa POVM by construction, and in general also informationallycomplete.

With respect to the latter, i.e., the physical realizability ofPOVMs, due to Neumark’s theorem [77–79], any POVM canbe realized as a pure projective measurement in a higher-dimensional space, thus putting them on the same founda-tional footing as ‘normal’ measurements in quantum mechan-ics. Without going any deeper into the theory of POVMs, letus emphasize the take-home message of the above sections:quantum states can be experimentally reconstructed in a verysimilar way as classical states, by simply collecting sufficientstatistics; however, the number of necessary measurementsis larger and their structure is richer. While this latter pointseems innocuous, it actually lies at the heart of the prob-lems one encounters when generalizing classical stochasticprocesses to the quantum realm, like the break-down of theKET; if all necessary measurements could be chosen to be di-agonal in the same basis, then there would be no fundamentaldifference between classical and quantum processes.

Page 21: arXiv:2012.01894v2 [quant-ph] 10 May 2021

21

B. Quantum stochastic matrix

Our overarching aim is to generalize the notion of stochas-tic processes to quantum theory. Here, after having dis-cussed quantum states and their experimental reconstructionin the previous section, we generalize the notion of classi-cal stochastic matrices. In the classical case a stochastic ma-trix, in Eq. (12), is a mapping of a statistical state from timetj to time tk, i.e., Γ(k∶j) ∶ P(Xj) ↦ P(Xk). As such,in clear analogy, we are looking for a mapping of the formE(k∶j) ∶ ρ(tj) ↦ ρ(tk). While there are different representa-tions of E(k∶j) (see, for example, Ref. [72]), we start with theone that most closely resembles the classical case, where aprobability vector gets mapped to another probability vectorby means of a matrix Γ(k∶j). We have already argued that thedensity matrix is the quantum generalization of the classicalprobability distribution. Then, consider the following trans-formation that turns a density matrix into a vector:

ρ =∑rs

ρrs ∣r⟩⟨s∣⟷ ∣ρ⟩⟩ ∶=∑rs

ρrs ∣rs⟩⟩ , (79)

where we use the ∣ r⟩⟩ notation to emphasize that the vectororiginally stems from a matrix. This procedure is often calledvectorization of matrices, for details see Refs. [49, 80–82].

Next, in clear analogy to Eq. (14), we can define a matrixE that maps a density matrix ρ (say, at time tj) to anotherdensity matrix ρ′ (say, at time tk); we have added the symbol˘ to distinguish the map E from its matrix representation E .Using the above notation, this matrix can be expressed as

E ∶= ∑r′s′,rs

Er′s′,rs ∣r′s′⟩⟩⟨⟨rs∣ (80)

and the action of E can be written as

∣E[ρ]⟩⟩ = E ∣ρ⟩⟩ = ∑r′s′rs

Er′s′,rsρrs ∣r′s′⟩⟩ = ∣ρ′⟩⟩ (81)

Here, E is simply a matrix representing the map E ∶B(Hi) → B(Ho),[83] very much like the stochastic ma-trix, that maps the initial state to final state. For betterbook-keeping, we explicitly distinguish between the input (i)Hilbert space Hi and output (o) Hilbert space Ho and denotethe space of matrices on said spaces by B(Hx). While for theremainder of this tutorial, the dimensions of these two spacesgenerally agree, in general, the two are allowed to differ, andeven in the case where they do not, it proves advantageous tokeep track of the different spaces.

It was with the above intuition Sudarshan et al. called Ethe quantum stochastic matrix [61]. In today’s literature, it isoften referred to as a quantum channel, quantum dynamicalmap, etc. Along with many names, it also has many repre-sentations. We will not go much into these details here (seeRef [72] for further information). We will, however, brieflydiscuss some of its important properties. Note, that we stickto discrete level systems and do not touch the topics of Gaus-sian quantum information [84, 85].

Amplitude damping channel. Before that, let us quicklyprovide an explicit examples of a quantum stochastic matrix.

Consider a relaxation process that takes any input quantumstate to the ground state. Such as process is, for example,described by the so-called amplitude damping channel

EAD(t∶0) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 1 − p(t)

0√p(t) 0 0

0 0√p(t) 0

0 0 0 p(t)

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (82)

This matrix acts on a vectorized density matrix of a qubit,i.e., ∣ρ(0)⟩⟩ = [ρ00, ρ01, ρ10, ρ11]T to yield ∣ρ(t)⟩⟩ = [ρ00 +(1−p(t))ρ11,

√p(t)ρ01,

√p(t)ρ10, p(t)ρ11]T. When p(t) =

exp−γt, we get relaxation exponentially fast in time, andfor t → ∞, any input state will be mapped to [1, 0, 0, 0]T.This example is very close in spirit of the classical examplein Eq. (40).

Here, already, it is easy to see that the matrices E , unliketheir classical counterparts, do not possess nice properties,like Hermiticity, or stochasticity (note that, for example, nei-ther the rows not the columns of EAD

(t∶0) sum to unity). How-ever, these shortcomings can be remedied in different rep-resentations of E . Also note that, here, we actually have afamily of quantum stochastic matrices parameterized by time(for each time twe have in general a different map). When wespeak of a family of maps we will label them with subscript(t ∶ 0). However, often the stochastic matrix only representsa mapping from the initial time to a final time. In such cases,we will omit the subscript and refer to the initial and finalstates as ρ and ρ′, respectively.

1. Linearity and tomography

Having formally introduced quantum maps, the generaliza-tion of stochastic matrices in the classical case, it is now timeto discuss the properties they should display. We begin withone of the most important features of quantum dynamics. Thequantum stochastic map, like its classical counterpart, is a lin-ear map:

E[αA + βB] = αE[A] + βE[B]. (83)

This is straightforwardly clear for the specific case of thequantum stochastic matrix E because the vectorization of adensity matrix itself is also a linear map, i.e., ∣A +B⟩⟩ =

∣A⟩⟩ + ∣B⟩⟩. Once this is done, the rest is just matrix trans-formations, which are linear.

The importance of linearity cannot be overstated; we willexploit this property over and over and, in particular, the lin-earity of quantum dynamics plays a crucial role in definingan unambiguous set of Markov conditions in quantum me-chanics and, as we have seen, it is the fundamental ingredientin the experimental reconstruction of quantum objects. Dueto linearity, a quantum channel is fully defined once its ac-tion on a set of linearly independent states is known. Froma practical point of view, this is important for experimentallycharacterizing quantum dynamics by means of a procedure

Page 22: arXiv:2012.01894v2 [quant-ph] 10 May 2021

22

known as quantum process tomography [73, 86] (see, e.g.,Refs. [87, 88] for a more in-depth discussion).

To see how this works out in practice, let us prepare a setof linearly independent input states, say %k, and determinetheir corresponding outputs E[%k] by means of quantumstate tomography (which we discussed above). The corre-sponding input-output relation then fully determines the ac-tion of the stochastic map on any density matrix

E[ρ] =∑k

qkE[%k], (84)

where we have used Eq. (65), i.e, ρ = ∑k qk%k. The aboveequation highlights that, once the output states for a basis ofinput states are known, the action of the entire map is deter-mined.

Using ideas akin to the aforementioned tomography ofquantum states we can also directly use linearity and dual setsto reconstruct the matrix E . Above, we saw that a quantumstate is fully determined, once the probabilities for an infor-mationally complete POVM are known. In the same vein, aquantum map is fully determined, once the output states for

a basis %jd2

j=1 of input states are known. Concretely, setting

ρ′j = E[%j], and denoting the dual set of %jd

2

j=1 by Dkd2

k=1,we have

E =d2

∑j=1

∣ρ′j⟩⟩⟨⟨Dj∣. (85)

Indeed, it is easy to see that ⟨⟨A∣B⟩⟩ = tr(A†B), implying

that, with the above definition, we have E ∣%j⟩⟩ = ∣ρ′j⟩⟩ forall basis elements. Due to linearity, this implies that E yieldsthe correct output state for any input state. Measuring theoutput states for a basis of input states is thus sufficient toperform process tomography. Specifically, if the output statesρ′j are measured by means of an informationally complete

POVM Ek with corresponding dual set ∆kd2

k=1, then ρ′j =

∑k p(j)k ∆

†k, with p(j)k = tr(%jEk) and Eq. (85) reads

E =d2

∑j,k=1

p(j)k ∣∆k⟩⟩⟨⟨Dj∣. (86)

As the experimenter controls the states they prepare at eachrun (and, as such, the dual Dj corresponding to each run),as well as the POVM they use, determining the probabilitiesp(j)j thus enables the reconstruction of E (see Figure 9 for

a graphical representation). While we have not discussed thereconstruction of classical stochastic maps in detail, it is clearthat it works in exactly the same vein. The above argumentsonly hinge on linearity as their main ingredient, implying that,analogous to the quantum case, a classical stochastic matrixΓ(t∶0) is determined once the resulting output distributionsΓ(t∶0)[Pj] for a basis of input distributions Pj are known.

Figure 9. Quantum process tomography. Any quantum channel Ecan be reconstructed by preparing a basis of input states and measur-ing the corresponding output states states ρ′j = E[%j] with an infor-mationally complete POVM. The corresponding duals Dj, ∆kand the outcome probabilities then allow for the reconstruction of Eaccording to Eq. (86).

2. Complete positivity and trace preservation

While linearity is a crucial property of quantum channels,it is naturally not the only pertinent one. A classical stochas-tic matrix maps probability vectors to probability vectors. Assuch, it is positive, in the sense that it maps any vector withpositive semi-definite entries to another positive semi-definitevector. In the same vein, quantum channels need to be posi-tive, as they have to map all density matrices to proper den-sity matrices, i.e., positive semi-definite matrices to positivesemi-definite matrices.

One crucial difference between classical stochastic mapsand their quantum generalization is the requirement of com-plete positivity. A positive stochastic matrix is guaranteed tomap probabilities into probabilities even if it acts non-triviallyonly on a subpart (implying that only some but not all degreesof freedom undergo the stochastic process at hand), i.e.,

ΓAPA = RA ≥ 0⇔ (ΓA ⊗ 11B)PAB = RAB ≥ 0, (87)

for all PA and PAB where A and B are two different spacesand 11B is the identity process on B. Here, P ≥ 0 meansthat all the entries of the vector are positive semi-definite, andwe have given all objects additional subscripts to denote thespaces they act/live on.

The same is not true in quantum mechanics. Namely, thereare maps that take all density matrices to density matrices ona subspace, but their action on a larger space fails to mapdensity matrices to density matrices, i.e.,

EA[ρA] ≥ 0 but EA ⊗ IB[ρAB] ≱ 0, (88)

where IB is the identity map on the systemB, i.e., IB[ρB] =ρB for all ρB , and ρ ≥ 0 means that all eigenvalues of ρ arepositive semidefinite. These maps are called positive maps,and they play an important role in the theory of entanglementwitnesses [89, 90]. The most prominent of a positive, but notcompletely positive map is the transposition map ρ → ρ

T. Itis easy to show that positivity breaks only when the map Eacts on an entangled bipartite state (which is why positivityand complete positivity coincide in the classical case). Ofcourse, giving up positivity of probability is not physical, and

Page 23: arXiv:2012.01894v2 [quant-ph] 10 May 2021

23

as such a positive map that is not also positive when acting ona part of a state is not physical.

One thus demands that physical maps must take all den-sity matrices to density matrices, even when only acting non-trivially on a part of them, i.e.,

EA ⊗ IB[ρAB] ≥ 0 ∀ ρAB ≥ 0. (89)

Maps, for which this is true for the arbitrary size of the sys-tem B are called completely positive (CP) maps [1] and arethe only maps we will consider throughout this tutorial (for adiscussion of non-completely positive maps and their poten-tial physical relevance – or lack thereof – see, for example,Refs. [72, 91–95]).

In addition to preserving positivity, i.e., preserving the pos-itivity of probabilities, quantum maps also must preserve thetrace of the state ρ, which amounts to preserving the normal-ization of probabilities. This is the natural generalization ofthe requirement on stochastic matrices that their columns sumup to 1. Consequently, for a quantum channel, we demandthat it satisfies

tr(E[ρ]) = tr(ρ) ∀ρ. (90)

If a map EA is trace-preserving, then so is EA ⊗ IB . Wewill refer to completely positive maps that are also trace-preserving as CPTP maps, or quantum channels. Impor-tantly, while the physicality of non-completely positive mapsis questionable, we will frequently encounter maps that areCP, but only trace non-increasing instead of trace-preserving.Such maps are the natural generalizations of POVM elementsand will play a crucial role when modeling quantum stochas-tic processes. Together, linearity, complete positivity, andtrace preservation fully determine the set of admissible quan-tum channels.

3. Representations

In the classical case, stochastic matrices map vectors tovectors, and are thus naturally expressed in terms of matri-ces. In contrast, here, quantum channels map density matricesto density matrices, raising the question of how to representtheir action for concrete calculations. We will not discuss thedetails of different representations here in much depth andonly provide a rather basic introduction; we refer the readerto Ref. [49, 72, 96] for further reading.

Above, we have already seen the matrix representation E ofthe quantum stochastic map in Eq. (80). This representation israther useful for numerical purposes, as it allows one to writethe action of the map E as a matrix multiplication. However,it is not straightforward to see how complete positivity andtrace preservation enter into the properties of E . When we addthese two properties to the mix, there are two other importantand useful representations that prove more insightful. Firstis the so-called Kraus representation of completely positivemaps:

E[ρ] =∑j

Kj ρK†j , (91)

Figure 10. Choi-Jamiołkowski isomorphism. A map E ∶B(Hi) → B(Ho) can be mapped to a matrix ΥE ∈ B(Ho ⊗Hi)by letting it act on one half on a maximally entangled state. Notethat, for ease of notation, here, we let E act on B(Hi′), such that

ΥE ∈ B(Ho ⊗Hi). As Hi′≅ Hi, this is merely a relabeling and

not of conceptual relevance.

where the linear operators Kj are called Kraus opera-tors [63, 97] (though, this form was first discovered in [61]).For the case of input and output spaces of the same size,Kraus operators are simply d× d matrices, hence are just op-erators on the Hilbert space. For this reason E is often calleda superoperator, i.e, an operator on operators. Denoting – asabove – the space of matrices on a Hilbert space H by B(H),it can be shown that a map E ∶ B(Hi) → B(Ho) is CP iff itcan be written in the Kraus form (91) for some d×d matricesKj. For the ‘if’ part, note that for any ρAB ≥ 0, we have

∑j

(Kj ⊗ 11B)ρAB(K†j ⊗ 11B) =∑

j

MjM†j , (92)

where Mj ∶= (Kj⊗11B)√ρAB . Since any matrix that can be

written as MjM†j is positive, we see that a map E that allows

for a Kraus form maps positive matrices of any dimensiononto positive matrices, making E completely positive. The‘only if’ part is discussed below after introducing a secondimportant representation of quantum channels.

Finally, the CP map E is trace-preserving, iff the Krausoperators satisfy ∑j K

†jKj = 11, which can be seen directly

from tr(∑j KjρK†j ) = tr(∑j K

†jKjρ).

Depolarizing channel. Let us consider a concrete exam-ple of the Kraus representation. A common quantum mapthat one encounters in this representation is the depolarizingchannel on qubits:

EDP[ρ] =3

∑j=0

pj σj ρ σj with pj ≥ 0, ∑j

pj = 1, (93)

where σj are the Pauli operators. This map is an exampleof a random unitary channel [98], i.e., it is a probabilisticmixture of unitary maps. When the pj are uniform the imageof this map is the maximally mixed state for all input states. Itis straightforward to see that the above map is indeed CPTP,as it can be written in terms of the Kraus operators Kj =√pjσj, and we have∑j K

†jKj = ∑j pjσjσj = 11.

Another important representation that nicely incorporatesthe properties of complete positivity and trace preservation isthat in terms of so-called Choi matrices. For this representa-tion of E , consider its action on one part of an (unnormalized)

Page 24: arXiv:2012.01894v2 [quant-ph] 10 May 2021

24

maximally entangled state ∣Φ+⟩ ∶= ∑di

k=1 ∣kk⟩:

ΥE ∶=E ⊗ I [∣Φ+⟩⟨Φ+∣]

=

di

∑k,l=1

E [∣k⟩⟨l∣]⊗ ∣k⟩⟨l∣,(94)

where ∣k⟩ is an orthonormal basis of Hi. See Figure 10 fora graphical depiction. The resultant matrix ΥE ∈ B(Ho ⊗Hi) is isomorphic to the quantum map E . This can easily beseen by noting that in the last equation E is acting on a com-plete linear basis of matrices, i.e., the elementary matricesεkl ∶= ∣k⟩⟨l∣. Consequently, ΥE contains all informationabout the action of E . In principle, instead of Φ

+, any bipartitevector with full Schmidt rank could be used for this isomor-phism [99], but the definition we use here is the one encoun-tered most frequently in the literature. In the form of (94) itis known as the Choi-Jamiołkowski isomorphism (CJI) [100–102]. It allows one to map linear maps, E ∶ B(Hi)→ B(Ho)to matrices ΥE ∈ B(Ho)⊗ B(Hi).

Usually, ΥE is called the Choi matrix or Choi state of themap E . We will mostly refer to it by the latter. Given ΥE , theaction of E can be written as

E[ρ] = tri [(11o ⊗ ρT)ΥE] , (95)

where tri is the trace over the input space Hi and 11o denotesthe identity matrix on Ho. The validity of (95) can be seen bydirect insertion of (94):

tri [(11o⊗ρT)ΥE] =di

∑k,`=1

⟨`∣ρT∣k⟩ E [∣k⟩⟨l∣]

=

di

∑k,`=1

E [⟨k∣ρ∣`⟩ ∣k⟩⟨`∣]

=E[ρ],

(96)

where we have used the linearity of E . A related decomposi-tion of the Choi state is

ΥE =∑k

ρ′j ⊗ D

∗j , (97)

where ρ′j = E[ρj] are the output states corresponding to abasis of input states. The above equation is simply a reshuf-fling of Eq. (85) from vectors to matrices. As was the casefor Eq. (85), its validity can be checked by realizing that theChoi state ΥE yields the correct output state for a full basis ofinput states, which can be seen by direct insertion of Eq. (97)into (95).

For quantum maps, ΥE has particularly nice properties.Complete positivity of E is equivalent to ΥE ≥ 0, and itis straightforward to deduce from Eq. (95) that E is trace-preserving iff tro(ΥE) = 11i (see below for a quick justifi-cation of these statements). These properties are much moretransparent, and easier to work with than, for example, theproperties that make E a matrix corresponding to a CPTPmap. Additionally, Eq. (95) allows one to directly relate the

representation of E in terms of Kraus operators to the Choistate ΥE , and, in particular, the minimal number of requiredKraus operators to the rank of ΥE . Specifically, in terms ofits eigenbasis, ΥE can be written as ΥE = ∑r

α=j λj∣Φj⟩⟨Φj∣,where r = rank(ΥE) and λj ≥ 0. Inserting this into Eq. (95),we obtain

E[ρ]=r

∑j=1

⎛⎜⎝√λj

di

∑α=1

⟨α∣Φj⟩⟨α∣⎞⎟⎠ρ⎛⎜⎝√λj

di

∑β=1

∣β⟩ ⟨Φj∣β⟩⎞⎟⎠

=∶r

∑j=1

KjρK†j , (98)

where ∣α/β⟩ is a basis of Hi. The above equation pro-vides a Kraus representation of E with the minimal numberof Kraus operators (for more details on this connection be-tween Choi matrices and Kraus operators, see, for example,Ref. [103]). Eq. (98) directly allows us to provide the miss-ing properties of the Kraus and Choi representations that wealluded to above. Firstly, if E is CP, then naturally, ΥE is pos-itive, which can be seen from its definition in Eq. (94). Onthe other hand, if ΥE is positive, then Eq. (98) tells us that itleads to a Kraus form, implying that the corresponding map Eis completely positive. In addition, this means that any com-pletely positive map admits a Kraus decomposition, a claimwe made several paragraphs above. Finally, from Eq. (95)we see directly that tro(ΥE) = 11i holds for all for trace pre-serving maps E . Naturally, all representations of quantummaps can be transformed into one another; details how this isdone can be found, for example, in Refs. [72, 104]; however,for our purposes, it will prove very advantageous to predom-inantly use Choi states.

Depolarizing Channel. For concreteness, let us considerthe above case of the depolarizing channel EDP and provideits Choi state. Inserting Eq. (93) into Eq. (94), we obtain

ΥDPE =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

p0 + p3 0 0 p0 − p3

0 p1 + p2 p1 − p2 0

0 p1 − p2 p1 + p2 0

p0 − p3 0 0 p0 + p3

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (99)

The resulting matrix ΥDPE is positive semidefinite (with

corresponding eigenvalues 2p0, 2p1, 2p2, 2p3), satisfiestroΥ

DPE = 11i, and trΥ

DPE = 2 = d

i.Besides its appealing mathematical properties, the CJI is

also of experimental importance. Given that a (normalized)maximally entangled state can be created in practice, the CJIenables another way of reconstructing a representation of themap E ; letting it act on one half of a maximally entangledstate and reconstructing the resulting state via state tomogra-phy directly yields ΥE . While this so-called ancilla-assistedprocess tomography [105, 106] requires the same number ofmeasurements as the input-output procedure, it can be – de-pending on the experimental situation – easier to implementin the laboratory.

Additionally, since they simply correspond to quantumstates with an additional trace condition, Choi states straight-forwardly allow for the analysis of correlations in time –

Page 25: arXiv:2012.01894v2 [quant-ph] 10 May 2021

25

Figure 11. Stinespring Dilation. Any CPTP map on the system Scan be represented in terms of a unitary on a larger space SA andfinal discarding of the additional degrees of freedom. The corre-sponding unitary can be computed by means of Eq. (105). Here, forsimplicity, we drop the explicit distinction between input and outputspaces we normally employ.

in the same way as quantum states do in space. Conse-quently, below, when analyzing pertinent properties of quan-tum stochastic processes, like, for example, quantum Markovorder, many of the results will be most easily be phrased interms of Choi states and we will make ample use of them.

4. Purification and Dilation

At this point, after having pinned down the properties ofstatistical objects in quantum mechanics, it is worth takinga short detour and comment on the origin of stochasticity inthe quantum case, and how it differs from the classical one.Importantly, in quantum mechanics any mixed state ρS canbe thought of as the marginal of a pure state ∣Ψ⟩SS ′ in ahigher dimensional space that. I.e., for any ρS , there exists apure state ∣Ψ⟩SS ′ such that trS ′(∣Ψ⟩SS ′⟨Ψ∣) = ρS . The state∣Ψ⟩SS ′ is then called a purification of ρS . The mixedness ofa quantum state can thus always be considered as stemmingfrom ignorance about about additional degrees of freedom.This is in contrast to classical physics, which is not endowedwith a purification principle.

To show that such a purification always exists, recall thatany mixed state ρS is diagonal in its eigenbasis ∣r⟩S, i.e.,

ρS =∑r

λr ∣r⟩S⟨r∣ , with λr ≥ 0 and ∑r

λr = 1. (100)

This state can, for example, be purified by

∣Ψ⟩SS ′ =∑r

√λr ∣r⟩S ∣r⟩S ′ . (101)

More generally, as a consequence of the Schmidt decom-position, any state ∣Ψ⟩SS ′′ that purifies ρS is of the form∣Ψ⟩SS ′′ = ∑r

√λr ∣r⟩SW (∣r⟩S ′), where W is an isometry

from space S ′ to S ′′. Importantly, ∣Ψ⟩SS ′ is entangled be-tween S and S ′ as soon as ρS is mixed, i.e., as soon as λr < 1for all r.

As entangled states lie outside of what can be captured byclassical theories, classical mixed states do not admit a pu-rification in the above sense – at least not one that lies withinthe framework of classical physics. Randomness in classicalphysics can thus not be considered as stemming from igno-rance of parts of a pure state in a higher-dimensional space,but it has to be inserted manually into the theory. On the other

hand, any quantum state can be purified within quantum me-chanics, and thus randomness can always be understood asignorance about extra degrees of freedom.

Purification example. To provide an explicit example,consider the purification of a maximally mixed state on ad-dimensional system ρS =

1d∑di=r ∣r⟩S⟨r∣. Following

the above reasoning, this state is, for example, purified by∣Φ+⟩SS ′ ∶= 1√

d∑di=1 ∣r⟩S ∣r⟩S ′ , which is the maximally en-

tangled state.Remarkably the purification principle also holds for dy-

namics, i.e., any quantum channel can be understood as stem-ming from a unitary interaction with an ancillary system,while this does not hold true for classical dynamics. Thisformer statement can be most easily seen by direct construc-tion. As we have seen, quantum channels can be representedin terms of their Kraus operators as

E[ρS] =∑j

KjρSK†j , where ∑

j

K†jKj = 11S . (102)

The above can be easily rewritten as an isometry in terms ofoperators Kj ∈ B(HS) and vectors ∣j⟩E ∈ HE :

VS→SE ∶=∑j

Kj ⊗ ∣j⟩E =∶ V, (103)

satisfying V †V = 11S . Consequently, the number of Kraus

operators determines the dimension dE of the environmentthat is used for the construction.[107] With this, we have

E[ρS] = trE(V ρSV †) = trE(UρS ⊗ ∣0⟩⟨0∣EU †). (104)

The second equality comes from the fact that any isometry Vcan be completed to a unitary USE→SE =∶ U (see, for exam-ple, Ref. [108] for different possible constructions). For com-pleteness, here, we provide a simple way to obtain U fromV : Let ∣`⟩S

dS−1`=0 (∣α⟩E

dE−1α=0 ) be an orthogonal basis of

the system (environment) Hilbert space. By construction (seeEq. (104)), we have U ∣`0⟩SE = V ∣`⟩S . Consequently, Ucan be written as

U =∑`=0

V ∣`⟩S ⟨`0∣SE +∑`=0,α=1

∣ϑ`,α⟩SE ⟨`α∣SE , (105)

where SE ⟨ϑ`′,α∣V ∣`⟩S= 0 for all `, `′ and α ≥ 1 and

⟨ϑ`′,α′∣ϑ`,α⟩ = δ``′δαα′ . Such a set ∣ϑ`′,α⟩ of orthogonalvectors can be readily found via a Gram-Schmidt orthogonal-ization procedure. It is easy to verify that the above matrix Uis indeed unitary.

The fact that any quantum channel can be understood aspart of a unitary process is often referred to as Stinespringdilation [109]. Together with the possibility to purify everyquantum state, it implies that all dynamics in quantum me-chanics can be seen as reversible processes, and randomnessonly arises due to lack of knowledge. In particular, we have

E[ρS] = trES ′[U(ΨSS ′ ⊗ ∣0⟩⟨0∣E)U †], (106)

where we have omitted the respective identity matrices.

Page 26: arXiv:2012.01894v2 [quant-ph] 10 May 2021

26

On the other hand, in the classical case, the initial stateof the environment is definite, but unknown, in each run ofthe experiment. Supposing that the system too is initializedin a definite state, then any pure interaction, i.e., a permuta-tion will always yield the system in a definite state. In otherwords, if any randomness exists in the final state, the random-ness must have been present a priori, either in the initial stateof the environment or in the interaction. A classical dynam-ics that transforms pure states to mixed states, i.e., one that isdescribed by stochastic maps, can thus not have come froma permutation and pure states only, and stochasticity in clas-sical physics does not stem from ignorance about additionaldegrees of freedom alone.

As both of these statements, the purifiability of quan-tum states and quantum channels ensure the fundamentalreversibility of quantum mechanics, purification postulateshave been employed as one of the axioms in reconstructingquantum mechanics from first principles [110, 111].

Purification of dephasing dynamics. Before advancing,it is insightful to provide the dilation of an explicit quantumchannel. Here, we choose the so-called dephasing map on asingle qubit:

EDD(t∶0)[ρ(0)] = ρ(t)

ρ(0) = (ρ00 ρ01

ρ10 ρ11)↦ ρ(t) =

⎛⎜⎜⎝

ρ00 e−γt

ρ01

e−γt

ρ10 ρ11

⎞⎟⎟⎠.

(107)

In what follows, whenever a dynamics is such that the off-diagonal elements vanish exponentially in a given basis, wewill call it pure dephasing. The above channel can be repre-sented with two Kraus matrices

K0(t) =√

1 + e−γt

2σ0, K1(t) =

√1 − e−γt

2σ3. (108)

Following Eq. (103), the corresponding isometry is given byV (t) = K0 ⊗ ∣0⟩E +K1 ⊗ ∣1⟩E , which implies

V (t) ∣0⟩S =√

1 + e−γt

2∣00⟩SE +

√1 − e−γt

2∣01⟩SE ,

V (t) ∣1⟩S =√

1 + e−γt

2∣10⟩SE −

√1 − e−γt

2∣11⟩SE .

(109)

From this, we can construct the two remaining vectors∣ϑ01⟩SE (t) and ∣ϑ11⟩SE (t) to complete V (t) to a unitaryU(t) by means of Eq. (105). For example, we can make thechoice

∣ϑ01(t)⟩SE =√

1 − e−γt

2∣10⟩SE +

√1 + e−γt

2∣11⟩SE ,

∣ϑ11(t)⟩SE =√

1 − e−γt

2∣00⟩SE −

√1 + e−γt

2∣01⟩SE .

(110)

It is easy to check that these vectors indeed satisfySE ⟨ϑ`′,α∣V ∣`⟩

S= 0 for all `, `′ and α ≥ 1, as well as

⟨ϑ`′,α′∣ϑ`,α⟩ = δ``′δαα′ . This, then, provides a unitary matrixU(t) that leads to the above dephasing dynamics:

U(t) =V (t) ∣0⟩S ⟨00∣SE + V (t) ∣1⟩S ⟨10∣SE+ ∣ϑ01(t)⟩⟨01∣SE + ∣ϑ11(t)⟩⟨11∣SE .

(111)

Insertion into Eq. (104) then shows that the thusly definedunitary evolution indeed leads to dephasing dynamics on thesystem.

Finally, while we chose to introduce quantum channels interms of the natural properties one would demand from them,we could have chosen the converse route, starting from theassumption that every dynamics can be understood as a uni-tary one in a bigger spaces, thus positing an equation alongthe lines of Eq. (106) as the starting point. Unsurprisingly,this, too, would have yielded CPTP maps, since we have

E[ρS] = trES ′[U(ΨSS ′ ⊗ ∣0⟩⟨0∣E)U †] =∑α

KαρSK†α ,

(112)

where Kα ∶= ⟨α∣U∣0⟩, α is a basis of HE and ρS =

trS ′(ΨSS ′). Since this is a Kraus decomposition that satis-fies∑αK

†αKα = 11S , the dynamics of an initial system state

that is initially uncorrelated with the environment (here in thestate ∣0⟩⟨0∣E) and evolves unitarily on system plus environ-ment, is always given by a CPTP map on the level of thesystem alone. We will use this fact – amongst others – lateron when we lay out how to detect non-Markovianity based onquantum master equation approaches.

C. Quantum Master Equations

While we have yet to formalize the theory of quantumstochastic processes (in the sense that we have yet to explorehow to obtain multi-time statistics in quantum mechanics),the quantum stochastic matrix formalism is enough to keepus occupied for a long time. In fact, much of the active re-search in the field of open quantum system dynamics is con-cerned with the properties of families of quantum channels.It should be already clear that the quantum stochastic matrix,like its classical counterpart, only deals with two-time cor-relations, see Figures 6 and 24, and can thus not provide acomplete description of quantum stochastic processes. Thisanalogy goes further; as is the case on the classical side, animportant family of stochastic matrices corresponds to quan-tum master equations.[112] Before fully generalizing the con-cept of stochastic processes to the quantum realm, let us thushave a quick – and very superficial – look at quantum masterequations and witnesses of non-Markovianity that are basedon them.

Quantum master equations have a long and rich historydating back to the 1920s. Right at the inception of modernquantum theory, Landau derived a master equation for lightinteracting with charged matter [113]. This should not besurprising because master equations play a key role in un-derstanding the real phenomena observed in the lab. Forthe same reason, they are widely used tools for theoretical

Page 27: arXiv:2012.01894v2 [quant-ph] 10 May 2021

27

physicists and beyond, including quantum chemistry, con-densed matter physics, high-energy physics, material science,and so on. However, the formal derivation of overarchingmaster equations took another thirty years. Nakajima andZwanzig independently derived exact memory kernel mas-ter equations using the so-called projection operator method.Since then there have an enormous number of studies ofnon-Markovian master and stochastic equations [114–137],spanning from exploring their mathematical structure, study-ing the transition between the Markovian and non-Markovianregime [138, 139] and applying them to chemistry or con-densed matter systems. Here, we will not concern ourselveswith these details and limit our discussion to the overarchingstructure of the master equation, and in particular how to tellMarkovian ones apart from non-Markovian ones. We referthe reader to standard textbooks for more details [2, 3, 5] onthese aspects as well as proper derivations, which we will notprovide in this section.

The most general quantum master has a form already fa-miliar to us. We simply replace the probability distributionin Eq. (36) with a density matrix to obtain the Nakajima-Zwanzig master equation[140]

d

dtρ(t) = ∫

t

sK(t, τ)[ρ(τ)] dτ. (113)

Above, K(t, τ) is a super-operator[141] that is called the mem-ory kernel. Often, this equation is written in two parts

d

dtρ(t) = −i[H, ρ(t)] +D[ρ(t)] + ∫

t

sK(t, τ)[ρ(τ)] dτ,

(114)

where D is called the dissipator with the form

D[ρ(t)] =d2

∑n,m=1

γj (Ljρ(t)L†j −

1

2L†

jLj , ρ(t)) . (115)

Above, the first term on the RHS corresponds to a unitary dy-namics, the second term is the dissipative part of the process,and the third terms carries the memory (which can also be dis-sipative). We note in passing that we have yet to define whatMarkovian and non-Markovian actually mean in the quantumrealm, and how the ensuing definitions relate to their clas-sical counterparts. We will provide a rigorous definition ofthese concepts in Sec. VI, while here, for the moment, weshall content ourselves with the vague ‘definition’ that non-Markovian processes are those where memory effects playa non-negligible role; Markovian processes are those wherememory effects are absent.

While the Nakajima-Zwanzig equation is the most gen-eral quantum master equation, the rage in the 1960s and1970s was to derive the most general Markovian master equa-tion. It took well over a decade to get there, after many at-tempts, see Ref. [142] for more on this history and Ref. [143]for a pedagogical treatment. Those who failed in this en-deavor were missing a key ingredient, complete positivity. In1976, this feat was finally achieved by Gorini-Kossakowski-Sudarshan [59] and Lindblad [60] independently.[144] A quan-tum Markov process can be described by this master equation,

now known as the GKSL master equation. Eq. (114) alreadycontains the GKSL master equation in the sense that the finalterm vanishes for the Markov process

d

dtρt = L[ρt] with L[ r] = −i[H, r] +D[ r], (116)

and, here L stands for Louivillian, but often called Lindbla-dian. Intuitively, the above master equation is consideredmemoryless, since it does not contain the integral over paststates that is present in Eq. (114) (we will see in Sec. VI A 2that this intuition is somewhat too simplistic, since there areprocesses that carry memory effects but can nonetheless bemodeled by a GKSL equation.) If L is time independent,then the above Master equation has the formal solution

ρt = Et∶r[ρr] = eL(t−r)[ρr] , (117)

where ρr is the system state at time r. From this, we seethat the respective dynamics between two arbitrary times rand t only depends on t − r, but not the absolute times rand t (or any earlier times). Using eL(t−r)

= eL(t−s)

eL(s−r),

this implies the often used semi-group property of Markoviandynamics

Et∶r = Et∶s Es∶r (118)

for t ≥ s ≥ r.Some remarks are in order. The decomposition in Eq. (114)

is not always unique. Often, a term dubbed as the inhomo-geneous term is present and it is due to the initial system-environment correlations. As we will outline below, describ-ing dynamics with initial correlations in terms of quantumchannels (and thus, master equations), is operationally dubi-ous and the interpretation of an inhomogeneous term as stem-ming from initial correlations thus somewhat misdirected.

In the Markovian case, the super-operators in Eq. (116)should be time-independent. In fact, it is possible to de-rive master equations for non-Markovian processes that lookjust like Eq. (116) but then the super-operators will be time-dependent and the rates γj may be negative [31, 128, 145–148] (while they are always positive in the Markovian case).For a Markovian master equation, the operators Lj are di-rectly related to the Kraus operators of the resulting quantumchannels [149]. Since Eq. (114) is the most general form ofa quantum master equation it contains equations due to Red-field, Landau, Pauli, and others. To reach this equation oneusually expands and approximates the memory kernel. This isa field of its own and we cannot do justice to these methods orthe reasoning behind the approximations here (for a compari-son of the validity of the different employed approximations,see, for example, Ref. [150, 151]).

As with the classical case, the above master equations ex-press the statistical quantum state continuously in time.[152]

They can be either derived from first principles by makingthe right approximations, or as ad hoc phenomenological dy-namical equations that model the pertinent peroperties of aprocess at hand (see, for example, Ref. [3] for a wide array ofdifferent derivations of quantum master equations).

As before, it may be tempting to think that the master equa-tion is equivalent to a stochastic process as defined above.

Page 28: arXiv:2012.01894v2 [quant-ph] 10 May 2021

28

However, just as in the classical case, the quantum masterequation only accounts for two-point correlations. This canbe seen intuitively by realizing that the solution of a masterequation is a family of quantum channels, each correpsondingto two-time correlations, or, more directly, by employing thetransfer tensor method [33–35], which shows that the RHS ofEq. (114) can be expressed as a linear combination of prod-uct of quantum maps E(c∶b) E(b∶a), with c being either t ort − dt, b = s, and a being either the initial time. A quantummap E(b∶a) is a mapping of a preparation at time a to a densitymatrix at time b. Thus, it only contains correlations betweentwo times a and b. The LHS can be computed by setting b = tand a = t − dt. Another formal method for showing that theRHS can be expressed as a product of two stochastic matricescan be done by means of the Laplace transform [31, 32].

This also puts into question our somewhat lax use of theterm ‘Markovian’ in the above discussion. As we discussedin the classical case, Markovianity is a statement about condi-tional independence for multi-time probability distributions.How, then, can a master equation that is concerned with two-point correlations only, be Markovian? Indeed, as we shallsee below it is possible to have physical non-Markovian pro-cesses (i.e., processes that do not display the correct con-ditional independence) that can be described by what wedubbed a Markovian master equation in Eq. (116). That is,the implication only goes one way; a Markov process alwaysleads to a master equation of Eq. (114) form with the finalterm vanishing. The converse does not hold. We detail an ex-ample below, but to fully appreciate it we must have a betterunderstanding of multi-time quantum correlations. Nonethe-less, while it is not possible to unambiguously deduce theMarkovianity of a process from limited statistical data only,one can, just like in the classical case, already use it to detectthe presence of memory effects.

D. Witnessing non-Markovianity

As mentioned above, Markovian processes will lead tomaster equations of the form of Eq. (116) and, in turn, can befully described by the resulting family of CPTP maps. Thus,having access to the stochastic matrix and master equationis already sufficient to witness departures from Markovian-ity. That is, there are certain features and properties that mustbelong to any Markovian quantum processes, which then al-lows for discriminating between Markov and non-Markovprocesses.

1. Initial correlations

Consider the dynamics of a system from an initial time tosome final time. When the system interacts with an environ-ment the process on the system can be described by a mapE(t∶0). As we showed in Eq. (104), such a map can be thoughtto come from unitary system-environment dynamics, with thecaveat that the initial system-environment has no correlations(in Eq. (104), it was of the form ρS ⊗ ∣0⟩⟨0∣). Alreadyin the 1980s and 1990s, researchers began to wonder what

happens if the initial system-environment state has correla-tions [91, 92, 153]. Though this may – at first glance – seemunrelated to the issue of non-Markovianity, the detectablepresence of initial correlation is already a non-Markovian ef-fect. This is because initial correlations indicate past interac-tions and if the initial correlations affect the future dynamicsthen the future dynamics are a function of the state of thesystem at t = 0, as well as further back in the past. As thisis in line with an intuitive definition of non-Markovianity (athorough one will be provided later), the observable presenceof initial correlations constitutes an experimentally accessiblewitness for memory [154–159].

We emphasize, that the presence of initial correlations doesnot make the resulting process non-Markovian per se; sup-pose there are initial correlations whose presence cannot bedetected on the level of the system, then these initial system-environment correlations do not lead to non-Markovianity. If,however, it is possible to detect an influence of such corre-lations on the behavior of the system (for example, by ob-serving a breakdown of complete positivity [91, 94, 95] orby means of a local dephasing map [155, 157]), then the cor-responding process is non-Markovian. With this in mind, inwhat follows, by ‘presence’ of correlations, we will alwaysmean ‘detectable presence’.

A pioneering result on initial correlations and open sys-tem dynamics was due to Pechukas in 1995 [91]. He arguedthat either there are no initial correlations or we must giveup either the complete positivity or the linearity of the dy-namics. Around the same time, many experiments began toexperimentally reconstruct quantum maps [160–162]. Sur-prisingly, many of these experiments revealed not completelypositive maps. This began a flurry of theoretical research ei-ther arguing for not-completely-positive (NCP) dynamics orreinterpreting the experimental results [94, 163–170]. How-ever, this does not add to the physical legitimacy of NCP pro-cesses [171]. Nevertheless, NCP dynamics remains as a wit-ness for non-Markovianity. We will show below that all dy-namics, including non-Markovian ones, must be completelypositive. We do this by getting around the Pechukas theo-rem by paying attention to what it means to have a state inquantum mechanics. For the moment though, the take-homemessage is that, one way to detect memory is to devise exper-iments that can detect initial correlations between the systemand the environment.

Two illustrative examples. Let us conclude this discus-sion of initial correlations with two examples that highlightthe problems encountered in the presence of initial correla-tions. To this end, we consider a single qubit system interact-ing with a single qubit environment. We let the initial corre-lated state be

ρSE(0) =1

4(11S ⊗ 11E + a ⋅ σS ⊗ 11E + g σ

yS ⊗ σ

zE) .(119)

The system-environment interaction is chosen to be

USE = ∏j=x,y,z

cos(ωt)11S ⊗ 11E − sin(ωt)σjS ⊗ σjE .

(120)

Page 29: arXiv:2012.01894v2 [quant-ph] 10 May 2021

29

We are of course interested in only the reduced dynam-ics of the system. The initial state of the system is ρS(0) =12(11S + a ⋅ σS) and under the above unitary, it evolves to

ρS(t) =1

2[11S + c

2ωa ⋅ σS − g cωsωσ

xS] . (121)

where where cω ∶= cos(2ωt) and sω ∶= sin(2ωt).Example 1. In the first instance, in order to provide a full

basis of input states, we will fix the correlation term (i.e., thethird term in Eq. (119)) and vary the system state alone. Bychoosing a to be κ(±1, 0, 0)T, κ(0, 1, 0)T, and κ(0, 0, 1)T

gives us a linearly independent set of input states given inEq. (69). Here, κ is a number less than 1 to ensure that thetotal SE state is positive. The quantum stochastic matrixis straightforwardly constructed plugging the output statesalong with the dual basis in Eq. (71) in Eq. (85).

This process is easily shown to be not completely-positivity. To do so, we compute the Choi state using Eq. (97)to get

ΥE =1

2

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 + c2ω 0 −g cωsω 2c2ω

0 1 − c2ω 0 −g cωsω

−g cωsω 0 1 − c2ω 0

2c2ω −g cωsω 0 1 + c2ω

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (122)

Two of the four eigenvalues of this Choi matrix turn out tobe 1

2(1 − cos

2(2ωt) ± g cos(2ωt) sin(2ωt)), which are notalways positive, i.e., the process is not completely-positive.Seemingly then, initial correlations lead to dynamics that arenot CP.

Example 2. One argument against the above procedureis that there is no operational mechanism for varying thestate of the system, while keeping the correlation term fixed.This is indeed true and a major flaw with the programme ofNCP dynamics. However, we could envisage the case wherethe initial state of the system is prepared, starting from thecorrelated state of Eq. (119), by means of projective oper-ations along directions of the vectors a ∈ κ(±1, 0, 0)T,κ(0, 1, 0)T

, κ(0, 0, 1)T. The Choi state of the resulting map,in this case, turns out to be

ΥE =1

2

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 + c2ω 0 0 f

0 s2ω − ig

2s2ω 0

0 ig

2s2ω s

2ω 0

f∗

0 0 1 + c2ω

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

, (123)

where f ∶= −1 − c2ω +ig

2s2ω . This map is perfectly oper-

ational (in the sense that there is a clear procedure of how toobtain it experimentally) and still we find that the dynamicsare NCP, since the above matrix is not positive semidefinite..

One of the key assumptions in the construction of CP maps(and linear maps in general) is that the state of the environ-ment is the same for all possible inputs of the system, and thus

does not depend on how the system of interest was prepared.This is clearly not the case here, as the correlation term van-ishes for three of these projections, but not for the projectionalong eigenvectors of σyS . This is the source of NCP dynam-ics and this is the reason why NCP dynamics are an indica-tor for non-Markovianity, since system-environtment correla-tions constitute a memory of past interactions.

However, there is a bigger question looming over us: howdo we decide whether the first or the second map is the validone? Worse yet, we can also construct a third map; let us as-sume that the initial state has the Bloch vector a = κ(1, 0, 0),and we prepare the basis of input states by applying local uni-tary operations to this initial state. In this case, the correlationterm will be different for each preparation and we will get an-other map that will also be NCP. It turns out that there areinfinite ways of preparing the input states [172] and each willlead to a different dynamical map, see Ref. [169] several ex-amples related to the two presented here. This is not tenableas we do not have a unique description for the process (sincethe maps we obtain depend on how we created the basis ofinput states) and it violates the CP condition. Moreover, allof these maps will further violate the linearity condition. Thatis, a map should have value in predicting the future state ofthe system, given an arbitrary input state. The above mapin example 2, has little to no predictive power; when prepar-ing an initial system state that is not one of the original basisstates, the action of the map of Eq. (123) will not yield thecorrect output state. The map in example 1 does have pre-dictive power, but it is not an operationally feasible objectbecause, as mentioned before, there is no way to manipulatea without changing the correlation term. In Sec. V C we willshow how these problems are avoided in a systematic, andoperationally well-defined manner manner.

2. Completely positive and divisible processes

Going beyond this rather static marker of non-Markovianity in terms of initial system-environmentcorrelations, we can extend the concept of divisibility thatwe first discussed in Sec. III C 2 for the classical case to thequantum case. A quantum process is called divisible if

E(t∶r) = E(t∶s) E(s∶r), ∀r, s, t. (124)

Here, stands for the composition of two quantum maps.Since they are not necessarily matrices, the composition may– depending on the chosen representation – not be a simplematrix product. Moreover, in the quantum case we now fur-ther require that each map here is completely positive andthus such a class of processes is referred to as CP divisibleprocesses.

Understanding the divisibility of quantum maps and giv-ing it an operational interpretation is a highly active area ofresearch [173–182], and we will only scratch the surface.Importantly, as we have seen above, processes that satisfya GKSL equation are divisible (see Eq. (118)), making thebreakdown of divisibility a witness of non-Markovianity.

Now, if r = 0 then we can certainly run a set of experimentsto determine the quantum maps E(t∶0) for all t by means

Page 30: arXiv:2012.01894v2 [quant-ph] 10 May 2021

30

of quantum process tomography outlined above. These mapswill be CP as long as there are no initial correlations. But howdo we determine the intermediate dynamics E(t∶s) for s > 0?One possible way is to infer an intermediate process from thefamily of maps E(t∶0) by inversion

ζ(t∶s) ∶= E(t∶0) E−1(s∶0), (125)

provided the maps E(t∶0) are invertible. We deliberately la-bel this map with a different letter ζ, as it may not actuallyrepresent a physical process [183]. Now, if the process isMarkovian then ζ(t∶s) = E(t∶s), i.e., it corresponds indeed tothe physical evolution between s and t, and it will be com-pletely positive. Conversely, if we find that ζ(t∶s) is not CPthen we know that process is non-Markovian.

Example of an indivisible process. To provide some moreconcrete insight, let us provide an example of a process thatis not divisible. To this end, we consider the initial state inEq. (119), along with the interaction in Eq. (120). Here, wewill take the limit of g → 0, thus dropping the correlationterm and rendering the initial system-environment state un-correlated. The Choi matrix for the dynamics of the systemis then given by

ΥE =1

2

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 + c2ω 0 0 −2c2ω

0 1 − c2ω 0 0

0 0 1 − c2ω 0

−2c2ω 0 0 1 + c2ω

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (126)

While this process is CP when considered from the initialtime (since system and environment are initially uncorre-lated), it is not divisible, simply because it is not possibleto ‘divide’ cos(2ωt) into a product of two function readily.More concretely, due to the oscillatory nature of the process,many of the possible inferred maps ζ(t∶s) of Eq. (125) wouldbe NCP. On the other hand, for a process where cω is replacedby something like exp(−ωt), the process would become di-visible, see Eq. (130).

Working with divisible processes has several advantages.Two that we have already discussed in the classical case arethe straightforward connection to the master equation anddata processing inequality (which also holds in the quantumcase). We can use these to construct further witnesses fornon-Markovianity, such as those based on the trace distancemeasure [184]. The amplitude damping channel in Eq. (82)and the dephasing channel in Eq. (107) are both divisible aslong as they relax exponentially. Otherwise, they are indivis-ible processes, which is easily checked numerically. As inthe classical case, the logic surrounding CP divisibility andits relationship to Markovianity is as follows: If a process isMarkovian, it is also CP divisible (the converse does not nec-essarily hold). A breakdown of CP divisibiltiy thus signalsnon-Markovian effects, without the need to investigate multi-time statistics. Pursuing this line of reasoning further, thereare properties that hold for CP divisible processes, like, forexample, quantum data processing inequalities (see below).Instead of checking for the breakdown of CP divisibility, one

can thus check the breakdown of other properties as a proxy.This, however, will lead to succesively weaker (but poten-tially more easily accessible) witnesses of non-Markovianity.

We mention briefly that CP divisibility is not the only typeof divisibility of open quantum system dynamics. There ex-ists a vast body of research on different types of divisibilityfor quantum processes, their stratification and interconnect-edness [185–188], as well as the closely related question ofsimulatability of quantum and classical channels adn dynam-ical maps [189–192]. Here,we will not dive into these fieldsin detail.

3. Snapshot

As in the classical case, when a process is divisible it willbe governed by Markovian master equation of GKSL typeof Eq. (116). Following the classical case, in Eq. (39), theLiouvillian for the quantum process can be obtained via

d

dtρ(t) = lim

∆t→0

Et+∆t∶t − I∆t

ρ(t) = L[ρt]. (127)

This, in turn, means that

ρ(t) = E(t∶0)[ρ(0)] with E(t∶0) = eLt (128)

We can now reverse the implication to see if a process isMarkovian by considering the process E(t∶0) for some t. Wecan take the log of this map, which has to be done carefully,to obtain L. If the process is Markovian then exp(Ls) willbe CP for all values of s. If this fails then the process mustbe non-Markovian, provided it is also symmetric under timetranslation. That is, we may have a Markovian process thatslowly varies in time, and may fail this test. This witness wasone of the first proposed for quantum processes [185, 189].Once again, note that here only two-time correlations are ac-counted for and, again, we use the term ‘Markovian masterequation’ in a rather lax sense; most importantly, besides rea-soning by intuition, we have not yet defined what a Marko-vian quantum process actually is). Unsurprisingly then, thiswitness will miss processes that are non-Markovian at higherorders.

Dephasing dynamics. Let us clarify these concepts bymeans of a concrete example. The dephasing process intro-duced in the previous subsection is divisible and thus can bedescribed by a Markovian master equations. To obtain this,we simply differentiate the state at time t to get

d

dtρ(t) = γe

−γt

2(σ3ρσ3 − ρ). (129)

The quantum stochastic matrix for this process is

E(t∶0) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 0

0 e−γt

0 0

0 0 e−γt

0

0 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (130)

Since the matrix is diagonal, it can be triviallyseen to be divisible by the fact that exp (−γt) =

Page 31: arXiv:2012.01894v2 [quant-ph] 10 May 2021

31

exp (−γ(t − s)) exp (−γs). Consequently, the under-lying process could – in principle – be Markovian. InSec. VI A 2 we will revisit this example and show thatthere are non-Markovian processes, where the two pointcorrelations have this exact form.

4. Quantum data processing inequalities

As mentioned, just like in the classical case, in quantummechanics, data processing inequalities hold, with the dis-tinction that here, they do not apply to stochastic matrices,but to quantum channels, i.e., CPTP maps. Specifically, thereare several distance measure that are proven to be contractiveunder CPTP dynamics [193, 194]:

f[ρ, σ] ≥ f[E(ρ), E(σ)]. (131)

Three prominent examples are the quantum trace distance

∥ρ − σ∥1 ∶= tr∣ρ − σ∣, (132)

the quantum mutual information,

S(A ∶ B) = S(ρA) + S(ρB) − S(ρAB), (133)

and the quantum relative entropy

S(ρ∥σ) = −tr[ρ log(σ) − log(ρ)]. (134)

All of these are defined as in the classical case, with the soledifference that for the latter two we replace Shannon entropywith von Neumann entropy, S(ρ) ∶= −tr[ρ log(ρ)]. Sincedivisible processes can be composed of independent CPTPmaps, they have to satisfy data processing inequalities be-tween any two points in time. Violation of the DPI thus im-plies a break-down of CP divisibility, and thus heralds thepresence of memory effects.

Two of the most popular witnesses of non-Markovianity [184, 195, 196], derived using the firsttwo data processing inequalities, were introduced about adecade ago. In particular, Ref. [196] proposed to prepare amaximally entangled state of a system and an ancilla. Thesystem is then subjected to a quantum process. Under thisprocess, if the quantum mutual information (or any othercorrelation measure) between the system and ancilla behavesnon-monotonically then the process must be non-Markovian.A similar argument was proposed by Ref. [184] using thetrace distance measure. It can be shown that the formeris a stronger witness than the latter [197]. Nonetheless,even this latter witness of non-Markovianity is generallynot equivalent to the break-down of CP divisibility, as thereare processes that behave monotonically under the abovedistance measures, but are not CP divisible [178, 198–200].We will not delve into the details of these measures heresince there are excellent reviews on these topics readilyavailable [7, 8] (for an in-depth quantitative study of thesensitivity to memory effects of correlation-based measures,see, for example, Ref. [201]).

With this, we come to the end of our very cursory dis-cussion of quantum stochastic processes in terms of master

equations and two-point correlations. We emphasize that theabove is meant less as a pedagogical introduction to the field,but rather as a brief (incomplete) overview of the machinerythat exists to model processes and detect memory by meansof master equations and their properties. While very power-ful and widely applicable in experimental settings, the readershould also have noted the natural shortcomings of this ap-proach. On the one hand, it cannot account for multi-timestatistics, thus not providing a complete framework for thedefinition and treatment of quantum stochastic processes. Onthe other hand, as a direct consequence of these shortcom-mings, we somehow had to awkwardly beat around the bushwhen it came to a proper definition of Markovianity in thequantum case. The remainder of this tutorial will be fo-cused on working out the explicit origin of the difficultieswith defining quantum stochastic processes, and how to over-come them.

E. Troubles with quantum stochastic processes

Do we need more (sophisticated) machinery than fam-ilies of quantum stochastic maps and quantum masterequations[202] to describe stochastic quantum phenomena?Perhaps for a long time, the machinery introduced above wassufficient. However, as quantum technologies gain sophisti-cation and as we uncover unexpected natural phenomena withquantum underpinnings, the above tools do not suffice [203–205]. Take for example the pioneering experiments that haveargued for the persistence of quantum effects on time scalesrelevant for photosynthetic processes [206–211], and, in par-ticular, that these processes might exploit complex quantummemory effects arising from the interplay of the electronicdegrees of freedom – constituting the system of interest –and the vibrational degrees of freedom – playing the role ofthe environment. In these experiments, three ultra-short laserpulses are fired at the sample and then a signal from the sam-ple is measured. The time between each pulse, as well asthe final measurement, are varied. This system itself is meso-scopic and therefore certainly an open system. The conclu-sion from these experiments is based on the wave-like featurein the signal, see the video in the supplementary materials ofRef. [207]. This experiment is fundamentally making use offour-time correlations and thus requiring more sophisticationfor its description than the above machinery affords us.

Another important example is the mitigation of non-Markovian noise in quantum computers and other quantumtechnologies [212–216] which can display non-trivial multi-time statistics. Finally, as we already mentioned in our dis-cussion of classical Master equations, in order to make as-sertions about multi-time statistics, it is inevitable to accountfor intermediate measurements, which cannot be done withinapproaches to quantum stochastic processes based on masterequations. It seems reasonable then, to aim for a direct gen-eralization of the description of classical stochastic processesin terms of multi-time statistics to the quantum realm. How-ever, as we unveil next, there are fundamental problems thatwe must overcome first before we can describe multi-timequantum correlations as a stochastic process.

Page 32: arXiv:2012.01894v2 [quant-ph] 10 May 2021

32

Z Z

X ZZ|z+⟩

|z− ⟩|x− ⟩

|z− ⟩|z− ⟩

ρ

ρ

Figure 12. Simple quantum process that violates the assump-tions of the KET. Successive measurements of the spin of a spin- 1

2particle do not allow one to predict the statistics if the intermediatemeasurement is not conducted. Here, measuring in the x−basis isinvasive, and thus summing over the respective outcomes is not thesame as not having done the measurement at all.

1. Break down of KET in quantum mechanics

As we have mentioned in Sec III B, one of the fundamen-tal theorems for the theory of classical stochastic processes,and the starting point of most books on them, is the Kol-mogorov extension theorem (KET). It hinges on the fact thatjoint probability distributions of a random variable S pertain-ing to a classical stochastic process satisfy consistency condi-tions amongst each other, like, for example, ∑s2

P(S3, S2 =

s2, S1) = P(S3, S1); a joint distribution on a set of times canalways be obtained by marginalization from one on a largerset of times. Fundamentally, this is a requirement of non-invasiveness, as it implies that not performing a measurementat a time is the same as performing a measurement but forget-ting the outcomes.

While seemingly innocuous, this requirement is not ful-filled in quantum mechanics, leading to a breakdown of theKET [217]. To see this, consider the following concatenatedStern-Gerlach experiment [218] (depicted in Figure 12): Let aqubit initially be in the state ∣x+⟩ = 1√

2(∣z+⟩+∣z−⟩), where

∣z+⟩ , ∣z−⟩ are the pure states corresponding to projectivemeasurements in the z-basis yielding outcomes z+, z−. Now,the state is measured sequentially (with no intermediate dy-namics happening) in the z-, x- and z-direction at times t1, t2and t3 (see Figure 12). These measurements have the possi-ble outcomes z+, z+ and x+, x− for the measurementin z- and x-direction, respectively. It is easy to see that theprobability for any possible sequence of outcomes is equal to1/8. For example, we have

P(z+, x+, z+) = 1

8. (135)

Now, summing outcomes at time t2, we obtain the marginalprobability ∑s2=x±

P(z+, s2, z+) = 1/4. However, by con-sidering the case where the measurement is not made at t2,it is easy to see that P(S3 = z+, S1 = z+) = 1/2. Theintermediate measurement changes the state of the system,and the corresponding probability distributions for differentsets of times are not compatible anymore [8, 219]. Here, forexample, when summing over the outcomes at t2, the cor-responding transformation of the state of the system corre-

Figure 13. Perturbed coin with interventions. Between measure-ments, the coin – which initially shows heads – is perturbed andstays on its side with probability p and flips with probability 1 − p,leading to a stochastic matrix Γ between measurements. Using theirinstrument, upon measuring an outcome, the experimenter flips thecoin. Here, this is shown for the outcome of hh. For most values ofthe probability p, this process – despite being fully classical – doesnot satisfy the requirement of the KET.

sponds to

ρt−2 ↦ ρt+2 = ⟨x + ∣ρt−2 ∣x+⟩ ∣x+⟩⟨x + ∣+ ⟨x − ∣ρt−2 ∣x−⟩ ∣x−⟩⟨x − ∣ , (136)

which, in general, does not coincide with the state ρt−2 rightbefore t2. Does this then mean that there is no singular ob-ject that can describe the joint probability for a sequence ofquantum events? Alternatively, what object would describe aquantum stochastic process if it cannot be a joint probabilitydistribution?

Seemingly, the breakdown of consistency conditions pre-vents one from properly reconciling the idea of an underlyingprocess with its manifestation on finite sets of times, as wedid in classical theory by means of the KET. However, some-what unsurprisingly, this obstacle is one of formalism, and nota fundamental one, in the sense that marginalization is moresubtle for quantum processes than it is for classical ones; in-troducing a proper framework for the description of quantumstochastic processes – as we shall do below in Sec. V – bringswith it a natural way of marginalization in quantum mechan-ics, that contains the classical version as a special case, andalleviates the aforementioned problems.

2. Input / output processes

As outlined above, the breakdown of the KET comes fromthe fact that in general, quantum measurements are invasive.Analogously, our understanding of classical stochastic pro-cesses, and with it the consistency between different observedjoint probability distributions are built upon the idea that clas-sical measurements are non-invasive. However, depending onthe ‘instrument’ J an experimenter uses to probe a system,this assumption of non-invasiveness might not be fulfilled,even in classical physics.

To see this, consider the example of a perturbed coin, thatflips with probability p and stays on the same side with prob-ability 1 − p (see Figure 13). Instead of merely observingoutcomes, an experimenter could actively interfere with theprocess. As there are many different ways, how the experi-menter could interfere at each point in time, we have to spec-ify the way in which they probe, or, in anticipation of later

Page 33: arXiv:2012.01894v2 [quant-ph] 10 May 2021

33

matters, what instrument they use, which we will denote byJ .

For example, upon observing heads or tails, they could al-ways flip the coin to tails and continue perturbing it. Or, uponobserving an outcome, they could flip the coin, i.e., h↦ t andt ↦ h. Finally, they could just leave it on the side they foundit in and let the perturbation process continue. Let us refer tothe latter two instruments JF and JI , respectively.

Now, let us assume, that, before the first perturbation, thecoin shows heads. Then, if at t1 we choose the instrumentsJ1 = JF that, upon observing an outcome, flips the coin, weobtain, e.g.,

P(F2 = h, F1 = t∣J1 = JF) = p(1 − p)P(F2 = h, F1 = h∣J1 = JF) = p(1 − p). (137)

This means that P(F2 = h) = 2p(1 − p) when J1 = JF . Onthe other hand, if the experimenter does not actively changethe state of the coin at the first time, i.e., J1 = JI , uponperturbation, the coin will show h with probability 1−p and twith probability p at time t1. Then, the probability to observeh at time t2 is given by

P(F2 = h) = (1 − p)2+ p

2, (138)

which does not coincide with 2p(1−p) except if p = 12

. Thusthe two cases do generally not coincide and the requirementsof the KET are not fulfilled.

As, here, the experimenter can observe the output of theprocess, and freely choose what they input into the process,these processes are often called input-output processes andare subject of investigation in the field of computational me-chanics [220]. A priori, it might seem arbitrary to allow foractive interventions in classical physics. However, such op-erations naturally occur in the field of causal modeling [221],where they can be used to deduce causal relations betweenevents; indeed, the only way to see whether two events arecausally connected is to change a parameter at one of themand see if this change affects the outcome statistics at theother one.

On the other hand, while in classical physics, it is a choice(up to experimental accuracy that is) to actively interfere withthe process at hand, in quantum mechanics, such an activeintervention due to measurements – even projective ones –can generally not be avoided. Considering classical processeswith interventions thus points us in the right direction as tohow quantum stochastic processes should be modeled.

Concretely, when active interventions are employed,the outcome statistics are conditional on the choicesof instruments the experimenter made to probe a pro-cess at hand. Consequently, such a setup would notbe described by a single joint probability distributionP(Fn, . . . , F1), but rather by conditional probabilities of theform P(Fn, . . . , F1∣Jn, . . . ,J1). It is exactly this depen-dence of observed probability distributions on the employedinstruments that we will encounter again when describingquantum stochastic processes.

Given that the breakdown of the KET can even occur inclassical physics, one might again pause and wonder if thereactually exists such a thing as a classical stochastic process

2 3 4 51 6 7

Alice Bob Charlie David

Figure 14. Spatial Measurements. Alice, Bob, Charlie, and Davidperform measurements on a seven-partite quantum state ρ. Both Boband Charlie have access to two parts of said state, respectively, butwhile Bob can perform correlated measurements on said systems,Charlie can only access them independently. The probabilities cor-responding to the respective outcomes are computed via the Bornrule (see Eq. (139)).

with interventions. Put differently, is there an underlying sta-tistical object that is independent of the made interventionsand can thus be considered the underlying process. While wewill discuss in detail that this is indeed the case, recalling theabove example of interventions that are used to unveil causalrelations between events already tells us why the answer willbe affirmative. Indeed, causal relations between events, andthe strength with which different events can potentially influ-ence each other are independent of what experimental inter-ventions are employed to probe them.

Interestingly, the breakdown of the requirements of theKET is closely related to the violation of Leggett-Garg in-equalities in quantum mechanics [222, 223], which, in brief,were derived to distinguish between the statistics of classicaland non-classical processes. These inequalities are derived onthe assumption of realism per se and non-invasive measura-bility. While realism per se implies that joint probability dis-tributions for a set of times can be expressed as marginals of arespective joint probability distribution for more times, non-invasiveness means that all finite distributions are marginalsof the same distribution. Naturally then, as soon as one ofthese conditions does not hold, the KET can fail and Leggett-Garg inequalities can be violated. More precisely, if one al-lows for active interventions in the classical setting, withoutany additional restrictions, then classical processes can ex-hibit exactly the same joint probability distributions as quan-tum mechanics [224] (this equivalence changes once one im-poses, for example, dimensional restrictions).

3. KET and spatial quantum states

Before finally advancing to quantum stochastic processes,it is instructive – as a preparation – to reconsider the conceptof states in quantum mechanics, in the context of measure-ments. To this end, consider the situation depicted in Fig-ure 14, where four parties (Alice, Bob, Charlie, and David)measure separate parts of a multipartite quantum state. Inthe general case, their measurements are given by Positiveoperator-valued measures (POVMs) denoted by JX , whereeach outcome j corresponds to a positive matrix Xj , and we

Page 34: arXiv:2012.01894v2 [quant-ph] 10 May 2021

34

have∑j Xj = 11. Then, according to the Born rule, probabili-ties for the measurements depicted in Figure 14 are computedvia

P(a, b, c, d∣JA,JB ,JC ,JD)= tr[ρ(Aa1 ⊗ 112 ⊗Bb34 ⊗ Cc5 ⊗ Cc6 ⊗Dd7)],

(139)

where ρ ∶= ρ1234567 is the probed multipartite state, andXam is the POVM operator for party X with outcome awhen measuring system m. We use the double subscript no-tation to label the operator index and the system at once. Theabove probability depends crucially on the respective POVMsthe parties use to probe their part of the state ρ. This depen-dence is denoted by making the probability contingent on theinstruments JX . As soon as ρ is known, all joint probabili-ties for all possible choices of instruments can be computedvia the above Born rule. In this sense, a quantum state repre-sents the maximal statistical information that can be inferredabout spatially separated measurements.

While, pictographically, Figure 14 appears to be a directquantum version of the classical stochastic processes we en-countered previously, there is a fundamental difference be-tween spatially and temporally separated measurements: Inthe spatial setting, none of the parties can signal to the others.For example, we have

∑c

P(a, b, c, d∣JA,JB ,JC ,JD) =

∑c′

P(a, b, c′, d∣JA,JB ,J ′C ,JD)

(140)

for all instruments. Put differently, the quantum state a subsetof parties sees is independent of the choice of instruments ofthe remaining parties. This is also mirrored by the fact that wemodel the respective measurement outcomes by POVM ele-ments, which make no assertion about how the state at handtransforms upon measurement.

On the other hand, the possible breakdown of the KETin quantum mechanics and classical processes with interven-tions shows that, in temporal processes, an instrument choiceat an earlier time can influence the statistics at later times.To accommodate for this kind of signaling between differentmeasurements, we will have to employ a more general de-scription of measurements, that accounts for the transforma-tions a quantum state undergoes upon measurement. How-ever, the general idea of how to describe temporal processescan be directly lifted from the spatial case: as soon as weknow how to compute the statistics for all sequences of mea-surements and all choices of (generalized) instruments, thereis nothing more that can be learned about the process at hand.Unsurprisingly then, we will recover a temporal version ofthe Born rule [225, 226], where the POVM elements are re-placed by more general completely positive (CP) maps, andthe spatial quantum state is replaced by a more general quan-tum comb that contains all detectable spatial and temporalcorrelations.

V. QUANTUM STOCHASTIC PROCESSES

In the last section, we saw various methods to look at two-time quantum correlations. While indispensable tools for thedescription of many experimental scenarios, these methodsare not well-suited to describe multi-time statistics, and assuch do not allow one to extend the notion of Markovianity –or absence thereof – to the quantum case in a way that boilsdown to the classical one in the correct limit. We now intro-duce tools that will allow us to consistently describe multi-time quantum correlations, independently of the choice ofmeasurement. Before doing this, it is worth elaborating on thesource of the troubles in way of a theory of quantum stochas-tic processes.

A. Subtleties of the quantum state and quantum measurement

Let us use the initial correlation problem in quantum me-chanics as an example. This problem has been fraught withcontroversies for decades now [163] as some researchers haveargued that, in presence of initial correlations, a dynamicalmap is not well defined [93], while others have argued to giveup complete positivity or linearity [92, 163]. What is the un-derlying reason for these disagreements? And does the sameproblem exist in classical mechanics?

The answer to this latter question is no. The crucial dif-ference being that it is possible to observe classical stateswithout disturbing the system, while the same cannot be saidfor quantum states. Consider a classical experiment thatstarts with an initial system-environment state that is cor-related between the system of interest and some environ-ment. The overall process takes the initial state Λ(t∶0) ∶P(S0E0) ↦ P(StEt). Of course, we can simply observe thesystem (without disturbing it) and measure the frequenciesfor S0 = sj ↦ St = sk. This is already enough to constructjoint distribution P(St, S0) and from it, we can construct astochastic matrix Γ(t∶0) that takes the initial system state tothe final state. In other words, the initial correlations pose noobstacles at all here. This should not be surprising, after all,a multi-time classical process will have system-environmentcorrelations at some point. And we have already argued thatit is always possible to construct a stochastic matrix betweenany two points.

If we try to repeat the same reconstruction process for thequantum case, we quickly run into trouble. Again, with-out any controversy, we can imagine that an initial system-environment quantum state is being transformed into a finalone, ρSE(0) ↦ ρSE(t). It may be then tempting to say thatwe can also have a transformation on the reduced state of thesystem ρS(0) ↦ ρS(t). However, we run into trouble assoon as we try to determine the process ρS(0) ↦ ρS(t). Inorder to do this, we need to relate different initial states andthe corresponding final states. Do we then mean that there isan initial set of states ρSE(0) and for each element of theset here we have a different initial state for the system, i.e.,ρS(0) = trE[ρSE(0)]? This is possible of course but re-quires knowledge about the environment, which goes againstthe spirit of the theory of open systems where we assume that

Page 35: arXiv:2012.01894v2 [quant-ph] 10 May 2021

35

the experimenter does not have control over the environmen-tal degrees of freedom.

Our problem is still more profound when we focus solelyon S. Suppose that the above setup is true and the initialsystem states ρS(0) are linearly independent, constitutingan input basis. But then in a given run of the experiment,how do we know which element of this set we have at ourdisposal? Quantum mechanics fundamentally forbids us tounambiguously discriminate a set of non-orthogonal states; ifthe set ρS(0) contains d2 linearly independent basis ele-ments, then at most d of them can be orthogonal. Therefore,quantum mechanics fundamentally forbids us from experi-mentally deducing the dynamical map when there are initialcorrelations!

This contextual nature of quantum states is the key subtletythat forces a fundamentally different structure for quantumstochastic processes than classical ones. Perhaps, a theoristmay be tempted to say that never mind the experiments andlet us construct the map with a theoretical calculation, i.e.,first, properly define the SE dynamics then infer the processon S alone. This is in fact what was done by many theo-rists in the past two decades. They asked what happened ifwe fixed the correlations in the initial state ρSE(0) and con-sider the family of ρS(0) states that are compatible with theformer, see Example 1 in Sec. IV D 1. Can we construct amap? These types of constructions are precisely what ledto not completely positive maps. However, do such calcu-lations have a correspondence with reality then [169]? Thereal source of the problem (in the technical sense) is that –as we have seen when we discussed the experimental recon-struction of quantum channels – we need an informationallycomplete set of initial states and corresponding final states tohave a well-defined map. For an experimenter, there is an‘easy’ solution. You simply go ahead and prepare the initialstate as desired, which can even be noisy [172]. Then let thisinitial state evolve and measure the corresponding final state.In fact, this is the only way, in quantum mechanics, to ensurethat we have a linearly independent set of input states, whoseoutput states are also accessible. Without preparation at theinitial time, we only have a single point in the state space,i.e., the reduced state ρS = trE(ρSE), and a map is only de-fined on a dense domain. However, in general, when thereare system-environment correlations, the preparation of inputstates will affect the state of the environment, which, in turn,will influence the subsequent dynamics, seemingly making itnon-linear in the sense that the dynamics itself depends on theinput state. See example two in Sec. IV D 1 and subsequentdiscussion.

This, in turn, raises the question whether finite set of suchexperiments have enough information to construct a well-defined dynamical map? And will this mapping be linear(and, in a sense to be defined below, CP and trace preserv-ing)? Somewhat surprisingly, despite all the apparent road-blocks we sketched above, the answer is yes! However, toachieve this goal, it is necessary to switch our understandingof what a dynamical map actually is when there are initialcorrelations. Giving away the punchline of the following sec-tions, here, it is not meaningful to define a mapping frominitial to final states, but rather from initial preparations to

final states. It is easy to show that there is only a finite num-ber of preparations that are linearly independent (for finite-dimensional systems). And therefore, any other preparationscan be expressed as a linear combination of a fixed set ofpreparations. Since for each initial preparation it is possibleto determine the final output state, very much in the same wayas we already saw for quantum channel tomography, one canunambiguously reconstruct a map that correctly maps all in-puts (here: the initial preparations) to the final output states.We will flesh out and extend these ideas in this and the fol-lowing sections.

First, we lay out the mathematical foundations for the no-tion of preparation, which is historically known as an instru-ment and which generalizes POVMs. With these tools, wewill show that the solution to the initial correlation problem iswell-defined, completely positive, and linear all at once [227].Moreover, this is then a pathway to laying down the founda-tions for quantum stochastic processes, since it will be di-rectly generalizable to multi-time scenarios.

B. Quantum measurement and instrument

As mentioned in Sec. IV E, unlike in the case of spatiallyseparate measurements, in the temporal case, it is importantto keep track of how the state of the system of interest changesupon being measured, as this change will influence the statis-tics of subsequent measurements. In order to take this into ac-count, we work with the concept of generalized instrumentsintroduced by Davies and Lewis [228]. This will both allowus to overcome the problems with initial correlations lined outabove, as well as provide a fully fledged theory of quantumstochastic processes that can account for intermediate mea-surements, and, as such, multi-time correlations.

To this end, first, recall the definition of a POVM providedin Sec. IV A 2. A POVM is a collection of positive matricesJ = Ejnj=1 with the property∑j Ej = 11. Each element ofJ corresponds to a possible outcome of the measurement. In-tuitively, a POVM allocates an operator to each measurementoutcome of the measurement device that allows one to com-pute outcome statistics for arbitrary quantum states that arebeing probed. However, it does not enable one to deduce howthe state changes upon observation of one of the outcomes.

To account for state changes, we have to modify the con-cept of a POVM; this generalization is known as a generalizedinstrument [229, 230]. As POVMs turn out to be a specialcase of (generalized) instruments, we will denote them by J ,too. An instrument corresponding to a measurement with out-comes j = 1, . . . , n is a collection of CP maps J = Ajthat add up to a CPTP map, i.e., A = ∑n

j=1 Aj . Each ofthe CP maps corresponds to one of the possible outcomes,while their sum corresponds to the overall transformation ofthe state at hand due to the application of the respective instru-ment (it is exactly the invasiveness of said map that leads to abreakdown of the KET in quantum mechanics). For example,returning to the case of a measurement of a qubit in the com-putational basis, the corresponding instrument is given by

A0[ r] ∶= ∣0⟩⟨0∣ r∣0⟩⟨0∣, A1[ r] ∶= ∣1⟩⟨1∣ r∣1⟩⟨1∣, (141)

Page 36: arXiv:2012.01894v2 [quant-ph] 10 May 2021

36

assuming, that after projecting the state onto the computa-tional basis, it is sent forward unchanged.

Importantly, an instrument allows one to compute both theprobability to obtain different outcomes and the state changeupon measurement. The latter is given by

ρ′j = Aj[ρ] (142)

when the system in state ρ is interrogated by the instrumentJ , yielding outcome j. The state after said interrogation,given the outcome, is obtained via the action of the corre-sponding element of the instrument. Importantly, this state isin general not normalized. Its trace provides the probabilityto observe a given outcome. Concretely, we have

P(j∣J ) = tr(Aj[ρ]) = tr(∑αj

KαjρK

†αj

), (143)

where the sum runs over all Kraus operators that pertain tothe CP map Aj , and we have ∑αj

K†αjKαj

< 11 if Aj isnot trace preserving. The requirement that all CP maps of aninstrument add up to a CPTP map ensures – just like in theanalogous case for POVMs – the normalization of probabili-ties:

n

∑j=1

P(j∣J ) = tr⎛⎜⎝

n

∑j=1

∑αj

KαjρK

†αj

⎞⎟⎠= tr(A[ρ]), (144)

which is 1 for all ρ. Naturally, the concept of generalizedinstruments contains POVMs as a special case, namely asthose generalized instruments, where the output space of therespective CP maps is trivial. Put differently, if one sim-ply wants to compute the probabilities of measurements ona quantum state, generalized instruments are not necessary.Concretely, we have

P(j∣J ) = tr(ρEj) with Ej =∑αj

K†αjKαj

. (145)

This is because for a single measurement, the state transfor-mation is not of interest. However, as we will see in the nextsection, this situation drastically changes, as soon as sequen-tial measurements are considered. There, POVMs are not suf-ficient anymore to correctly compute statistics.

Before advancing, it is insightful to make the connectionbetween the CP maps of an instrument and the elements ofits corresponding POVM explicit. This is most easily donevia the Choi states we introduced in Sec. IV B 3. There, wediscussed that the action of a map Aj on a state ρ can beexpressed as

ρ′j = Aj[ρ] = tri[AT

j (ρ⊗ 11o)], (146)

where Aj ∈ B(Hi⊗Ho) is the Choi state of the map Aj andρ ∈ B(Hi), and we have moved the transposition onto Ajinstead of ρ. Using this expression to compute probabilities,we obtain

P(j∣J ) = trio[ATj (ρ⊗ 11o)] = tr(Ejρ). (147)

Comparing this last expression with the Born rule, we seethat the POVM element Ej corresponding to Aj is given byEj = tro(AT

j ), where the additional transpose stems from ourdefinition of the CJI. This definition indeed yields a POVM,as the partial trace of a positive matrix is also positive, andwe have∑n

j=1Ej = ∑nj=1 tro(AT

j ) = 11, where we have usedthat the Choi state A of A satisfies tro(A) = 11. Discardingthe outputs of an instruments thus yields a POVM. This im-plies that different instruments can have the same correspond-ing POVM. For example, the instrument that measures in thecomputational basis and feeds forward the resulting state, hasthe same corresponding POVM as the instrument that mea-sures in the computational basis, but feeds forward a maxi-mally mixed state, indiscriminate of the outcome. While bothof these instruments lead to the same POVM, their influenceon future statistics is very different.

1. POVMs, Instruments, and probability spaces

Before advancing to the description of multi-time quan-tum processes, let us quickly connect POVMs and instru-ments to the more formal discussion of stochastic processeswe conducted earlier. The benefit of making this connectiontransparent is two-fold; on the one hand, it recalls the origi-nal ideas, stemming from the theory of probability measures,that led to their introduction in quantum mechanics. On theother hand, it renders the following discussions of quantumstochastic processes a natural extension of both of the con-cepts of instruments, as well as the theory of classical stochas-tic processes.

In the classical case, we described a probability space asa σ-algebra (where each observable outcome correspondsto an element of the σ-algebra) and a probability measureω that allocates a probability to each element of said σ-algebra. Without dwelling on the technical details (see, e.g.,Refs. [230, 231] for more thorough discussions), this defi-nition can be straightforwardly extended to POVMs and in-struments. However, instead of directly mapping observableoutcomes to probabilities, in quantum mechanics, we have tospecify how we probe the system at hand. Mathematically,this means that instead of mapping the elements of our σ-algebra to probabilities, we map them to positive operatorsvia a function ξ that satisfies the properties of a probabilitymeasure (hence the name positive operator-valued measure).For example, the POVM element corresponding to the unionof two disjoint elements of the σ-algebras is the sum of thetwo individual POVM elements, and so on. Together withthe Born rule, each POVM then leads to a distinct probabil-ity measure on the respective σ-algebra. Concretely, denotingthe Born rule corresponding to a state ρ by χρ[E] = tr(ρE),then ωρ = χρ ξ is a probability measure on the consideredσ-algebra.

For instruments, the above construction is analogous, butwith POVM elements replaced by CP maps. It is then a natu-ral step to assume that, in order to obtain probabilities, a gen-eralized Born rule [225, 226], that maps CP maps to the cor-responding probabilities. More generally yet, sequences ofmeasurement outcomes correspond to sequences of CP maps,

Page 37: arXiv:2012.01894v2 [quant-ph] 10 May 2021

37

and a full description of the process at hand would be givenby a mapping of such sequences to probabilities. In the nextsection, we will see that this reasoning indeed leads to a con-sistent description of quantum stochastic processes that – ad-ditionally – resolves the aforementioned problems, like, e.g.,the breakdown of the Kolmogorov extension theorem.

C. Initial correlations and complete positivity

With the introduction of the instrument, we are now in aposition to operationally resolve the initial correlation prob-lem alluded to above. Importantly, the resolution of this spe-cial case will directly point us in the right direction of how togeneralize stochastic processes to the quantum realm, whichis why we consider it first.

We begin with an initial system-environment quantum statethat is correlated. Now, in a meaningful experiment, that aimsto characterize the dynamics of the system from the initialtime to the final time, one will apply an instrument J = Ajon the system alone at the initial time to prepare it into aknown (desired) state. To be concrete, this instrument could,for example, be a measurement in the computational basis,such that each Aj is a trace non-increasing CP map with ac-tion Aj[ρ] = ⟨j∣ρ∣j⟩ ∣j⟩⟨j∣. Importantly, though, we imposeno limitation on the set of admissible instruments an experi-menter could use. Next, the total SE state propagates in timevia a map

U(t∶0)[ r] ∶= U(t∶0)( r)U †(t∶0). (148)

Note that, due to the dilation theorem in Sec. V D 7 we canalways take the system-environment propagator to be unitary.Taking the propagator to be a CPTP map would make no dif-ference at the system level. The full process can written downas

ρj(t) = trEU(t∶0) (Aj ⊗ I)[ρSE(0)]. (149)

Above I is the identity map on the E as the instrument actsonly on S. While perfectly correct, whenever ρSE(0) is notof product form, the above does not allow one to obtain a(physically meaningful) mapping that takes input states of thesystem and maps them to the corresponding output states at alater time.

Now, let us recall that a map (in quantum, classical, andbeyond physics) is nothing more than a relationship betweenexperimentally controllable inputs and measurabble outputs.Here, the inputs are the choice of the instrument J = Aj– which can be freely chosen by the experimenter – and thecorresponding outcome is ρS(t) – which can be determinedby means of quantum state tomography. Then, right away, bycombining everything that is unknown to the experimenter inEq. (149) into one object, we have the map

ρj(t) = T(t∶0)[Aj]. (150)

The map T(t∶0) was introduced in Ref. [227] and was referredto as the superchannel in Ref. [232], where it was first realized

experimentally. By comparing Eqs. (149) and (150), we seethat the action of the superchannel is given by

T(t∶0)[ r] = trEU(t∶0) ( r⊗ I)[ρSE(0)] , (151)

which is a linear map on the operations Ai it is defined on.While, in contrast to the case of quantum channels, T(t∶0) doesnot act on states but on operations, we emphasize the opera-tional similarities between quantum channels and superchan-nels. On the one hand, they are both ‘made up’ of all the partsof the evolution that are not directly accessible to the exper-imenter; the initial state of the environment and the unitarysystem-environment evolution in the case of quantum chan-nels, and the initial system-environment state and the unitarysystem-environment evolution in the case of superchannels.Additionally, they both constitute a mapping of what can befreely chosen by the experimenter to a later state of the sys-tem at time t. Unsurprisingly then, as we shall see below,quantum channels can be considered to be just a special caseof superchannels.

Ref. [227] proved that – besides being linear – the mapT(t∶0) is completely positive and trace-preserving (in a well-defined sense); and clearly, it is well-defined for any initialpreparation Aj . The trace-preservation property means that ifA is CPTP then the output will be unit-trace. See Ref. [233]for further discussion and theoretical development with re-spect to open system dynamics of initially correlated systems.

The meaning of complete positivity for this map is oper-ationally clear and very analogous to the case of quantumchannels; suppose the instrument J acts not only on the sys-tem S, but also on an ancilla. Then the superchannel’s com-plete positivity guarantees that the result of its action on anyCP map – which could be acting on the system and an addi-tional ancilla – is again a CP map (see Figure 15 for a graphi-cal depiction). We will not provide a direct proof of this state-ment here. However, as we will discuss below, it is easy to seethat T(t∶0) has a positive Choi state, which implies completepositivity in the above sense. Since the explicit computationof T(t∶0) from U(t∶0) and ρSE(0) requires – as for the case ofquantum channels – the choice of an explicit representation,we will also relegate it to Sec. V D 3, where we discuss Choistates of higher order quantum maps in more detail.

The ‘TP’ property of superchannels means that T(t∶0) mapsany trace preserving map A to a unit trace object. Indeed,with the Definition (151) of superchannels in mind, we seethat, since trE and U(t∶0) are trace preserving and ρSE(0) is aunit trace state, T(t∶0)[A] amounts to a concatenation of tracepreserving maps acting on a unit trace state, thus yielding aunit trace object. This, then, also implies that, whenever Aj

is trace non-increasing, ρj(t) = T(t∶0)[Aj] is subnormalizedand its trace amounts to the probability of the map Aj occur-ring. This is a simple consequence of the fact that T(t∶0) ismade up of trace preserving elements only, and, as we havediscussed around Eq. (143), the trace of the output of a CPmap yields its implementation probability.

Importantly, as mentioned, the superchannel is a higher-order map as its domain is the set of CP maps and the image isdensity operators. Clearly, this is different from the quantumstochastic matrix. In fact, the superchannel is the first stepbeyond two point quantum correlations. This is most easily

Page 38: arXiv:2012.01894v2 [quant-ph] 10 May 2021

38

Figure 15. Complete Positivity and Trace Preservation for Su-perchannels. A superchannel is said to be CP if it maps CP mapsto CP maps (even when acting on only a part of them), and we callit CPTP if it maps CPTP maps to CPTP maps. Here, T(t∶0) is CP(CPTP) if for all CP (CPTP) maps A and all possible ancilla sizes,the resulting map A′ is also CP (CPTP). Note that for the TP part, itis already sufficient that T(t∶0) maps all CPTP maps on the system toa unit trace object.

seen from its Choi state, which is a bounded operator on threeHilbert spaces: Υ(t∶0) ∈ B(Hi

0 ⊗Ho0 ⊗Hi

1) (details for con-structing the Choi state of higher order maps can be found be-low and in Sec. V D 3). Moreover, the superchannel contains‘normal’ quantum channels as a limiting case: when there areno initial correlations, i.e., ρSE(0) = ρS(0) ⊗ ρE(0), thenthe superchannel reduces to the usual CPTP map:

T(t∶0)[Aj] = (E(t∶0) Aj)[ρS(0)], whereE(t∶0)[ r] = trEU(t∶0)[ r⊗ ρE],

(152)

which can be seen by direct insertion of the product state as-sumption into Eq. (151).

The superchannel is a primitive for constructing the de-scriptor of quantum stochastic processes. As such, it shouldbe operationally accessible via a set of experiments, in thesame vein as quantum channels are experimentally recon-structable. Somewhat unsurprisingly, the reconstruction pro-cedure for superchannels works in a similar way as that forquantum channels; the input of the superchannel, CP maps,span a vector space that has a basis consisting of CP maps.This means that the superchannel is fully determined by itsaction on the CP maps Aj that form a linear basis. Con-cretely, let the output states corresponding to this basis of in-put operations be

T(t∶0)[Aj] = ρj(t). (153)

Now, this informationally complete input-output relation canbe used to represent the superchannel T(t∶0). As was alreadythe case for channels, this can be done in terms of duals, butthis time not in terms of duals for a set of input states, buta set of input operations Aj. While there is no concep-tual problem with duals of maps, let us avoid this additionallevel of abstraction. Rather, here, we opt to directly choosethe Choi state representation of the superchannel, in the samespirit as Eq. (97). To this end, let Aj be the Choi states ofthe maps Aj, and let Dj be the corresponding set of dualmatrices, i.e., tr(D†

jAk) = δjk. Then, the Choi state of T(t∶0)

can be written as

ΥT(t∶0) =∑j

ρj(t)⊗ D∗j , (154)

and its action on an arbitrary map A is given by

T(t∶0)[A] = tri[(11o ⊗ AT)ΥT(t∶0)] (155)

where A is the Choi state of the map A. While somewhatmore complex than in the case of quantum channels, this formshould not come as a surprise; indeed, it simply expressesa linear input-output relation. Eq. (154) ‘attaches’ the cor-rect output state to each dual of a basis map, and Eq. (155)guarantees that the action of T(t∶0) is properly defined on allbasis elements, and thus on all maps A. The fact that wewent via the Choi representation of the maps is then rather amathematical convenience than a conceptual leap. Below, wewill discuss Choi states of higher-order maps in more detail,and also argue why the above object ΥT(t∶0) can rightfully becalled a Choi state of T(t∶0). For the moment, let us empha-size once again that ΥT(t∶0) together with the action given byEq. (155) yields the correct output state for any input opera-tion A; any CP map can be cast as a linear sum of the basismaps as A = ∑j αjAj . The action of T(t∶0) defined aboveyields the correct output state for a basis Aj, since

T(t∶0)[Aj] =d2

∑k=1

tri[(11o ⊗ ATj )(ρk(t)⊗ D

∗k)]

=

d2

∑k=1

ρk(t)tr(D†kAj) = ρj(t),

(156)

where we have used the duality of D†k and Aj . Consequently,

the superchannel defined in this way indeed provides the cor-rect mapping on on all conceivable CP maps A.

Example. Armed with this new operational understandingof open system dynamics, we now revisit the example fromSec. IV D 1, where we discussed open system dynamics inthe presence of initial correlations. Previously, we saw thatthe usual CPTP map fails to describe this process. Now, aspromised, we will see that the superchannel can describe thisprocess adequately. To do so, let us first write down linearbasis of CP maps on a qubit as

Aij[ r] ∶= ∣πi⟩⟨πj∣( r)∣πj⟩⟨πi∣ (157)

where ∣πi⟩ , ∣πj⟩ ∈ ∣x+⟩ , ∣y+⟩ , ∣z+⟩ , ∣x−⟩. Intuitively,each of the maps Aij performs a projective measurement and,depending on the outcome, feeds forward a different state.Since the corresponding pure states form a linear basis on thematrix space B(H), their cross-combination forms a basis onthe instrument space (we will discuss the necessary numberof basis elements in more detail below). The correspondingChoi states are given by Aij = ∣πi⟩⟨πi∣ ⊗ ∣πj⟩⟨πj∣, withcorresponding duals Dk` = Dk⊗D` (given in Eq. (71)). Us-ing these instruments we can tomographically construct thesuperchannel according to Eq. (154) by computing the cor-responding output states ρij(t) for this scenario. Doing this,we obtain:

Page 39: arXiv:2012.01894v2 [quant-ph] 10 May 2021

39

ΥTt∶0=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a+3C

2+

20 0 a

+3 c

−igs2ω+a−C

2+

20 0 −gcωsω + a

−c2ω

0 a+3C

2−

20 0 0 ig+a−

2s

2ω 0 0

0 0 a+3C

2−

20 0 0 −ig+a−

2s

2ω 0

a+3 c

2ω 0 0 a

+3C

2+

2gcωsω + a

−c2ω 0 0 igs

2ω+a

−C

2+

2

igs2ω+a

+C

2+

20 0 gcωsω + a

+c2ω

a−3C

2+

20 0 a

−3 c

0 −ig+a+

2s

2ω 0 0 0 a

−3C

2−

20 0

0 0 ig+a+

2s

2ω 0 0 0 a

−3C

2−

20

−gcωsω + a+c2ω 0 0 −igs2ω+a

+C

2+

2a−3 c

2ω 0 0 a

−3C

2+

2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

(158)

where cω = cos(2ωt), sω = sin(2ωt), C2± = 1 ± c

2ω ,

a±3 = 1 ± a3, a+ = a1 + ia2, a− = a1 − ia2, and g is the

correlation coefficient. Importantly, the above matrix is pos-itive semidefinite, making – as we shall see when we discusshigher order quantum maps in more generality – the corre-sponding superchannel a completely positive map. Addition-ally, the above procedure is fully operational; the resultingT(t∶0) is independent of the respective maps A)ij , and, oncereconstructed, can be applied to any preparation A to yieldthe correct output state. This also implies that the superchan-nel is constructed with finite number of experiments. Con-sequently, the examples in Sec. IV D 1 are all contained hereas limiting cases. Finally, while a CPTP map acting on a d-dimensional system would be represented by d2 × d2 matrix.The superchannel, on the other hand, is a d3 × d3 matrix thatcontracts with CP map to yield an output that is d× d matrix.

In fact, the superchannel has been observed in the labo-ratory [232] and proven to be effective at dealing with ini-tial correlations without giving up either linearity or completepositivity. One then might wonder how does this get aroundPechukas’ theorem? To retain both linearity and completepositivity we have given up the notion of the initial state. Infact, as we argued in Sec. V A, in presence of correlation,quantum mechanics does not allow for a well-defined localstate beyond a singular point in the Hilbert space. Therefore amap on this singular point alone is not very much meaningful,hence there is no big loss in giving up the notion of the ini-tial state as a relevant concept for the dynamics.[234] Finally,it should be said that this line of reasoning is very close tothat of Pearl [221] in classical causal modeling, which goesbeyond the framework of classical stochastic processes andallows for interventions.

At this point, it is insightful to quickly take stock of whatwe have achieved in this section, what we have implicitly as-sumed, and how to generalize these ideas to finally obtaina fully-fledged description of quantum stochastic processes.

Firstly, in order to deal with initial correlations, we haveswitched perspective and described the dynamics in terms ofa mapping from initial operations (instead of initial states) tofinal states at time t. While seemingly odd from a mathemati-cal perspective, from the operational perspective it is only rea-sonable: experimentally meaningful maps should map frominitial objects that can independently be controlled/preparedby the experimenter to objects that can be measured. In thepresence of initial system-environment correlations, the ex-perimenter does not have control over the initial state – atleast not without potentially influencing the remaining pa-rameters of the dynamics, i.e., the correlations with the en-vironment). However, they have control over what operationthey implemented, and the dynamics from those operationsto the final states can be defined and reconstructed, and isfully independent of what the experimenter does (in the sensethat T(t∶0) is independent of the experimenter’s action. Con-sequently, switching perspective as we have done is simplynatural from an operational point of view.

Since the superchannel already deals with ‘intermediate’CP maps performed by the experimenter, it also directlypoints out how to go beyond experimental scenarios wherethe experimenter only acts at two times; in principle, noth-ing keeps us from also performing CP maps at intermediatetimes, and then reconstructing the final state for sequences ofCP maps, instead of only one CP map, as we have done here.It should not come as a surprise that this is exactly what weare going to do in the next section.

It remains to quickly comment on the mathematical detailsthat we deliberately brushed over in this section. Naturally, tomake things simpler, we have chosen the most insightful rep-resentation of the superchannel in terms of a d3 × d3. Unsur-prisingly, there is also a vectorized version of the superchan-nel [72], or we could have kept things entirely abstract andphrase everything in terms of maps acting on maps. Again,we emphasize that representation has no bearing on physi-cal properties, but employing the represenation we chose will

Page 40: arXiv:2012.01894v2 [quant-ph] 10 May 2021

40

Figure 16. General quantum stochastic process. System of in-terest is coupled to an unknown environment and probed at timesTk = t0, t1, . . . , tk with corresponding CP maps AxTk

=

Ax0,Ax1

, . . . ,Axk. In between measurements, the system and

the environment together undergo closed, i.e., unitary dynamics.The corresponding multi-time joint probabilities can be computedby means of the process tensor corresponding to the process at hand(depicted by the grey dotted outline).

prove very advantageous; for example, it allows us to easilyderive the dimension of the spaces we work with, as well asexpress the properties of higher-order quantum maps in a con-cise way. Additionally, as was the case for quantum channels,we will see that this representation indeed has an interpreta-tion in terms of a (multipartite) quantum state, which is whywe already called it the Choi representation throughout thissection.

D. Multi-time statistics in quantum processes

Following the above resolution for the initial correlationproblem in quantum mechanics, we are now in a positionto provide a fully-fledged framework for the description ofmulti-time quantum processes. Here, we predominantly fo-cus on the case of finitely many times at which the processof interest is interrogated (for an in-depth discussion of con-tinuous measurements, see, for example, Refs. [235, 236]).Note that here, we can but scratch the surface of the differentapproaches that exist to the theory of multi-time quantum pro-cesses. For a much more in-depth investigation of the relationbetween different concepts of memory in quantum physics,see Ref. [10].

In principle, there are two ways to motivate this frame-work. On the one hand, by generalizing joint probabilities,the descriptor of classical stochastic processes, to the quan-tum realm, and taking into consideration that, in quantum me-chanics, we have to specify the instruments that were used tointerrogate the system. This approach would then yield a tem-poral Born rule [225, 226], and provide a natural descriptorof quantum stochastic processes in terms of a ‘quantum stateover time’. We will circle back to this approach below. Here,we shall take the second possible route to the description ofmulti-time open quantum processes, which – just like in thecase of initial correlations – is motivated by considering theunderlying dynamics of a quantum stochastic process. As weshall see, though, both approaches are equivalent and lead tothe same descriptor of quantum stochastic processes.

As we have seen, the initial correlation problem was solvedby taking the preparation procedure into account, and to con-struct a consistent mapping of the preparation operations tofinal states. To obtain a consistent description of a multi-time

process, consider – as before – a system of interest S coupledto an environment E. Initially, the joint system-environment(SE) is in state ρSE(0) which might be correlated. Together,we consider SE to be closed, such that, between any twotimes, the system-environment state evolves unitarily – de-scribed by the unitary map

ρSE(tj+1) = U(j+1∶j)[ρSE(tj)] =∶ Uj[ρSE(tj)]. (159)

For brevity we have contracted the subscript on U . Next, inorder to minimize notational clutter we define several sets

Tk ∶= t0, t1, . . . , tk−1, tk (160)JTk

∶= J0,J1, . . . ,Jk−1,Jk (161)xTk

∶= x0, x1, . . . , xk−1, xk (162)AxTk

∶= Ax0,Ax1

, . . . ,Axk−1,Axk

. (163)

The first set, Tk is the set of times on which the process isdefined. At these times the system S is interrogated with a setof instruments JTk

yielding a set of outcomes xTk. The set

of outcomes corresponds to a set of CP map AxTk. Note, that

while we have let the instruments at each time be independentof each other, we can also allow for correlated instruments,also known as testers, see Sec V D 6.

Now, in clear analogy to both the classical case, as wellas the quantum case with initial correlations, we envisionan experimenter that probes the system of interest at timesTk by means of instruments JTk

and we are interested in aconsistent descriptor of this experimental situation. For ex-ample, they could perform measurements in the computa-tional basis, such that each outcome xj at a time tj wouldcorrespond to the (trace non-increasing) transformation ρ ↦⟨xj∣ρ∣xj⟩ ∣xj⟩⟨xj∣. However, importantly, we do not limitthe set of allowed operations in any way, shape, or form (be-sides them being trace non-increasing CP maps). The overallsystem-environment dynamics is thus a sequence of unitarymaps on the system and the environment, interspersed by CPmaps that act on the system alone, each of them correspond-ing to a measurement outcome; this is shown in Figure 16.This continues until a final intervention at tk, and then theenvironmental degrees of freedom are discarded. We empha-size that, as we do not limit or specify the size of the envi-ronment E, this setup is fully general; as we outlined above,due to the Stinespring dilation, any quantum evolution be-tween two points in time can be understood as a unitary evo-lution on a larger space. As such, our envisioned setup is themost general description of the evolution of an open quan-tum system that is probed at times Tk. We will see belowthat this statement even holds in more generality: there areno conceivable quantum stochastic processes that cannot berepresented in the above way, as sequences of unitaries on asystem-environment space, interspersed by CP maps that acton the system alone.

The probability to observe a sequence of quantum events,i.e., the outcomes xTk

corresponding CP to maps AxTk, can

then be straightforwardly computed via

P(xTk∣JTk

) = trAxkk−1j=0Uj Axj

[ρSE(0)]. (164)

Above, denotes the composition of maps, the maps A acton S alone, while the maps U act on SE, but we have omitted

Page 41: arXiv:2012.01894v2 [quant-ph] 10 May 2021

41

I on E for brevity. This last equation is just quantum me-chanics, as well as simply a multi-time version of Eq. (149),which defines the superchannel. Of course, the challenge is tonow turn this equation into a clear descriptor for a multi-timequantum process.

This can be done by noting that the above expression is amulti-linear map with respect to the maps Axk

[237]. This issimilar to the superchannel case we discussed in the previoussection, which was linear with respect to the preparation mapsAj . It is then possible to write Eq. (164) as a multi-linearfunctional TTk

, which we call the process tensor:

P(xTk∣JTk

) =∶ TTk[AxTk

]. (165)

While seemingly a mere mathematical rearrangement, theabove description of an open system dynamics in terms of theprocess tensor [237–239] TTk

is of conceptual relevance; itallows one to separate the parts of the dynamics that are con-trolled by the experimenter, i.e., the maps AxTk

from the un-known and inaccessible parts of the dynamics, i.e., the initialsystem-environment state and the system-environment inter-actions. This clean separation means that when we speak of aquantum stochastic process we only need to refer to TTk

, andthen for any choice of instrument, we can compute the prob-ability for the sequence of outcomes by means of Eq. (165).As already mentioned in the discussion of superchannels, thisis akin to the well-known case of quantum channels, wherewe separate the part of the process that cannot be controlled– that is, the quantum channel E from the parts of the pro-cess that are controlled by the experimenter – that is, the ini-tial system. Here, while the respective objects are somewhatmore involved, the underlying idea is exactly the same. Con-sequently, TTk

is the clear generalization of the superchannelT(t∶0), which is, in turn, is a generalization of CPTP maps asdiscussed in Sec V C.

Moreover, this separation will later help us to resolve theaforementioned issues with the KET in quantum mechanics,where, apparently, the possible invasiveness of measurementsprevented a consistent description of quantum stochastic pro-cesses. This will be possible because TTk

does not depend onthe maps AxTk

, and as such provides a description of openquantum system dynamics that is independent of the way inwhich the process at hand is probed.

We now discuss several key properties of the process ten-sor. To remain close the classical case in spirit, we will focuson probabilities, i.e, understand TTk

as a mapping that allo-cates the correct probability to any sequence of measurementoutcomes for a given given choice of instruments. At firstglance, this is not in line with the superchannel, which con-stituted a mapping from CP maps to final states. However,we could also understand the process tensor as a mappingfrom operations to a final state at time tk; as it can act on allsequences of CP maps, one can choose to not apply an instru-ment at the last time tk. Consequently, TTk

allows for theconstruction of a related map

TTk[AxTk−1

] = ρ(tk∣xTk−1,JTk−1

) (166)

whose output is a quantum state at tk conditioned on the se-quence of CP maps AxTk

at times Tk. Often, in what fol-

lows, we will not explicitly distinguish between process ten-sors that return probabilities and those that return states, andthe respective case will either be clear from context, or ir-relevant for the point we aim to make. With this somewhattechnical point out of the way, let us now become more con-crete and discuss both the experimental reconstruction as wellas the representation of process tensors. Unsurprisingly, wecan directly generalize the ideas we developed above to themulti-time case.

1. Linearity and tomography

As mentioned above, TTkis a multi-linear functional on se-

quences of CP maps AxTk. Consequently, once all the prob-

abilities for the occurrence of a basis of such sequences areknown, the full process tensor is determined. This is anal-ogous to the classical case, where the full process at handwas completely characterized once all joint probabilities forall possible combinations of different outcomes was known.Here, the only difference is that different measurements canbe applied at each point in time, making the reconstruction alittle bit more cumbersome. As the space of sequences of CPmaps is finite-dimensional (for d < ∞), TTk

can be recon-structed in a finite number of experiments, in a similar veinto the reconstruction of quantum channels and superchannelsdiscussed above. The instrument Jtj at any time tj is a set ofCP maps

Axj∶ B(Hi

j )→ B(Hoj ). (167)

The space spanned by such CP maps, i.e., the space thatcontains all maps of the form ∑xj

cxjAxj

is (djidjo)2-dimensional since it is – as we have seen in our discussionof the CJI in Sec. IV B 3 – isomorphic to the matrix spaceB(Ho

j ⊗Hij ) (in what follows, we will assume dji = djo =

d). Since we can choose an instrument at each time indepen-dently of other times, we can form a multi-time basis consist-ing of basis elements at each time, which forms a linear basison all times Tk (in the same way as the basis of a multipar-tite quantum system can be constructed from combinations oflocal basis elements):

JTk= AxTk

∶= Axk,⋯, Ax0

d4

xj=1. (168)

Importantly, any other sequence of operations (and also tem-porally correlated ones, see below) can be written as a linearcombination of such a complete set of basis operations.

The action of the process tensor on the multi-time basisgives us the probability to observe the sequence xTk

as

P(xTk∣JTk

) ∶= TTk[AxTk

]. (169)

From our discussion of the reconstruction of the super-channel, reconstructing the multi-time object TTk

is now astraightforward endeavour, which, again, we will carry outusing Choi states. To this end, we note that, since all op-erations performed at different times are uncorrelated, their

Page 42: arXiv:2012.01894v2 [quant-ph] 10 May 2021

42

overall Choi state AxTkis simply a tensor product of the Choi

states of the individual operations, i.e.,

AxTk= Axk

⊗ Axk−1⊗ ⋅ ⋅ ⋅ ⊗ Ax1

⊗ Ax0, (170)

The Choi state of the process tensor can then be written – inthe same spirit as Eq. (97) – as

ΥTk= ∑

xTk

P(xTk∣JTk

) D∗xk⊗⋯⊗ D

∗x0

(171)

with the action of the process tensor given by

TTk[A ] = tr[AT

xTkΥTk

]. (172)

Here, again, Dxk forms the dual basis to the Choi

states of the basis operations Axk at each time tk, i.e.,

tr[D†xiAxj

] = δij .Again, by construction, the process tensor above yields

the correct probabilities for any of the basis sequences inJTk

(which can be seen by direct insertion of Eq. (171) intoEq. (172)):

TTk[AxTk

] = ∑x′Tk

tr[(ATxk⊗ ⋅ ⋅ ⋅ ⊗ A

Tx0)

⋅ (P(x′Tk∣JTk

) D∗x′k⊗⋯⊗ D

∗x′0)]

= ∑x′Tk

P(x′Tk∣JTk

)tr(D†x′kAxk

)⋯tr(D†x′0Ax0

)

= P(xTk∣JTk

),(173)

thus yielding the correct probability for any basis sequenceof measurements, implying that it yields the correct proba-bility for any conceivable operation on the set of times Tk.In order to reconstruct a process tensor on times Tk, an ex-perimenter would hence have to probe the process using in-formationally complete instruments Jj – in the sense thatits elements span the whole space of CP maps. More con-cretely, the duals Dxj

can be computed, the joint probabil-ities P(xTk

∣JTk) can be measured, and Eq. (171) tells us

how to combine them to yield the correct matrix ΥTk(see

below and Refs. [237, 240] for details on the reconstructionof ΥTk

). This reconstruction also applies in case that the ex-perimenter does not have access to informationally completeinstruments, yielding a ‘restricted’ process tensor [215, 240],that only meaningfully applies to operations that lie in thespan of those that can be implemented.

While the number of necessary sequences for the recon-struction of a process tensor scales exponentially with thenumber of times (if there are N times, then there are d4N

different sequences, for which the probabilities would haveto be determined), the number is still finite, and thus, in prin-ciple, feasible. We note that classical processes are scaledby a similar exponential scaling problem. If there are d dif-ferent outcomes at each time, then the joint probabilities fordN different sequences of N outcomes. Let us now discuss

some concrete properties and interpretations of these aboveconsiderations.

2. Spatiotemporal Born rule and the link product

As before, let

AxTk= Axk

⊗ Axk−1⊗ ⋅ ⋅ ⋅ ⊗ Ax1

⊗ Ax0, (174)

be a set of Choi states corresponding to a sequence of inde-pendent CP maps. Then, as we have seen, the probability toobtain this sequence is given by

P(xTk∣JTk

) = tr[ΥTkA

TxTk

], (175)

where ΥTkis the Choi state of TTk

(see below for a discus-sion as to why it actually constitutes a Choi state). The ad-vantage of representing the process tensor by its action in thisway is two-fold. On the one hand, all objects are now ratherconcrete (and not abstract maps), and we can easily talk abouttheir properties (see below). On the other hand, the fact thatΥTk

is a matrix and not an abstract map will allow us to freelytalk about temporal correlations in quantum mechanics in thesame way that we do in the spatial setting.

Additionally, the above Eq. (175) constitutes a multi-timegeneralization of the Born rule [225, 226], where ΥTk

playsthe role of a quantum state over time, and the Choi statesAxTk

play a role that is analogous to that of POVM elementsin the spatial setting. In principle, ΥTk

can be computed fromthe underlying dynamics by means of the link product ⋆ de-fined in Ref. [241] as

ΥTk= trE[Uk ⋆ ⋅ ⋅ ⋅ ⋆ U0 ⋆ ρSE(0)]. (176)

Here, Uj is the Choi state of the map Uj and the link productacts like a matrix product on the space E and a tensor prod-uct on space S. Basically, the link product translates con-catenation of maps onto their corresponding Choi matrices,i.e., if A and C are the Choi states A and C, respectively,then D = C ⋆ A is the Choi state of D = C A. We willnot employ the link product frequently in this Tutorial, butwill quickly provide its definition and motivation here (see,for example, Ref. [241] for more details). Concretely, asan exemplary case, let A ∶ B(H1) → B(H3 ⊗ H4) andC ∶ B(H4⊗H5)→ B(H6), then D = CA ∶ B(H1⊗H5)→B(H3⊗H6). Correspondingly, for the respective Choi stateswe have C ∈ B(H4 ⊗H5 ⊗H6), A ∈ B(H1 ⊗H3 ⊗H4),and D ∈ B(H1 ⊗ H5 ⊗ H3 ⊗ H6). Using Eq. (95), onecan rewrite the action of the resulting map D on an arbitrarymatrix ρ ∈ B(H1 ⊗ H5)) in terms of its Choi state, whichyields

D[ρ] = tr15[D(ρT⊗ 1136)] =∶ tr15[(C ⋆ A)(ρT

⊗ 1136)] ,(177)

where 1136 is the identity matrix on H3⊗H6. Now, using theabove equation, one can directly read off the form of C⋆A as

C ⋆ A = tr4[(C⊗ 1113)(AT4 ⊗ 1156)] . (178)

The derivation of the above relation from Eq. (176) isstraightforward but somewhat lenghty and left as an exer-cise to the reader. Intuitively, the above tells us that the link

Page 43: arXiv:2012.01894v2 [quant-ph] 10 May 2021

43

product between two matrices consists of i) tensoring bothmatrices with identity matrices so that they live on the samespace, ii) partially transposing one of the matrices with re-spect to the spaces both of the matrices share, and iii) takingthe trace of the product of the obtained objects with respect tothe spaces both of the matrices share. This recipe holds for allconceivable situations where the Choi matrix of the concate-nation of maps is to be computed. As mentioned, we will notmake much use of the link product here (with the exception ofSec. VI B 1), but it can be very convenient when working outthe Choi states of higher order quantum maps like the pro-cess tensor. Let us mention in passing that the link producthas many appealing properties, like, for example, commuta-tivity (for all intents and purposes [241]) and associativity,which allows us to write the Choi state ΥTk

in Eq. (176) as amulti-link product without having to care about in what orderwe carry out the ‘multiplication’ ⋆.

As it will turn out, ΥTkis a many-body density matrix

(up to a normalization), therefore constituting a very natu-ral generalization for a classical stochastic process which isa joint probability distribution over many random variables.Since it allows the compact phrasing of many of the subse-quent results, we will often opt for a representation of TTk

interms of its Choi matrix ΥTk

in what follows (there, we willalso see why it is justified to dub it a Choi matrix), and wewill, for simplicity, often call both of them the process tensor.Nonetheless, for better accessibility, we will also express ourresults in terms of maps whenever appropriate.

Before advancing, let us recapitulate what has beenachieved by introducing the process tensor for the descriptionof general quantum processes. First, the effects on the systemdue to interaction with the environment have been isolatedin the process tensor ΥTk

. All of the details of the instru-ments and their outcomes are encapsulated in AxTk

, whileall inaccessible effects and influences are contained in theprocess tensor. In this way, ΥTk

is a complete representa-tion of the stochastic quantum process, containing all accessi-ble multi-time correlations [242–245].[246] The process tensorcan be formally shown to be the quantum generalization of aclassical stochastic process [218], and it reduces to classicalstochastic process in the correct limit [218, 247, 248] (we willget back to this point below).

We emphasize that open quantum system dynamics is notthe only field of physics where an object like the processtensor (or variants thereof) crop up naturally. See, for ex-ample Refs. [239, 241, 249–261] for an incomplete collec-tion of works where similar mathematical objects have beenused for the study of higher-order quantum maps, causalautomata/non-anticipatory channels, quantum networks withmodular elements, quantum information in general rela-tivistic space-time, quantum causal modeling, and quantumgames (see also Table I). In open quantum system dynamics,they have been used in the disguise of so-called correlationkernels already in early works on multi-time quantum pro-cesses [230, 231, 262].

Name Application

QuantumInformation

Quantum comb/Causal box

Quantum circuitarchitecture

Open Quantumsystem dynamics

Correlation kernel/Process tensor

Study oftemporal

correlations

QuantumGames Strategy

Computation ofwinning probabilities

QuantumCausality Process matrix

Processes withoutdefinitive causal

order

Quantum causalmodelling Process matrix

Causal relationsin quantumprocesses

QuantumShannon Theory

Causalautomaton/

non-anticipatorychannel

Quantum channelswith memory

Table I. ‘Process tensors’ in different fields of quantum mechan-ics. Mathematical objects that are similar in spirit to the processtensor crop up frequently in quantum mechanics. The above tableis an incomplete list of the respective fields and commonly usednames. Note that, even within these fields, the respective namesand concrete applications differ. Additionally, some of the objectsthat occur on the above list might have slightly different propertiesthan the process tensor (for example, process matrices do not haveto display a global causal order), and might look very different thanthe process tensor (for example, it is not obvious that the correlationkernels used in open quantum system dynamics are indeed variantsof process tensors in disguise). These disparities notwithstanding,the objects in the above table are close both in spirit, as well as therelated mathematical framework.

3. Many-body Choi state

While we now know how to experimentally reconstructit, it remains to provide a physical interpretation for ΥTk

,and discuss its properties (and justify why we called it theChoi matrix of TTk

above). We start with the former. Forthe case of quantum channels, the interpretation of the Choistate ΥE is clear; it is the state that results from letting Eact on half of an unnormalized maximally entangled state.ΥE then contains exactly the same information as the origi-nal map E . Somewhat unsurprisingly, in the multi-time case,the CJI is similar to the two-time scenario of quantum chan-nels. Here, however, instead of feeding one half of a (unnor-malized) maximally entangled state into the process once, wehave to do so at each time in Tk (see Figure 17 for a graphicalrepresentation). From Eq. (175), we see that ΥTk

must be anelement of B(Hi

k ⊗ Hok−1 ⊗⋯ ⊗ Ho

0 ⊗ Hi0). Labeling the

maximally entangled states in Figure 17 diligently, and dis-tinguishing between input and output spaces, we see that theresulting state ΥTk

lives on exactly the right space. Check-ing that the matrix ΥTk

constructed in this way indeed yieldsthe correct process tensor can be seen by direct insertion. In-

Page 44: arXiv:2012.01894v2 [quant-ph] 10 May 2021

44

Figure 17. Choi state of a process tensor. At each time, half of anunnormalized maximally entangled state is fed into the process. Forbetter book-keeping, all spaces are labeled by their respective time.The resulting many-body state ΥTk

contains all spatio-temporalcorrelations of the corresponding process as spatial correlations.

deed, by using the Choi state of Figure 17 and the definitionof the Choi state Axj

, one sees that Eq. (175) holds. Whilestraightforward, this derivation is somewhat arduous and leftas an exercise to the reader.

We thus see that ΥTkis proportional to a many-body quan-

tum state, and the spatio-temporal correlations of the under-lying process are mapped onto spatial correlations of ΥTk

viathe CJI and each time corresponds to two Hilbert spaces (onefor the input space, and one for the output space). Specif-ically, statements like ‘correlations between different time’now translate to statements about correlations between differ-ent Hilbert spaces the state ΥTk

is defined on. These prop-erties lend themselves to convenient methods for treating amulti-time process as a many-body state with applications forefficient simulations and learning the process [263–267].

Additionally, the CJI for quantum channels as well as su-perchannels are simply special cases of the more general CJIpresented here. We emphasize that, with this, expressing theaction of a process tensor in terms of matrices has becomemore than just a convenient trick. Knowing that ΥTk

is (pro-portional to) a quantum state tells us straight away that it ispositive, and all spatio-temporal correlations present in theprocess can now be conveniently be understood as spatial cor-relations in the state ΥTk

. This convenience is the main rea-son why most of our results will be phrased in terms of Choimatrices in what follows.

4. Complete positivity and trace preservation

Just like for the case of quantum channels, the propertiesof a multi-time process can be most easily read off its Choistate. First, as we have seen above, ΥTk

is positive. Like inthe case of channels, and superchannels, this property impliescomplete positivity of the process at hand. As was the casefor superchannels, complete positivity here has a particular

Figure 18. Trace conditions on process tensors. Displayed is thepertinent part of Figure 17. As tr U = tr for all CPTP mapsU , tracing out the final degree of freedom of ΥTk

, denoted by ki,amounts to a partial trace of Φ

+k−1o . This, in turn, yields a tensor

product between 11k−1i and a process tensor on one step less. As inFigure 17, the swap operation is represented by a vertical line withcrosses at its ends.

meaning: let the process act on any sequence of CP maps

BTk= Bx0

,Bx1, . . . ,Bxk−1

,Bxk

BxTk∶= Bxk

⊗ Bxk−1⋯⊗ Bx1

⊗ Bx0,

(179)

where Bx is the Choi state of Bx. These superoperators actboth on the S of interest, as well as some external ancillas,which we collectively denote byB, which do not interact withthe environment E that is part of the process tensor. We cansee the complete positivity of the process tensor directly interms of the positivity of the process’ Choi state

tr[(ΥTk⊗ 11) BT

xTk] ≥ 0. (180)

Above ΥTkacts on S at times Tk and 11B is the identity ma-

trix on the ancillary degrees of freedom B. As the positivityof the Choi state implies complete positivity of the underlyingmap, any sequence BxTk

of CP maps is mapped to a CP mapby TTk

. Analogously, we could have expressed the aboveequation in terms of maps, yielding

(TTk⊗ IB)[Bx0

,Bx1, . . . ,Bxk−1

,Bxk] is CP. (181)

However, as mentioned, the properties of process tensors aremuch more easily represented in terms of their Choi matrices.

In clear analogy to the case of quantum channels, processtensors should also satisfy a property akin to trace preser-vation. At its core, trace preservation is a statement aboutnormalization of probabilities. As CPTP maps can be im-plemented with unit probability, at first glance, the naturalgeneralization of trace preservation thus appears to be

tr[ΥTTk

(Ak ⊗⋯⊗ A0)] = 1 (182)

for all CPTP maps A0, . . . ,Ak. However, this requirementon its own is too weak, as it does not encapsulate the tempo-ral ordering of the process at hand [257]. If only the aboverequirement was fulfilled, then, actions at a time tj could inprinciple influence the statistics at an earlier time t′j < tj .This should be forbidden by causality, though. Fortunately,ΥTk

already encapsulates the causal ordering of the underly-ing process by construction. Specifically, tracing over the de-grees of freedom of ΥTk

that correspond to the last time (i.e.,

Page 45: arXiv:2012.01894v2 [quant-ph] 10 May 2021

45

the degrees of freedom labeled by ki in Figure 17) yields

trkiΥTk= 11k−1o ⊗ΥTk−1

, (183)

where ΥTk−1is the process tensor on times Tk−1 with a fi-

nal output degree of freedom denoted by k − 1i. The above

property trickles down, in the sense that

trk−1iΥTk−1= 11k−2o ⊗ΥTk−2

,

trk−2iΥTk−2= 11k−3o ⊗ΥTk−3

,

tr1iΥT1= 110o ⊗ΥT0

,

tr0iΥT0= 1.

(184)

Before elucidating why these properties indeed ensure causalordering, let us quickly lay out why they hold. To this end,it is actually sufficient to only prove the first condition (183),as the others follow in the same vein. A rigorous version ofthis proof can, for example, be found in [237, 241]. Here, wewill prove it by means of Figure 17. Consider tracing out thedegrees of freedom denoted by ki in said figure. This, then,amounts to tracing out all output degrees of freedom of themap Uk. As Uk is CPTP, tracing out all outputs after apply-ing Uk is the same as simply tracing out the outputs withouthaving applied Uk, i.e., tr Uk = tr. This, then, implies apartial trace of the unnormalized maximally entangled stateΦ+k−1o , yielding 11k−1o , as well as a trace over the environ-

mental output degrees of freedom of Uk−1 (see Figure 18 fora detailed graphical representation). The remaining part, i.e.,the part besides 11k−1o is then a process tensor on the timesTk−1 = t0, . . . , tk−1. Iterating these arguments then leadsto the hierarchy of trace conditions in Eq. (184). While a lit-tle bit tedious algebraically, these relations can very easily beread off from the graphical representation provided in Fig-ure 18.

Showing that the above trace conditions indeed imply cor-rect causal ordering of the process tensor now amounts toshowing that a CPTP map at a later time does not have an in-fluence on the remaining process tensor at earlier times. Westart with a CPTP map at tk. This map does not have an out-put space. The only CPTP map with trivial output space isthe trace operation, which has a Choi state 11ki . Thus, let-ting ΥTk

act on it amounts to a partial trace trkiΥTk, which

is equal to 11k−1o ⊗ ΥTk−1. Letting this remaining process

tensor act on a CPTP map Ak−1 at time tk−1 yields

trk−1[(11k−1o ⊗ΥTk−1)AT

k−1] = 11k−2o ⊗ΥTk−2, (185)

where trk−1 denotes the trace over k− 1i and k− 1

o, and wehave used the property of CPTP maps that trk−1o(Ak−1) =11k−1i . As the LHS of the above equation does not depend onthe specific choice of Ak−1, no statistics before tk−1 will de-pend on the choice of Ak−1 either. Again, iterating this argu-ment then shows that the above hierarchy of trace conditionsimplies proper causal ordering.

5. ‘Reduced’ process tensors

Importantly, this independence on earlier CPTP maps im-plies that we can uniquely define certain ‘reduced’ process

tensors. Say, we have a process tensor ΥTkthat is defined on

times t1 < ⋯ < tk and we want to obtain the correct pro-cess tensor only on the first couple t0, . . . , tj with tj < tk.To this end, at first glance, it seems like we would have tospecify what instruments Ji we aim to apply at times ti > tj .However, since ΥTk

satisfies the above causality constraints,earlier statistics, and with them, the corresponding processtensors are independent of later CPTP maps. This is in con-trast to later statistics, that can, due to causal influences, abso-lutely depend on earlier CPTP maps. Long story short, while‘tracing out’ later times is a unique operation on combs, ‘trac-ing out’ earlier ones is not, and the corresponding resultingprocess tensor would depend on the CPTP maps that wereused for the tracing out operations. This, unsurprisingly, isin contrast to the spatial case, where local CPTP maps neverinfluence the statistics of other parties, for the simple reasonthat in the spatial case, there is no signalling between differentparties happening. To be more concrete, in order to obtain aprocess tensor ΥTj

from ΥTk, where tj < tk, we could ‘con-

tract’ ΥTkwith any sequence of CPTP maps Aj+1, . . . ,Ak:

ΥTj= trk∶j+1[ΥTk

(ATj+1 ⊗⋯⊗ A

Tk)). (186)

Since ΥTksatisfies the causality constraints of Eqs. (183)

and (184) the above ΥTjis independent of the choice of

CPTP maps Aj+1, . . . ,Ak and correctly reproduces all statis-tics on Tj . Since the choice of CPTP maps is arbitrary, wecan take the simple choice Ai =

1dio

11ii ⊗ 11io , which yields

ΥTj=

1

Πki=j+1dio

trk∶j+1(ΥTk) . (187)

Again, we emphasize that the causality constraints on ΥTk

only apply in a fixed order – that is, the order that is given bycausal ordering of the times in Tk, such that a ‘reduced’ pro-cess tensor on later times is a meaningful concept, but wouldin general depend on the CPTP maps that were applied at ear-lier times. For example, we would generally have

Υ(A0)tk,...,t1 ∶= tr0(ΥTk

AT0) ≠ tr0(ΥTk

A′T0 ) =∶ Υ

(A′0)tk,...,t1

(188)

As before, the above results can, equivalently, be stated interms of maps. However, the corresponding equations wouldnot be very enlightening. To summarize, process tensors, justlike channels and superchannels satisfy complete positivityand trace preservation, albeit with slightly different interpre-tations than was the case for channels.

At this point, it is insightful to return to the two differentways of motivating the discussion of quantum stochastic pro-cesses we alluded to at the beginning of Sec. V D. Naturally,based on a reasoning by analogy, we could have introducedthe process tensor as a positive linear functional that maps se-quences of CP maps to probabilities and respects the causalorder of the process. After all, coming from classical stochas-tic processes and knowing about how measurements are de-scribed in quantum mechanics, this would have been a verynatural route to take. This, then, might in principle have ledto a larger set of process tensors than the ones we obtainedfrom underlying circuits. However, this is not the case; as we

Page 46: arXiv:2012.01894v2 [quant-ph] 10 May 2021

46

Figure 19. Tester Element. In the most general case, an exper-imenter can correlate the system of interest with an ancilla (here,initially in state ∣Ψ⟩), use said ancilla again at the next time, etc.,and make a final measurement with outcome x in the end. As theunitaries Vj can also act trivially on parts of the ancilla, this sce-nario includes all conceivable measurements an experimenter canperform. Summing over the outcomes xTk

amounts to tracing outthe ancillas, thus yielding a proper comb (compare with Figure 16.Note that the inputs (outputs) of the resulting tester elements corre-spond to the outputs (inputs) of the process tensor, and the systemof interest corresponds to the top line, not the bottom line.

shall see in the next section, any object that is positive andsatisfies the trace hierarchy above actually corresponds to aquantum circuit with only pure states and unitary interme-diate maps. Consequently, and somewhat reassuringly, boththe axiomatic perspective, as well as the operational one wetook here, lead to the same resulting descriptors of quantumstochastic processes.

Finally, one might wonder, why we never discussed thequestion of causality in the case of classical stochastic pro-cesses. There, however, causality does not play a role per seif only non-invasive measurements are considered. It is onlythrough the invasiveness of measurements/interrogations thatinfluences between different events, and, as such, causal re-lations can be discerned. A joint probability distribution ob-tained from making non-invasive measurements does thus notcontain information about causal relations. This, naturally,changes drastically, as soon as active interventions are takeninto considerations, as is done actively in the field of classicalcausal modeling [221], and as cannot be avoided in quantummechanics [218].

6. Testers: Temporally correlated ‘instruments’

So far, we have only considered the application of inde-pendent instruments, which have the form given in Eq. (174).However, these are not the only operations a process ten-sor can meaningfully act on. In principle, an experimentercould, for example, condition their choice of instrument attime tj ′ on all outcomes they recorded at times tj < tj ′ .This would lead to a (classically) temporally correlated ‘in-strument’, which is commonly practiced in quantum opticsexperiments [268]. More generally, at times Tk, the exper-imenter could correlate the system of interest with externalancillas, which are re-used, and measure said ancillas at timetk (see Figure 19). This, then, would result in a generalizedinstrument that has temporal quantum correlations.

We can always express such correlated operations using alocal linear basis as

AxTk= ∑

xTk

αxTkAxk

⊗ Axk−1⊗ ⋅ ⋅ ⋅ ⊗ Ax1

⊗ Ax0. (189)

The LHS side of this equation is labeled the same asEq. (174). This is because the above equation containsEq. (174) as a special case. Here, Axj

form a linear ba-sis for operations at time tj and αxTk

are generic coefficientthat can be non-positive. In other words, the above corre-lated operation can carry ‘entanglement in time’ since notonly convex combinations of product operations are possi-ble. Temporally correlated operations can be performed aspart of a temporally correlated instrument. Such generaliza-tions of instruments have been called ‘testers’ in the litera-ture [225, 241, 269].

In the case of ‘normal’ instruments, the respective elementsCP maps have to add up to a CPTP map. Here, in clear anal-ogy, the elements of a tester have to add up to a proper processtensor. In terms of Choi states, this means that the elementsAxTk

of a tester have to be positive, and add up to a matrixA = ∑kAxTk

that satisfies the hierarchy of trace conditionsof Eqs. (183) and (184). We emphasize that the possible out-comes xTk

that label the tester elements do not have to corre-spond to sequences x0, . . . xk of individual outcomes at timest0, . . . tk. As outlined above, for correlated tester elements,all measurements could happen at the last time, only, or atany subset of times. Consequently, in what follows, unlessexplicitly stated otherwise, xTk

will label ‘collective’ mea-surement outcomes and not necessarily sequences of individ-ual outcomes. Interestingly, since tester elements add up to aproper process tensor, this discussion of testers already pointsus to the interpretation of correlations between different timesin ΥTk

; each type of them corresponds to a different typeof information that is transmitted between different points intime by the environment – just like in the tester case classi-cal correlations correspond to classical information that is fedforward, while entanglement between different times relatesto quantum information being processed. We will make thesepoints clearer below, but already want to emphasize the dualrole that testers and process tensors play.

However, for a tester, the roles of input and output are re-versed with respect to the process tensors that act on them;an output of the process tensor is an input for the tester andvice versa. Consequently, keeping the labeling of spacesconsistent with the above, and assuming that testers endon the last output space ki, the trace hierarchy for testersstarts with trko(AxTk

) = 11ki ⊗ ATk−1, trk−1o(ATk−1

) =11k−1i⊗ATk−2

, etc., implying that, with respect to Eqs. (183)and (184), the roles of i and o in the trace hierarchy are sim-ply exchanged. Naturally, testers generalize both POVMs andinstruments to the multi-time case.

Importantly, for any element AxTkof a tester that is or-

dered in the same way as the underlying process tensor, wehave

0 ≤ tr(ΥTkA

TxTk

) ≤ 1, and tr(ΥTkA

TTk

) = 1, (190)

which can be seen by employing the hierarchy of trace con-ditions that hold for process tensors and testers. Similarly tothe case of POVMs and instruments, letting a process tensoract on a tester element yields the probability to observe theoutcome xTk

that corresponds to AxTk(see Figure 20 for a

graphical representation). Below, we will encounter tempo-

Page 47: arXiv:2012.01894v2 [quant-ph] 10 May 2021

47

Figure 20. Action of a process tensor on a tester element. ‘Con-tracting’ a process tensor (depicted in blue) with a temporally corre-lated measurement, i.e., a tester element (depicted in green), yieldsthe probability for the occurrence of said tester element.

rally correlated tester elements when discussing Markovian-ity and Markov order in the quantum setting.

As before, one might wonder why temporally correlatedmeasurements – unlike temporally correlated joint probabil-ity distributions – made no occurrence in our discussion ofclassical stochastic processes. As before, the answer is rathersimple; in our discussion of classical stochastic processes,there was no notion of different instruments that could beused at different times, so that there was also no means oftemporally correlating them. Had we allowed for differentclassical instruments, i.e., had we allowed for different kindsof active interventions, then classically correlated instrumentswould have played a role as well. However, these instrumentwould have only displayed classical correlations between dif-ferent points in time, since in no quantum information can besent between classical instruments.

7. Causality and dilation

In Sec. V E, we will see that, besides being a handy math-ematical tool, process tensors allow for the derivation of ageneralized extension theorem, thus appearing to be the nat-ural extension of stochastic processes to the quantum realmon a fundamental level. Here, we will, in a first step, connectprocess tensors to underlying dynamics. In classical physics,it is clear that every conceivable joint probability distributioncan be realized by some – potentially highly exotic – classi-cal dynamics. On the other hand, so far, it is unclear if thesame holds for process tensors. By this, we mean, that, wehave not shown the claim made above, that every process ten-sor, i.e., every positive matrix that satisfies the trace hierar-chy of Eqs. (183) and (184) can actually be realized in quan-tum mechanics. We will provide a short ‘proof’ by examplehere; more rigorous treatments can, for instance, be found inRefs. [237, 241, 251, 270].

Concretely, showing that any process tensor can be realizedin quantum mechanics amounts to showing that they admit aquantum circuit that is only composed of pure states and uni-tary dynamics. This is akin to the Stinespring dilation wediscussed in Sec. IV B 4, which allowed us to represent anyquantum channel in terms of pure objects only. In this sense,the following dilation theorem will even be more general thanthe analogous statement in the classical case, where random-ness has to be inserted ‘by hand’.

We use the property that all quantum states are purifi-able to obtain a representation for general process tensors.

Figure 21. Dilation of the Choi state of a process tensor. Up tonormalization, Eqs. (192) and (193) together yield a quantum circuitfor the implementation of the Choi state of a two-step process ten-sor that only consists of pure states and isometries (which could befurther dilated to unitaries).

Figure 22. Process tensor corresponding to Figure 21. Rearrang-ing the wires of the circuit of Figure 21 and maximally entangledstates (i.e., undoing the CJI), yields the representation of a processtensor we have already encountered in the previous section. Asabove, note that Ho

0 ≅ Ho0′ and Ho

1 ≅ Ho1′ , such that this is indeed

the correct dilation of Υ0i0o1i1o2i .

For concreteness, let us consider a three-step process tensorΥ0i0o1i1o2i , defined on three times t0, t1, t2. Now, due tothe causality constraints of Eqs. (183) and (184), we havetr2i(Υ0i0o1i1o2i) = 111o ⊗Υ

′0i0o1i , where the prime is added

for clearer notation in what follows. Since each of its compo-nents is proportional to a quantum state, this latter term canbe dilated in at least two different ways:

111o ⊗Υ′0i0o1i = tr2iA(∣Υ⟩⟨Υ∣0i0o1i1o2iA)

= d1otr1o′B(Φ+1o1o′ ⊗ ∣Υ′⟩⟨Υ′∣0i0o1iB),

(191)

where ∣Υ⟩0i0o1i1o2iA and ∣Υ′⟩0i0o1iB are purifications ofΥ0i0o1i1o2i and Υ

′0i0o1i , respectively (with corresponding an-

cillary purification spaces A and 1o′, B), and the additional

pre-factor d1o = tr(111o) is required for proper normalization.These two different dilations of the same object are relatedby an isometry V1o′B→2iA =∶ V that only acts on the dilationspaces, i.e.,

∣Υ⟩⟨Υ∣0i0o1i1o2iA= d1oV (Φ+

1o1o′ ⊗ ∣Υ′⟩⟨Υ′∣0i0o1iB)V † (192)

In the same vein, due to the causality constraints of Υ′0i0o1i ,

we can show that there exists an isometry W0o′0i′→1iB =∶W ,

Page 48: arXiv:2012.01894v2 [quant-ph] 10 May 2021

48

Figure 23. Consistency condition for Quantum Stochastic Pro-cesses. Letting a process tensor act on an identity (here at time t2)yields the correct process tensor on the remaining times.

such that

∣Υ′⟩⟨Υ′∣0i0o1iB= d0oW (Φ+

0o0o′ ⊗ ∣Υ′′⟩⟨Υ′′∣0i0i′)W †,

(193)

where ∣Υ′′⟩0i0i′ is a pure quantum state. Insert-ing this into Eq. (192) and using that Υ0i0o1i1o2i =

trA(∣Υ⟩⟨Υ∣0i0o1i1o2iA) yields – up to normalization – arepresentation of Υ0i0o1i1o2i in terms of a pure initial state∣Υ′′⟩⟨Υ′′∣0i0i′ ⊗ Φ

+0o0o′ ⊗ Φ

+1o1o′ and subsequent isometries

W0o′0i′→1iB and V1o′B→2iA (see Figure 21). As any isometrycan be completed to a unitary, this implies that Υ0i0o1i1o2i

can indeed be understood as stemming from a quantum cir-cuit consisting only of pure states and unitaries. This circuitsimply provides the CJI of the corresponding process tensor,as can easily be seen by ‘removing’ the maximally entan-gled states, and rearranging the wires in a more insightfulway (see Figure 22). Naturally, these arguments can be ex-tended to any number of times. Here, we sacrificed someof the mathematical rigor for brevity and clarity of the ex-position; as mentioned, for a more rigorous derivation, seeRefs. [237, 241, 251, 270]. Importantly, with this dilationpropewrty at hand, we can be sure that every process ten-sor actually has a physical representation, i.e., it describesa conceivable physical situation. This is akin to the case ofchannels, where the Stinespring dilation guaranteed that ev-ery CPTP map could be implemented in the real world. Withthese loose ends wrapped up, it is now time to discuss pro-cess tensors and quantum stochastic processes on a more ax-iomatic level.

E. Some mathematical rigorGeneralized extension theorem (GET)

Above, we provided a consistent way to describe quan-tum stochastic processes. Importantly, this description givenby process tensors can deal with the inherent invasivenessof quantum measurements, as it separates the measurementsmade by the experimenter from the underlying process theyprobe. Unsurprisingly then, employing this approach to quan-tum stochastic processes, the previously mentioned break-down of the KET in quantum mechanics can be resolved in asatisfactory manner [218, 262].

Recall that one of the ingredients of the Kolmogorov ex-tension theorem – which does not hold in quantum me-chanics – was the fact that a multi-time joint probabilitydistribution contains all joint probability distributions forfewer times. In quantum mechanics on the other hand, a

Indivisible processes ℰ(t:r) ≠ ℰ(t:s) ∘ ℰ(s:r)

ΥT ⊇ ΥTk ⊇ … ⊇ ΥT3 ⊇ ΥT2

Generalized Extension Theorem proves the existence of

Markovian processes, Master equations,

Data processing inequality

Generic non-Markovian correlations

tr [ΥTkAT

k:j+ ⊗ ATj−:0] ≠ tr [ΥTk

ATk:j+ ⊗ A′Tj−:0]

Figure 24. Hierarchy of multi-time quantum processes. A quantumstochastic process is the process tensor over all times. Of course, inpractice one looks only at finite time statistics. However, the gen-eralized extension theorem tells us that the set of all k-time processtensors ΥTk

contain, as marginals, all j-time probability distri-butions ΥTj

for j < k. Moreover, the set of two and three timeprocesses play a significant roles in the theory of quantum stochasticprocesses. Here, we only display a small part of the multi-facetedstructure of non-Markovian quantum processes. For a much morecomprehensive stratification, see Ref. [7, 10].

joint probability distribution, say, at times t1, t2, t3 for in-struments J1,J2,J3 does not contain the information ofwhat statistics one would have recorded, had one not mea-sured at t2, but only at times t1, t3. More generally,P(x3, x2, x1∣J3,J2,J1) does not allow one to predict prob-abilities for different instruments J ′

1,J ′2,J ′

3. On the otherhand, the process tensor allows one to – on the set of timesit is defined on – compute all joint probabilities for all em-ployed instruments, in particular, for the case where oneof more of the instruments are the ‘do-nothing’ instrument.Consequently, it is easy to see that for a given process on, say,times t1, t2, t3, the corresponding process tensor Tt1,t2,t3– where for concreteness, here we use Tt1,t2,t3 instead ofTT3

– contains the correct process tensors for any subset oft1, t2, t3. For example, we have

Tt1,t3[ r, r] = Tt1,t3[ r, I2, r], (194)

where I2 is the identity map I[ρ] = ρ at time t2 (see Fig-ure 23 for a graphical representation).

This, in turn, implies that process tensors satisfy a gen-eralized consistency condition. Importantly, as I is a uni-tary operation, letting T act on an identity does generally notcoincide with the summation over measurement outcomes.Concretely, for any instrument with more than one outcome,we have ∑xAx ≠ I, and thus summation over outcomesis not the correct way to ‘marginalize’ process tensors. Wewill discuss below, why it works nonetheless for classicalprocesses. To make this concept of compatibility for processtensors more manifest, let us revisit the concatenated Stern-Gerlach experiment we presented in Sec. IV E 1 when we dis-cussed the breakdown of the Kolmogorov extension theoremin quantum mechanics. There, the system of interest under-went trivial dynamics (given by the identity channel I), in-terspersed by measurements in the z-, x-, and z- direction(see Figure 12). Choosing the initial state of the system tobe fixed and equal to ∣+⟩ (as we did in Sec. IV E 1) thenyields a corresponding process tensor that actso on CP map

Page 49: arXiv:2012.01894v2 [quant-ph] 10 May 2021

49

Ax1,Az2 ,Ax3

at times t1, t2, t3 as

Tt1,t2,t3[Ax1,Az2 ,Ax3

]= tr[Ax3

I2→3 Az2 I1→2 Ax1)[∣+⟩⟨+∣]]. (195)

Now, replacing Az2 in the above by I2, since I2→3 I2 I1→2 = I1→3 we see that we exactly obtain the process tensorfor trivial dynamics between t1 and t3, i.e.,

Tt1,t2,t3[Ax1, I2,Ax3

]=tr[Ax3 I1→3 Ax1

)[∣+⟩⟨+∣]]= Tt1,t3[Ax1

,Ax3].

(196)

On the other hand, summing over the outcomes at t2 (as onewould do in the classical case), we would not obtain the cor-rect process tensor in absence of a measurement at t2. Specif-ically, setting Az = ∑z2

Az2 , we obtain

∑z2

Tt1,t2,t3[Ax1,Az2 ,Ax3

]

= tr[Ax3Az Ax1

)[∣+⟩⟨+∣]] ≠ Tt1,t3[Ax1,Ax3

].(197)

The quantum process we discussed in Sec. IV E 1, and moregenerally, all quantum processes thus satisfy consistencyproperties, however, not in exactly the same sense as classicalprocesses do.

With this generalized consistency condition at hand, a gen-eralized extension theorem (GET) in the spirit of the KET canbe proven for quantum processes [218, 262]; any underlyingquantum process on a set of times T leads to a family of pro-cess tensors TTk

Tk⊂T that are compatible with each other,while any family of compatible process tensors implies theexistence of a process tensor that has all of them as marginalsin the above sense. More precisely, setting

T ∣Tk

T`[ r] ∶= TT`

[ ⨂α∈T`\Tk

Iα, r], (198)

where we employ the shorthand notation ⨂α∈T`\TkIα to

denote that the identity map is ‘implemented’ at each timetα ∈ T` \Tk, we have the following Theorem [218, 262]:

Theorem. (GET) Let T be a set of times. For each fi-nite Tk ⊂ T, let TTk

be a process tensor. There exists aprocess tensor TT that has all finite ones as ‘marginals’, i.e.,TTk

= T ∣Tk

T iff all finite process tensors satisfy the consis-

tency condition, i.e., TTk= T ∣Tk

T`for all finite Tk ⊂ T` ⊂ T.

As the proof of the GET is somewhat technical, wewill not provide it here and refer the interested reader toRefs. [218, 262]. We emphasize though, that, since the basicidea of the GET is – just like the KET – based on compatibil-ity of descriptors on different sets of times, it can be provenin a way that is rather similar to the proof of the KET [218].

Importantly, this theorem contains the KET as a specialcase, namely the one where all involved process tensors andoperations are classical. Consequently, introducing processtensors for the description of quantum stochastic processes

closes the apparent conceptual gaps we discussed earlier, andprovides a direct connection to their classical counterpart;while quantum stochastic processes can still be consideredas mappings from sequences of outcomes to joint probabil-ities, in quantum mechanics, a full description requires thatthese probabilities are known for all instruments an experi-menter could employ (see Figure 25). Additionally, the GETprovides satisfactory mathematical underpinnings for phys-ical situations, where active interventions are purposefullyemployed, for example, to discern different causal relationsand mechanisms. This is for instance the case in classical andquantum causal modeling [221, 239, 259, 271] (see Figure 26for a graphical representation).

In light of the fact that, mathematically, summing over out-comes of measurements does not amount to an identity map– even in the classical case – it is worth reiterating from amathematical point of view, why the KET holds in classicalphysics. For a classical stochastic process, we always implic-itly assume that measurements are made in a fixed basis (thecomputational basis), and no active interventions are imple-mented. Mathematically, this implies that the considered CPmaps are of the form Axj

[ρ] = ⟨xj∣ρ∣xj⟩ ∣xj⟩⟨xj∣. Sum-ming over these CP maps yields the completely dephasingCPTP map ∆j[ρ] ∶= ∑xj

⟨xj∣ρ∣xj⟩ ∣xj⟩⟨xj∣, which doesnot coincide with the identity map. However, on the set ofstates that are diagonal in the computational basis, the ac-tion of both maps coincides, i.e., ∆j[ρ] = Ij[ρ] for allρ = ∑xj

λxj∣xj⟩⟨xj∣. More generally, their action coin-

cides with the set of all combs that describe classical pro-cesses [248]. In a sense then, mathematically speaking, theKET works because, in classical physics, only particular op-erations, as well as particular process tensors are considered.Going beyond either of these sets requires – already in classi-cal physics – a more general way of ‘marginalization’, lead-ing to an extension theorem that naturally contains the classi-cal one as a special case.

With the GET, which, here, we only looked at in a very cur-sory manner, we have answered the final foundational ques-tion about process tensors and have established them as thenatural generalization of the descriptors of classical stochas-tic processes to the quantum realm, both from an operationalas well as an axiomatic perspective. While rather obvious inhindsight, it required the introduction of some machinery tobe able to properly describe measurements in quantum me-chanics. For the remainder of this Tutorial, we will employthe developed machinery to discuss questions properties ofquantum stochastic processes, in particular that of Marko-vianity and Markov order.

VI. PROPERTIES OF QUANTUM STOCHASTICPROCESSES

A. Quantum Markov conditions and Causal break

Now, armed with a clear description for quantum stochas-tic processes, i.e., the process tensor, we are in the position toask when is a quantum process Markovian. We will formulatea quantum Markov condition [238, 239] by employing the no-

Page 50: arXiv:2012.01894v2 [quant-ph] 10 May 2021

50

x1 x2 x3 … xn

t = 0

t = 1

t = 2

t = 3

…t = k

s1s2 sn s3

J1

J2

J3

Jk

Figure 25. ‘Trajectories’ of a quantum stochastic process. Anopen quantum process is fully described once all joint probabilitiesfor sequences of outcomes are known for all possible instrumentsan experimenter can employ to probe the process. Like in the clas-sical case, each sequence of outcomes can be considered a trajec-tory, but unlike in the classical case, there is no ontology attachedto such trajectories. Additionally, each sequence of outcomes inthe quantum case corresponds to a sequence of measurement op-erators, not just labels. If both the process and the allowed (non-invasive) measurements are diagonal in the same fixed basis, then theabove figure coincides with Figure 5, where trajectories of classicalstochastic processes were considered. Importantly, while in classi-cal physics, only probabilistic mixtures of different trajectories arepossible, quantum mechanics allows for the coherent superpositionof ‘trajectories’ [272].

tion of causal breaks. As we have seen in the classical case,Markovianity is a statement about conditional independenceof the past and the future. Intuitively speaking, information ofthe past can be transmitted to the future in two different ways:via the system itself and via the inaccessible environment. Ina Markovian process, the environment does not transmit anypast system information to the future process on the system.This condition is encapsulated in the classical Markov condi-tion

P(xk∣xk−1, . . . , x0) = P(xk∣xk−1) ∀k. (199)

Conditioning on a given outcome blocks the information flowfrom the past to the future through the system (since it is set toa fixed outcome), and conditional independence from the pastthen tells us that there is no information that travels throughthe environment.

A causal break allows one to extend this classical intuitionto the quantum case. It is designed to block the informationtransmitted by the system itself and at the same time look forthe dependence of the future dynamics of the system condi-tioned on the past control operations on the system. If thefuture depends on the past controls, then we must concludethat the past information is transmitted to the future by theenvironment, which is exactly the non-Markovian memory.

Let us begin by explicitly denoting the process ΥT`on

a set of time T` = t0, . . . , tk, . . . , t`. We break thisset into two subsets at an intermediate time step k < l as

Figure 26. (Quantum) Causal network. Performing different in-terventions allows for the causal relations between different events(denoted by Xj) to be probed. For example, in the figure the eventB1 directly influences the events C3 and A2, while A3 influencesonly B4. As not all pertinent degrees of freedom are necessarilyin the control of the experimenter, such scenarios can equivalentlybe understood as an open system dynamics. Any such scenario canbe described by a process tensor [241], and the GET applies, eventhough active interventions must be performed to discern causal rela-tions. For example, the eventsD3,D4, B5 could be successive (e.g.,at times t3, t4 and t5) spin measurements in z-, x- and z-direction,respectively. Summing over the results of the spin measurement inx-direction at t4 would not yield the correct probability distributionfor two measurements in z-direction at t3 and t5 only, but consis-tency still holds on the level of process tensors (see also Sec. IV E 1).

T− = t0, . . . , tk− and T+ = tk+ , . . . , t` where tk− andtk+ respectively are the times corresponding to the spaces inthe Choi state of the process denoted by ki and ko. In the firstsegment, we implement a tester element Ax− belonging to in-strument J− with outcomes x−. In the next time segment,as the system evolves to time step `, we implement a tester el-ement Ax+ belonging to instrument J+ with outcomes x+(see Figure 27). Together, we have applied two independenttester elements Ax+ ⊗Ax− , where the simple tensor productbetween the two testers implies their independence. In detail,the two testers split the timestep k: the first instrument endswith a measurement on the output of the process at time tk(labeled as tk− ). The second instrument begins with prepar-ing a fresh state at the same time (labeled as tk+ ). Importantly,it implies a causal break that prevents any information trans-mission between the past T− and the future T+ via the systemwhich is similar to our reasoning in the classical case. Thus,detecting an influence of past operations on future statisticswhen implementing a causal break implies the presence ofmemory effects mediated by the environment. Additionally,it is easy to see that causal breaks can span a basis of the spaceof all testers on T, a property we will make use of below.

For future convenience, let us define the process tensorΥT−

∶= 1dO+

tr+(ΥT`) that is defined on T− only. As we

have seen in our discussion of causal ordering of process ten-sor, ΥT−

is well-defined and reproduces the statistics on T−

correctly. We now focus on the conditional outcome statisticsof the future process, which are given by Eq. (175)

P(x+∣J+,x−) = tr[Υ(Ax− )T+

ATx+]. (200)

Note that we have added a second condition x− on the LHSas well as an additional superscript on the RHS because the

Page 51: arXiv:2012.01894v2 [quant-ph] 10 May 2021

51

future process, in general, may depend on the outcomes forthe past instrument J−. Importantly, above we have set

Υ(Ax− )T+

∶= tr−(ΥT`A

Tx−)/tr(ΥT−

ATx−) , (201)

making the P(x+∣J+,x−) given in Eq. (200) a proper con-ditional probability. This operationally well-defined condi-tional probability is fully consistent with the conditional clas-sical probability distributions in Eq. (7).

The causal break at timestep k guarantees that the systemitself cannot carry any information of the past into the futurebeyond step k. The only way the future process Υ

(Ax− )T+

coulddepend on the past is if the information of the past is carriedacross the causal break via the environment. We have de-picted this in Figure 27, where the only possible way the pastinformation can go to the future is through the process tensoritself. This immediately results in the following operationalcriterion for a Markov process:

Quantum Markov Condition. A quantum process isMarkovian when the future process statistics, after a causalbreak at time step k (with l > k), is independent of the pastinstrument outcomes x−

P(x+∣J+,x−) = P(x+∣J+), (202)

∀J+,J− and ∀ k ∈ T.

Alternatively, the above Markov condition says that aquantum process is non-Markovian iff there exist two pasttesters outcomes, x− and x

′−, such that after a causal break at

time step k, the conditional future process statistics are dif-ferent for a some future instrument J+:

P(x+∣J+,x−) ≠ P(x+∣J+,x′−). (203)

Conversely, if the statistics remain unchanged for all possiblepast controls, then the process is Markovian. Naturally, theMarkov condition of Eq. (202) should ring a bell and remindthe reader of the exactly analogous Markov condition in theclassical case.

The above quantum Markov condition is fully operational(since it is phrased in terms of conditional probabilities,which can, in principle, be determined experimentally) andthus it is testable with a finite number of experiments[273].Suppose the conditional independence in Eq. (202) holds fora complete basis of past and future testers Ax−⊗Ax+, then,by linearity it holds for any instruments, and the future is al-ways conditionally independent of the past. It is worth notingthat this definition is the quantum generalization of the causalMarkov condition for classical stochastic evolutions where in-terventions are allowed [274].

Additionally, in spirit, the above definition is similar to thesatisfaction of the Quantum regression formula (QRF) [2, 3,275]. Indeed, its equivalence to a generalized QRF has beenshown in [10], while satisfaction of the generalized QRF hasbeen used in [230, 262] as a basis for the definition of quan-tum Markovian processes. On the other hand, the relation ofthe QRF and the witnesses of non-Markovianity we discussedin Sec. IV D has been investigated [203, 276]. Here, we optfor the understanding of Markovianity in terms of conditionalfuture-past independence, an approach fully equivalent to theone taken in the aforementioned works [10, 230, 262].

Figure 27. Determining whether a quantum process is Marko-vian. Generalized testers (multi-time instruments) Ax− and Ax+

are applied to the system during a quantum process, where the sub-scripts represent the outcomes. The testers are chosen to implementa causal break at a timestep tk, which ensures that the only way thefuture outcomes depend on the past if the process is non-Markovian.Thus by checking if the future depends on the past for a basis ofinstruments we can certify the process to be Markovian or not.

1. Quantum Markov processes

Intuitively, the quantum Markov condition implies that anysegment of the process is uncorrelated with the remainder ofthe process. Put differently, at every time in T`, a Markovianprocess is conditionally independent of its past. This rightaway means that a Markov process must have a rather re-markably simple structure. Translating the idea of conditionalindependence to the correlations that can persist in ΥT`

, onewould expect that ΥT`

cannot contain any correlations be-tween distant times if the underlying process is Markovian.And indeed, the Choi state of the process tensor for a Markovprocess can be shown to be simply a product state

Υ(M)T`

= ρ0 ⊗`−1

⨂j=0

ΥE(j+1−∶j+) , (204)

where each ΥE(j+1−∶j+) is the Choi matrix of a CPTP map fromtj+ to tj+1− and ρ0 is the initial state of the system. Beforecommenting on the origin of Eq. (204), let us first commenton the meaning of its structure. The above equation simplysays that there are no temporal correlations in the process,other than those between neighboring time steps facilitatedby the channel on the system itself, i.e., there is no memorytransmitted via the environment. An obvious example of sucha process is a closed process, i.e., a unitary process. Hereeach ΥE(j+1+∶j−) will be maximally entangled (since quantuminformation is transmitted perfectly by unitary maps) and cor-responds to a unitary evolution, respectively. However, thereare no other memory effects between distant times present,since in a closed process there is no environment that couldtransport such memory.

By inserting Eq. (204) into Eq.eq:process, where the actionof the process tensor in terms of its Choi state was defined, wesee that for a sequence of CP maps Ax0

⊗⋯Ax`performed by

the experimenter, we see that, after rearrangement, we have

P(xT`∣JT`

) = tr[ATx`

ΥE(`−∶`−1+)ATx`−1

⋯ΥE(1−∶0+)ATx0ρ0],(205)

which simply looks like a concatenation of mutually indepen-dent maps that act on the system alone, as one would expectfrom a Markovian process. This becomes even clearer whenwe represent the above equation in terms of quantum maps.

Page 52: arXiv:2012.01894v2 [quant-ph] 10 May 2021

52

Then, the action of the corresponding process tensor T (M)Tk

CPmaps Axi

can be expressed equivalently to Eq. (205) as

T (M)Tk

[ρ,Ax1, . . . ,Ak] = tr(Ak E(k−∶k−1+) ⋯

⋯ E(2−∶1+) Ax1 E(1−∶0+))[ρ0],

(206)

where all E(j+1−∶j+) (corresponding to ΥE(j+1−∶j+) ) are mutu-ally independent CPTP maps that act on the system alone,and ρ0 is the initial system state. While this property of in-dependent CPTP maps – at first sight – seems equivalent toCP divisibility, we emphasize that it is strictly stronger, as themutual independence of the respective maps has to hold forarbitrary interventions at all times in Tk [183, 237, 238] andis thus – unlike CP divisibility – a genuine multi-time state-ment.

As an aside, the above form for Markov processes does notmean that we need to do experiments with causal breaks inorder to decide Markovianity of a process. We simply needto determine if the process tensor has any correlations in timewhich can also be done using noisy or temporally correlatedinstruments that do not correspond to causal breaks. We caninfer the correlations in a process once we have reconstructedthe propcess tensor. This can be done – as outlined above –by tomography, which only requires applying a linear basisof instruments, causal breaks or not. Causal breaks, however,have the conceptual upside that they make the relation to theclassical Markov condition transparent. Additionally, devia-tion from Markovianity can already be witnessed – and as-sertions about the size of the memory can be made – evenif not a full basis of operations is available to the experi-menter [215, 240, 264, 277], but we will not delve into thesedetails here.

Finally, let us comment on how we actually arrived at theproduct structure of Markovian process tensors, starting fromthe requirement of conditional independence of Eq. (202).Slightly rewritten in terms of process tensors, conditional in-dependence of the future and the past implies that

tr−(ΥT`A

Tx+)∝ tr−(ΥT`

A′Tx+) ∀Ax+ ,A

′x+ , (207)

where the proportionality sign ∝ is used instead of an equal-ity, since Ax+ and A

′x+ generally occur with different proba-

bilities (this has no bearing on the conditional future proba-bilities though, since they are renormalized by the respectivepast probabilities (see Eqs. (200) and (201)). Since the aboveequation has to hold for all conceivable tester elements Ax+

and A′x+) (and thus, by linearity, for all matrices), it is easy to

see that the corresponding ΥT`has to be of product form, i.e.,

ΥT`= Υ+ ⊗Υ−. Demanding that conditional independence

holds for all times in T` then implies that ΥT`is indeed of the

product form postulated in Eq. (204). More detailed proofs ofthis statement can, for example, be found in [237–239].

It is important to emphasize that we did not start out bypostulating this product structure of Markovian processes.While tempting – and eventually correct – it would not havebeen an a priori operationally motivated postulate, but ratherone guided by purely mathematical consideration. Here, werather started from a fully operational quantum definition of

Markovianity, phrased entirely in terms of experimentally ac-cessible quantities, and in line with its classical counterpart.

Besides following the same logic as the classical defini-tion of Markovianity, that is, conditional independence of thefuture and the past, the above notion of Markovianity also ex-plicitly boils down to the classical one in the correct limit:Choosing fixed instruments at each time in Tk yields a prob-ability distribution P(xk, . . . , x1) for the possible combina-tions of outcomes. Now, if each of the instruments only con-sists of causal breaks – which is the case in the study of clas-sical processes – then a (quantum) Markovian process yieldsa joint probability distribution for those instruments that sat-isfies the classical Markov condition of Eq. (199). The quan-tum notion of Markovianity thus contains the classical oneas a special case. One might go further in the restriction tothe classical case, by demanding that the resulting statisticsalso satisfy the Kolmogorov consistency conditions we dis-cussed earlier. However, on the one hand, there are quantumprocesses that do not satisfy Kolmogorov consistency condi-tions, independent of the choice of probing instruments [248].On the other hand, Markovianity is also a meaningful conceptfor classical processes with interventions [221], where Kol-mogorov conditions are generally not satisfied. Independentof how one aims to restrict to the classical case, the notion ofMarkovianity we introduced here for the quantum case wouldsimplify to the respective notion of Markovianity in the clas-sical case.

Below we will use the structure of Markovian pro-cesses to construct operationally meaningful measures fornon-Markovianity that go beyond the witnesses of non-Markovianity based on two-time correlations we presented inSec. IV D. We then discuss the concept of quantum Markovorder to close this section.

Before doing this we will shortly discuss how the quan-tum Markov condition we introduced above relates to theaforementioned witnesses of non-Markovianity we have al-ready encountered. In contrast to the non-Markovianity wit-nesses discussed in Sec. IV D, the above condition is neces-sary and sufficient for memorylessness. That is, if it holds,then there is no physical experiment that will see condi-tional dependence between the past and future processes. Ifit does not hold then there exists some experiment that willbe able to measure some conditional dependence between thepast and the future processes. In fact, a large list of non-Markovian witnesses, defined in Refs. [154, 155, 159, 167,184, 189, 196, 199, 227, 278–290], herald the breaking of thequantum Markov condition. However, there are always non-Markovian processes that will not be identified by most ofthese witnesses. This happens because these witnesses aboveusually only account for (at most) three-time correlations.Many of them are based on the divisibility of the process. AMarkov process, which has the form of Eq. (204), will alwaysbe CP-divisible, while the converse does not hold true in gen-eral [183, 262]. To see this latter point, it suffices to show anexample of a CP-divisible process that is non-Markovian.

Page 53: arXiv:2012.01894v2 [quant-ph] 10 May 2021

53

e−i g2 σ3⊗ x t e−i g

2 σ3⊗ x t |x± ⟩

|ψE⟩

σ1

Figure 28. A CP-divisible but non-Markovian process. A qubitsystem is prepared in states ∣x±⟩ and evolves along with an environ-ment. The uninterrupted dynamics of the system is pure dephasing,which will be certified as Markovian by two-point witnesses. How-ever, when an instrument X is applied at time t, the system dynam-ics reverse and the system returns to its original state, which is onlypossible in the presence of non-Markovian memory.

2. Examples of divisible non-Markovian processes

A completely positive and divisible process on a singlequbit system can be acquired by following the prescriptionin Refs. [262, 291, 292], where the so-called shallow pocketmodel was discussed. We begin with the system in an ar-bitrary state ρ(0) that interacts with an environment whoseinitial state is a Lorentzian wavefunction

⟨x∣ψE⟩ = ψE(x) =√

1

x + iG. (208)

We assume the initial state to be uncorrelated, i.e. of the formρ(0)⊗ ∣ψE⟩⟨ψE∣. The two evolve together according to theHamiltonian HSE =

g

2σ3 ⊗ x, where x is the environmental

position degree of freedom. The total SE dynamics are thendue to the unitary operator

Ut = e−iHSEt. (209)

It is easy to show, by partial tracing of the environment E,that the reduced dynamics of the system S is pure dephasingin the z-basis (see Eq. (129) in Sec. IV D 3), and can be writ-ten exactly in GKSL form, i.e., if the system is not interferedwith, the evolution between any two points is a CPTP map ofthe following form:

ρ(tj) = E(tj−ti)[ρ(ti)] with

E(tj−ti) = expL(tj − ti).(210)

As we argued above, such a process is both completely pos-itive, fully divisible [7, 8, 290], and also has a ‘Markovian’generator as required by the snapshot method [189]. How-ever, as we will see now, this process is not Markovian, sincethere are instruments that can detect memory in the process.

Now suppose we start the system in initial states ρ±(0) ∶=∣x±⟩⟨x±∣. After some time t, these states will have theform

ρ±(t) ∶=1

2

⎛⎜⎜⎝

1 ±e−γt

±e−γt 1

⎞⎟⎟⎠

with γ = g G . (211)

It is then easy to see that the trace distance between the twostates will monotonically decrease:

D[ρ+(t), ρ−(t)] ∶= 12∥ρ+(t) − ρ−(t)∥ = e−γt. (212)

This means that the non-Markovianity witness based onnon-monotonicity of the trace-distance measure, given inRef. [184], would call this a Markovian process. This is notsurprising as the process is divisible, which a stronger wit-ness for non-Markovianity than the trace distance [7, 197].This process will also be labeled as Markovian by the snap-shot approach, as the generator of the dynamics of the systemalone will always lead to CP maps. In fact, we have alreadyshown in Sec. IV D that divisibility-based witnesses will notsee any non-Markovianity in a pure dephasing process. How-ever, as we have discussed, Markovianity is a multi-time phe-nomenon that should be decided on conditional independenceof events at different points in time.

To take this argument further, let us split the process from0 → 2t into two segments: 0 → t and t → 2t. If the processis indeed Markovian then we can treat it identically in eachsegment, i.e., the dynamical map for both segments will bethe same. This fact should be independent of whether or notan instrument was performed at time t; observing a changeof the dynamics from t to 2t would thus constitute a mem-ory effect. Now, using this intuition, we show that this pro-cess, while divisible, is indeed non-Markovian. However, theusual witnesses fail to detect temporal correlations as the pro-cess only reveals non-Markovianity when an instrument at anintermediate time is applied, see Figure 28.

Suppose, we apply a single element (unitary) instrumentJ1 = X[ r] ∶= σ1( r)σ1 at time t. Doing so should notbreak the Markovianity of the process. Moreover, the processshould not change at all because the states in Eq. (211) com-mute with σ1. Thus, continuing the process to time 2t shouldcontinue to decrease the trace distance monotonically,

D[ρ+(2t), ρ−(2t)]→ exp(−γ2t). (213)

Indeed this is what happens if the instrumentX is not applied.However, when the instrument X is applied, the dynamicsin the second segment reverses the dephasing. This is mosteasily seen by the fact the total system-environment unitaryis mapped to its adjoint by X as U †

= σ1Uσ1. Concretelywe have

ρ(t) =trE[Utσ1Utρ(0)⊗ ρEU†t σ1U

†t ]

=trE[σ1U†t Utρ(0)⊗ ρEU

†t Utσ1]

=σ1ρ(0)σ1.

(214)

Above ρE = ∣ψE⟩⟨ψE∣. This calculations shows that thestate at time 2t is unitarily equivalent to the initial state of thesystem, which is in contrast to Eq. (213).

There are a few take away messages here. First, the initialstates, ρ±(0), which were monotonically moving closer toeach other during the first time segment, will begin to moveapart monotonically if the CPTP mapX is applied on the sys-tem at time t. In other words, during the second segment, theyare becoming more and more distinguishable. This meansthat the trace distance monotonically grows for a time greaterthan t (until 2t that is). Therefore, with the addition of anintermediate instrument, the process is no longer seen to beMarkovian. Indeed, if the process were Markovian then anaddition of an intermediate instrument would not break themonotonicity of the trace distance. In other words, this is

Page 54: arXiv:2012.01894v2 [quant-ph] 10 May 2021

54

breaking a data processing inequality, and therefore the pro-cess was non-Markovian from the beginning.

Second, the dynamics in the second segment are restoringthe initial state of the system, which means that the dynamicalmap in the second segment depends on the initial condition.If the process was Markovian, then the total dynamics wouldhave to have the following form

E(2t∶0) = E(2t∶t) Xt E(t∶0). (215)

However, as we saw above, this is not the same as the totaldynamics , which is simply a unitary transformation. There-fore, the process is not divisible anymore when an intermedi-ate instrument is applied. Again, if the process were Marko-vian, adding an intermediate instrument will not break thedivisibility of the process, and therefore the process was non-Markovian from the beginning.

Third, the snapshot witness [189] would not be able to at-tribute CP dynamics to the second segment if the map X isapplied at t) and thus it too would conclude that the processis non-Markovian. In fact, it is possible to construct dynam-ics that look Markovian for arbitrary times and then revealthemselves to be non-Markovian [293].

To be clear, unlike the snapshot method, the process ten-sor for the whole process will always be completely positive.Let us then write down the process tensor Υ2t,t,0 for thisprocess for three time 2t, t, 0 [248]. To do so, we first no-tice that the action of the system-environment unitary has thefollowing simple form

Ut ∣0ψE⟩ = ∣0⟩ e−igt

2x ∣ψE⟩ = ∣0⟩ut ∣ψE⟩

Ut ∣1ψE⟩ = ∣1⟩ eigt

2x ∣ψE⟩ = ∣1⟩u†

t ∣ψE⟩ ,(216)

where ut ∶= exp(−i gt2x) is a unitary operator on E alone.

Next, to construct the Choi state for this process we will‘feed’ half of two maximally entangled states into the pro-cess. That is, we prepare two maximally entangled states forthe system: ∣Φ+⟩0 ∈ HS0

⊗HS ′0and ∣Φ+⟩1 ∈ HS1

⊗HS ′1

and let the part S ′0 interact with the environment in time seg-ment one and then S

′1 in time segment two. Namely, let

U(0)t ∈ B(HS ′0

⊗HE) andU (1)t ∈ B(HS ′1

⊗HE), where theseare the interaction unitary matrices for the two segments. Wefirst write down the process tensor for the whole SE, i.e.,without the final trace on the environment (see Figure 29):

∣ΥSE2t,t,0⟩=U

(1)t U

(0)t ∣Φ+⟩1⊗∣Φ+⟩0⊗∣ψE⟩ (217)

= ∣0000⟩u2t ∣ψE⟩ + ∣0011⟩utu†

t ∣ψE⟩+ ∣1100⟩u†

tut ∣ψE⟩ + ∣1111⟩ (u†t)2 ∣ψE⟩

= ∣00u2tψE⟩ + ∣01ψE⟩ + ∣10ψE⟩ + ∣11u†2tψE⟩ ,

where we have defined ∣0⟩ ∶= ∣00⟩ ∈ HSi⊗HS ′i

and ∣1⟩ ∶=∣11⟩ ∈ HSi

⊗HS ′ifor brevity, where i ∈ 0, 1.

Combining Eq. (217) with Eq. (216) and tracing over theenvironment (i.e., Υ2t,t,0 = trE(∣ΥSE

2t,t,0⟩⟨ΥSE2t,t,0∣)), we

get the Choi state of the process in the compressed basis

∣0⟩ ∶= ∣00⟩ and ∣1⟩ ∶= ∣11⟩:

Υ2t,t,0 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 e−γt

e−γt

e−2γt

e−γt

1 1 e−2γt

e−γt

1 1 e−2γt

e−2γt

e−γt

e−γt

1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (218)

We have used the fact that

tr[ut ∣ψE⟩⟨ψE∣] = tr[u†t ∣ψE⟩⟨ψE∣] = e−γt, (219)

where again γ = g G and we have employed the explicit formof ∣ΨE⟩ provided in Eq. (208). Note that the process tensoris really a 16 × 16 matrix, but we have expressed it in thecompressed basis. In other words, all elements of the processthat are not of the form ∣jjll⟩ ⟨mmnn∣ are vanishing.

Looking at the Choi state it is clear that there are correla-tions between time steps 0 and 2. This is most easily seen bycomputing the mutual information. We can think of the pro-cess tensor in Eq. (218) as a two-qubit state, where the firstqubit represents spaces S0S

′0 and the second qubit represents

S1S′1 (see Figure 29). Moreover, the S ′0 and S ′1 spaces are the

output of the process at times t and 2t respectively. Comput-ing the mutual information information between these spacesthus gives us an idea of whether the process correlates the ini-tial and the final time. If it does, it cannot be of product form,and thus it is not Markovian. For our chosen example, themutual information between the respective spaces of interestis about 0.35 for large values of γt. Therefore it does not havethe form of Eq. (204) and the process is non-Markovian. Thisnon-Markovianity will also be detectable if causal breaks areapplied at t. However, it is not detectable by witnesses ofnon-Markovianity that are based on CP divisibility only.

For completeness, let us detail how to obtain quantumchannels from the more general object Υ2t,t,0. The quan-tum stochastic matrix from 0 → 2t can be obtained by con-tracting the process tensor with the instrument at time t:

ΥE(At)(2t∶0)

∶= trtito[Υ2t,t,0ATt ] (220)

Note that this cannot be done in the compressed basis as theinstruments live on the S ′0S

′1 spaces, i.e., one would have to

fully write out the process tensor of Eq. (218), which we leaveas an exercise to the reader. Naturally, the channel result-ing from Eq. (220) depends on the operation At (even in theMarkovian case). Applying the identity, i.e., contracting withΦ+t , which is the Choi state of the identity channel, will give

us – as expected – exactly Choi state of the dephasing channel

ΥE(It)(2t∶0)

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 e−2γt

0 0 0 0

0 0 0 0

e−2γt

0 0 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

. (221)

On the other hand, applying the Xt instrument will giveus a unitary channel, as we already know from Eq. (214):E(Xt)(2t∶0)[ρ] = σ1ρσ1.

Page 55: arXiv:2012.01894v2 [quant-ph] 10 May 2021

55

Figure 29. Choi state for the shallow pocket model. Each wire ofthe Choi state of the shallow pocket model (defined in Eq. (217) cor-responds to a different time (where t−1 and t+1 are the input and outputwire at time t1, respectively). The intermediate system-environmentunitaries Ut are given by Eq. (216). Note that, in contrast to previ-ously depicted Choi states, here, the environmental degree of free-dom E is not traced out yet.

This example shows that there are non-Markovian effectsthat can only be detected by interventions. This is not a purelyquantum phenomenon; the same can be done in the classicalsetting and this is the key distinction between stochastic pro-cesses and causal modeling, that is, between theories withand without interventions. Naturally, a similar comparisonbetween other traditional witnesses for non-Markovianity inthe quantum case, and the results obtained by means of theprocess tensor can be conducted, too; we point the interestedreader to Ref. [294] for a detailed analysis.

B. Measures of non-Markovianity for Multi-time processes

Having encountered the shortcomings of traditional wit-nesses and measures of non-Markovianity in quantum me-chanics, it is natural to construct new, more sensitive ones,based on the process tensor approach. Importantly, we al-ready know what Markovian processes ‘look like’ makingthe quantification of the deviation of a process from the setof Markovian ones a relatively straight forward endeavour.Concretely, the Choi state Υ of a quantum process translatesthe correlations between timesteps into spatial correlations. Amulti-time process is then described by a many-body densityoperator. This general description then affords the freedomto use any method for quantifying many-body correlations toquantify non-Markovianity. However, there are some natu-ral candidates, which we discuss below. We do warn thereader that there will be infinite ways of quantifying non-Markovianity, as there are infinite ways of quantifying en-tanglement and other correlations. However, there are met-rics that are natural for certain operational tasks. We empha-size that, here, we will only provide general memory mea-sures, and will not make a distinction between classical andquantum memory, the latter corresponding to entanglementin the respective splittings of the corresponding Choi ma-

Figure 30. Three-step process. Graphic provided as reference forthe considerations of Sec. VI B 1

trix [245, 295]. Also, we will only provide a cursory overviewof possible ways to quantify non-Markovian effects, and willhave to refer the reader to the references mentioned in thissection for further information. Finally, in what follows, weomit the subscripts on the process tensors, as they will all beunderstood to be many-time processes.

1. Memory bond

We begin by discussing the natural structure of quantumprocesses. One important feature of the process tensor isthat it naturally has the structure of a matrix product oper-ator (MPO) [296, 297], i.e., it can be written as a productof matrices that are ‘contracted’ on certain spaces and con-tain open indices. While this vague notion of an MPO al-ready reminds us of the action of the link product that weintroduced in Sec. V D 2 let us be more concrete and pro-vide such a matrix product operator for a simple three-stepprocess with an initial system-environment state ρ and twosystem-environment unitary maps U and V . Before we do so,we emphasize that we are not attempting to provide a generalintroduction to MPOs, but rather motivate why there usagein the context of process tensors can be very fruitful. Let usdenote the involved system spaces from 0

i to 2i and the in-

volved environment spaces by abc (see Figure 30 for refer-ence). As we have seen in our discussion of the link product,we can write the resulting process tensor Υ in terms of theChoi matrices U and V as

Υ = ρ0ia⋆ U

0o1iab⋆ trcV

1o2ibc, (222)

where we have added the respective spaces each of the ma-trices is defined on as subscripts. Using the definition of thelink product provided below Eq. (178), and recalling that itamounts to a partial transpose and trace over shared spaces,the above can be written as

Υ = trbtra[ρ0ia(U0

o1iab)Ta]trc(V1

o2ibc)Tb , (223)

where we have omitted the respective identity matrices. Notethat, in the above, the spaces with labels a, b, c are ‘con-tracted’, while the remaining spaces are untouched, such thatΥ ∈ B(Hi

0 ⊗Ho0 ⊗Hi

1 ⊗Ho1 ⊗Hi

2). This can be made moreconcrete by rewriting Eq. (223) as a product of three matrices

Page 56: arXiv:2012.01894v2 [quant-ph] 10 May 2021

56

(without any trace operations). To this end, let us set

ρ0ia=∑

ia

⟨iaia∣ ρ0ia,

U0o1iab= ∑ib,ia

⟨ibib∣(U0o1iab)Ta∣iaia⟩ ,

and V1o2ibc=∑

ib

trc(V1o2ibc)Tb ∣ibib⟩ ,

(224)

where ∣ix⟩ is an orthogonal basis of Hx. Note that each ofthe objects above now corresponds to a matrix with differentinput and output spaces, i.e., we have

ρ0ia∶ Hi

0 → Hi0 ⊗H⊗2

a ,

U0o1iab∶ Ho

0 ⊗Hi1 ⊗H⊗2

a → Ho0 ⊗Hi

1 ⊗H⊗2b ,

and V1o2ibc∶ Ho

1 ⊗Hi2 → Ho

1 ⊗Hi2 ⊗H⊗2

b

(225)

Basically, the reshapings in Eq. (224) are required such thatthe trace operations that occur in Eq. (223) are moved intomatrices and Eq. (223) can be expressed as a simple matrixproduct. Indeed, we have

Υ = ρ0ia⋅ U

0o1iab⋅ V

1o2ibc, (226)

which can be seen by direct insertion of the expressions ofEq. (224) into the above equation:

ρ0ia⋅ U

0o1iab⋅ V

1o2ibc

= ∑ia,ib,ja,jb

⟨ia∣ ⟨ia∣ ρ0ia⟨ib∣ ⟨ib∣(U0

o1iab)Ta∣ja⟩ ∣ja⟩

⋅ trc(V1o2i6c)Tb ∣jb⟩ ∣jb⟩

= ∑ia,ib

⟨ib∣[(⟨ia∣ρ0iaU

0o1iab)Ta∣ia⟩)trc(V1

o2ibc)Tb]∣ib⟩

= trbtra[ρ0ia(U0

o1iab)Ta]trc(V1

o2ibc)Tb = Υ,

(227)

where, for better orientation, we have color-coded the brasand kets that belong together. At this point, one might wonderwhy we went through the ordeal of rewriting our process ten-sor in terms of a product of matrices, above all in light of thefact that any many-body operator can be written as a matrixproduct operator. To see why this representation is meaning-ful, let us take a closer look at Eq. (226); each of the matricesthat occur has some ‘open indices’, i.e., spaces that only oneof the matrices is defined on, while each of them also sharesspaces with their respective neighbors (a is shared betweenρ and U, b is shared between U and V) that are ‘contracted’over and do not appear in the resulting Υ. These dimensionof these latter degrees of freedom is the bond dimension ofthe MPO (in our case, it would be max(d2

a, d2b)). Compar-

ing this to the circuit of Figure 30, we see that the bond di-mension directly corresponds to the dimension of the envi-ronment. Why would we go through the hustle of working

Figure 31. MPO representation of a multi-time process Υ. Un-der the assumptions that all the unitaries in the process are the same– which, for example, holds true when the times are equidistantlyspaced and the generating Hamiltonian is time-independent, then theresulting process tensor can be written as a tensor product of an ini-tial matrix ρ and matrices U, together with a final partial trace (bentorange line on the right). The degrees of freedom that they share(orange lines) are the bond dimension of the MPO, the remainingopen indeces correspond to the spaces the resulting MPO lives on.The smallest bond dimension of all possible MPO representations ofa given Υ can be taken as a measure of non-Markovianity of Υ.

out an MPO reprsentation of Υ then? The reason is twofold.Firstly, in general, we are not given the circuit representationof Υ, but only Υ itself. Then, finding a matrix product opera-tor representation gives us a good gauge for the size of the re-quired environment. On the other hand, even if we were givensome dilation of Υ, there might be other representations thatrequire a smaller environment, thus providing a representa-tion that only uses the effective environment required for thepropagation of memory. Concretely then, the bond dimensioncorresponds to the smallest environment ancilla transportingmemory that is required in order to reproduce the process ten-sor at hand. For a Markov process, naturally, the bond dimen-sion is one. While it is not necessarily straight forward to finda representation of an MPO with minimal bond dimension, byemploying methods from the field of tensor networks we cancompress the bond and give the process an efficient descrip-tion [298].

Additionally, and more importantly for the numerical anal-ysis of multi-time processes, the theory of MPOs provides alarge toolbox for the efficient representation of process ten-sors. This holds particularly true for processes that are timetranslationally invariant, i.e., each of the matrices that oc-cur in the product is the same (see Figure 31). Since aproper introduction of these techniques would require a tu-torial article of its own right (and excellent tutorials on thematter already exist, see, for example, Refs. [299, 300]), wewill not delve deeper into the theory of tensor networks andMPOs. Let us emphasize though, that their explicit use bothfor the conceptual and numerical description of open systemdynamics is a very active field of research [301–304] andthe corresponding techniques are particularly well-tailored totackle multi-time processes within the process tenor frame-work [263, 265, 267, 305]. As mentioned though, these tech-niques go beyond the scope of this tutorial, and here, wecontent ourselves with mentioning that the structure of pro-cess tensors allows for a very direct representation in termsof MPOs, which i) allows for the whole machinery devel-oped for MPOs to be used for the efficient description ofopen system dynamics, and ii) enables the interpretation of

Page 57: arXiv:2012.01894v2 [quant-ph] 10 May 2021

57

the minimal necessary bond dimension as a measure of non-Markovinanity.

2. Schatten measures

Next in our discussion of measures of non-Markovianity,we make use of the form of Markov processes given inEq. (204), where we saw that the Choi state of a Marko-vian process tensor is of tensor product form. We remindthe reader that this quantum Markov condition contains theclassical Markov condition, and any deviations from it inthe structure of Υ imply non-Markovianity. Importantly, thisstructural property of Markovian processes, and the fact thatΥ is – up to normalization – a quantum state allow for oper-ationally meaningful measures of non-Markovianity. That is,by sampling from a process we can determine if it has mem-ory or not and then also quantify this memory. For instance, ifwe want to distinguish a given non-Markovian process fromthe set of Markov processes, we can measure the distance tothe closest Markov process for a choice of metric, e.g., theSchatten p-norm,

Np ∶= minΥ(M)

∥Υ −Υ(M)∥p, (228)

where ∥X∥pp = tr(∣X∣p). Here, we are minimizing the dis-tance for a given quantum process Υ over all Markovian pro-cesses Υ

(M), which have the form of Eq. (204). Naturally,this goes to zero if and only if the given process is Marko-vian. On the other hand, to maximally differentiate betweena given process and its closest Markovian process the naturaldistance choice is the diamond norm:

N⧫ ≡1

2minΥ(M)

∥Υ −Υ(M)∥⧫, (229)

where ∥ r∥⧫ is the generalized diamond norm for pro-cesses [241, 244] and the somewhat random prefactor of 1/2is just added for consistency with the literature. Eq. (229)then gives the optimal probability to discriminate a processfrom the closest Markovian one in a single shot, given anyset of measurements together with an ancilla. The differencebetween the diamond norm and Schatten norm is that in theformer, we are allowed to use ancillas in the form of quantummemory. This is known to lead to better distinguishability, ingeneral.

Schatten norms play a central role in quantum informationtheory. Therefore, the family of non-Markovianity measuresgiven above will naturally arise in many applications. Forinstance, the diamond norm is very convenient to work withwhen studying the statistical properties of quantum stochas-tic processes [306, 307]. However, while constituting a nat-ural measure, these quantifiers of non-Markovianity have thedraw-back that they require a minimization over the whole setof Markovian processes, which makes them computationallyhard to access. This problem can be remedied by choosing adifferent metric in Eq. (228).

3. Relative entropy

We could also use any metric or pseudo-metric D that iscontractive under CP operations

N ∶= minΥ

(M)k∶0

D [Υk∶0∥ΥMarkovk∶0 ] . (230)

Here, CP contractive means that D[Φ(X)∥Φ(Y )] ≤

D[X∥Y ] for any CP map Φ on the space of generalized Choistates. A metric or pseudo-metric that is not CP contractive,may not lead to consistent measures for non-Markovianitysince, for example, it could be increased by the presence ofan independent ancillary Markov process. Here, the require-ment that D is a pseudo-metric means that it satisfies all theproperties of a distance except that it may not be symmetricin its arguments. Different quasi-distance measures will thenhave different operational interpretations for the memory. Ingeneral though, they will still be plagued by the problem ofminimization that appears in Eq. (230).

A very convenient pseudo-metric choice is the quan-tum relative entropy [38], which we already encountered inSec. IV D 4 when we discussed quantum data processing in-equalities. In order to be able to use the relative entropy, letus assume for the remainder of this section that all the pro-cess tensors we use are normalized, i.e., trΥ = 1. Besidesbeing contractive under CP maps, this pseudo-metric is veryconvenient because for any given process Υ, the closest (withrespect to the quantum relative entropy) Markovian process isstraightforwardly found by discarding the correlations. Thatis, the process made of the marginals of the given process isthe closest Markov process, such that

NR = D [Υk∶0∥Υ1−∶0+ ⊗⋯⊗Υk−∶k−1+] , (231)

where the CPTP maps Υj−∶j=1+ are the respectivemarginals (obtained via partial trace) of Υk∶0. This followsfrom the well known fact that, with respect to the quantumrelative entropy, the closest product state of a multi-partitequantum state is the one that is simply a tensor product of itsmarginals [308]. Moreover, besides alleviating the minimiza-tion problem, this measure has a clear operational interpreta-tion as a probability of confusing the given process for beingMarkovian [237]:

Pconfusion = exp−nNR (232)

where NR is the relative entropy between the given pro-cess and its marginals. Specifically, this measure quantifiesthe following: suppose a process in an experiment is non-Markovian. The employed model for the experiment is, how-ever, Markovian. The above measure is related to the proba-bility of confusing the model with the experiment after n sam-plings. If NR is large, then an experimenter will very quicklyrealize that the hypothesis is false, and the model needs up-dating, i.e., the experimenter is quick to learn that the Marko-vian model poorly approximates the (highly) non-Markovianprocess.

With this, we conclude our short presentation of measuresfor non-Markovianity in the multi-time setup. The attentive

Page 58: arXiv:2012.01894v2 [quant-ph] 10 May 2021

58

reader will have noticed that we have not touched on wit-nesses of non-Markovianity here, unlike in Sec. IV D wherewe discussed (some of the) witnesses for non-Markovianityin the two-time scenario. In principle, such witnesses canbe straightforwardly constructed. We have already done soin this tutorial, when we discussed the shallow pocket modeland its non-Markovian features. Also, experimentally imple-menting some causal breaks and checking for conditional in-dependence would provide a witness for non-Markovianity.However, to date, no experimentally used witness for non-Markovianity in the multi-time setting has crystallized, and asystematic construction of memory witnesses that are attunedto experimental requirements is subject of ongoing research.

More generally though, ‘simply’ deciding whether a pro-cess is Markovian or not seems somewhat blunt for a multi-time process. After all, it is not just of interest if there arememory effects, but what kinds of memory effects there are.At its core, this is latter question is a question of Markov or-der for quantum processes. We will thus spend the remainderof this tutorial to provide a proper definition of Markov or-der in the quantum case, as well as a non-trivial example toillustrate these considerations.

C. Quantum Markov order

The process tensor allows one to properly define Marko-vianity for quantum processes. As we have seen, though, inour discussion of the classical case, Markovian processes arenot the only possibility. Rather, they constitute the set of pro-cesses of Markov order 1 (and 0). It is then natural to ask ifMarkov order is a concept that transfers neatly to the quan-tum case as well. As we shall see, Markov order is indeed ameaningful concept for quantum processes but turns out to bea more layered phenomenon than in the classical realm. Here,we will only focus on a couple of basic aspects of quantumMarkov order. For a more in-depth discussion, see, for ex-ample, Refs. [243, 309]. Additionally, while it is possible tophrase results on quantum Markov order in terms of maps, itproves rather cumbersome which is why the following resultswill be presented exclusively in terms of Choi states.

Before turning to the quantum case, let us quickly recall(see Sec. III C 4) that for classical processes of Markov order∣M∣ = `, we had

P(F ∣M,H) = P(F ∣M), (233)

which implied that the conditional mutual informationH(F ∶H∣M) between the future F and the history H given thememory M vanished. Additionally, for classical processesof finite Markov order, there exists a recovery map RM→FM

that acts only on the memory block and allows one to recoverP(F,M,H) from P(M,H).

In the quantum case, an equation like Eq. (233) is ill-defined on its own, as the respective probabilities depend onthe instruments JF ,JM ,JH that were used at the respec-tive times to probe the process. With this in mind, we obtainan instrument-dependent definition of finite Markov order inthe quantum case [242, 243]:

Quantum Markov order. A process is said to be of quan-tum Markov ∣M∣ = ` with respect to an instrument JM , if forall possible instruments JF ,JH the relation

P(xF ∣JF ;xM ,JM ;xH ,JH)=P(xF ∣JF ;xM ,JM)(234)

is satisfied at all times in T.

Intuitively, this definition of Markov order is the same asthe classical one; once the outcomes on the memory block Mare known, the future F and the history H are independentof each other. However, here, we have to specify, what in-strument JM is used to interrogate the process on M . Impor-tantly, demanding that a process has finite Markov order at alltimes with respect to all instruments JM is much too strong arequirement, as it can only be satisfied by processes of quan-tum Markov order 0, i.e., processes where future statistics donot even depend on the previous outcome [37, 242, 243].

While seemingly a quantum trait, this instrument depen-dence of memory length is already implicitly present in theclassical case; there, we generally only consider joint prob-ability distributions that stem from sharp, non-invasive mea-surements. However, as mentioned above, even in classicalphysics, active interventions, and, as such, different probinginstruments, are possible. This, in turn, makes the standarddefinition of Markov order for classical processes inherentlyinstrument-dependent, albeit without being mentioned ex-plicitly. Indeed, there are classical processes that change theirMarkov order when the employed instruments are changed(see, e.g., Sec. VI of Ref. [243] for a more detailed discus-sion).

In the quantum case, there is no ‘standard’ instrument, andthe corresponding instrument-dependence of memory effectsis dragged into the limelight. Even the definition of Marko-vianity, i.e., Markov order 1, that we provided in Sec. VI A isan inherently instrument-dependent one; quantum processesare Markovian if and only if they do not display memory ef-fects with respect to causal breaks. However, this does notexclude memory effects to appear as soon as other instru-ments are employed (as these memory effects would be in-troduced by the instruments and not by the process itself,the instrument-dependent definition of Markovianity still cap-tures all memory that is contained in the process at hand).Just like for the definition of Markovianity, once all processtensors are classical, and all instruments consist of classicalmeasurements only, the above definition of Markov order co-incides with the classical one [242]

For generality, in what follows, the instruments on M canbe temporally correlated, i.e., they can be testers (however,for conciseness, we will call JF ,JM , and JH instrumentsin what follows). While in our above definition of quan-tum Markov order we fix the instrument JM on the memoryblock, we do not fix the instruments on the future and the his-tory, but require Eq. (234) to hold for all JF and JH . This,then, ensures, that, if there are any conditional memory ef-fects between future and history for the given instrument onthe memory, they would be picked up.

As all possible temporal correlations are contained in theprocess tensor ΥFMH that describes the process at hand,vanishing instrument-dependent quantum Markov order has

Page 59: arXiv:2012.01894v2 [quant-ph] 10 May 2021

59

Figure 32. Quantum Markov order. If a process ΥFMH has finiteMarkov order with respect to an instrument/tester JM = AxM

on the memory block, then the application of each of the elementsof JM leaves the process in a tensor product between future F andhistory H .

structural consequences for ΥFMH . In particular, let JM =

AxM be the instrument for which Eq. (234) is satisfied, and

let JF = AxF and JH = AxH

be two arbitrary instru-ments on the future and history. With this, Eq. (234) implies

tr[ΥTFMH(AxF

⊗AxM⊗AxH

)]tr[ΥT

MH(AxM⊗AxH

)]

=

∑xHtr[ΥT

FMH(AxF⊗AxM

⊗AxH)]

∑xHtr[ΥT

MH(AxM⊗AxH

)],

(235)

where the process tensor on MH is ΥMH =1dFo

trF (ΥFMH) (which, due to the causality constraints isindependent of JF ) and dF o is the dimension of all spaceslabeled by o on the future F (we already encountered thisdefinitition of reduced processes in Sec. V D 5). As therelation (235) has to hold for all conceivable instrumentsJH and JF , and all elements of the fixed instrument JM , itimplies that each element AxM

∈ JM ‘splits’ the processtensor in two independent parts, i.e.,

trM[ΥTM

FMHAxM] = ΥF ∣xM

⊗ ΥH∣xM. (236)

See Figure 32 for a graphical representation. While straight-forward, proving the above relation is somewhat tedious, andthe reader is referred to Refs. [242, 309], where a detailedderivation can be found. Here, we rather focus on its intuitivecontent and structure. Most importantly, Eq. (236) impliesthat, for any element of the fixed instrument JM , the remain-ing ‘process tensor’ on future and history does not containany correlations; put differently, if one knows the outcomeon M , the future statistics are fully independent of the past.Conversely, by insertion, it can be seen that any process ten-sor ΥFMH that satisfies Eq. (236) for some instrument JMalso satisfies Eq. (234). As an aside, we have already seenthis ‘splitting’ of the process tensor due to conditional inde-pendence when we discussed Markovian processes. Indeed,the resulting structure of Markovian processes is a particularcase of the results for Markov order presented below.

On the structural side, it can be directly seen that the termsΥF ∣xM

in Eq. (236) are proper process tensors, i.e., they

are positive and satisfy the causality constraints of Eqs. (183)and (184). Specifically, contracting ΥFMH with a positiveelement on the memory block M yields positive elements,and does not alter satisfaction of the hierarchy of trace condi-tions on the block F . This fails to generally hold true on theblock H . While still positive, the terms ΥH∣xM

do not neces-sarily have to satisfy causality constraints. However, the setΥH∣xM

forms a tester, i.e.,∑xMΥH∣xM

= ΥH is a processtensor.

Employing Eq. (236), we can derive the most general formof a process tensor ΥFMH that has finite Markov order withrespect to the instrument JM = AxM

nxM=1. To this end,without loss of generality, let us assume that all n elements ofJM are linearly independent.[310] Then, this set can be com-pleted to a full basis of the space of matrices on the memoryblock M by means of other tester elements AαM

dMαM=n+1,where dM is the dimension of the space spanned by tester el-ements on the memory block. As these two sets together forma linear basis, there exists a corresponding dual basis, whichwe denote as

∆xMnxM=1⋃∆αM

dMαM=n+1 . (237)

From this, we obtain the general form of a process tensorΥFMH with finite Markov order with respect to the instru-ment JM [243]:

ΥFMH =

n

∑xM=1

ΥF ∣xM⊗∆

∗xM

⊗ ΥH∣xM

+dM

∑αM=n+1

ΥFH∣αM⊗ ∆

∗αM

.

(238)

It can be seen directly (by insertion into Eq. (236)) that theabove ΥFMH indeed yields the correct term ΥF ∣xM

⊗ΥH∣xM

for every AxM∈ JM . Using other tester elements, like, for

example AαM, will however not yield uncorrelated elements

on FH (as the terms ΥFH∣αMdo not necessarily have to be

uncorrelated). This, basically, is just a different way of sayingthat an informationally incomplete instrument is not sufficientto fully determine the process at hand [240]. Additionally,most elements of the span of JM will not yield uncorrelatedelements, either, but rather a linear combination of uncorre-lated elements, which is generally correlated.

While remaining a meaningful concept in the quantumdomain, quantum Markov order is highly dependent on thechoice of instrument JM , and there exists a whole zoo ofprocesses that show peculiar memory properties for differ-ent kinds of instruments, like, for example, processes thatonly have finite Markov order for unitary instruments, or pro-cesses, which have finite Markov order with respect to an in-formationally complete instrument, but the conditional mu-tual information does not vanishes [242, 309].

Before providing a detailed example of a process with fi-nite quantum Markov order, let us discuss this aforemen-tioned connection between quantum Markov order and thequantum version of the conditional mutual information. Inanalogy to the classical case, one can define a quantum CMI(QCMI) for quantum states ρFMH shared between parties

Page 60: arXiv:2012.01894v2 [quant-ph] 10 May 2021

60

F,M, and H as

S(F ∶H∣M) = S(F ∣M)+S(H∣M)−S(FH∣M), (239)

where S(A∣B) ∶= S(AB) − S(B) and S(A) ∶=−tr[A log(A)] (see Sec. IV D 4) is the von-Neumann en-tropy. Quantum states with vanishing QCMI have many ap-pealing properties, like, for example, the fact that they ad-mit a block decomposition [311], as well as a CPTP recoverymap WM→FM[ρMH] = ρFMH that only acts on the blockM [312, 313]. Unlike in the classical case, the proof of thislatter property is far from trivial and a highly celebrated re-sult. States with vanishing QCMI or, equivalently, states thatcan be recovered by means of a CPTP map WM→FM arecalled quantum Markov chains [46, 47, 311, 313–317]. Im-portantly, for states with approximately vanishing QCMI, therecovery error one makes when employing a map WM→FM

can be bounded by a function of the QCMI [46, 47, 316, 317].As process tensors ΥFMH are, up to normalization, quan-

tum states, all of the aforementioned results can be usedfor the study of quantum processes with finite Markov or-der. However, the relation of quantum processes with finiteMarkov order and the QCMI of the corresponding processtensor is – unsurprisingly – more layered than in the classi-cal case. We will present some of the peculiar features herewithout proof to provide a flavor of the fascinating jungle thatis memory effects in quantum mechanics (see, for example,Refs. [242, 243, 309] for in-depth discussions).

Let us begin with a positive result. Using the repre-sentation of quantum states with vanishing QCMI providedin Ref. [311], for any process tensor ΥFMH that satisfiesS(F ∶ H∣M)ΥFMH

= 0, one can construct an instrumenton the memory block M , that blocks the memory between Hand F . Put differently, vanishing QCMI implies (instrument-dependent) finite quantum order.

However, the converse does not hold. This can already beseen from Eq. (238), where the general form of a process ten-sor with finite Markov order is provided. The occurrence ofthe second set of terms ΥFH∣αM

⊗ ∆αMimplies the exis-

tence of a wide range of correlations between H and F thatcan still persist (but not be picked up by the fixed instrumentchosen on M ), making it unlikely that the QCMI of such aprocess tensor actually vanishes. On the other hand, if theinstrument JM is informationally complete, then there is arepresentation of ΥFMH that only contains terms of the formΥF ∣xM

⊗∆xM, which looks more promising in terms of van-

ishing QCMI (in principle, such a decomposition can alsoexist when the respective tester elements are not informa-tionally complete, which is the case for classical stochasticprocesses). However, when the tester elements AxM

corre-sponding to the duals ∆xM

do not commute (which, in gen-eral they do not), then, again, the QCMI of ΥFMH does notvanish [242, 243, 309]. Nonetheless, for any process tensorof the form

ΥFMH =∑xM

ΥF ∣xM⊗∆

∗xM

⊗ ΥH∣xM, (240)

knowing the outcomes on the memory block (for the instru-ment JB = AxM

) allows one to reconstruct the full process

tensor. Concretely, using

ΥMH =trF (ΥFMH)

dF o

∶=1

dF o∑xM

cxM∆∗xM

⊗ ΥH∣xM,

(241)

where we set cxM= tr(ΥF ∣xM

) and dF o is the dimension ofall output spaces on the future block, we have

ΥFMH = dF o∑xM

c−1xM

ΥF ∣xM⊗∆

∗xM

⊗ trM(ΥMHATxM

)

=∶ WM→FM[ΥMH].(242)

, asHere, the map WM→FM appears to play the role of a re-covery map. However, as the duals ∆xM

are not neces-sarily positive,[318] WM→FM in the above equation is gener-ally not CPTP. Nonetheless, with this procedure one can thenconstruct an ansatz for a quantum process with approximateMarkov order. The crucial point being that the difference be-tween the ansatz process and the actual process can be quan-tified by relative entropy between the two [244]. Such a con-struction has applications in taming quantum non-Markovianmemory; as we stated earlier, the complexity of a process in-creases exponentially with the size of the memory. Thus con-tracting the memory, without the loss of precision, is highlydesirable. With this, we conclude our discussion of the prop-erties of quantum processes with finite Markov order. Wenow provide an explicit example of such a process.

1. Non-trivial example of quantum Markov order

Let us now consider a process, introduced in [244] and de-picted here in Figure 33, which requires leaving parts of Mattached to H and F (as we will see shortly). We label theinput/output spaces associated to each time of the process asfollows H,M,F = Hi

, Ho, L

i, L

o, R

i, R

o, F

i, wherewe have subdividedM into left L and rightR spaces. At eachtime, the system of interest comprises three qubits, and soeach Hilbert space is of the form HX = HXa

⊗HXb⊗HXc

,where X takes values for the times and a, b, c are labels forthe three qubits; whenever we refer to an individual qubit, wewill label the system appropriately, e.g., Lia refers to the aqubit of the system L

i; whenever no such label is specified,we are referring to all three qubits.

The environment first prepares the five-qubit commoncause states

∣e+⟩ = 1√2(α ∣ψ0, 00⟩ + β ∣ψ1, 11⟩) and

∣e−⟩ = 1√2(γ ∣φ0, 01⟩ + δ ∣φ1, 10⟩).

(243)

Here, we have separated the first register, which is a threequbit state, from the second, which consists of two qubits,with a comma. The first parts of the states ∣e+⟩ and ∣e−⟩ are

Page 61: arXiv:2012.01894v2 [quant-ph] 10 May 2021

61

FH L L R(abc) (abc) (abc) (abc) (abc) (abc) (abc)

Φ+3

ϕ0, ϕ1

01,10

e+

e−

00,11

ψ0, ψ1Φ+

3

e

R

e2

χ±

Ψ±

X

Z

Φ+3

μk

e1Y

HL′ R′H′

(abc) (abc) (abc) (abc) (abc) (abc)

Φ+3

ϕ0, ϕ1

01,10

μk

χ∓Ψ∓

e+

e−

00,11

ψ0, ψ1Φ+

3 X

e

e1Y

(abc)

Φ+3

e2Z

FH L L R RHL′ R′H′

Figure 33. (Quantum) Markov order network. A process withfinite quantum Markov order with parts of M kept by H and F .The top panel shows the first process, in which parts of the commoncause state ∣e+⟩ is sent to Li and ∣e−⟩ is sent to Ri. The processin the bottom panel has the recipients flipped. The process tensor isdepicted in gray, and entanglement between parties color-coded ingreen and maroon. The overall process is a probabilistic mixture ofboth scenarios. Still the process has finite Markov order because itis possible to differentiate between the scenarios by making a paritymeasurement on M .

respectively sent to Hi and F i. The second parts are senteither to Li or Ri, according to some probability distribution(see Figure 33).

Let the state input at Ho be the first halves of three max-imally entangled states⨂x∈a,b,c ∣Φ

+⟩HoxH

′ox

with ∣Φ+⟩ ∶=1√2(∣00⟩ + ∣11⟩); here, the prime denotes systems that are

fed into the process, whereas the spaces without a prime re-fer to systems kept outside of it (these maximally entangledstates are fed into the process to construct the resulting Choimatrix). The input at Lo and Ro are labeled similarly. In be-tween times Ho and Li, the process makes use of the secondpart of the state ∣e+⟩ to apply a controlled quantum chan-nel X , which acts on all three qubits a, b, c. Following this,qubits a and b are discarded. The ab qubits input at Lo, aswell as all three qubits input at Ro, are sent forward into theprocess, which applies a joint channel Y on all of these sys-tems, as well as the first part of the state ∣e−⟩. Three of theoutput qubits are sent out to F i, and the rest are discarded.The c qubit input at Lo is sent to Ri, after being subjected toa channel Z, which interacts with the first part of the commoncause state ∣e−⟩, i.e., the φ0, φ1 register.

Consider the process where ∣e+⟩ is sent to Hi and Li and∣e−⟩ to Ri and F i. The process tensor for this case is

Υ±= Ψ

±HiHoLi ⊗ χ

±LoRiRoF i with

Ψ±HiHoLi = tre1 [∣G

±⟩ ⟨G±∣] and

χ±RiRoF i = tre2 [∣K

±⟩ ⟨K±∣] ,(244)

where

∣G±⟩ = 1√2(α ∣ψ0⟩Hi ∣µ0⟩Hoe1L

ic∣00⟩Li

ab

+β ∣ψ1⟩Hi ∣µ1⟩Hoe1Lic∣11⟩Li

ab) ,

(245)

with ∣µk⟩Hoe1Lic= X

kH ′o→e1L

ic∣Φ+⟩⊗3

HoH ′o

∣K±⟩ = 1√2YL′oabR

′oe→F ie2ZkL′

oce→R

ice

× (γ ∣01⟩Riab

∣φ0⟩e + δ ∣10⟩Riab

∣φ1⟩e)

∣φ+⟩⊗3

LoL′o ∣φ+⟩⊗3

RoR′o .

(246)

Next, consider the process where ∣e+⟩ is sent to Hi andM

′i and ∣e−⟩ to Mi and F

i. The process tensor for thisscenario is

Υ∓= Ψ

∓HiHoLi

cRiab⊗ χ

∓Li

abLoRi

cRoF i with

Ψ∓HiHoLi

cRiab= tre1 [∣G

∓⟩ ⟨G∓∣] and

χ∓Li

abLoRi

cRoF i = tre2 [∣K

∓⟩ ⟨K∓∣](247)

where

∣G∓⟩ = 1√2(α ∣ψ0⟩Hi ∣µ0⟩Hoe1L

ic∣00⟩Ri

ab

+β ∣ψ1⟩Hi ∣µ1⟩Hoe1Lic∣11⟩Ri

ab)

(248)

with ∣µk⟩Hoe1Lic= X

kH ′o→e1L

ic∣Φ+⟩⊗3

HoH ′o

∣K∓⟩ = 1√2YL′oabR

′oe→F ie2ZkL′

oce→R

ice

× (γ ∣01⟩Liab

∣φ0⟩e + δ ∣10⟩Liab

∣φ1⟩e)

∣Φ+⟩⊗3

LoL′o ∣Φ+⟩⊗3

RoR′o .

(249)

In the first case, there is entanglement between Hio andLi, as well as between LoRio and F i. In the second case,

there is entanglement betweenHio and LicRiab, as well as be-

tween LiabLoRicR

o and F i. The overall process is the aver-age of these two, which will still have entanglement across thesame cuts for generic probability distributions that the com-mon cause states are sent out with.

This process has a vanishing Markov order because we canmake a parity measurement on the ab parts ofLi andRi. Theparity measurement applies two controlled phases to an an-cilla initially prepared in the state ∣+⟩, with the control regis-ters being qubits a and b. If the two control qubits are in states∣00⟩ or ∣11⟩, then ∣+⟩ ↦ ∣+⟩. However, if the control qubitsare in states ∣01⟩ or ∣10⟩, then ∣+⟩↦ ∣−⟩. By measuring thefinal ancilla, which can be perfectly distinguished since it is inone of two orthogonal states, we can know which process wehave in a given run; in either case, there are no FH correla-tions. Lastly, note that this process also has vanishing QCMI;this agrees with the analysis in Ref. [243], as the instrumentthat erases the history, comprises only orthogonal projectors.

Page 62: arXiv:2012.01894v2 [quant-ph] 10 May 2021

62

VII. CONCLUSIONS

We began this tutorial with the basics of classical stochas-tic processes by means of concrete examples. We then builtup to the formal definition of classical stochastic processes.Subsequently, we moved to quantum stochastic processes,covering the early works from half a century ago to modernmethods used to differentiate between Markovian and non-Markovian processes in the quantum domain. Our main mes-sage throughout has been to show how a formal theory ofquantum stochastic processes can be constructed based onideas akin to those used in the classical domain. The resultingtheory is general enough that it contains the theory of classi-cal stochastic processes as a limiting case. On the structuralside, we have shown that a quantum stochastic process is de-scribed by a many-body density operator (up to a normal-ization factor). This is a natural generalization for classicalprocesses which are described by joint probability distribu-tions over random variables in time. Along the way, we haveattempted to build intuition for the reader by giving severalexamples.

In particular, the examples in the last section show that,in general, quantum stochastic processes are as complex asmany-body quantum states are. However, there is beautyin the simplicity of the framework that encapsulates com-plex quantum phenomena in an overarching structure. Werestrained our discussion to Markov processes and Markovorder in the final section, but needless to say, there is muchmore to explore. Complex processes, in the quantum or clas-sical realm, will have many attributes that are of interest forfoundational or technological reasons. We cannot do justiceto many (most) of these facets of the theory in this shortmanuscript. On the plus side, there are a lot of interestingproblems left unexplored for current and future researchers.Our tutorial has barely touched the topic of quantum proba-bility, and associated techniques such as quantum trajectories,quantum stochastic calculus, and the SLH[319] framework.This is an extremely active area of research [6, 217, 268, 320–322] with many overlaps with the ideas presented here; how-ever, a detailed cross-comparison would form a whole tutorialon its own.

We now bring this article to closure by discussing someimportant open problems and some important applications ofthe theory of open quantum systems.

The vastness of the theory of classical stochastic processessuggests that there are many open problems in the quantumrealm. In this sense, it is a daunting endeavor to even at-tempt to make a list of interesting problems, and we makeno claims of comprehensiveness. On the foundational front,understanding the quantum to classical transition for stochas-tic processes [248] should be a far more manageable prob-lem than the elusive connection between pure-state unitaryquantum mechanics and phase space classical mechanics.Objectivity of quantum measurements and quantum Darwin-ism [323] are also enticing topics to reconsider from the pro-cess tensor perspective, i.e, as emergent phenomena in timerather than in space. It may also possible to better understandquantum chaos by analyzing multi-time correlations in quan-tum processes (e.g., out-of-time-ordered correlations already

discuss quantum chaos in a similar vein). The list of com-plex dynamical phenomena includes dynamical phase transi-tions [324], dynamical many-body scars [325], measurementinduced phase transition [326], and understanding memoryin complex quantum processes [306]. For all of these areas,higher order quantum maps like the process tensor provide anideal framework to foster future developments. For practicalimplementations, quantifying and witnessing entanglement intime (i.e., genuinely non-classical temporal correlations) is ofutmost importance complex experimental setups that aim toexploit quantum phenomena in time. On the mathematicalside, there are many interesting problems such as embeddingcoherent dynamics in classical processes, simulating classicalprocesses on quantum devices [191], or identifying processesthat cannot be classical. Finally, there is still much work tobe done approximating quantum processes with ansatz typeconsiderations. For instance, what are the best ways (in thesense of minimal error) to truncate quantum memories, howto quantify contextual errors due to finite pulse width for thecontrol operations. i.e., how to deal with experimental op-erations that cannot be considered to be implemented instan-taneously, as we did throughout this tutorial. The processtensor also opens up – as we aready mentioned – the wholetoolkit of tensor network to characterize, simulate, and ma-nipulate complex quantum processes and multi-time statisticsin quantum mechanics. In particular, little attention has beendevoted so far to understanding critical quantum processesfrom a multi-time perspective, i.e., to efficiently describingprocesses where the correlations decay is slow (powerlaw).

On the application side, the foremost application of thetheory of quantum stochastic processes is quantum control,e.g. dynamical decoupling [327–330] (and understandingprocesses that cannot be decoupled [331]), decoherence-freesubspaces [332, 333], quantum error correction [334], andquantum Zeno effect [335–337]. All of these are dynamicalphenomena and it remains to see how they fit into the the-ory described in this tutorial. Moreover, small quantum com-puters are now readily available, but they suffer from com-plex noise, i.e., undergo complex non-Markovian stochasticprocesses. This forms a fertile ground for the process ten-sor [215] framework to provide new conceptual insights. Ad-ditionally, there are natural systems that would also be excel-lent candidates for an application of the process tensor frame-work, for instance, control of biological systems [338]. Theyare intersting because, it is possible that these systems harnesscomplex noise to achieve efficient and quantum informationprocessing tasks [339–343]. Already, and even more so inthe future, these tools (will) enable quantum technologies inpresence of non-Markovian noise [344–346]. As we attemptto engineer more and more sophisticated quantum devices wewill need more sophistication in accounting for the noise dueto the environment. These applications will be within reachonce we can characterize the noise [347–354] and understandhow quantum processes and memory effects can serve as re-sources [355–360].

There are also foundational applications to the frameworksdiscussed above. For instance to better understand howthe theory of thermodynamics fits with the theory of quan-tum mechanics requires better handling of interventions and

Page 63: arXiv:2012.01894v2 [quant-ph] 10 May 2021

63

memory, and already there is progress on this front [361–364]. This framework also allows for a method to builda classical-quantum correspondence, i.e., determining quan-tum stochastic processes that look classical [247, 248]. Fur-thermore, it enables one to understand the statistical natureof quantum processes, i.e., when is the memory too com-plex [306, 307, 365, 366], or when does a system look as if ithas equilibrated [367, 368]? These latter questions are closelyrelated to ones aiming to derive statistical mechanics fromquantum mechanics [369–372]. In general, non-Markovianeffects in many-body systems [373–375] and complex singlebody systems [376, 377] will be of keen interest as they willcontain rich physics.

Finally, the tools introduced in the article are closely re-lated to those used to examine the role of causal order – orabsence thereof – in quantum mechanics. As they are tai-lored to account for active interventions, they are used in thefield of quantum causal modeling [239, 259, 271, 378, 379] todiscern causal relations in quantum processes. Beyond suchcausally ordered situations, the quantum comb and processmatrix framework have been employed to explore quantummechanics in the absence of global causal order [257, 380],and it has been shown that such processes would provideadvantages in information processing tasks over causally or-

dered ones [380–385]. The existence of such exotic processesis still under debate and the search for additional principles tolimit the set of ‘allowed’ causally disordered processes is anactive field of research [386]. Nonetheless, the tools to de-scribe them are – both mathematically and in spirit – akin tothe process tensors we introduced for the description of openquantum processes, demonstrating the versatility and wideapplicability of the ideas and concepts employed in this tu-torial.

ACKNOWLEDGMENTS

We thank Felix A. Pollock and Philip Taranto for valu-able discussions, and Heinz-Peter Breuer, Daniel Burgarth,Fabio Costa, Michael Hall, Susana Huelga, Jyrki Piilo, Mar-tin Plenio, Angel Rivas, Andrea Smirne, Phillipp Strasberg,Bassano Vacchini, Howard Wiseman, and Karol Zyczkowskifor helpful comments on the original manuscript. SM ac-knowledges funding from the European Union’s Horizon2020 research and innovation programme under the MarieSkłodowska Curie grant agreement No 801110, and the Aus-trian Federal Ministry of Education, Science and Research(BMBWF). KM is supported through Australian ResearchCouncil Future Fellowship FT160100073.

[1] R. Alicki and K. Lendi, Quantum Dynamical Semi-Groupsand Applications (Springer, Berlin, 1987).

[2] C. Gardiner and P. Zoller, Quantum Noise (Springer, 1991).[3] H.-P. Breuer and F. Petruccione, The Theory of Open Quan-

tum Systems (Oxford Univ. Press, 2002).[4] L. Accardi, Y. G. Lu, and I. Volovich, Quantum Theory

and Its Stochastic Limit (Springer-Verlag, Berlin Heidelberg,2002).

[5] H. M. Wiseman and G. J. Milburn, Quantum Measurementand Control (Cambridge Univ. Press, 2010).

[6] L. Bouten, R. van Handel, and M. James, SIAM J. ControlOptim. , 2199 (2007).

[7] A. Rivas, S. F. Huelga, and M. B. Plenio, Rep. Prog. Phys.77, 094001 (2014).

[8] H.-P. Breuer, E.-M. Laine, J. Piilo, and B. Vacchini, Rev.Mod. Phys. 88, 021002 (2016).

[9] I. de Vega and D. Alonso, Rev. Mod. Phys. 89, 015001 (2017).[10] L. Li, M. J. Hall, and H. M. Wiseman, Phys. Rep. 759, 1

(2018).[11] C. Li, G. Guo, and J. Piilo, EPL 128, 30001 (2020).[12] C. Li, G. Guo, and J. Piilo, EPL 127, 50001 (2019).[13] The sum of the opposite ends of a die always equals 7.[14] N. G. van Kampen, Braz. J. Phys. 28, 90 (1998).[15] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities: The-

ory of Majorization and Its Applications, 2nd ed., SpringerSeries in Statistics (Springer-Verlag, New York, 2011).

[16] J. P. Crutchfield and K. Young, Phys. Rev. Lett. 63, 105(1989).

[17] C. R. Shalizi and J. P. Crutchfield, J. Stat. Phys. 104, 817(2001).

[18] J. P. Crutchfield, Nat. Phys. 8, 17 (2012).[19] T. Tao, An Introduction to Measure Theory (American Math-

ematical Society, 2011).

[20] D. S. Lemons, An Introduction to Stochastic Processes inPhysics (Johns Hopkins University Press, Baltimore, 2002).

[21] A. N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeit-srechnung (Springer, Berlin, 1933) [Foundations of the The-ory of Probability (Chelsea, New York, 1956)].

[22] W. Feller, An Introduction to Probability Theory and Its Ap-plications (Wiley, New York, 1971).

[23] Besides consistency, the individual probability distributionsalso have to be inner regular. We will not concern ourselveswith this technicality, see Ref. [19] for more details.

[24] N. Wiener, A. Siegel, B. Rankin, and W. T. Martin, eds., Dif-ferential Space, Quantum Systems, and Prediction (The MITPress, Cambridge (MA), 1966).

[25] M. P. Levy, Am. J. Math. 62, 487 (1940).[26] Z. Ciesielski, Lectures on Brownian motion, heat conduction

and potential theory (Aarhus Universitet, Mathematisk Insti-tutt, 1966).

[27] R. Bhattacharya and E. C. Waymire, A Basic Course in Prob-ability Theory (Springer, New York, NY, 2017).

[28] N. Van Kampen, Stochastic Processes in Physics and Chem-istry (Elsevier, New York, 2011).

[29] P. Hanggi and H. Thomas, Phys. Rep. 88, 207 (1982).[30] Some authors will not call this a master equation due to its

temporal non-locality [28].[31] A. Smirne and B. Vacchini, Phys. Rev. A 82, 022110 (2010).[32] B. Vacchini, A. Smirne, E.-M. Laine, J. Piilo, and H.-P.

Breuer, New J. Phys. 13, 093004 (2011).[33] J. Cerrillo and J. Cao, Phys. Rev. Lett. 112, 110401 (2014).[34] R. Rosenbach, J. Cerrillo, S. F. Huelga, J. Cao, and M. B.

Plenio, New J. Phys. 18, 023035 (2016).[35] F. A. Pollock and K. Modi, Quantum 2, 76 (2018).[36] Due to this inequivalence of divisibility and Markovianity, the

maps Γt∶s in Eq. (38) cannot always be considered as matri-

Page 64: arXiv:2012.01894v2 [quant-ph] 10 May 2021

64

ces containing conditional probabilities P(Rt∣Rs) – as theseconditional probabilities might depend on prior measurementoutcomes – but rather as mapping from a probability distri-bution at time s to a probability distribution at time t [29? ].This breakdown of interpretation also occurs in quantum me-chanics [183]. In the Markovian case, Γt∶s indeed containsconditional probabilities.

[37] M. Capela, L. C. Celeri, K. Modi, and R. Chaves, Phys. Rev.Research 2, 013350 (2020).

[38] V. Vedral, Rev. Mod. Phys. 74, 197 (2002).[39] P. Strasberg and M. Esposito, Phys. Rev. E 99, 012120 (2019).[40] M. Polettini and M. Esposito, Phys. Rev. E 88, 012112 (2013).[41] A. Rivas, Phys. Rev. Lett. 124, 160601 (2020), publisher:

American Physical Society.[42] R. W. Yeung, A First Course in Information Theory (Infor-

mation Technology: Transmission, Processing and Storage)(Springer, 2002).

[43] R. W. Yeung, Information Theory and Network Coding (Infor-mation Technology: Transmission, Processing and Storage)(Springer, 2008).

[44] D. Janzing, D. Balduzzi, M. Grosse-Wentrup, andB. Scholkopf, Ann. Stat. 41, 2324 (2013).

[45] R. Chaves, L. Luft, and D. Gross, New J. Phys. 16, 043001(2014).

[46] O. Fawzi and R. Renner, Commun. Math. Phys. 340, 575(2015).

[47] D. Sutter, O. Fawzi, and R. Renner, Proc. Royal Soc. A 472,2186 (2016).

[48] M. A. Nielsen and I. L. Chuang, Quantum Computation andQuantum Information (Cambridge University Press, 2000).

[49] I. Bengtsson and K. Zyczkowski, Geometry of QuantumStates: An Introduction to Quantum Entanglement (Cam-bridge University Press, 2007).

[50] M. M. Wilde, Quantum information theory (Cambridge Uni-versity Press, 2013).

[51] W. Pauli, in Festschrift zum 60. Geburtstage A. Sommerfeld(Hirzel, Leipzig, 1928) p. 30.

[52] E. T. Jaynes, Phys. Rev. 106, 620 (1957).[53] S. Nakajima, Progr. Theo. Phys. 20, 948 (1958).[54] R. Zwanzig, J. Chem. Phys. 33, 1338 (1960).[55] W. E. Lamb, Phys. Rev. 134, A1429 (1964).[56] W. Weidlich and F. Haake, Z. Physik 185, 30 (1965).[57] W. Weidlich, H. Risken, and H. Haken, Z. Physik 201, 396

(1967).[58] W. Weidlich, H. Risken, and H. Haken, Z. Physik 204, 223

(1967).[59] V. Gorini, A. Kossakowski, and E. C. G. Sudarshan, J. Math.

Phys. 17, 821 (1976).[60] G. Lindblad, Commun. Math. Phys. 48, 119 (1975).[61] E. Sudarshan, P. Mathews, and J. Rau, Phys. Rev. 121, 920

(1961).[62] T. F. Jordan and E. C. G. Sudarshan, J. Math. Phys. 2, 772

(1961).[63] K. Kraus, Ann. Phys. 64, 311 (1971).[64] E. Schrodinger, Math. Proc. Cambridge Philos. Soc. 32, 446

(1936).[65] N. Gisin, Phys. Rev. Lett. 52, 1657 (1984).[66] L. P. Hughston, R. Jozsa, and W. K. Wootters, Phys. Lett. A

183, 14 (1993).[67] A. Gleason, J. Math. Mech. 6, 885 (1957).[68] P. Busch, Phys. Rev. Lett. 91, 120403 (2003).[69] B. d’Espagnat, “In Preludes in Theoretical Physics,” (North

Holland Wiley, 1966) Chap. An elementary note about ‘mix-tures’.

[70] We denote operator (and later superoperator) basis elements

with hats.[71] K. Modi, C. A. Rodrıguez-Rosario, and A. Aspuru-Guzik,

Phys. Rev. A 86, 064102 (2012).[72] S. Milz, F. A. Pollock, and K. Modi, Open Syst. Inf. Dyn. 24,

1740016 (2017).[73] I. L. Chuang and M. A. Nielsen, J. Mod. Opt. 44, 2455 (1997).[74] C. Ferrie, Phys. Rev. Lett. 113, 190404 (2014).[75] A. Kalev, R. L. Kosut, and I. H. Deutsch, npj Quantum Inf.

1, 1 (2015).[76] It is easy to construct informationally complete POVMs,

adding symmetric part is hard. For our purposes IC will besufficient.

[77] M. Neumark, Izv. Akad. Nauk SSSR Ser. Mat. 4, 53 (1940).[78] A. Peres, Quantum Theory (Kluwer Academic, 1995).[79] V. Paulsen, Completely Bounded Maps and Operator Alge-

bras (Cambridge University Press, 2003).[80] G. D’Ariano, L. Maccone, and M. Paris, Phys. Lett. A 276,

25 (2000).[81] T. F. Havel, J. Math. Phys. 44, 534 (2003).[82] A. Gilchrist, D. R. Terno, and C. J. Wood, arXiv:0911.2539

(2009).[83] One should think of the map as an abstract object, and should

be distinguished from its representation, hence we have E ver-sus E .

[84] J. Eisert and M. M. Wolf, in Quantum Information with Con-tinuous Variables of Atoms and Light (Imperial College Press,London, 2007) pp. 23–42.

[85] C. Weedbrook, S. Pirandola, R. Garcıa-Patron, N. J. Cerf,T. C. Ralph, J. H. Shapiro, and S. Lloyd, Rev. Mod. Phys.84, 621 (2012).

[86] J. F. Poyatos, J. I. Cirac, and P. Zoller, Phys. Rev. Lett. 78,390 (1997).

[87] M. Ringbauer, in Exploring Quantum Foundations with Sin-gle Photons, Springer Theses (Springer, Cham, 2017).

[88] C. J. Wood, Initialization and characterization of open quan-tum systems, Ph.D. thesis, UWSpace (2015).

[89] O. Guhne and G. Toth, Phys. Rep. 474, 1 (2009).[90] N. Friis, G. Vitagliano, M. Malik, and M. Huber, Nat. Rev.

Phys. 1, 72 (2019).[91] P. Pechukas, Phys. Rev. Lett. 73, 1060 (1994).[92] R. Alicki, Phys. Rev. Lett. 75, 3020 (1995).[93] P. Pechukas, Phys. Rev. Lett. 75, 3021 (1995).[94] T. F. Jordan, A. Shaji, and E. C. G. Sudarshan, Phys. Rev. A

70, 052110 (2004).[95] P. Stelmachovic and V. Buzek, Phys. Rev. A 64, 062106

(2001).[96] K. Zyczkowski and I. Bengtsson, Open Syst. Inf. Dyn. 11, 3

(2004).[97] K. Kraus, States, Effects, and Operations: Fundamental

Notions of Quantum Theory (Springer, Berlin, Heidelberg,1983).

[98] C. B. Mendl and M. M. Wolf, Commun. Math. Phys. 289,1057 (2009).

[99] G. M. D’Ariano and P. Lo Presti, Phys. Rev. Lett. 91, 047902(2003).

[100] J. d. Pillis, Pacific J. Math. 23, 129 (1967).[101] A. Jamiołkowski, Rep. Math. Phys. 3, 275 (1972).[102] M. D. Choi, Linear Algebra Appl. 10, 285 (1975).[103] F. Verstraete and H. Verschelde, arXiv:0202124 (2002).[104] C. J. Wood, J. D. Biamonte, and D. G. Cory, Quant. Inf.

Comp. 15, 759 (2015).[105] J. B. Altepeter, D. Branning, E. Jeffrey, T. C. Wei, P. G. Kwiat,

R. T. Thew, J. L. O’Brien, M. A. Nielsen, and A. G. White,Phys. Rev. Lett. 90, 193601 (2003).

[106] G. M. D’Ariano and P. Lo Presti, Phys. Rev. Lett. 86, 4195

Page 65: arXiv:2012.01894v2 [quant-ph] 10 May 2021

65

(2001).[107] Note that there always exists a minimal number of Kraus op-

erators, and, as such, a minimal environment dimension thatallows one to dilate the map E . Any map E can be dilated in aspace of dimension dE ≤ d

2S .

[108] F. Buscemi, G. M. D’Ariano, and M. F. Sacchi, Phys. Rev. A68, 042113 (2003).

[109] W. F. Stinespring, Proc. Amer. Math. Soc. 6, 211 (1955).[110] L. Hardy, arXiv:0101012 (2001).[111] G. M. D’Ariano, G. Chiribella, and P. Perinotti, Quantum

Theory from First Principles: An Informational Approach, 1sted. (Cambridge University Press, Cambridge, United King-dom ; New York, NY, 2017).

[112] This is a pedagogical statement, not a historical one.[113] L. D. Landau, Z. Physik 45, 430 (1927).[114] L. Diosi and W. T. Strunz, Phys. Lett. A 235, 569 (1997).[115] L. Diosi, N. Gisin, and W. T. Strunz, Phys. Rev. A 58, 1699

(1998).[116] W. T. Strunz, L. Diosi, and N. Gisin, Phys. Rev. Lett. 82,

1801 (1999).[117] T. Yu, L. Diosi, N. Gisin, and W. T. Strunz, Phys. Lett. A 265,

331 (2000).[118] H. M. Wiseman and L. Diosi, Chem. Phys. 268, 91 (2001).[119] H.-P. Breuer and J. Piilo, EPL 85, 50004 (2009).[120] H.-P. Breuer and B. Vacchini, Phys. Rev. E 79, 041147 (2009).[121] H.-P. Breuer, D. Burgarth, and F. Petruccione, Phys. Rev. B

70, 045323 (2004).[122] J. Gambetta and H. M. Wiseman, Phys. Rev. A 68, 062104

(2003).[123] H. M. Wiseman and J. M. Gambetta, Phys. Rev. Lett. 101,

140401 (2008).[124] J. Gambetta and H. M. Wiseman, Phys. Rev. A 66, 012108

(2002).[125] H. Wichterich, M. J. Henrich, H.-P. Breuer, J. Gemmer, and

M. Michel, Phys. Rev. E 76, 031115 (2007).[126] J. Cerrillo, M. Buser, and T. Brandes, Phys. Rev. B 94,

214308 (2016).[127] R. Dann, A. Levy, and R. Kosloff, Phys. Rev. A 98, 052129

(2018).[128] D. Chruscinski and A. Kossakowski, Phys. Rev. Lett. 104,

070406 (2010).[129] S. N. Filippov and D. Chruscinski, Phys. Rev. A 98, 022123

(2018).[130] G. Cohen and E. Rabani, Phys. Rev. B 84, 075150 (2011).[131] C. Sutherland, T. A. Brun, and D. A. Lidar, Phys. Rev. A 98,

042119 (2018).[132] A. Shabani and D. A. Lidar, Phys. Rev. A 71, 020101 (2005).[133] J. Piilo, S. Maniscalco, K. Harkonen, and K.-A. Suominen,

Phys. Rev. Lett. 100, 180402 (2008).[134] B. Vacchini, Sci. Rep. 10, 5592 (2020).[135] A. Smirne, M. Caiaffa, and J. Piilo, Phys. Rev. Lett. 124,

190402 (2020).[136] N. Megier, A. Smirne, and B. Vacchini, New J. Phys. 22,

083011 (2020), publisher: IOP Publishing.[137] N. Megier, A. Smirne, and B. Vacchini, Entropy 22, 796

(2020).[138] B.-H. Liu, L. Li, Y.-F. Huang, C.-F. Li, G.-C. Guo, E.-M.

Laine, H.-P. Breuer, and J. Piilo, Nat. Phys. 7, 931 (2011).[139] N. Garrido, T. Gorin, and C. Pineda, Phys. Rev. A 93, 012113

(2016).[140] Some authors would not call this a master equation and refer

to it as a memory-kernel equation. We are being a bit liberalhere.

[141] In the classical case it was an operator acting on a vector.Here, it is called a super-operator because it acts on ρ, which

is an operator on the Hilbert space.[142] D. Chruscinski and S. Pascazio, Open Sys. & Inf. Dyn. 24,

1740001 (2017).[143] D. Manzanoa, AIP Advances 10, 025106 (2020).[144] Around the same time Franke proposed a very similar equa-

tion, though he seems to be unaware of complete positivity atthat time [? ].

[145] H.-P. Breuer, B. Kappler, and F. Petruccione, Phys. Rev. A59, 1633 (1999).

[146] E. Andersson, J. D. Cresser, and M. J. W. Hall, J. Mod. Opt.54, 1695 (2007).

[147] E.-M. Laine, K. Luoma, and J. Piilo, J. Phys. B 45, 154004(2012).

[148] M. J. W. Hall, J. D. Cresser, L. Li, and E. Andersson, Phys.Rev. A 89, 042120 (2014).

[149] C. A. Rodrıguez, The theory of non-Markovian open quantumsystems, Ph.D. thesis, The University of Texas, Austin (2008).

[150] A. Rivas, A. D. K. Plato, S. F. Huelga, and M. B. Plenio, NewJ. Phys. 12, 113032 (2010).

[151] R. Hartmann and W. T. Strunz, Phys. Rev. A 101, 012103(2020).

[152] This fact notwithstanding, many master equations can be de-rived as limits of discrete time collision models [252? ? ?? ? ? ? ], both in the Markovian [? ? ? ] and the non-Markovian [182? ? ? ? ? ? ? ? ? ? ] case.

[153] J. Park and W. Band, Found. Phys. 22, 657 (1992).[154] E.-M. Laine, J. Piilo, and H.-P. Breuer, EPL 92, 60010

(2010).[155] M. Gessner and H.-P. Breuer, Phys. Rev. Lett. 107, 180402

(2011).[156] A. Smirne, D. Brivio, S. Cialdi, B. Vacchini, and M. G. A.

Paris, Phys. Rev. A 84, 032112 (2011).[157] M. Gessner and H.-P. Breuer, Phys. Rev. A 87, 042107 (2013).[158] A. Smirne, S. Cialdi, G. Anelli, M. G. A. Paris, and B. Vac-

chini, Phys. Rev. A 88, 012108 (2013).[159] M. Gessner, M. Ramm, T. Pruttivarasin, A. Buchleitner, H.-P.

Breuer, and H. Haffner, Nat. Phys. 10, 105 (2014).[160] Y. S. Weinstein, T. F. Havel, J. Emerson, N. Boulant, M. Sara-

ceno, S. Lloyd, and D. G. Cory, J. Chem. Phys. 121, 6117(2004).

[161] S. H. Myrskog, J. K. Fox, M. W. Mitchell, and A. M. Stein-berg, Phys. Rev. A 72, 013615 (2005).

[162] J. L. O’Brien, G. J. Pryde, A. Gilchrist, D. F. V. James, N. K.Langford, T. C. Ralph, and A. G. White, Phys. Rev. Lett. 93,080502 (2004).

[163] A. Shaji and E. C. G. Sudarshan, Phys. Lett. A 341, 48 (2005).[164] H. A. Carteret, D. R. Terno, and K. Zyczkowski, Phys. Rev.

A 77, 042113 (2008).[165] M. Ziman, M. Plesch, V. Buzek, and P. Stelmachovic, Phys.

Rev. A 72, 022106 (2005).[166] M. Ziman, arXiv:0603166 (2006).[167] C. A. Rodrıguez-Rosario, K. Modi, A.-m. Kuah, A. Shaji,

and E. Sudarshan, J. Phys. A 41, 205301 (2008).[168] A. Shabani and D. A. Lidar, Phys. Rev. Lett. 102, 100402

(2009).[169] K. Modi and E. C. G. Sudarshan, Phys. Rev. A 81, 052119

(2010).[170] A. Brodutch, A. Datta, K. Modi, A. Rivas, and C. A.

Rodrıguez-Rosario, Phys. Rev. A 87, 042301 (2013).[171] B. Vacchini and G. Amato, Sci. Rep. 6, 37328 (2016).[172] K. Modi, Open Sys. & Info. Dyn. 18, 253 (2011).[173] J. Bausch and T. Cubitt, Linear Algebra Appl. , 64 (2016).[174] F. Buscemi, Phys. Rev. Lett. 113, 140502 (2014).[175] F. Buscemi and N. Datta, Phys. Rev. A 93, 012101 (2016).[176] S. N. Filippov, J. Piilo, S. Maniscalco, and M. Ziman, Phys.

Page 66: arXiv:2012.01894v2 [quant-ph] 10 May 2021

66

Rev. A 96, 032111 (2017).[177] J. Bae and D. Chruscinski, Phys. Rev. Lett. 117, 050403

(2016).[178] D. Chruscinski, A. Rivas, and E. Størmer, Phys. Rev. Lett.

121, 080407 (2018).[179] F. Benatti, D. Chruscinski, and S. Filippov, Phys. Rev. A 95,

012112 (2017).[180] S. Chakraborty and D. Chruscinski, Phys. Rev. A 99, 042105

(2019).[181] F. A. Wudarski and D. Chruscinski, Phys. Rev. A 93, 042120

(2016).[182] T. Rybar, S. Filippov, M. Ziman, and V. Buzek, J. Phys. B 45,

154006 (2012).[183] S. Milz, M. S. Kim, F. A. Pollock, and K. Modi, Phys. Rev.

Lett. 123, 040401 (2019).[184] H.-P. Breuer, E.-M. Laine, and J. Piilo, Phys. Rev. Lett. 103,

210401 (2009).[185] M. M. Wolf and J. I. Cirac, Commun. Math. Phys. 279, 147

(2008).[186] D. Chruscinski and K. Siudzinska, Phys. Rev. A 94, 022118

(2016).[187] D. Davalos, M. Ziman, and C. Pineda, Quantum 3, 144

(2019).[188] D. Chruscinski and F. Mukhamedov, Phys. Rev. A 100,

052120 (2019).[189] M. M. Wolf, J. Eisert, T. S. Cubitt, and J. I. Cirac, Phys. Rev.

Lett. 101, 150402 (2008).[190] Z. Puchała, Ł. Rudnicki, and K. Zyczkowski, Phys. Lett. A

383, 2376 (2019).[191] K. Korzekwa and M. Lostaglio, arXiv:2005.02403 (2020).[192] F. Shahbeigi, D. Amaro-Alcala, Z. Puchała, and

K. Zyczkowski, arXiv:2003.12184 (2020).[193] G. Lindblad, Commun. Math. Phys. 39, 111 (1974).[194] G. Lindblad, Commun. Math. Phys. 40, 147 (1975).[195] E.-M. Laine, J. Piilo, and H.-P. Breuer, Phys. Rev. A 81,

062115 (2010).[196] A. Rivas, S. F. Huelga, and M. B. Plenio, Phys. Rev. Lett.

105, 050403 (2010).[197] D. Chruscinski, A. Kossakowski, and A. Rivas, Phys. Rev. A

83, 052128 (2011).[198] C. Addis, B. Bylicka, D. Chruscinski, and S. Maniscalco,

Phys. Rev. A 90, 052103 (2014).[199] B. Bylicka, M. Johansson, and A. Acın, Phys. Rev. Lett. 118,

120501 (2017).[200] N. Megier, D. Chruscinski, J. Piilo, and W. T. Strunz, Sci.

Rep. 7, 6379 (2017).[201] D. De Santis, M. Johansson, B. Bylicka, N. K. Bernardes, and

A. Acın, Phys. Rev. A 102, 012214 (2020).[202] We have already argued that the two are equivalent, so only

one will suffice.[203] D. Alonso and I. de Vega, Phys. Rev. Lett. 94, 200403 (2005).[204] I. de Vega and D. Alonso, Phys. Rev. A 73, 022102 (2006).[205] A. Smirne, D. Egloff, M. G. Dıaz, M. B. Plenio, and S. F.

Huelga, Quantum Sci. Technol. 4, 01LT01 (2018).[206] N. Lambert, Y.-N. Chen, Y.-C. Cheng, C.-M. Li, G.-Y. Chen,

and F. Nori, Nat. Phys. 9, 10 (2013).[207] G. S. Engel, T. R. Calhoun, E. L. Read, T.-K. Ahn, T. Mancal,

Y.-C. Cheng, R. E. Blankenship, and G. R. Fleming, Nature446, 782 (2007).

[208] M. B. Plenio and S. F. Huelga, New J. Phys. 10, 113019(2008).

[209] A. W. Chin, A. Datta, F. Caruso, S. F. Huelga, and M. B.Plenio, New J. Phys. 12, 065002 (2010).

[210] M. Mohseni, P. Rebentrost, S. Lloyd, and A. Aspuru-Guzik,J. Chem. Phys. 129, 174106 (2008).

[211] F. Caruso, A. W. Chin, A. Datta, S. F. Huelga, and M. B.Plenio, Phys. Rev. A 81, 062346 (2010).

[212] N. H. Nickerson and B. J. Brown, Quantum 3, 131 (2019).[213] R. Blume-Kohout, J. K. Gamble, E. Nielsen, K. Rudinger,

J. Mizrahi, K. Fortier, and P. Maunz, Nat. Commun. 8, 1(2017).

[214] R. Harper, S. T. Flammia, and J. J. Wallman, Nat. Phys. , 1(2020).

[215] G. A. L. White, C. D. Hill, F. A. Pollock, L. C. L. Hollenberg,and K. Modi, Nat. Comm. , 6301 (2020).

[216] D. C. McKay, A. W. Cross, C. J. Wood, and J. M. Gambetta,arXiv:2003.02354 (2020).

[217] L. Accardi, Phys. Rep. 77, 169 (1981).[218] S. Milz, F. Sakuldee, F. A. Pollock, and K. Modi, Quantum

4, 255 (2020).[219] S. Shrapnel and F. Costa, Quantum 2, 63 (2018).[220] N. Barnett and J. P. Crutchfield, J. Stat. Phys. 161, 404 (2015).[221] J. Pearl, Causality: Models, Reasoning and Inference (Cam-

bridge University Press, Cambridge, U.K.; New York, 2009).[222] A. J. Leggett and A. Garg, Phys. Rev. Lett. 54, 857 (1985).[223] C. Emary, N. Lambert, and F. Nori, Rep. Prog. Phys. 77,

016001 (2014).[224] C. Budroni, G. Fagundes, and M. Kleinmann, New J. Phys.

21, 093018 (2019), publisher: IOP Publishing.[225] G. Chiribella, G. M. D’Ariano, and P. Perinotti, Phys. Rev.

Lett. 101, 180501 (2008).[226] S. Shrapnel, F. Costa, and G. Milburn, New J. Phys. 20,

053010 (2018).[227] K. Modi, Sci. Rep. 2, 581 (2012).[228] E. Davies and J. Lewis, Commun. Math. Phys. 17, 239 (1970).[229] E. B. Davies, Quantum Theory of Open Systems (Academic

Press Inc, London; New York, 1976).[230] G. Lindblad, Commun. Math. Phys. 65, 281 (1979).[231] E. B. Davies, Commun. Math. Phys. 15, 277 (1969).[232] M. Ringbauer, C. J. Wood, K. Modi, A. Gilchrist, A. G.

White, and A. Fedrizzi, Phys. Rev. Lett. 114, 090402 (2015).[233] G. A. Paz-Silva, M. J. W. Hall, and H. M. Wiseman, Phys.

Rev. A 100, 042120 (2019).[234] The singular point state is contained in T(t∶0) and can be ob-

tained by tracing over spaces Ho0 ⊗Hi

1.[235] A. S. Holevo, Statistical Structure of Quantum Theory, Lec-

ture Notes in Physics Monographs (Springer-Verlag, BerlinHeidelberg, 2001).

[236] A. Barchielli and M. Gregoratti, Quantum Trajectories andMeasurements in Continuous Time: The Diffusive Case, Lec-ture Notes in Physics (Springer-Verlag, Berlin Heidelberg,2009).

[237] F. A. Pollock, C. Rodrıguez-Rosario, T. Frauenheim, M. Pa-ternostro, and K. Modi, Phys. Rev. A 97, 012127 (2018).

[238] F. A. Pollock, C. Rodrıguez-Rosario, T. Frauenheim, M. Pa-ternostro, and K. Modi, Phys. Rev. Lett. 120, 040405 (2018).

[239] F. Costa and S. Shrapnel, New J. Phys. 18, 063032 (2016).[240] S. Milz, F. A. Pollock, and K. Modi, Phys. Rev. A 98, 012108

(2018).[241] G. Chiribella, G. M. D’Ariano, and P. Perinotti, Phys. Rev. A

80, 022339 (2009).[242] P. Taranto, F. A. Pollock, S. Milz, M. Tomamichel, and

K. Modi, Phys. Rev. Lett. 122, 140401 (2019).[243] P. Taranto, S. Milz, F. A. Pollock, and K. Modi, Phys. Rev. A

99, 042108 (2019).[244] P. Taranto, F. A. Pollock, and K. Modi, arXiv:1907.12583

(2019).[245] C. Giarmatzi and F. Costa, arXiv:1811.03722 (2020).[246] This tensor, in general, is also a quantum comb, where the

bond represents information fed forward through an ancillary

Page 67: arXiv:2012.01894v2 [quant-ph] 10 May 2021

67

system.[247] P. Strasberg and M. G. Dıaz, Phys. Rev. A 100, 022120

(2019).[248] S. Milz, D. Egloff, P. Taranto, T. Theurer, M. B. Plenio,

A. Smirne, and S. F. Huelga, Phys. Rev. X 10, 041049 (2020).[249] G. Chiribella, G. M. D’Ariano, and P. Perinotti, EPL 83,

30004 (2008).[250] G. Chiribella, G. M. D’Ariano, and P. Perinotti, Phys. Rev.

Lett. 101, 060401 (2008).[251] D. Kretschmann and R. F. Werner, Phys. Rev. A 72, 062323

(2005).[252] F. Caruso, V. Giovannetti, C. Lupo, and S. Mancini, Rev.

Mod. Phys. 86, 1203 (2014).[253] C. Portmann, C. Matt, U. Maurer, R. Renner, and B. Tack-

mann, IEEE Trans. Inf. Theory 63, 3277 (2017).[254] L. Hardy, arXiv:1608.06940 (2016).[255] L. Hardy, Phil. Trans. R. Soc. A 370, 3385 (2012).[256] J. Cotler, C.-M. Jian, X.-L. Qi, and F. Wilczek, J. High En-

ergy Phys. 2018, 93 (2018).[257] O. Oreshkov, F. Costa, and C. Brukner, Nat. Commun. 3,

1092 (2012).[258] O. Oreshkov and C. Giarmatzi, New J. Phys. 18, 093020

(2016).[259] J.-M. A. Allen, J. Barrett, D. C. Horsman, C. M. Lee, and

R. W. Spekkens, Phys. Rev. X 7, 031021 (2017).[260] G. Gutoski and J. Watrous, in Proceedings of the thirty-ninth

annual ACM symposium on Theory of computing (ACM,2007) pp. 565–574.

[261] G. Gutoski, A. Rosmanis, and J. Sikora, Quantum 2, 89(2018).

[262] L. Accardi, A. Frigerio, and J. T. Lewis, Pub. Res. Inst. Math.Sci. 18, 97 (1982).

[263] I. A. Luchnikov, S. V. Vintskevich, and S. N. Filippov,arXiv:1801.07418 (2018).

[264] S. Shrapnel, F. Costa, and G. Milburn, Int. J. Quantum Inf.16, 1840010 (2018).

[265] I. A. Luchnikov, S. V. Vintskevich, H. Ouerdane, and S. N.Filippov, Phys. Rev. Lett. 122, 160401 (2019).

[266] I. A. Luchnikov, S. V. Vintskevich, D. A. Grigoriev, and S. N.Filippov, Phys. Rev. Lett. 124, 140502 (2020).

[267] C. Guo, K. Modi, and D. Poletti, Phys. Rev. A 102, 062414(2020).

[268] J. Combes, J. Kerckhoff, and M. Sarovar, Adv. Phys. X 2,784 (2017).

[269] G. Chiribella and D. Ebler, New J. Phys. 18, 093053 (2016).[270] T. Eggeling, D. Schlingemann, and R. F. Werner, EPL 57,

782 (2002).[271] J. Barrett, R. Lorenz, and O. Oreshkov, arXiv:1906.10726

(2020).[272] F. Sakuldee, S. Milz, F. A. Pollock, and K. Modi, J. Phys. A

51, 414014 (2018).[273] It can even be tested for if the ordering of events is not given

a priori [378].[274] J. Pearl, Causality (Oxford University Press, 2000).[275] H. Carmichael, An Open Systems Approach to Quantum Op-

tics, Lecture Notes in Physics Monographs (Springer-Verlag,Berlin, 1993).

[276] G. Guarnieri, A. Smirne, and B. Vacchini, Phys. Rev. A 90,022110 (2014).

[277] J. Morris, F. A. Pollock, and K. Modi, arXiv:1902.07980(2019).

[278] S. C. Hou, X. X. Yi, S. X. Yu, and C. H. Oh, Phys. Rev. A83, 062115 (2011).

[279] X.-M. Lu, X. Wang, and C. P. Sun, Phys. Rev. A 82, 042103(2010).

[280] L. Mazzola, C. A. Rodrıguez-Rosario, K. Modi, and M. Pa-ternostro, Phys. Rev. A 86, 010102 (2012).

[281] A. K. Rajagopal, A. R. Usha Devi, and R. W. Rendell, Phys.Rev. A 82, 042107 (2010).

[282] C. A. Rodrıguez-Rosario, K. Modi, L. Mazzola, andA. Aspuru-Guzik, EPL 99, 20010 (2012).

[283] S. Luo, S. Fu, and H. Song, Phys. Rev. A 86, 044101 (2012).[284] Z. He, H.-S. Zeng, Y. Li, Q. Wang, and C. Yao, Phys. Rev. A

96, 022106 (2017).[285] F. F. Fanchini, G. Karpat, B. Cakmak, L. K. Castelano, G. H.

Aguilar, O. J. Farıas, S. P. Walborn, P. H. S. Ribeiro, andM. C. de Oliveira, Phys. Rev. Lett. 112, 210402 (2014).

[286] B. Bylicka, D. Chruscinski, and S. Maniscalco, Sci. Rep. 4,5720 (2014).

[287] C. Pineda, T. Gorin, D. Davalos, D. A. Wisniacki, andI. Garcıa-Mata, Phys. Rev. A 93, 022117 (2016).

[288] A. R. Usha Devi, A. K. Rajagopal, and Sudha, Phys. Rev. A83, 022109 (2011).

[289] A. R. Usha Devi, A. K. Rajagopal, S. Shenoy, and R. W.Rendell, J. Quant. Info. Sci. 2, 47 (2012).

[290] D. Chruscinski and S. Maniscalco, Phys. Rev. Lett. 112,120404 (2014).

[291] G. Lindblad, (unpublished), Stockholm (1980).[292] C. Arenz, R. Hillier, M. Fraas, and D. Burgarth, Phys. Rev.

A 92, 022102 (2015).[293] D. Burgarth, P. Facchi, M. Ligabo, and D. Lonigro, Phys.

Rev. A 103, 012203 (2021).[294] Y.-Y. Hsieh, Z.-Y. Su, and H.-S. Goan, Phys. Rev. A 100,

012120 (2019).[295] S. Milz, C. Spee, Z.-P. Xu, F. A. Pollock, K. Modi, and

O. Guhne, arXiv:2011.09340 (2020).[296] F. Verstraete, J. J. Garcıa-Ripoll, and J. I. Cirac, Phys. Rev.

Lett. 93, 207204 (2004).[297] M. Zwolak and G. Vidal, Phys. Rev. Lett. 93, 207205 (2004).[298] C. Yang, F. C. Binder, V. Narasimhachar, and M. Gu, Phys.

Rev. Lett. 121, 260602 (2018).[299] J. C. Bridgeman and C. T. Chubb, J. Phys. A 50, 223001

(2017).[300] R. Orus, Ann. Phys. 349, 117 (2014).[301] J. Prior, A. W. Chin, S. F. Huelga, and M. B. Plenio, Phys.

Rev. Lett. 105, 050404 (2010).[302] F. A. Y. N. Schroder and A. W. Chin, Phys. Rev. B 93, 075105

(2016).[303] M. L. Wall, A. Safavi-Naini, and A. M. Rey, Phys. Rev. A

94, 053637 (2016).[304] A. Strathearn, P. Kirton, D. Kilda, J. Keeling, and B. W.

Lovett, Nat. Commun. 9, 3322 (2018).[305] M. R. Jørgensen and F. A. Pollock, Phys. Rev. Lett. 123,

240602 (2019).[306] P. Figueroa-Romero, K. Modi, and F. A. Pollock, Quantum

3, 136 (2019).[307] P. Figueroa-Romero, F. A. Pollock, and K. Modi, Commun.

Phys. (2021).[308] K. Modi, T. Paterek, W. Son, V. Vedral, and M. Williamson,

Phys. Rev. Lett. 104, 080501 (2010).[309] P. Taranto, Int. J. Quantum Inf. 18, 1941002 (2020).[310] If an element Axj

is linearly dependent on Axkand Ax`

,then either the conditional future and past processes are thesame for both outcomes k and ` or outcome j does yield con-ditional independence.

[311] P. Hayden, R. Jozsa, D. Petz, and A. Winter, Commun. Math.Phys. 246, 359 (2004).

[312] D. Petz, Commun. Math. Phys. 105, 123 (1986).[313] D. Petz, Rev. Math. Phys. 15, 79 (2003).[314] M. B. Ruskai, J. Math. Phys. 43, 4358 (2002).

Page 68: arXiv:2012.01894v2 [quant-ph] 10 May 2021

68

[315] B. Ibinson, N. Linden, and A. Winter, Commun. Math. Phys.277, 289 (2008).

[316] M. M. Wilde, Proc. Royal Soc. A 471, 2182 (2015).[317] D. Sutter, M. Berta, and M. Tomamichel, Commun. Math.

Phys. 352, 37 (2017).[318] The duals are, for example, all positive, if the blocking in-

strument consists of elements with Choi that are orthogonalprojectors. In this case, WM→FM is a proper recovery mapand the QCMI vanishes.

[319] S stands for a scattering matrix, L for jump operators, and Hfor Hamiltonians.

[320] K. Parthasarathy, An introduction to quantum stochastic cal-culus, Monographs in Mathematics. 85. (Basel: BirkhauserVerlag, 1992).

[321] A. J. Daley, Adv. Phys. 63, 77 (2014).[322] H. I. Nurdin, “Quantum stochastic processes and the mod-

elling of quantum noise,” in Encyclopedia of Systems andControl, edited by J. Baillieul and T. Samad (Springer Lon-don, London, 2019) pp. 1–8.

[323] W. H. Zurek, Nature Physics 5, 181 (2009).[324] M. Heyl, A. Polkovnikov, and S. Kehrein, Phys. Rev. Lett.

110, 135704 (2013).[325] C. J. Turner, A. A. Michailidis, D. A. Abanin, M. Serbyn, and

Z. Papic, Nature Physics 14, 745 (2018).[326] B. Skinner, J. Ruhman, and A. Nahum, Phys. Rev. X 9,

031009 (2019).[327] L. Viola, E. Knill, and S. Lloyd, Phys. Rev. Lett. 82, 2417

(1999).[328] L. Viola, S. Lloyd, and E. Knill, Phys. Rev. Lett. 83, 4888

(1999).[329] C. Arenz, D. Burgarth, P. Facchi, and R. Hillier, J. Math.

Phys. 59, 032203 (2018).[330] C. Addis, F. Ciccarello, M. Cascio, G. M. Palma, and S. Man-

iscalco, New J. Phys. 17, 123004 (2015).[331] D. Burgarth, P. Facchi, M. Fraas, and R. Hillier,

arXiv:1904.03627 (2020).[332] P. Zanardi, Phys. Lett. A 258, 77 (1999).[333] D. A. Lidar, I. L. Chuang, and K. B. Whaley, Phys. Rev. Lett.

81, 2594 (1998).[334] D. A. Lidar and T. A. Brun, Quantum Error Correction (Cam-

bridge University Press, 2013).[335] G. A. Paz-Silva, A. T. Rezakhani, J. M. Dominy, and D. A.

Lidar, Phys. Rev. Lett. 108, 080501 (2012).[336] J. F. Haase, P. J. Vetter, T. Unden, A. Smirne, J. Rosskopf,

B. Naydenov, A. Stacey, F. Jelezko, M. B. Plenio, and S. F.Huelga, Phys. Rev. Lett. 121, 060401 (2018).

[337] D. Burgarth, P. Facchi, H. Nakazato, S. Pascazio, andK. Yuasa, Quantum 4, 289 (2020).

[338] F. Caruso, S. Montangero, T. Calarco, S. F. Huelga, and M. B.Plenio, Phys. Rev. A 85, 042331 (2012).

[339] M. B. Plenio, S. F. Huelga, A. Beige, and P. L. Knight, Phys.Rev. A 59, 2468 (1999).

[340] F. Verstraete, M. M. Wolf, and J. I. Cirac, Nat. Phys. 5, 633(2009).

[341] F. Caruso, A. W. Chin, A. Datta, S. F. Huelga, and M. B.Plenio, J. Chem. Phys. 131, 105106 (2009).

[342] F. Caruso, S. F. Huelga, and M. B. Plenio, Phys. Rev. Lett.105, 190501 (2010).

[343] M. J. Kastoryano, M. M. Wolf, and J. Eisert, Phys. Rev. Lett.110, 110501 (2013).

[344] A. W. Chin, S. F. Huelga, and M. B. Plenio, Phys. Rev. Lett.109, 233601 (2012).

[345] M. Am-Shallem and R. Kosloff, J. Chem. Phys. 141, 044121(2014).

[346] Z.-D. Liu, Y.-N. Sun, B.-H. Liu, C.-F. Li, G.-C. Guo,

S. Hamedani Raja, H. Lyyra, and J. Piilo, Phys. Rev. A 102,062208 (2020).

[347] S. F. Huelga, A. Rivas, and M. B. Plenio, Phys. Rev. Lett.108, 160402 (2012).

[348] D. K. Burgarth, P. Facchi, V. Giovannetti, H. Nakazato,S. Pascazio, and K. Yuasa, Nat. Commun. 5, 5173 (2014).

[349] L. M. Norris, G. A. Paz-Silva, and L. Viola, Phys. Rev. Lett.116, 150503 (2016).

[350] G. A. Paz-Silva, L. M. Norris, and L. Viola, Phys. Rev. A 95,022121 (2017).

[351] P. Strasberg and M. Esposito, Phys. Rev. Lett. 121, 040601(2018).

[352] D. Chruscinski, C. Macchiavello, and S. Maniscalco, Phys.Rev. Lett. 118, 080404 (2017).

[353] C. Benedetti, M. G. A. Paris, and S. Maniscalco, Phys. Rev.A 89, 012114 (2014).

[354] P. Schindler, M. Muller, D. Nigg, J. T. Barreiro, E. A. Mar-tinez, M. Hennrich, T. Monz, S. Diehl, P. Zoller, and R. Blatt,Nat. Phys. 9, 361 (2013).

[355] B. Bylicka, D. Chruscinski, and S. Maniscalco,arXiv:1301.2585 (2013).

[356] D. Rosset, F. Buscemi, and Y.-C. Liang, Phys. Rev. X 8,021033 (2018).

[357] B. Bylicka, M. Tukiainen, D. Chruscinski, J. Piilo, andS. Maniscalco, Sci. Rep. 6, 27989 (2016).

[358] G. D. Berk, A. J. P. Garner, B. Yadin, K. Modi, and F. A.Pollock, Quantum (2021).

[359] S. Singha Roy and J. Bae, Phys. Rev. A 100, 032303 (2019).[360] P. Abiuso and V. Giovannetti, Phys. Rev. A 99, 052106

(2019).[361] P. Strasberg, Phys. Rev. E 100, 022127 (2019).[362] P. Strasberg and A. Winter, Phys. Rev. E 100, 022135 (2019).[363] P. Strasberg, Phys. Rev. Lett. 123, 180604 (2019).[364] P. Strasberg, Quantum 4, 240 (2020).[365] D. Chruscinski, A. Kossakowski, and S. Pascazio, Phys. Rev.

A 81, 032101 (2010).[366] D. Chruscinski and F. A. Wudarski, Phys. Rev. A 91, 012104

(2015).[367] L. Banchi, D. Burgarth, and M. J. Kastoryano, Phys. Rev. X

7, 041015 (2017).[368] P. Figueroa-Romero, K. Modi, and F. A. Pollock, Phys. Rev.

E 102, 032144 (2020).[369] S. Popescu, A. J. Short, and A. Winter, Nat. Phys. 2, 754

(2006).[370] N. Linden, S. Popescu, A. J. Short, and A. Winter, Phys. Rev.

E 79, 061103 (2009).[371] L. Masanes, A. J. Roncaglia, and A. Acın, Phys. Rev. E 87,

032137 (2013).[372] L. D’Alessio, Y. Kafri, A. Polkovnikov, and M. Rigol, Adv

Phys 65, 239 (2016).[373] M. B. Plenio and S. Virmani, Phys. Rev. Lett. 99, 120504

(2007).[374] M. B. Plenio and S. Virmani, New. J. Phys. 10, 043032

(2008).[375] P. Haikka, S. McEndoo, and S. Maniscalco, Phys. Rev. A 87,

012127 (2013).[376] P. Haikka, J. Goold, S. McEndoo, F. Plastina, and S. Manis-

calco, Phys. Rev. A 85, 060101 (2012).[377] F. Cosco and S. Maniscalco, Phys. Rev. A 98, 053608 (2018).[378] C. Giarmatzi and F. Costa, npj Quantum Inf. 4, 1 (2018).[379] G. Chiribella and D. Ebler, Nat. Commun. 10, 1472 (2019).[380] G. Chiribella, G. M. D’Ariano, P. Perinotti, and B. Valiron,

Phys. Rev. A 88, 022318 (2013).[381] L. M. Procopio, A. Moqanaki, M. Araujo, F. Costa,

I. Alonso Calafell, E. G. Dowd, D. R. Hamel, L. A. Rozema,

Page 69: arXiv:2012.01894v2 [quant-ph] 10 May 2021

69

v. Brukner, and P. Walther, Nat. Commun. 6, 7913 (2015).[382] G. Rubino, L. A. Rozema, A. Feix, M. Araujo, J. M. Zeuner,

L. M. Procopio, C. Brukner, and P. Walther, Sci. Adv. 3,e1602589 (2017).

[383] D. Ebler, S. Salek, and G. Chiribella, Phys. Rev. Lett. 120,120502 (2018).

[384] K. Goswami, C. Giarmatzi, M. Kewming, F. Costa, C. Bran-ciard, J. Romero, and A. G. White, Phys. Rev. Lett. 121,090503 (2018).

[385] K. Goswami, Y. Cao, G. A. Paz-Silva, J. Romero, and A. G.White, Phys. Rev. Research 2, 033292 (2020).

[386] M. Araujo, A. Feix, M. Navascues, and C. Brukner, Quantum1, 10 (2017).