Welcome message from author

This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Opportunities and challenges for quantum-assisted machine learning in near-termquantum computers

Alejandro Perdomo-Ortiz,1, 2, 3, 4, 5, ∗ Marcello Benedetti,1, 2, 4, 5 John Realpe-Gomez,1, 6, 7 and Rupak Biswas1, 8

1Quantum Artificial Intelligence Lab., NASA Ames Research Center, Moffett Field, CA 94035, USA2USRA Research Institute for Advanced Computer Science, Mountain View, CA 94043, USA

3Qubitera, LLC., Mountain View, CA 94041, USA4Department of Computer Science, University College London, WC1E 6BT London, UK

5Cambridge Quantum Computing Limited, CB2 1UB Cambridge, UK6SGT Inc., Greenbelt, MD 20770, USA

7Instituto de Matematicas Aplicadas, Universidad de Cartagena, Bolıvar 130001, Colombia8Exploration Technology Directorate, NASA Ames Research Center, Moffett Field, CA 94035, USA

(Dated: March 20, 2018)

With quantum computing technologies nearing the era of commercialization and quantumsupremacy, machine learning (ML) appears as one of the promising “killer” applications. Despitesignificant effort, there has been a disconnect between most quantum ML proposals, the needs ofML practitioners, and the capabilities of near-term quantum devices to demonstrate quantum en-hancement in the near future. In this contribution to the focus collection on “What would you dowith 1000 qubits?”, we provide concrete examples of intractable ML tasks that could be enhancedwith near-term devices. We argue that to reach this target, the focus should be on areas where MLresearchers are struggling, such as generative models in unsupervised and semi-supervised learn-ing, instead of the popular and more tractable supervised learning techniques. We also highlightthe case of classical datasets with potential quantum-like statistical correlations where quantummodels could be more suitable. We focus on hybrid quantum-classical approaches and illustratesome of the key challenges we foresee for near-term implementations. Finally, we introduce thequantum-assisted Helmholtz machine (QAHM), an attempt to use near-term quantum devices totackle high-dimensional datasets of continuous variables. Instead of using quantum computers toassist deep learning, as previous approaches do, the QAHM uses deep learning to extract a low-dimensional binary representation of data, suitable for relatively small quantum processors whichcan assist the training of an unsupervised generative model. Although we illustrate this concept on aquantum annealer, other quantum platforms could benefit as well from this hybrid quantum-classicalframework.

I. INTRODUCTION

With quantum computing technologies nearing theera of commercialization and of quantum supremacy [1],it is important to think of potential applications thatmight benefit from these devices. Machine learning (ML)stands out as a powerful statistical framework to attackproblems where exact algorithms are hard to develop.Examples of such problems include image and speechrecognition [2, 3], autonomous systems [4], medical ap-plications [5], biology [6], artificial intelligence [7], andmany others. The development of quantum algorithmsthat can assist or entirely replace the classical ML routineis an ongoing effort that has attracted a lot of interestin the scientific quantum information community [8–40].We restrict the scope of our perspective to this specificangle and refer to it hereafter as quantum-assisted ma-chine learning (QAML).

Research in this field has been focusing on tasks suchas classification [14], regression [11, 15, 18], Gaussianmodels [16], vector quantization [13], principal compo-nent analysis [17] and other strategies that are routinely

∗ Correspondence: alejandro.perdomoortiz@nasa.gov

used by ML practitioners nowadays. We do not thinkthese approaches would be of practical use in near-termquantum computers. The same reasons that make thesetechniques so popular, e.g., their scalability and algorith-mic efficiency in tackling huge datasets, make them lessappealing to become top candidates as killer applicationsin QAML with devices in the range of 100-1000 qubits.In other words, regardless of the claims about polynomialand even exponential algorithmic speedup, reaching in-teresting industrial scale applications would require mil-lions or even billions of qubits. Such an advantage is thenmoot when dealing with real-world datasets and with thequantum devices to become available in the next years inthe few thousands-of-qubits regime. As we elaborate inthis paper, only a game changer such as the new develop-ments in hybrid classical-quantum algorithms might beable to make a dent in speeding up ML tasks.

In our perspective here, we propose and emphasizethree approaches to maximize the possibility of findingkiller applications on near-term quantum computers:

(i) Focus on problems that are currently hard and in-tractable for the ML community, for example, gen-erative models in unsupervised and semi-supervisedlearning as described in Sec. II.

(ii) Focus on datasets with potentially intrinsic

arX

iv:1

708.

0975

7v2

[qu

ant-

ph]

19

Mar

201

8

2

quantum-like correlations, making quantum com-puters indispensable; these will provide the mostcompact and efficient model representation, withthe potential of a significant quantum advantageeven at the level of 50-100 qubit devices. InSec. II B we suggest the case of the cognitive sci-ences, as a research domain potentially yieldingsuch datasets.

(iii) Focus on hybrid algorithms where a quantum rou-tine is executed in the intractable step of the classi-cal ML algorithmic pipeline, as described in Sec. IIIand Sec. IV.

Each one of these tasks has its own challenges andsignificant work needs to be done towards having experi-mental implementations on available quantum hardware(see for example Refs. [20, 21]). Based on our past ex-perience in implementing QAML algorithms on existingquantum hardware, we provide here some insights intothe main challenges and opportunities in this steadilygrowing research field. Along with illustrations from ourearlier work and demonstration on quantum annealers,we attempt to provide general insights and challengesapplicable to other quantum computational paradigmssuch as the gate model.

In Sec. II we present examples of domains in ML webelieve offer viable opportunities for near-term quantumcomputers. In Sec. III we present and illustrate the chal-lenges ahead of such implementations and, whenever pos-sible, with demonstrations in real hardware. In Sec. IVwe introduce a new and flexible approach, the quantum-assisted Helmholtz machine (QAHM) [38], which has thepotential to solve many of the challenges towards a near-term implementation of QAML for real-world industrialscale datasets. In Sec. V we summarize our work.

II. OPPORTUNITIES IN QAML

A. Quantum devices for sampling applications

The majority of data being collected daily is unla-beled. Examples of unlabeled data are photos and videosuploaded to the Internet, medical imaging, tweets, au-dio recordings, financial time series, and sensor data ingeneral. Labeling is the process of data augmentationwith task-specific informative tags. But this task is of-ten expensive as it requires humans experts. It is there-fore important to design models and algorithms capa-ble of extracting information and structures from unla-beled data; this is the focus of unsupervised ML. Butwhy is this important at all? The discovery of patternsis one of the central aspects of science; scientists do notalways know a priori which patterns they should lookfor and they need unsupervised tools to extract salientspatio-temporal features from unlabeled data. In general,unsupervised techniques can learn useful representationsof high-dimensional data, that have desirable properties

such as simplicity and sparsity [41]. When used in con-junction with supervised techniques such as regressorsand classifiers, unsupervised tools can substantially re-duce the amount of labeled data required to achieve adesired generalization performance [42]. Connections toreinforcement learning and more specific applications arepointed out in Ref. [43] and references therein.

An unsupervised approach that learns the joint prob-ability of all the variables involved in a problem is oftencalled a generative model. The name comes from thepossibility of inferring any marginal and conditional dis-tribution, which in turns provide a way to generate newdata resembling the training set. By following the tax-onomy introduced in Ref. [43], we distinguish generativemodels as either explicit or implicit density estimators.Explicit density estimators are a large family of modelswhich include Boltzmann machines, belief networks, andautoencoders. Depending on the characteristics of themodel, they can be learned by variational approxima-tions [44–48], Monte Carlo sampling [49–52], or a combi-nation of both [53]. Implicit density estimators achievethe generative task indirectly, for example, by casting theproblem into a classification task [54] or finding the Nashequilibrium of a game [55].

Interestingly, generative models with many layers ofunobserved stochastic variables (also called hidden vari-ables) have the ability to learn multi-modal distribu-tions over high-dimensional data [56]. Each additionallayer provides an increasingly abstract representation ofthe data and improves the generalization capability ofthe model [50]. These models are usually representedas graphs of stochastic nodes where edges may be di-rected, undirected, or both. Unfortunately, exact infer-ence is intractable in all but the most trivial topolo-gies [57]. In practice, learning the parameters of themodel is even more demanding as it requires to carryout the intractable inference step for each observed datapoint and for each learning iteration. The computationalbottleneck comes from computing expectation values ofseveral quantities under the complex distribution imple-mented by the model. Classically, this is approachedby Markov chain Monte Carlo (MCMC) techniques that,unfortunately, are known to suffer from the slow-mixingproblem [58]. It is indeed difficult for the Markov chainto jump from one mode of the distribution to anotherwhen these are separated by low-density regions of rele-vant size.

The capability of quantum computers to efficiently pre-pare and sample certain probability distributions has at-tracted the interest of the ML community [10, 59]. In thiswork, we focus on the use of quantum devices to sampleclassical [60] or quantum [19, 61] Gibbs distributions asan alternative to MCMC methods. This approach tosampling holds promise for more efficient inference andtraining in classical and quantum generative models, e.g.,classical and quantum Boltzmann machines [25]. Onceagain though, the small number of qubits and the lim-itations of currently available hardware may impair the

3

Probabilistic programming

Potential impact across social and natural sciences, engineering, and more

Hypothesis: intractable sampling problems enhanced by quantum sampling

Deeplearning Others...Bayesian

inference

FIG. 1. Sampling applications in ML as an opportunity forquantum computers: Quantum devices have the potentialto sample efficiently from complex probability distributions.This task is a computationally intractable step in many MLframeworks, and could have a significant impact on scienceand engineering.

sampling process making it useless for real ML applica-tions. In this perspective, we argue that even noisy distri-butions could be used for generative modeling of real-lifedatasets. This requires to work in settings where theoperations implemented in hardware are only partiallyknown. We call this scenario a gray-box. We also arguethat hybrid classical-quantum architectures are suitablefor near-term applications where the classical part is usedto bypass some of the limitations of the quantum hard-ware. We call this approach quantum-assisted.

As highlighted by ML experts [62], it is expected thatunsupervised learning will become far more importantthan purely supervised learning in the longer term. Morespecifically, most earlier work in generative models indeep learning relied on the computationally costly stepof MCMC, making it hard to scale to large datasets.We believe this represents an opportunity for quantumcomputers, and has been the motivation of our previouswork [20, 21] and of the new developments presented inSec. IV.

We discussed how inference and learning in graphicalmodels can benefit by more efficient sampling techniques.As illustrated in Figure 1, sampling is at the core of otherleading-edge domains as well, such as probabilistic pro-gramming (see Ref. [63] for an example of application).If quantum computers can be shown to significantly out-perform classical sampling techniques, we expect to seea strong impact across science and engineering.

B. Datasets with native quantum-like correlations

Recently, Google demonstrated that quantum com-puters with as few as 50 qubits can attain quantumsupremacy, although in a task with no obvious applica-tions [64]. A highly relevant question then is: Whichtype of real-life applications could benefit from quan-

tum supremacy with near-term small devices? One ofthe main motivations underlying the research efforts de-scribed here is that quantum computers could speed upML algorithms. However, they could improve other as-pects of ML and artificial intelligence (AI) as well. Re-cent work shows that quantum mechanics can providemore parsimonious models of stochastic processes thanclassical models [65–67], as quantified by an entropicmeasure of complexity. This suggests that quantum mod-els hold the potential to substantially reduce the amountof other type of computational resources, e.g. mem-ory [66, 67], required to model a given dataset.

This invites us to pose the following challenge: Identifyreal-life datasets, unrelated to quantum physics, wherequantum models are substantially simpler than classi-cal models, as quantified by standard model comparisontechniques such as the Akaike information criterion [68].In a sense, this is a form of quantum supremacy. Whiledatasets generated in experiments with quantum systemswould be an obvious fit, the challenge is to find such datasets elsewhere. Cognitive sciences may offer some poten-tial candidates, as we will describe below.

To avoid potential misunderstandings, let us considerfirst the example of statistical mechanics, which was de-veloped in the 19th century to describe physical systemswith many particles. Although statistical mechanics waslong thought to be specific to physics only, we know to-day that certain aspects of it can be derived from verygeneral information-theoretical principles. For instance,the structure of the Boltzmann distribution can be de-rived from the maximum entropy principle of informa-tion theory [69]. The tools of statistical mechanics havebecome valuable to study phenomena as complex as hu-man behavior [70, 71], develop record-performance algo-rithms [72, 73], among many other interdisciplinary ap-plications. Indeed, statistical mechanics has played, andcontinue to play, a relevant role in the development ofML, the Boltzmann machine being an important exam-ple.

In a similar vein, decades of research in quantum foun-dations and quantum information have allowed the iden-tification of certain features, like interference, entangle-ment, contextuality, among others, that are encoded nat-urally by quantum systems [74, 75]. Moreover, thereis increasing evidence suggesting that quantum modelscould be a valuable mathematical tool to study certainpuzzling phenomena in the cognitive sciences (e.g., seeRefs. [76–78] and references therein).

Consider, for instance, the following experiment [79]:A participant is asked to play a game where she can ei-ther win $200 or lose $100 with equal probability, andafterwards she is given the choice of whether or not toplay the same gamble again. The experimentalist de-cides whether or not to tell the participant the result ofthe first gamble, i.e., whether she won or lost. Resultsshowed that although participants typically preferred toplay the second gamble when they knew the outcomeof the first gamble, regardless of whether they won or

4

lost, they typically preferred not to play if they did nothave such information. More specifically, the participantschoose to play the second gamble (G) with probabilitiesP (G|W ) = 0.69 or P (G|L) = 0.59, respectively, if theyknew that they won (W) or lost (L) the first gamble, andwith probability P (G) = 0.39 if they did not have such in-formation. These results violate the law of total probabil-ity P (G) = P (G|W )P (W ) + P (G|L)P (L), regardless ofthe values of the marginal probabilities P (W ) and P (L).By interpreting this as an interference phenomenon, theauthors have managed to fit the experimental results us-ing a quantum model substantially more parsimoniousthan alternative classical models, as quantified by stan-dard Bayesian model comparison techniques.

Another general and unexpected prediction is con-cerned with the effects of the order of questions [80].Asking a yes-no question to a human subject can cre-ate a context that affects the answer to a second yes-noquestion. So the probability of, say, answering yes toboth questions depends on which order the questions areasked. For instance, in a 1997 poll in USA, 501 respon-dents were first asked the question: ‘Do you generallythink Bill Clinton is honest and trustworthy?’ After-wards, they were asked the same question about Al Gore.Other 501 respondents were asked the same two ques-tions, but in reverse order. The number of respondentsanswering ‘yes’ to both questions significantly increasedwhen Al Gore was judged first. A quantum model forthis phenomenon is based on the assumption that therespondent’s initial belief regarding the idea of ‘honestand trustworthy’ can be encoded using a quantum stateρ. The two possible answers, yes (Y) or no (N), to thequestion about Clinton or Gore are represented by basis{|Y 〉C , |N〉C} or {|Y 〉G, |N〉G}, respectively, with respectto which measurements are performed. If the projec-tors associated to Clinton and Gore do not commute, wehave order effects. A general parameter-free equality canbe derived from these quantum models for which exper-imental support has been found in about 70 surveys ofabout 1000 people each, and two experiments with 100people [80]. Quantum computers may allow for a comple-mentary experimental exploration of these ideas at largerscales, now that experiments with hundreds of people arebecoming more common [80–82].

A report [83] from the White House in 2016 reads “itis unlikely that machines will exhibit broadly-applicableintelligence comparable to or exceeding that of humansin the next 20 years”. Some of the most relevant reasonsare technical. Recent work [75] suggests why quantummodels may be useful in the cognitive sciences and, if so,it may offer new approaches to tackle some of the techni-cal hurdles in AI. The 20-year span predicted above maybe in sync with the advent of more powerful and moreportable quantum computing technologies. The poten-tial returns may be high.

Although we emphasized the case of cognitive sciences,it would be interesting to explore what other relevant andcommercial datasets exhibit quantum-like correlations,

and where quantum computers can have an advantageeven at the level of 50-100 qubits. In general, the iden-tification of characteristics that are intrinsically quan-tum, and therefore hard to simulate classically, could bea game changer in the landscape of applications for near-term quantum technologies. Rather than trying to catchup with mature classical technologies in a competition forsupremacy, this may offer the opportunity for quantumtechnologies to create their own unique niche of commer-cial applications, thereby becoming indispensable.

III. CHALLENGES IN QAML

Near-term implementation of QAML algorithms thatcan compete with state-of-the-art classical ML algo-rithms most likely will not come from the quantum ver-sion of popular ML algorithms (see e.g., [11, 13–18]). Asmentioned in Ref. [22], it would be difficult for these algo-rithms to preserve the speedup claimed since they inheritthe limitations of the Harrow-Hassidim-Lloyd (HHL) al-gorithm [84]. Here, we raise the bar even higher as ourattention goes to the implementation of algorithms withpotential quantum advantage in near-term quantum de-vices.

As pointed out in Sec. II, problem selection is key. Forexample, consider the recent work in quantum recom-mendation systems [27]. The authors developed a customdata structure along with a quantum algorithm for ma-trix sampling that has polylogarithmic complexity in thematrix dimensions. The result is a quantum recommen-dation system, and a proposal to circumvents most of therelevant limitations in the HHL algorithm. Because theinput size is extremely large (e.g., number of users timesnumber of products), the algorithm promises to signifi-cantly speedup the task compared to currently employedML approaches that require polynomial time. For thevery same reasons, however, millions of qubits would beneeded to handle datasets where state-of-the-art classicalML starts to struggle. We do not expect such devices toappear in the next decade. Instead, our attention goes tohybrid quantum-classical algorithms where conventionalcomputers are used in the tractable subroutines of thealgorithms and quantum computers assist only in theintractable steps. Fig. 2 illustrates an example of thisconcept for the case of ML tasks.

There are several challenges which will generally im-pact any QAML algorithm, such as the limited qubitconnectivity, the finite dynamic range of the parametersdictated by the intrinsic energy scale of the interactionsin the device, and intrinsic noise in the device leadingto decoherence in the qubits and uncertainty in the pro-grammable parameters. We now emphasize some of thesepractical challenges, with particular attention to execu-tion and implementation of hybrid QAML algorithms innear-term devices.

5

LEARNING

Stochastic gradient descent

Θt+1 = Θt + G [ P(s|Θt) ]

PREDICTIONS

F [ P(s|Θt) ]

HARD TO COMPUTE

Estimation assisted by sampling from quantum computer

DATA

s = {s1,…, sD}

FIG. 2. General scheme for hybrid quantum-classical algo-rithms as one of the most promising research directions todemonstrate quantum enhancement in ML tasks. A data setdrives the fine tuning of model’s parameters. In the case ofgenerative models one can use stochastic gradient descent toupdate the parameters Θ from time t to t + 1. The updatesoften require estimation of an intractable function G whichcould be approximated by samples from a probability distri-bution P (s|Θt). This computationally hard sampling stepcould be assisted by a quantum computer. In some cases,making predictions out of the trained model is also an in-tractable task. The predictions F could be approximated bysamples with the assistance of a quantum computer as well.Examples of such hybrid approaches are illustrated further inFigs. 3 and 4.

A. Issue of classical and quantum modelcompatibility

Essential to a hybrid approach is the need to havea flow of information between the classical preprocess-ing and the quantum experiments. The possibility ofsharing information back and forth between the differ-ent architectures might pose a significant challenge, aris-ing from the need to match samples from the classicaland the quantum model. That is the case, for exam-ple, when assisting the training of restricted Boltzmannmachines (RBMs) or deep belief networks (DBNs) withquantum devices [20, 23]. There, updates of the param-eters are performed by a stochastic gradient descent al-gorithm that requires two key components. Due to thebipartite structure of RBMs, the first component, alsoknown as the “positive phase”, can be estimated very ef-ficiently with conventional sampling techniques, while thesecond component, also known as the “negative phase”,is in general intractable and can be assisted with quan-tum sampling. Since the two terms are subtracted inthe same equation and originate from the same model,it is important to have control and to match all the pa-rameters defining both classical and quantum probabilitydistributions.

A challenge here is related to the temperature of the

Gibbs distribution of the model we are sampling from.For simulations in conventional computers, it is irrele-vant to explicitly specify the temperature. Since it isa multiplicative factor, we could set it to 1 and ignoreit altogether. In the case of an experimental physicaldevice, such as a quantum annealer, we cannot neglectthe temperature because (i) it is not under our control;and (ii) it is determined by many factors. These includenot only the operational temperature of the device, butalso the details of the quantum dynamics. The lack ofknowledge of this parameter implies that the communi-cation between classical and quantum components of ourhybrid algorithm is broken. In previous work [20], weshowed the significant difference in performance that canarise from tackling this challenge. As shown in Fig. 3(a),a proper estimation of the temperature also allows us touse restart techniques. For instance, the learning pro-cess can initially be carried out on a classical computerbecause the model parameters may be below the noiselevel and precision of the quantum device. Then, whendesired, the quantum computer can be called to continuethe process.

The challenge can also be addressed at hardware levelby designing devices that can prepare a class of quantumGibbs states at will. However, this strategy might openup other difficulties; some of these are detailed below.

B. Robustness to noise in programmableparameters of the quantum device

Preparing quantum Gibbs states with a quantum an-nealer or with a universal gate model quantum computeris not a straightforward task, given the intrinsic noisein the programmable parameters. In the case of quan-tum annealers, freezing on quantum distributions or dy-namical effects can lead to non-equilibrium distributionsaway from Gibbs [60, 88]. This is one of the challengestowards scaling the approaches in Refs. [20, 23] wherethe training of RBMs and DBNs requires reliable sam-ples from a classical Boltzmann distribution. Further-more, any quantum computing architecture, with intrin-sic noise in the programmable parameters, can alwayslead to deviations from the desired target state. A pro-posed solution [60, 88] was to use samples obtained froma quantum device to seed classical Gibbs samplers (e.g.,MCMC). While this is a promising approach, one of itsdrawbacks is that it forces the model to have a specificform, even if the quantum device has a genuinely differ-ent structure. If the quantum distribution prepared bythe device is far from the one assumed, most of the bur-den would lie on the classical post-processing. It is notclear how much of the method’s efficiency is given by thequality of the samples from the quantum devices and howmuch is achieved by the post-processing steps.

Interestingly, gray-box models such as the fully-visibleBoltzmann machine (FVBM) studied in Refs. [21, 25, 60]and illustrated in Fig. 3 (b) may be a near-term solution

6

Hidden variables

(b) FVBM-based hybrid model

Visible variables

(a) RBM-based hybrid modelTest set Corrupted Restored

FIG. 3. Examples of QAML implementations of different probabilistic graphical models, illustrating some of the challengesin near-term devices. (a) Restricted Boltzmann machines (RBM) are key components in deep learning approaches such asdeep belief networks. Training of these architectures could be significantly improved if one were able to sample from the jointprobability distribution of visible and hidden variables, a computationally intractable step usually performed with approximateMarkov chain Monte Carlo approaches such as Contrastive Divergence (CD) [85]. Although a quantum annealer can be used togenerate such samples, the results need to be combined with those obtained from the classical ML pipeline component withinthe hybrid approach. This classical-quantum model compatibility challenge (Sec. III A) is illustrated in the experimental resultson the right. Lav is the average log-likelihood and represents the performance metric; the higher its value, the better themodel is expected to represent the training dataset. As demonstrated in Ref. [20], estimating the instance-dependent effectivetemperature, Teff , is key in this QAML approach. In all three lines, the first 250 iterations are done with the cheapest andwidely used version of CD (denoted as CD-1). The line with open circles represents the case where all 500 iterations in thetraining are performed with CD-1, and is used as a baseline comparison for our QAML approach. Using our quantum-assistedlearning (QuALe) algorithm where Teff is estimated at each learning iteration (crosses), we can restart from the point where theclassical CD-1 left off and improve the performance of the model with respect to the baseline. Assuming instead that Teff is thephysical temperature of the device, TDW2X = 0.033 (triangles), such restart technique fails. (b) Visible-only generative modelsare often used either because of their tractability [43] or because of the convexity of the associated optimization problem, asin fully-visible Boltzmann machines (FVBM). In the latter, convexity does not mean tractability; exact learning would stillrequire computation of the partition function which is intractable for nontrivial topologies. Even though there exist fast [86] andconsistent [87] approximations to the required gradients, we here consider quantum annealing as an alternative tool to samplefrom nontrivial topologies. In Ref. [21], we implemented and trained a hybrid QAML model by introducing a gray-box model(see Sec. III B) expected to be robust to noise in the programmable parameters and to deviations from the desired Boltzmanndistribution, for example, due to non-equilibrium effects in the quantum dynamics. On the right, we show the capabilities ofthe generative model; two test datasets (leftmost column) are corrupted with different type of noise (red pixels, central column)and then restored on the quantum annealer (rightmost column).

to these issues. In a gray-box model, although we as-sume that samples come from a Gibbs-like distribution,we work directly at the level of the first and second mo-ment statistics, without complete knowledge of the actualparameters that were implemented. Therefore, the em-phasis is on the quality of the samples obtained from thequantum device, and their closeness to the data. This hasthe potential to increase the resilience against perturba-tions in the programmable parameters. In fact, as longas the estimated gradients for stochastic gradient descenthave a positive projection on the direction of the true gra-dients, the model moves towards the optimum. In caseswhere this is not so, it might still be possible to designalgorithms that mix model-based and model-free infor-mation in a suitable way. For instance, a proxy couldcheck whether the estimated gradients actually projecton the correct direction; if not, then the system can movein the opposite direction.

Gray-box models with hidden variables could exploitall the available resources from the quantum device, while

coping with its intrinsic noise and parameter misspecifi-cation. As an example, we used a gray-box approach toimplement a quantum-assisted Helmholtz machine wherethe hidden variables are sampled from a D-Wave device;the framework is discussed in Sec. IV and described indetail in Ref. [38]. A caveat of the gray-box approach isthat the final model is inevitably tailored for the quan-tum device used during the training. That is, any timewe want to perform ML tasks of interest, such as recon-struction or generation of new images as shown in Fig. 3,we will need to use the same quantum device.

C. The curse of limited connectivity

The basic principle behind this challenge is that phys-ical interactions are local in nature. Although engineer-ing advances can push the degree of qubit connectivityin quantum annealers or gate model quantum comput-ers, required qubit-to-qubit interactions not available in

7

the device will cost an overhead in the computational re-sources. In the case of quantum annealers, a standard so-lution is to produce an embedding of the logical problemof interest into the physical layout, therefore increasingthe number of required qubits. In the case of a gate modelquantum computer, the overhead comes in the numberof swaps required to make distant qubits talk to eachother [89]. In any architecture, this compilation require-ment needs to be considered in the algorithmic design.

In the case of quantum annealers, achieving the topo-logical connectivity of the desired model is only half ofthe challenge. Another significant challenge is the prob-lem of parameter setting associated with the additionalinteractions present in the embedded model. In otherwords, how does one set the new parameters such thatthe embedded model accurately represents the intendedmodel? There are no known optimal solutions, althoughheuristic strategies have been proposed [90–92]. How-ever, in the type of ML applications we are considering,there is a way out. The main goal in the training phaseof a ML algorithm is to find the optimal parameters thatminimize a certain performance metric, suggesting thatML itself is a parameter setting procedure. This is pre-cisely the demonstration in Ref. [21], where we show howto train models with arbitrary pairwise connectivity. Inthis case, the difficult task is not the embedding, whichcan be readily obtained by known heuristics, but ratherthe training of the whole device, implicitly solving theparameter setting problem.

To summarize, emphasis has been given to the embed-ding problem and to the mapping of the logical model ofinterest into physical hardware. An equally or even moreimportant challenge, is to determine how to set the pa-rameters, including those associated to the embedding,such that the device samples from the desired distribu-tion. This combined problem is what we call the curse oflimited connectivity. In the case of gate model quantumcomputer, Ref. [93] presents an illustrative experimentalstudy, highlighting this challenge. It presents a compar-ison and analysis of the trade-off between connectivityand quality of computation due to the aforementionedoverhead in computational resources. A similar trade-off would need to be taken into consideration in imple-mentations of QAML algorithms in near-term gate modelquantum computers.

D. Representation of complex ML datasets intonear-term devices

Quantum information does not have to be encoded intobinary observables (qubits), it could also be encoded intocontinuous observables [94]. There has been work inquantum ML that follows the latter direction [95, 96].However, most available quantum computers do workwith qubits, nicely resembling the world of classical com-putation. Datasets commonly found in industrial appli-cations have a large number of variables that are not

binary. For instance, images may have millions of pixels,where each pixel is a 3-dimensional vector and each entryof the vector is a number specifying the intensity of color.We refer to this kind of datasets as complex ML datasets.A naive binarization of the data will quickly consume thequbits of any device with 100-1000 qubits. Many QAMLalgorithms [11, 14, 18] rely on amplitude encoding in-stead, a technique where continuous data is stored in theamplitudes of a quantum state. This provides an expo-nentially efficient representation upon which one couldperform linear algebra operations. Unfortunately, it isnot clear how one could prepare arbitrary states of thiskind on near-term quantum devices. Even reading outall the amplitudes of an output vector might kill or sig-nificantly hamper any speedup [22].

In this perspective, we argue that near-term QAML al-gorithms should rather aim at encoding continuous vari-ables stochastically into abstract binary representations,a strategy we refer to as semantic binarization. In thisapproach, we use quantum states that can be preparedby near-term devices for sampling from unique quantumprobability distributions that may capture correlationshard to model with conventional classical ML models. Inthe context of quantum annealers, such design may al-low sampling from non-trivial graph topologies that areusually avoided in favor of restricted ones; for example,bipartite graphs are favored in classical neural networksfor convenience.

One way to obtain such an abstract representation isto use hybrid approaches where visible variables v arelogically implemented by classical hardware, and hiddenvariables u are physically implemented by quantum hard-ware. However, this idea comes with further challenges.First, the issue of model compatibility described abovewould apply. Second, sampling hidden variables u fromthe posterior distribution P (u|v) may be highly prob-lematic because the preparation of arbitrary quantumstates is an open challenge. Finally, we might have tosample a binarization for each data point and that wouldbe impractical for near-term quantum computers. Forinstance, the standard ML dataset of handwritten digitsModified National Institute of Standards and Technology(MNIST) is composed of 60 000 training points, hencewe would need to program the quantum device at least60 000 times.

As we will see in the next section, deep learning mayprovide solutions to these challenges. We propose a newhybrid quantum-classical paradigm where the objectiveis to tackle most of the issues discussed here and, at thesame time, to deal with complex ML datasets.

IV. THE QUANTUM-ASSISTED HELMHOLTZMACHINE

The quantum-assisted Helmholtz machine (QAHM) isa framework for hybrid quantum-classical ML with the

8

potential of coping with real-world datasets. We alreadypointed out some of the challenges in developing near-term QAML capable of competing with conventional MLpipelines. Most importantly, the encoding of continuousvariables, the limited number of variables, the need toprepare and measure quantum states for each data point.Here we show how some early ideas from the deep learn-ing community can help avoid some of these difficulties.Details about the formalism and preliminary implemen-tation of QAHM can be found in Ref. [38].

Consider a dataset and the task of modelingits empirical distribution with a generative modelP (v) =

∑u P (v|u)PQC(u) (see Sec. II A for a brief in-

troduction). Here v are the visible variables that repre-sent the data and u are unobservable or hidden variablesthat serve to capture non-trivial correlations. It is com-mon to use binary valued stochastic hidden variables asthey can express the presence or absence of features inthe data. The set of hidden variables can be partitionedinto a sequence of layers that encode increasingly abstractfeatures. In other words, P (v) is a deep neural network,called generator network.

Here we suggest using a quantum device to modelthe most abstract representation of the data, that is,the deepest layers of the generator network. The sam-ples obtained from a quantum device are described bythe diagonal elements of a parametrized density ma-trix, PQC(u) = 〈u|ρ|u〉. As an example, the densitycould be parametrized by a quantum Gibbs distributionρ = e−βH/Z, where H is the Hamiltonian implementedin quantum hardware and Z is the corresponding par-tition function. The conditional distribution P (v|u) isthen a classical neural network that transforms samplesfrom the quantum device into samples with the samestructure of those in the dataset. Hence, visible variablesv could be continuous variables, discrete variables, orother objects, effectively tackling the challenge of repre-senting complex data (see Sec. III D). Because the quan-tum device works on a lower dimensional binary repre-sentation of the data, this model is also able to handledatasets whose dimensionality is much larger than thenumber of qubits available in hardware.

Typically, learning algorithms for generative modelsattempt to maximize the average likelihood of the data.As pointed out in Sec. II A, this is not feasible in modelswith multiple layers of discrete hidden variables as wewould need to sample from the intractable posterior dis-tribution P (u|v). A Helmholtz machine [97, 98] consistsof a generator network along with a recognition networkQ(u|v) that learns to approximate P (u|v). This is akey approach behind many variational learning and im-portance sampling algorithms employed nowadays [44–48, 50–53]. In principle, the recognition network can alsobe implemented as a deep neural network whose hid-den layers are modeled by a quantum device. However,this design requires quantum state preparation and mea-surement of the hidden variables for each data point inthe training set and for each learning iteration, a daunt-

ing process in near-term implementations. To avoid do-ing this, we will implement the recognition network asa classical deep neural network. Restricting the recog-nition network to be classical is not a feature of theQAHM framework; instead it is an option to speed uplearning of large datasets assisted by serial quantum de-vices (e.g., quantum annealers). To force the recogni-tion network Q(u|v) to be close to the true posteriorP (u|v) a notion of distance between them is minimizedat each learning iteration. Using the Kullback-Leibler di-vergence, for instance, leads to the so-called wake-sleepalgorithm [51, 97, 99].

The general architecture of a type of QAHM is illus-trated in Fig. 4. The recognition network (left) infers hid-den variables via a bottom-up pass starting from the rawdata. The most abstract representation is obtained eitherfrom a classical layer (near-term) or from a quantum de-vice (future implementations). The generator networkgenerates samples of the visible variables via a top-downpass starting from samples obtained from a quantum de-vice. The final model is an implicit density model whena gray-box approach is used to characterize the quan-tum hardware (see Sec. III B), but we can turn it into anexplicit density model if further processing is employed(e.g., quantum annealing to seed Gibbs samplers). Taskssuch as reconstruction, generation, and classification canalso be implemented in the QAHM framework.

In Ref. [38], we tested these ideas using a D-Wave2000Q quantum annealer for the generation of artificialimages. For this task we used a sub-sampled 16 × 16pixels version of the standard handwritten digit datasetMNIST. Each gray-scale pixel is characterized by an in-teger value in {0, . . . , 255}; we rescale this value in therange [−1,+1] and interpret it as a continuous variable.There are also 10 binary variables indicating membershipto one of the classes. The 266 visible variables needed toencode this data could, in principle, be embedded directlyon the D-Wave 2000Q using a FVBM and a much poorerrepresentation of the data via a naive binarization. Yetwe would have to choose a relatively sparse model topol-ogy as we cannot embed an all-to-all connectivity in theD-Wave 2000Q for the 266 variables. The sparse connec-tivity and the absence of hidden variables can severelylimit the ability to model the dataset. Our approachtackles both challenges and also enables the handling oflarger datasets than would be possible in state-of-the-artquantum annealers.

We used a classical recognition network and aquantum-assisted generator network, both with 266 vis-ible variables and two hidden layers of 120 and 60 vari-ables, respectively. The deepest layer of 60 variables wasmapped to 1644 qubits in D-Wave 2000Q using the ap-proach in Ref. [21]. We run the wake-sleep algorithm for1000 iterations and generated samples with the quantum-assisted generator network. The images generated areshown in Fig. 4(a). Although these preliminary resultscannot compete with state-of-the-art ML, the artificialdata often resemble digits written by humans. Fig-

9

Compressed data

Raw

inpu

t da

ta

Quantum dynamics

Hid

den

laye

rs

Gen

erat

ion

Rec

ogni

tion

QU

AN

TU

M

PR

OC

ESS

ING

CL

ASS

ICA

L P

RE

- A

ND

P

OST

-PR

OC

ESS

ING

MeasurementVisible variables Hidden variables Qubits

Corrupted image

Reconstructed image

(a)

(b)

FIG. 4. Generation of handwritten digits with a type of quantum-assisted Helmholtz machine (QAHM). (Left) The QAHM is theframework we propose to model complex ML datasets in near-term devices. By complex, we refer to datasets where the numberof variables is much larger than the number of qubits available in the quantum device, and where data may be continuous ratherthan discrete. The framework employs a quantum computer to model the deepest hidden layers, containing the most abstractrepresentation of the data. This low-dimensional compact representation is where we believe the quantum device can capturenon-trivial correlations and where quantum distributions might have a significant effect. The number of hidden variables inthe deepest layers is much smaller than the number of visible variables, making it ideal for near-term implementation on earlyquantum technologies either on quantum annealers, or universal gate-model quantum computers. θ indicates the parameters ofthe quantum computer to be learned, and control the samples obtained from it. Although we illustrate in panel (a) and (b) therealization on a quantum annealer from Ref. [38], extensions to gate model quantum computers are in progress. (a) Artificialdata generated by a QAHM implemented on the D-Wave 2000Q, trained on a sub-sampled version of the MNIST dataset with16 × 16 continuous valued pixels and 10 binary variables indicating the class in {0, . . . , 9}. Both recognition and generatornetworks have 266 visible variables and two layers of 120 and 60 hidden variables, respectively. The samples are generated fromthe final model by first sampling the deepest layer with the D-Wave 2000Q, and then transforming those samples through theclassical part of the generator network. These experiments use 1644 qubits of the D-Wave 2000Q quantum annealer. Some ofthe samples resemble blurry variations of digits written by humans; this problem affects other approaches as well. (b) Samplesfrom the MNIST dataset that are closest in Euclidean distance to those generated by our model. The model does not simplymemorize the training set, but rather reproduce its statistics. In future work, the QAHM will be fine-tuned to provide sharperresults.

ure 4(b) shows the images in the training set that areclosest in Euclidean distance to the generated samples.We can see that the artificial images generated by the net-work are not merely copies of the training set; instead,they present variations and novelty in some cases, reflect-ing the generalization capability of the model. While theartificial data may also looks blurry, this problem affectsother approaches as well. Only the recent developmentof GANs [55] led to much sharper artificial images.

V. SUMMARY

Machine learning (ML) has been presented as one ofthe application with commercial value for near-term tech-nologies. However, there seems to be a disconnect be-tween the quantum algorithms proposed in much of the

literature and the needs of the ML community. Whilemost of the quantum algorithms for ML show that quan-tum computers have the potential of being very efficientat doing linear algebra (e.g., [11, 13–18, 84]), as discussedin Ref. [22], these proposals do not address the issuesrelated to any near-term implementation. More impor-tantly, to date there are no concrete benchmarks indicat-ing that such work can be close to outperforming theirconventional classical ML counterparts. In this perspec-tive we stress this disconnect and provide our views onkey aspects to consider towards building a robust readi-ness roadmap of QAML in near-term devices.

If a demonstration of quantum advantage on indus-trial applications is a first milestone to be pursued, weemphasize the need to move away from the popular andtractable ML implementations. We should rather look forapplications that are highly desirable, but not-so-popular

10

because of their intractability. It is in this domain wherewe believe quantum computers can have a significant im-pact in ML. In that sense, quantum speedup in itself isnot enough; if the chosen ML applications are tractableto a great accuracy with classical ML methods (as it isthe case of Ref. [27]), then the number of qubits requiredto tackle industrial-scale applications may be far largerthan those available in near-term devices.

As an example of intractable applications, in Sec. IIwe presented the case of sampling from complex proba-bility distributions with quantum devices. There is po-tential here for boosting the training of generative mod-els in unsupervised or semi-supervised learning. Anotherapproach we suggested is to explore applications wherequantum distributions naturally fit the model describ-ing the data correlations. That seems to be the casefor some datasets from the field of cognitive sciences(see e.g., Refs. [75, 79, 80, 100]; also [76–78] and ref-erences therein). Other hard ML problems have beenmentioned elsewhere [37]. We think working in any ofthese currently intractable applications will yield a higherpayoff towards demonstrating that quantum models im-plemented in near-term devices might surpass modelstrained with classical resources.

Here we focused on some of the most pressing chal-lenges we foresee in near-term implementations. For in-stance, the limited qubit connectivity will result in anoverhead of qubits in the adiabatic model, and an over-head of gate operations in the gate model. It is alsoimportant to take into consideration the model complex-ity each physical hardware might present, which couldhave significant consequences on the ML task. For in-stance, applications to cognitive sciences might requirea universal quantum computer capable of preparing andfine-tuning tailored quantum distributions, while appli-cations to generative modeling might only need samplingfrom a quantum Gibbs distribution. For the latter, therealready exist proposals with both quantum annealers andgate model quantum computer architectures.

Coping with the challenges presented in Sec. III is cer-tainly an ongoing research activity. One key strategy pro-posed here towards the near-term demonstration of quan-tum advantage is the development of hybrid quantum-classical algorithms capable of exploiting the best of bothworlds. In this perspective, we also put forward a newframework for such hybrid QAML algorithms, referredhere as the quantum-assisted Helmholtz machine [38].

This new approach aims to solve some of the most press-ing issues towards handling industrial-scale datasets witha large number of continuous variables. It is motivatedby the idea that a quantum computer should be usedonly to tackle the more abstract representation of thedata, after trimming the information that can be han-dled classically. Here we use a deep neural network totransform the large continuous data into a new abstractdiscrete dataset with reduced dimensionality. It is to thisabstract representation that we apply the quantum com-puter. This approach can be used to solve practical MLtasks, including reconstruction, classification and gener-ation of images.

Certainly, more work is needed to address the ques-tion of identifying the first killer ML application that canbe implemented in near-term quantum computers withthe order of a few thousand qubits. Finding intractableproblems and developing new hybrid QAML algorithmswhich tackle the challenges of working with real-worlddevices, is what we find the ideal scenario. We hope thecommunity makes a leap in this direction now that morepowerful and larger quantum annealers and gate modelquantum computers are becoming available to the scien-tific community.

ACKNOWLEDGEMENTS

The work of A.P-O., J-R-G, and M.B. was supportedin part by the AFRL Information Directorate under grantF4HBKC4162G001, the Office of the Director of NationalIntelligence (ODNI), the Intelligence Advanced ResearchProjects Activity (IARPA), via IAA 145483, and the U.S.Army TARDEC under the “Quantum-assisted MachineLearning for Mobility Studies” project. The views andconclusions contained herein are those of the authors andshould not be interpreted as necessarily representing theofficial policies or endorsements, either expressed or im-plied, of ODNI, IARPA, AFRL, U.S. Army TARDECor the U.S. Government. The U.S. Government is au-thorized to reproduce and distribute reprints for Gov-ernmental purpose notwithstanding any copyright anno-tation thereon. M.B. was partially supported by theUK Engineering and Physical Sciences Research Council(EPSRC) and by Cambridge Quantum Computing Lim-ited (CQCL). The authors would like to thank LeonardWossnig, Jonathan Romero, and Max Wilson for usefulfeedback on an early version of this manuscript.

[1] Masoud Mohseni, Peter Read, Hartmut Neven, SergioBoixo, Vasil Denchev, Ryan Babbush, Austin Fowler,Vadim Smelyanskiy, and John Martinis, “Commercial-ize quantum technologies in five years,” Nature 543,171–174 (2017).

[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-ton, “Imagenet classification with deep convolutional

neural networks,” in Advances in neural informationprocessing systems (2012) pp. 1097–1105.

[3] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl,Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Se-nior, Vincent Vanhoucke, Patrick Nguyen, Tara NSainath, et al., “Deep neural networks for acoustic mod-eling in speech recognition: The shared views of four re-

11

search groups,” IEEE Signal Processing Magazine 29,82–97 (2012).

[4] Jesse Levinson, Jake Askeland, Jan Becker, JenniferDolson, David Held, Soeren Kammel, J Zico Kolter,Dirk Langer, Oliver Pink, Vaughan Pratt, et al., “To-wards fully autonomous driving: Systems and algo-rithms,” in Intelligent Vehicles Symposium (IV), 2011IEEE (IEEE, 2011) pp. 163–168.

[5] Andre Esteva, Brett Kuprel, Roberto A Novoa, JustinKo, Susan M Swetter, Helen M Blau, and SebastianThrun, “Dermatologist-level classification of skin can-cer with deep neural networks,” Nature 542, 115–118(2017).

[6] Travers Ching, Daniel S. Himmelstein, Brett K.Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gre-gory P. Way, Enrico Ferrero, Paul-Michael Agapow, WeiXie, Gail L. Rosen, Benjamin J. Lengerich, Johnny Is-raeli, Jack Lanchantin, Stephen Woloszynek, Anne E.Carpenter, Avanti Shrikumar, Jinbo Xu, Evan M. Cofer,David J. Harris, Dave DeCaprio, Yanjun Qi, AnshulKundaje, Yifan Peng, Laura K. Wiley, Marwin H. S.Segler, Anthony Gitter, and Casey S. Greene, “Oppor-tunities and obstacles for deep learning in biology andmedicine,” bioRxiv (2017), 10.1101/142760.

[7] Volodymyr Mnih, Koray Kavukcuoglu, David Silver,Andrei A Rusu, Joel Veness, Marc G Bellemare, AlexGraves, Martin Riedmiller, Andreas K Fidjeland, GeorgOstrovski, et al., “Human-level control through deep re-inforcement learning,” Nature 518, 529–533 (2015).

[8] Harmut Neven, Vasil S Denchev, Marshall Drew-Brook,Jiayong Zhang, William G Macready, and GeordieRose, “Binary classification using hardware implemen-tation of quantum annealing,” in Demonstrations atNIPS-09, 24th Annual Conference on Neural Informa-tion Processing Systems (2009) pp. 1–17.

[9] Zhengbing Bian, Fabian Chudak, William G Macready,and Geordie Rose, The Ising model: teaching an oldproblem new tricks, Tech. Rep. (D-Wave Systems, 2010).

[10] Misha Denil and Nando De Freitas, “Toward the imple-mentation of a quantum RBM,” NIPS Deep Learningand Unsupervised Feature Learning Workshop (2011).

[11] Nathan Wiebe, Daniel Braun, and Seth Lloyd, “Quan-tum algorithm for data fitting,” Physical review letters109, 050505 (2012).

[12] Kristen L. Pudenz and Daniel A. Lidar, “Quantum adi-abatic machine learning,” Quantum Information Pro-cessing 12, 2027–2070 (2013).

[13] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost,“Quantum algorithms for supervised and unsupervisedmachine learning,” arXiv:1307.0411 (2013).

[14] Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd,“Quantum support vector machine for big data classifi-cation,” Phys. Rev. Lett. 113, 130503 (2014).

[15] Guoming Wang, “Quantum algorithm for linear regres-sion,” Physical Review A 96, 012335 (2017).

[16] Z. Zhao, J. K. Fitzsimons, and J. F. Fitzsimons, “Quan-tum assisted Gaussian process regression,” ArXiv e-prints (2015), arXiv:1512.03929 [quant-ph].

[17] Seth Lloyd, Masoud Mohseni, and Patrick Reben-trost, “Quantum principal component analysis,” NaturePhysics 10, 631–633 (2014).

[18] Maria Schuld, Ilya Sinayskiy, and Francesco Petruc-cione, “Prediction by linear regression on a quantumcomputer,” Physical Review A 94, 022342 (2016).

[19] Krysta M. Svore Nathan Wiebe, Ashish Kapoor,“Quantum deep learning,” arXiv:1412.3489 (2015).

[20] Marcello Benedetti, John Realpe-Gomez, RupakBiswas, and Alejandro Perdomo-Ortiz, “Estimation ofeffective temperatures in quantum annealers for sam-pling applications: A case study with possible applica-tions in deep learning,” Phys. Rev. A 94, 022308 (2016).

[21] Marcello Benedetti, John Realpe-Gomez, RupakBiswas, and Alejandro Perdomo-Ortiz, “Quantum-assisted learning of hardware-embedded probabilisticgraphical models,” Phys. Rev. X 7, 041052 (2017).

[22] Scott Aaronson, “Read the fine print,” Nature Physics11, 291–293 (2015), commentary.

[23] Steven H. Adachi and Maxwell P. Henderson, “Appli-cation of quantum annealing to training of deep neuralnetworks,” arXiv:1510.06356 (2015).

[24] Nicholas Chancellor, Szilard Szoke, Walter Vinci,Gabriel Aeppli, and Paul A Warburton, “Maximum-entropy inference with a programmable annealer,” Sci-entific reports 6 (2016).

[25] Mohammad H. Amin and Evgeny Andriyash and Ja-son Rolfe and Bohdan Kulchytskyy and Roger Melko,“Quantum Boltzmann Machine,” arXiv:1601.02036(2016).

[26] Maria Kieferova and Nathan Wiebe, “Tomographyand generative training with quantum boltzmann ma-chines,” Phys. Rev. A 96, 062327 (2017).

[27] Iordanis Kerenidis and Anupam Prakash, “Quan-tum recommendation systems,” arXiv preprintarXiv:1603.08675 (2016).

[28] Peter Wittek and Christian Gogolin, “Quantum en-hanced inference in markov logic networks,” ScientificReports 7 (2017).

[29] Thomas E. Potok, Catherine Schuman, Steven R.Young, Robert M. Patton, Federico Spedalieri, JeremyLiu, Ke-Thia Yao, Garrett Rose, and GangotreeChakma, “A study of complex deep learning networkson high performance, neuromorphic, and quantum com-puters,” arXiv:1703.05364 (2017).

[30] Maria Schuld, Ilya Sinayskiy, and Francesco Petruc-cione, “An introduction to quantum machine learning,”Contemporary Physics 56, 172–185 (2015).

[31] Jonathan Romero, Jonathan P Olson, and AlanAspuru-Guzik, “Quantum autoencoders for efficientcompression of quantum data,” Quantum Sci. Technol.2, 045001 (2017).

[32] Jeremy Adcock, Euan Allen, Matthew Day, StefanFrick, Janna Hinchliff, Mack Johnson, Sam Morley-Short, Sam Pallister, Alasdair Price, and StasjaStanisic, “Advances in quantum machine learning,”arXiv preprint arXiv:1512.02900 (2015).

[33] Jacob Biamonte, Peter Wittek, Nicola Pancotti, PatrickRebentrost, Nathan Wiebe, and Seth Lloyd, “Quantummachine learning,” arXiv preprint arXiv:1611.09347(2016).

[34] Unai Alvarez-Rodriguez, Lucas Lamata, PabloEscandell-Montero, Jose D Martın-Guerrero, andEnrique Solano, “Quantum machine learning with-out measurements,” arXiv preprint arXiv:1612.05535(2016).

[35] Lucas Lamata, “Basic protocols in quantum reinforce-ment learning with superconducting circuits,” ScientificReports 7 (2017).

12

[36] Maria Schuld, Mark Fingerhuth, and FrancescoPetruccione, “Quantum machine learning with small-scale devices: Implementing a distance-based classifierwith a quantum interference circuit,” arXiv preprintarXiv:1703.10793 (2017).

[37] C. Ciliberto, M. Herbster, A. Davide Ialongo, M. Pon-til, A. Rocchetto, S. Severini, and L. Wossnig, “Quan-tum machine learning: a classical perspective,” ArXive-prints (2017), arXiv:1707.08561 [quant-ph].

[38] Marcello Benedetti, John Realpe-Gomez, and Ale-jandro Perdomo-Ortiz, “Quantum-assisted helmholtzmachines: A quantum-classical deep learning frame-work for industrial datasets in near-term devices,”arXiv:1708.09784 (2017).

[39] Marcello Benedetti, Delfina Garcia-Pintos, YunseongNam, and Alejandro Perdomo-Ortiz, “A generativemodeling approach for benchmarking and training shal-low quantum circuits,” arXiv:1801.07686 (2018).

[40] Edward Farhi and Hartmut Neven, “Classification withquantum neural networks on near term processors,”arXiv:1802.06002 (2018).

[41] Yoshua Bengio, Aaron Courville, and Pascal Vincent,“Representation learning: A review and new perspec-tives,” IEEE transactions on pattern analysis and ma-chine intelligence 35, 1798–1828 (2013).

[42] Dumitru Erhan, Yoshua Bengio, Aaron Courville,Pierre-Antoine Manzagol, Pascal Vincent, and SamyBengio, “Why does unsupervised pre-training help deeplearning?” Journal of Machine Learning Research 11,625–660 (2010).

[43] Ian Goodfellow, “Nips 2016 tutorial: Generative adver-sarial networks,” arXiv:1701.00160 (2016).

[44] Diederik P Kingma and Max Welling, “Auto-encodingvariational bayes,” arXiv preprint arXiv:1312.6114(2013).

[45] Danilo Jimenez Rezende, Shakir Mohamed, and DaanWierstra, “Stochastic backpropagation and approx-imate inference in deep generative models,” arXivpreprint arXiv:1401.4082 (2014).

[46] Andriy Mnih and Karol Gregor, “Neural variational in-ference and learning in belief networks,” arXiv preprintarXiv:1402.0030 (2014).

[47] Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe,Søren Kaae Sønderby, and Ole Winther, “Ladder vari-ational autoencoders,” in Advances in Neural Informa-tion Processing Systems (2016) pp. 3738–3746.

[48] Jason Tyler Rolfe, “Discrete variational autoencoders,”arXiv preprint arXiv:1609.02200 (2016).

[49] David H Ackley, Geoffrey E Hinton, and Terrence JSejnowski, “A learning algorithm for boltzmann ma-chines,” Cognitive science 9, 147–169 (1985).

[50] Geoffrey E Hinton, Simon Osindero, and Yee-WhyeTeh, “A fast learning algorithm for deep belief nets,”Neural computation 18, 1527–1554 (2006).

[51] Jorg Bornschein and Yoshua Bengio, “Reweighted wake-sleep,” arXiv preprint arXiv:1406.2751 (2014).

[52] Jorg Bornschein, Samira Shabanian, Asja Fischer, andYoshua Bengio, “Bidirectional helmholtz machines,” inInternational Conference on Machine Learning (2016)pp. 2511–2519.

[53] Ruslan Salakhutdinov and Geoffrey Hinton, “Deepboltzmann machines,” in Artificial Intelligence andStatistics (2009) pp. 448–455.

[54] Yoshua Bengio, Eric Laufer, Guillaume Alain, andJason Yosinski, “Deep generative stochastic networkstrainable by backprop,” in International Conference onMachine Learning (2014) pp. 226–234.

[55] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza,Bing Xu, David Warde-Farley, Sherjil Ozair, AaronCourville, and Yoshua Bengio, “Generative adversar-ial nets,” in Advances in neural information processingsystems (2014) pp. 2672–2680.

[56] Yoshua Bengio et al., “Learning deep architectures forai,” Foundations and trend in Machine Learning 2, 1–127 (2009).

[57] Dan Roth, “On the hardness of approximate reasoning,”Artificial Intelligence 82, 273–302 (1996).

[58] Yoshua Bengio, Gregoire Mesnil, Yann Dauphin, andSalah Rifai, “Better mixing via deep representations,”in Proceedings of the 30th International Conference onMachine Learning (ICML-13) (2013) pp. 552–560.

[59] V. Dumoulin, I. J. Goodfellow, A. C. Courville, andY. Bengio, “On the challenges of physical implemen-tations of RBMs,” in Proceedings of the Twenty-EighthAAAI Conference on Artificial Intelligence, July 27 -31,2014, Quebec City, Quebec, Canada. (2014) pp. 1199–1205.

[60] Dmytro Korenkevych, Yanbo Xue, Zhengbing Bian,Fabian Chudak, William G Macready, Jason Rolfe, andEvgeny Andriyash, “Benchmarking quantum hardwarefor training of fully visible boltzmann machines,” arXivpreprint arXiv:1611.04528 (2016).

[61] Anirban Narayan Chowdhury and Rolando D Somma,“Quantum algorithms for gibbs sampling and hitting-time estimation,” arXiv preprint arXiv:1603.02940(2016).

[62] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton,“Deep learning,” Nature 521, 436 – 444 (2015).

[63] Brenden M Lake, Ruslan Salakhutdinov, and Joshua BTenenbaum, “Human-level concept learning throughprobabilistic program induction,” Science 350, 1332–1338 (2015).

[64] Sergio Boixo, Sergei V Isakov, Vadim N Smelyan-skiy, Ryan Babbush, Nan Ding, Zhang Jiang, John MMartinis, and Hartmut Neven, “Characterizing quan-tum supremacy in near-term devices,” arXiv preprintarXiv:1608.00263 (2016).

[65] Mile Gu, Karoline Wiesner, Elisabeth Rieper, andVlatko Vedral, “Quantum mechanics can reduce thecomplexity of classical models,” Nature Communica-tions 3, 762 (2012).

[66] Matthew S Palsson, Mile Gu, Joseph Ho, Howard MWiseman, and Geoff J Pryde, “Experimentally mod-eling stochastic processes with less memory by the useof a quantum processor,” Science Advances 3, e1601302(2017).

[67] Thomas J Elliott and Mile Gu, “Occam’s vorpalquantum razor: Memory reduction when simulatingcontinuous-time stochastic processes with quantum de-vices,” arXiv preprint arXiv:1704.04231 (2017).

[68] Kenneth P Burnham and David R Anderson, Modelselection and multimodel inference: a practicalinformation-theoretic approach (Springer Science &Business Media, 2003).

[69] E. T. Jaynes, “Information theory and statistical me-chanics,” Phys. Rev. 106, 620–630 (1957).

13

[70] Matjaz Perc, Jillian J Jordan, David G Rand, ZhenWang, Stefano Boccaletti, and Attila Szolnoki, “Sta-tistical physics of human cooperation,” Physics Reports(2017).

[71] John Realpe-Gomez, Giulia Andrighetto, GustavoNardin, and Javier Antonio Montoya, “Balancing self-ishness and norm conformity can explain human behav-ior in large-scale prisoners dilemma games and can poisehuman groups near criticality,” Physical Review E, toappear (2018), arXiv preprint arXiv:1608.01291.

[72] Marc Mezard, Giorgio Parisi, and Riccardo Zecchina,“Analytic and algorithmic solution of random satisfia-bility problems,” Science 297, 812–815 (2002).

[73] Marc Mezard and Andrea Montanari, Information,Physics, and Computation (Oxford University Press,Inc., New York, NY, USA, 2009).

[74] Giacomo Mauro D’Ariano, Giulio Chiribella, and PaoloPerinotti, Quantum Theory from First Principles: AnInformational Approach (Cambridge University Press,2017).

[75] John Realpe-Gomez, “Quantum as self-reference,”arXiv preprint arXiv:1705.04307 (2017).

[76] Peter D Bruza, Zheng Wang, and Jerome R Busemeyer,“Quantum cognition: a new theoretical approach topsychology,” Trends in cognitive sciences 19, 383–393(2015).

[77] Jerome R Busemeyer and Peter D Bruza, Quantummodels of cognition and decision (Cambridge UniversityPress, 2012).

[78] Diederik Aerts, Jan Broekaert, Liane Gabora, and San-dro Sozzo, “Quantum structures in cognitive and socialscience,” Frontiers in psychology 7 (2016).

[79] Jerome R Busemeyer, Zheng Wang, and Richard MShiffrin, “Bayesian model comparison favors quantumover standard decision theory account of dynamic in-consistency.” Decision 2, 1 (2015).

[80] Zheng Wang, Tyler Solloway, Richard M Shiffrin, andJerome R Busemeyer, “Context effects produced byquestion orders reveal quantum nature of human judg-ments,” Proceedings of the National Academy of Sci-ences 111, 9431–9436 (2014).

[81] Carlos Gracia-Lazaro, Alfredo Ferrer, Gonzalo Ruiz,Alfonso Tarancon, Jose A Cuesta, Angel Sanchez,and Yamir Moreno, “Heterogeneous networks do notpromote cooperation when humans play a prisonersdilemma,” Proceedings of the National Academy of Sci-ences 109, 12922–12926 (2012).

[82] Mario Gutierrez-Roig, Carlos Gracia-Lazaro, JosepPerello, Yamir Moreno, and Angel Sanchez, “Transi-tion from reciprocal cooperation to persistent behaviourin social dilemmas at the end of adolescence,” Naturecommunications 5 (2014).

[83] White House, “Artificial intelligence, automa-tion, and the economy,” Executive office of thePresident. https://obamawhitehouse. archives.gov/sites/whitehouse. gov/files/documents/Artificial-Intelligence-Automation-Economy. PDF (2016).

[84] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd,“Quantum algorithm for linear systems of equations,”

Physical review letters 103, 150502 (2009).[85] Geoffrey E Hinton, “Training products of experts by

minimizing contrastive divergence,” Neural Computa-tion 14, 1771–1800 (2002).

[86] Miguel A Carreira-Perpinan and Geoffrey E Hinton,“On contrastive divergence learning.” in Aistats, Vol. 10(2005) pp. 33–40.

[87] Aapo Hyvarinen, “Consistency of pseudolikelihood es-timation of fully visible boltzmann machines,” NeuralComputation 18, 2283–2292 (2006).

[88] Jack Raymond, Sheir Yarkoni, and Evgeny Andriyash,“Global warming: Temperature estimation in anneal-ers,” arXiv:1606.00919 (2016).

[89] Robert Beals, Stephen Brierley, Oliver Gray, Aram WHarrow, Samuel Kutin, Noah Linden, Dan Shepherd,and Mark Stather, “Efficient distributed quantum com-puting,” in Proc. R. Soc. A, Vol. 469 (The Royal Society,2013) p. 20120686.

[90] Vicky Choi, “Minor-embedding in adiabatic quantumcomputation: Ii. minor-universal graph design,” Quan-tum Information Processing 10, 343–353 (2011).

[91] Alejandro Perdomo-Ortiz, Joseph Fluegemann, RupakBiswas, and Vadim N Smelyanskiy, “A performanceestimator for quantum annealers: gauge selection andparameter setting,” arXiv:1503.01083 (2015).

[92] Kristen L. Pudenz, “Parameter setting for quantum an-nealers,” arXiv:1611.07552 (2016).

[93] Norbert M Linke, Dmitri Maslov, Martin Roetteler,Shantanu Debnath, Caroline Figgatt, Kevin A Lands-man, Kenneth Wright, and Christopher Monroe, “Ex-perimental comparison of two quantum computing ar-chitectures,” Proceedings of the National Academy ofSciences , 201618020 (2017).

[94] Seth Lloyd and Samuel L Braunstein, “Quantum com-putation over continuous variables,” Physical ReviewLetters 82, 1784 (1999).

[95] Hoi-Kwan Lau, Raphael Pooser, George Siopsis, andChristian Weedbrook, “Quantum machine learning overinfinite dimensions,” Physical Review Letters 118,080501 (2017).

[96] S. Das, G. Siopsis, and C. Weedbrook, “Continuous-variable quantum Gaussian process regression andquantum singular value decomposition of non-sparse low rank matrices,” ArXiv e-prints (2017),arXiv:1707.00360 [quant-ph].

[97] Geoffrey E. Hinton, Peter Dayan, Brendan J. Frey, andRadford M. Neal, “The wake-sleep algorithm for unsu-pervised neural networks,” Science 268, 1158 (1995).

[98] Peter Dayan, Geoffrey E Hinton, Radford M Neal, andRichard S Zemel, “The helmholtz machine,” Neuralcomputation 7, 889–904 (1995).

[99] Peter Dayan and Geoffrey E Hinton, “Varieties ofhelmholtz machine,” Neural Networks 9, 1385–1403(1996).

[100] Peter D Kvam, Timothy J Pleskac, Shuli Yu, andJerome R Busemeyer, “Interference effects of choice onconfidence: Quantum characteristics of evidence accu-mulation,” Proceedings of the National Academy of Sci-ences 112, 10645–10650 (2015).

Related Documents