This thesis investigates applications of classical machine learning to quantum problems and the possibilities of combining machine learning and quantum computing to improve algorithms for solving quantum problems. In quantum physics and quantum chemistry, the high dimensionality of quantum problems poses a significant challenge. Due to the increased complexity of such problems, traditional algorithms may not solve them effectively. However, new insights and better computational methods have become possible with the development of machine learning methods. This thesis aims to develop new methods based on classical and quantum machine learning methods applied to quantum problems. The first part of the thesis shows how Bayesian machine learning can be applied to quantum research when the number of calculations is limited. To be more specific, I construct accurate global potential energy surfaces for polyatomic systems by using a small number of energy points and demonstrate methods to improve the accuracy of quantum dynamics approximations with few exact results. The second part of the thesis looks into combining machine learning and quantum computing to improve machine learning algorithms. I demonstrate the first practical application of quantum regression models and use the resulting models to produce accurate global potential energy surfaces for polyatomic molecules. Furthermore, I illustrate the effect of qubit entanglement for the resulting models. In addition, I propose a quantum-enhanced feature mapping algorithm that is proven to have a quantum advantage for specific classically unsolvable classification problems and is more computationally efficient than previous methods. Finally, I highlight the potential for combining machine learning and quantum computing to improve algorithms for solving quantum problems.

Quantum Machine Learning in Drug Discovery: Applications in Academia and Pharmaceutical Industries

The nexus of quantum computing and machine learning––quantum machine learning––offers the potential for significant advancements in chemistry. This review specifically explores the potential of quantum neural networks on gate-based quantum computers within the context of drug discovery . We discuss the theoretical foundations of quantum machine learning, including data encoding, variational quantum circuits, and hybrid quantum-classical approaches. Applications to drug discovery are highlighted, including molecular property prediction and molecular generation. We provide a balanced perspective, emphasizing both the potential benefits and the challenges that must be addressed.

These authors contribute equally to this article. \altaffiliation These authors contribute equally to this article. \alsoaffiliation Department of Chemistry, Yale University, New Haven, CT 06520, USA \alsoaffiliation Yale Quantum Institute, Yale University, New Haven, CT 06511, USA \abbreviations QML

1 Introduction

1.1 quantum computing.

In this introduction, we discuss the general methodology of quantum computing based on unitary transformations (gates) of quantum registers, which underpin the potential advancements in computational power over classical systems. We introduce the unique properties of quantum bits, or qubits, quantum calculations implemented by algorithms that evolve qubit states through unitary transformations, followed by measurements that collapse the superposition states to produce specific outcomes, and lastly the challenges faced in practical quantum computing limited by noise, with hybrid approaches that integrate quantum and classical computing to address current limitations. This introductory discussion sets the stage for a deeper exploration into quantum computing for machine learning applications in subsequent sections.

Calculations with quantum computers generally require evolving the state of a quantum register by applying a sequence of pulses that implement unitary transformations according to a designed algorithm. A measurement of the resulting quantum state then collapses the coherent state, yielding a specific outcome of the calculation. To obtain reliable results, the process is typically repeated thousands of times, with averages taken over all of the measurements to account for quantum randomness and ensure statistical accuracy. This repetition is essential to achieve convergence, as each individual measurement only provides probabilistic information about the quantum state.

Quantum registers are commonly based on qubits. Like classical bits, qubits can be observed in either of two possible states (0 or 1). However, unlike classical bits, they can be prepared in superposition states, representing both 0 0 and 1 1 1 1 simultaneously with certain probability. In fact, the state of a single qubit can be described using the ket notation, as follows:


superscript 𝛼 2 superscript 𝛽 2 1 |\alpha|^{2}+|\beta|^{2}=1 | italic_α | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_β | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 . Such a state represent the states | 0 ⟩ ket 0 |0\rangle | 0 ⟩ and | 1 ⟩ ket 1 |1\rangle | 1 ⟩ simultaneously with probability | α | 2 superscript 𝛼 2 |\alpha|^{2} | italic_α | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and | β | 2 superscript 𝛽 2 |\beta|^{2} | italic_β | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , respectively.

Quantum registers with n 𝑛 n italic_n qubits represent states that are linear combinations of tensor products of qubit states. Therefore, a register with n 𝑛 n italic_n qubits represents 2 n superscript 2 𝑛 2^{n} 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT states simultaneously, offering a representation with exponential advantage over classical registers. For instance, the state of a register with two qubits represents four states simultaneously, as follows:


superscript subscript 𝛼 00 2 superscript subscript 𝛼 01 2 superscript subscript 𝛼 10 2 superscript subscript 𝛼 11 2 1 |\alpha_{00}|^{2}+|\alpha_{01}|^{2}+|\alpha_{10}|^{2}+|\alpha_{11}|^{2}=1 | italic_α start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_α start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_α start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_α start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 , and defining the probabilities P j ⁢ k = | α j ⁢ k | 2 subscript 𝑃 𝑗 𝑘 superscript subscript 𝛼 𝑗 𝑘 2 P_{jk}=|\alpha_{jk}|^{2} italic_P start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT = | italic_α start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT of observing the state collapsed onto state | j ⟩ ⊗ | k ⟩ tensor-product ket 𝑗 ket 𝑘 |j\rangle\otimes|k\rangle | italic_j ⟩ ⊗ | italic_k ⟩ when measuring the two qubits.

Quantum gates, analogous to classical logic gates, are used to represent the effect of the pulses that manipulate the states according to unitary transformations. Commonly used gates for transformation of a single qubit are the gates represented by the Pauli matrices:


For example, the X 𝑋 X italic_X (or, NOT) gate, flips the state of a qubit from | 0 ⟩ ket 0 |0\rangle | 0 ⟩ to | 1 ⟩ ket 1 |1\rangle | 1 ⟩ , and vice-versa. Another important class of transformations of a single qubit are the rotation gates R x ⁢ ( θ ) subscript 𝑅 𝑥 𝜃 R_{x}(\theta) italic_R start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_θ ) , R y ⁢ ( θ ) subscript 𝑅 𝑦 𝜃 R_{y}(\theta) italic_R start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_θ ) , and R z ⁢ ( θ ) subscript 𝑅 𝑧 𝜃 R_{z}(\theta) italic_R start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_θ ) . The rotation around the Y 𝑌 Y italic_Y -axis, for instance, is expressed as:


where I 𝐼 I italic_I is the identity matrix.

For multi-qubit systems, universal computing can be achieved with single qubit gates (such as Pauli, or rotation gates) and the two-qubit CNOT (Controlled-NOT) gate, defined as follows:


Measurements of individual qubits project the superposition states onto one of the basis states of the operator used for the measurement. Averages over many measurements (i.e., many shots) are required to achieve statistical converge of the calculation. For example, for a single qubit prepared in state | ψ ⟩ ket 𝜓 |\psi\rangle | italic_ψ ⟩ , given by Eq.  1 , measurements with the Z 𝑍 Z italic_Z operator yield either 1 1 1 1 when the state is collapsed by the measurement onto state | 0 ⟩ ket 0 |0\rangle | 0 ⟩ (with probability | α | 2 superscript 𝛼 2 |\alpha|^{2} | italic_α | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), or − 1 1 -1 - 1 when the state is is collapsed into state | 1 ⟩ ket 1 |1\rangle | 1 ⟩ (with probability | β | 2 superscript 𝛽 2 |\beta|^{2} | italic_β | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

tensor-product subscript 𝛼 00 ket 0 ket 0 tensor-product subscript 𝛼 11 ket 1 ket 1 |\psi\rangle=\alpha_{00}|0\rangle\otimes|0\rangle+\alpha_{11}|1\rangle\otimes|1\rangle | italic_ψ ⟩ = italic_α start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT | 0 ⟩ ⊗ | 0 ⟩ + italic_α start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT | 1 ⟩ ⊗ | 1 ⟩ where measurement of one of the two qubits collapses the state of both qubits in highly correlated way so that both qubits end-up in the same collapsed state (both | 0 ⟩ ket 0 |0\rangle | 0 ⟩ , or both | 1 ⟩ ket 1 |1\rangle | 1 ⟩ , but never one | 0 ⟩ ket 0 |0\rangle | 0 ⟩ and the other | 1 ⟩ ket 1 |1\rangle | 1 ⟩ ). These quantum properties offer great potential for computational advantage over classical computers and thus could lead to significant advancements in many areas of chemistry, and beyond.

Quantum algorithms can achieve significant speed up compared to their classical counterparts. For example, the Quantum Fourier Transform (QFT) 1 can enable exponential speedup when compared to the best-known classical Fourier transform algorithms. Algorithms like Quantum Phase Estimation 2 and the Shor’s algorithm 3 , 4 can also enable factorization of large numbers with exponential quantum advantage. Amplitude amplification techniques, such as those used in Grover’s algorithm, provide a quadratic speed up for unstructured search problems, while the Harrow-Hassidim-Lloyd (HHL) algorithm 5 offers logarithmic speed up for solving linear systems within bounded error, highlighting the potential of quantum computing to outperform classical methods in a wide range of applications. The actual implementation of these quantum algorithms, however, would require fault-tolerant quantum computers that are not currently available to achieve quantum advantage over classical algorithms.

Due to the current limitations of quantum hardware, including noise and limited qubit counts, significant efforts have been focused on near-term calculations based on hybrid quantum-classical approaches where only part of the calculation is performed on the quantum computer while the rest of the computation is delegated to conventional high-performance computers. For example, variational algorithms, such as the Variational Quantum Eigensolver (VQE) 6 and Quantum Imaginary Time Evolution (QITE),  7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 implement hybrid quantum-classical approaches. These algorithms generate quantum states and employ classical computations to combine the results of the measurements performed on the quantum states. This synergy leverages the strengths of both quantum and classical resources, making it feasible to solve problems with current noisy intermediate-scale quantum (NISQ) devices.

Despite significant advances in the field, an outstanding challenge is to achieve advantage over classical high performance computing. One promising direction is the use of quantum computers to implement machine learning algorithms. Harnessing the speed-up of quantum algorithms could address complex problems in data analysis and pattern recognition. Given that quantum computing has many potential applications in chemistry and biological science, there is a great deal of hope that quantum machine learning (QML) can be extended to these areas of research.

1.2 Machine Learning

Machine learning (ML) algorithms are able to learn from data, where learning in this context can be defined according to “a program is said to learn from experience E with respect to some other class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience E ” 17 . In practice, machine learning can be used to approximate a function of the input data to predict some variable (e.g., predict chemical toxicity from molecular features)  18 , 19 , or can be used to learn the distribution of the input data to generate synthetic data akin to the training distribution (e.g., generating virtual compounds with specific drug-like properties) 20 .

There are now many machine learning methods that have demonstrated exceptional, unprecedented abilities in many areas of research pertaining to drug development, with AlphaFold 2 and its later iterations being particularly recognizable 21 , 22 . AlphaFold is able to predict protein structures from their input sequences with high accuracy, although it is less capable in cases where the input sequence corresponds to a structure that is not well represented in the training distribution. Nonetheless, there is a lot of excitement and anticipation that AlphaFold will enable a lot of innovation within the domains of studying protein dynamics and hit identification in drug discovery 23 .

Machine learning has become pervasive in cheminformatics, and there have been many tools developed to predict molecular properties, generate compounds with prespecified properties, and ultimately reduce an incredible vast chemical search space to something tractable given the specific task at hand 24 . Specifically, there are a lot of efforts leveraging machine learning to reveal molecular mechanisms 25 , analyze complex biochemical data 26 , process and optimize chemical data 27 , predict protein structure 21 , 22 , virtual screening and drug design 28 , 29 , protein-ligand docking 30 , as well as many other tasks 31 .

1.3 Quantum Neural Networks

Machine learning has had a transformative effect on all facets of modern life and has led to increasing computational demands, thus motivating the development of QML methods. The promising capabilities of quantum computing have already motivated the development of quantum analogs for a wide range of classical machine learning methods. Bayesian inference 32 , least-squares fitting 33 , principal component analysis 34 , and support vector machines 35 are some of the algorithms for which quantum counterparts have already been developed. While quantum analogs for these traditional ML methods have a demonstrable quantum speed-up,  36 some of the most awe-inspiring advances are due to artificial neural networks (ANNs).

Perhaps the earliest discussions of quantum neural networks (QNNs) were motivated by studies of neural function through the lens of quantum mechanics 37 . Since then, the field has evolved to exploiting the computational parallelization enabled by superposition states and entanglement 38 . In the early stages of research for QNNs, much effort was dedicated towards developing quantum systems that preserved the mechanisms of classical ANNs 39 , 40 , 41 . However, those efforts have largely failed to reconcile the linear dynamics of a quantum state evolving through a circuit and the non-linear behavior of classical neural networks 42 . Increasingly, the field has consolidated around the use of variational quantum circuits to learn data representations 43 rather than directly creating a quantum analog of a neural network. Accordingly, quantum versions of the most popular classical neural network architectures, such as convolutional neural networks (QCNNs), graph neural networks (QGNNs), variational autoencoders (QVAEs), and generative adversarial networks (QGANs) have been realized and centered around variational quantum circuits.

Refer to caption

QNN s require data encoding, variational quantum gates with learnable parameters θ 𝜃 \theta italic_θ , and measurements, as depicted in Figure 1 . Data encoding converts classical data into a quantum state. The choice of strategy for data encoding can be of paramount importance in QNNs, as it can significantly affect performance and impact the underlying computational complexity. While other data encoding strategies exist 44 , 45 , three of the most popular methods are discussed in the following subsections.

1.3.1 Basis Encoding

Basis encoding is a straightforward and inexpensive method to encode binary data into a quantum system. Explicitly, let 𝒟 𝒟 \mathcal{D} caligraphic_D be a classical binary dataset such that each element x m ∈ 𝒟 superscript 𝑥 𝑚 𝒟 x^{m}\in\mathcal{D} italic_x start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∈ caligraphic_D is an N 𝑁 N italic_N -bit binary string of the form x m = ( b 1 m , b 2 m , ⋯ , b N m ) superscript 𝑥 𝑚 subscript superscript 𝑏 𝑚 1 subscript superscript 𝑏 𝑚 2 ⋯ subscript superscript 𝑏 𝑚 𝑁 x^{m}=(b^{m}_{1},b^{m}_{2},\cdots,b^{m}_{N}) italic_x start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT = ( italic_b start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_b start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) , with b j m = 0 subscript superscript 𝑏 𝑚 𝑗 0 b^{m}_{j}=0 italic_b start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 0 or 1 1 1 1 . Then the classical dataset can be represented by the quantum state | 𝒟 ⟩ ket 𝒟 \ket{\mathcal{D}} | start_ARG caligraphic_D end_ARG ⟩ of N 𝑁 N italic_N qubits, where M 𝑀 M italic_M is the total number of basis states used for the encoding :


Refer to caption

1.3.2 Angle Encoding

Unlike basis encoding where the data is restricted to binary values, angle encoding allows data to take the form of real, floating point numbers. This encoding method entails rotating the state of a qubit around an axis of the Bloch sphere by an angle corresponding to the classical data. Explicitly, for an element x m superscript 𝑥 𝑚 x^{m} italic_x start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT of a classical dataset 𝒟 𝒟 \mathcal{D} caligraphic_D where x m ∈ [ 0 , 2 ⁢ π ] superscript 𝑥 𝑚 0 2 𝜋 x^{m}\in[0,2\pi] italic_x start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∈ [ 0 , 2 italic_π ] , then the value of x m superscript 𝑥 𝑚 x^{m} italic_x start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT may be encoded into a single qubit by a rotation operator:


where k 𝑘 k italic_k indicates the rotation axis ( e.g. , k = y 𝑘 𝑦 k=y italic_k = italic_y ).


After scaling, the angles can be encoded with the R x subscript 𝑅 𝑥 R_{x} italic_R start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT or R y subscript 𝑅 𝑦 R_{y} italic_R start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT gates, as shown in Figure 2(b) .

1.3.3 Amplitude Encoding

Amplitude encoding allows one to encode complex valued floats into the amplitudes of a quantum state. Thus, for a given classical dataset 𝒟 𝒟 \mathcal{D} caligraphic_D , an L2-normalized complex vector x ∈ 𝒟 𝑥 𝒟 x\in\mathcal{D} italic_x ∈ caligraphic_D of length N 𝑁 N italic_N can be encoded into log ⁡ ( N ) 𝑁 \log(N) roman_log ( italic_N ) qubits. Namely,


Many quantum neural networks rely on this encoding strategy, as it enables an exponential reduction in the number of required bits to represent data, and thus has the potential to allow for a speed-up that is not possible on classical computers. Despite this, the unitary operator U x subscript 𝑈 𝑥 U_{x} italic_U start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT shown in Equation 9 may demand a significant number of gates - a challenge discussed further in section 6.2 .

1.3.4 Variational Quantum Circuits and Readout

Variational quantum circuits (VQCs), also commonly known as parameterized quantum circuits (PQCs), are typically used to introduce learnable parameters θ 𝜃 \theta italic_θ of unitary gates (Fig.  3 ).

Refer to caption

After the VQC, measurements are performed. Measurements typically undergo classical post-processing to obtain averages. The set of parameters θ 𝜃 \theta italic_θ are iteratively adjusted by a classical computer to minimize a cost function C ⁢ ( θ ) 𝐶 𝜃 C(\theta) italic_C ( italic_θ ) defined by the average expectation values ⟨ ϕ | U † ⁢ ( θ ) ⁢ O ^ ⁢ U ⁢ ( θ ) | ϕ ⟩ quantum-operator-product italic-ϕ superscript 𝑈 † 𝜃 ^ 𝑂 𝑈 𝜃 italic-ϕ \langle\phi|U^{\dagger}(\theta)\hat{O}U(\theta)|\phi\rangle ⟨ italic_ϕ | italic_U start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( italic_θ ) over^ start_ARG italic_O end_ARG italic_U ( italic_θ ) | italic_ϕ ⟩ , as follows:


where | ϕ ⟩ ket italic-ϕ |\phi\rangle | italic_ϕ ⟩ are the encoded states and U ⁢ ( θ ) 𝑈 𝜃 U(\theta) italic_U ( italic_θ ) is the ansatz of choice with learnable parameters and the function f 𝑓 f italic_f is any classical post-processing function . The overall hybrid quantum-classical machine learning scheme is depicted in Figure 4 .

Refer to caption

2 Predictive Quantum Machine Learning

2.1 quantum graph neural networks.

Graph neural networks (GNNs) are popular models in applications of machine learning methods to chemistry because molecules can be intuitively represented as graphs where nodes are atoms and edges are bonds (Figure  5 ). In a typical GNN, messages ( i.e. , features used to describe each node) are passed between neighboring nodes, ultimately resulting in an aggregated graph-level encoding which can subsequently be processed to predict some value (e.g., protein-ligand binding affinity, hERG activity, etc.)  46

Refer to caption

QGNNs were first introduced with the Networked Quantum System.  47 In this system, a graph 𝒢 = { 𝒱 , ℰ } 𝒢 𝒱 ℰ \mathcal{G}=\{\mathcal{V},\mathcal{E}\} caligraphic_G = { caligraphic_V , caligraphic_E } with the set of nodes 𝒱 𝒱 \mathcal{V} caligraphic_V and edges ℰ ℰ \mathcal{E} caligraphic_E is defined as tensor products of Hilbert subspaces representing nodes and edges. The Hilbert space representing nodes, ℋ 𝒱 = ⨂ v ∈ 𝒱 ℋ v subscript ℋ 𝒱 subscript tensor-product 𝑣 𝒱 subscript ℋ 𝑣 \mathcal{H}_{\mathcal{V}}=\bigotimes_{v\in\mathcal{V}}\mathcal{H}_{v} caligraphic_H start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT = ⨂ start_POSTSUBSCRIPT italic_v ∈ caligraphic_V end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT and the space representing edges ℋ ℰ = ⨂ e ∈ ℰ ℋ e subscript ℋ ℰ subscript tensor-product 𝑒 ℰ subscript ℋ 𝑒 \mathcal{H}_{\mathcal{E}}=\bigotimes_{e\in\mathcal{E}}\mathcal{H}_{e} caligraphic_H start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT = ⨂ start_POSTSUBSCRIPT italic_e ∈ caligraphic_E end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT are joined to create the full networked Hilbert space H 𝒢 = ℋ 𝒱 ⊗ ℋ ℰ subscript 𝐻 𝒢 tensor-product subscript ℋ 𝒱 subscript ℋ ℰ H_{\mathcal{G}}=\mathcal{H}_{\mathcal{V}}\otimes\mathcal{H}_{\mathcal{E}} italic_H start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT = caligraphic_H start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT ⊗ caligraphic_H start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT that comprise the space for the complete graph. Since then, various quantum theoretical formulations of QGNNs have been introduced.  48 , 49 , 50 , 51 Alongside quantum graph convolutional networks, quantum learning on equivariant graphs has also been demonstrated 52 , 53 , which has been of increasing interest in classical ML for drug discovery 54 , 55 , 56 .

Equivariant QGNNs and hybrid quantum-classical QGNNs have been used to predict the HOMO-LUMO gap in the QM9 dataset, which can provide insights on molecular stability.  57 An interesting observation from this work is that comparisons of their QGNN models to their corresponding classical models with the same number of parameters shows that the quantum models typically outperform the classical counterparts. Additionally, training of the quantum model is generally more efficient. These are exciting results that suggest favorable scalability and generalization of QGNNs, as previously suggested.  58 Another study, 59 has implemented a hybrid QGNN to predict the formation energy of perovskite materials. While their method underperforms compared to the fully classical GNN, it has been pointed out that advantages will emerge once state preparation techniques improve due to their usage of amplitude encoding.

Quantum isomorphic graph networks and quantum graph convolutional networks have been used to predict protein ligand binding affinities, showing that hybrid models already perform on par with state-of-the-art models.  60 In this work, features are amplitude encoded into a quantum state and a PQC replaces the classical multi-layer perceptron (MLP) to perform convolutions. The models provide a good balance between number of parameters and generalization.

QGNNs are truly promising methods. For example, Liao et al. 51 has analyzed quantum implementations of the Simple Graph Convolutional network 61 and the linear graph convolutional network 62 that exhibit quantum advantage in terms of both space and time complexity. As the utility of graph networks is extended to both small molecules and large protein structures alike, solutions with complexity advantages are expected to be the dominant driver of the success of QGNNs.

2.2 Quantum Convolutional Neural Networks

Convolutional neural networks (CNNs) gained initial popularity for their success in image detection and classification 63 . They have been applied in chemistry to predict molecular properties, interaction strengths, and other chemically significant tasks 64 . The most fundamental architectural components of CNNs are the kernels of convolutional layers 65 . Each kernel creates a linear combination of the values in the spatial neighborhood of a given voxel (i.e., a pixel in the 2D case or a point in a 3D grid) of the input data and then propagates the resulting scalar to a corresponding spatial index in the output array. The coefficients for this linear combination are learned throughout training and constitute the weights of the kernel, which are applied uniformly across the input voxels.

QCNNs were first introduced for quantum phase recognition 66 , outperforming existing approaches with a significantly reduced number of variational parameters, scaling as O ⁢ ( log ⁡ ( N ) ) 𝑂 𝑁 O(\log(N)) italic_O ( roman_log ( italic_N ) ) with N 𝑁 N italic_N the number of qubits. This initial success sparked significant interest, leading to the development of many QCNN variants 67 , 68 , 69 , 70 , 71 , tutorials 72 , 73 , 74 , 75 , and applications to a large range of complex tasks in many fields of science and technology. For example, in high energy physics, QCNNs have been used to classify particles with a level of accuracy and speed of convergence that surpasses classical Convolutional Neural Networks (CNNs) with the same number of learnable parameters 76 . In the field of biochemistry, they have shown the ability to generate protein distance matrices 77 and predict protein-ligand binding affinities 78 , 60 , demonstrating their potential to contribute to our understanding of complex biological systems.

The appeal of QCNNs over many quantum counterparts of classical neural networks is multi-faceted. In a QCNN, the classical convolutional filters are replaced by quantum circuits (Figure  6 ).

Refer to caption

In CNNs, the computation involves the discrete convolution between a relatively small kernel and the input data. This is attractive, as it allows the quantum approach to load only a small amount of information at a time onto quantum devices, as determined by the kernel size, which is of paramount importance during the NISQ era. This feature of QCNNs can be particularly useful in a biological context, as full-size feature maps would be too demanding.

Broadly speaking, there are two classes of QCNNs that could offer quantum advantage. This first class is akin to the general structure shown in Figure 6 .  66 QCNNs with that structure incorporate pooling layers that halve the number of active qubits with each successive layer. This architectural choice involves only O ⁢ ( log ⁡ ( N ) ) 𝑂 𝑁 O(\log(N)) italic_O ( roman_log ( italic_N ) ) parameters and effectively circumvents the issue of barren plateaus —a significant challenge discussed further in Section 6.2 . The second class can be termed Hybrid-QCNNs (HQCNNs). HQCNN models replace the forward pass of a convolutional filter with a quantum circuit, but perform pooling layers classically after a measurement. HQCNNs are popular choices since they allow for more classical control over the network, with the mixing of quantum and classical components potentially offering performance gains at the expense of trainability and complexity brought by the original QCNN architecture.

QCNNs and HQCNNs offer distinct advantages that are attractive for chemical and pharmaceutical applications. While QCNNs require only O ⁢ ( log ⁡ ( N ) ) 𝑂 𝑁 O(\log(N)) italic_O ( roman_log ( italic_N ) ) parameters and avoid barren plateaus, this by itself does not deem them to be advantageous over classical CNNs. In a rigorous analysis of QCNNs (to which they later extend to all QML models), the generalization bounds of these models were investigated 58 . The reported analysis offers a guide to determine whether a QML model can exhibit better performance on unseen (test) data when compared to their classical counterpart. It is shown that when a QML model achieves a small training error on a given task, while the classical model with the same training error is significantly more complex, then the QML model will most likely outperform the classical model on unseen data.

This simple guide is particularly useful for drug discovery applications where datasets can often be limited but good generalization is paramount for discovery of life-saving compounds. Given that in the NISQ era QML models can only include a limited number of parameters, it is commonplace and intuitive when designing QML models to compare their performance to a classical network of equal parameters. Therefore, it is important to temper claims of advantage in the event of comparing a quantum and classical models, wherein the classical model might be heavily restricted for the sole purpose of fair comparisons with equal number of parameters. Instead, it is more significant to identify tasks which satisfy the criteria which guarantee good generalization bounds 58 . Shifting focus to this task identification, we anticipate that applications that are more likely to benefit from demonstrable quantum advantage are those for which the training data is scarce.

HQCNNs operate differently and more flexibly than QCNNs. So, the ways in which quantum advantage might be demonstrated is likely different from QCNNs. While the above criteria to identify potential generalization quantum advantage would still apply to HQCNNs, this becomes less straightforward as HQCNNs do not necessarily operate with O ⁢ ( log ⁡ ( N ) ) 𝑂 𝑁 O(\log(N)) italic_O ( roman_log ( italic_N ) ) parameters like their fully quantum counterparts. HQCNNs have been proposed to enable quantum speed-up in the CNN architecture (and neural network architectures in general) by directly calculating the inner product of the filter and input data (Fig.  7 ) 18 , 79 , 80 .

Refer to caption

These approaches are attractive when searching for quantum advantage, as they are task-agnostic and the potential for realization on quantum hardware is dictated almost exclusively by data representation - a much more straightforward litmus test of advantage compared to that required for a generalizability advantage.

The success of classical CNNs in drug discovery has prompted the exploration of QCNNs, as in the domain of biophysics where the relatively large input data can be broken up into tractable quantum circuits using the HQCNN methodology. An early biophysical application of HQCNNs has involved a model capable of predicting protein structure,  77 where the sequence lengths of the protein chains range from 50 to 500 residues and 50 to 266 residues in the training and testing sets, respectively. The reported results indicate commensurate performance to predictions by the popular classical model DeepCov 81 for protein contact maps while offering faster training convergence. Both Domingo et al. 78 and Dong et al. 60 trained HQCNNs to predict protein-ligand binding affinities. Domingo et al. demonstrated that their HQCNN architecture is able to reduce the number of parameters by 20% while maintaining performance. They noted that depending on the hardware, this translates to a 20% to 40% reduction in training times. Similarly, Dong et al. demonstrated competitive results with force field-based MM/GBSA and MM/PBSA calculations while reducing the overall number of parameters to their classical counterparts.

In the work by Smaldone and Batista 18 , a HQCNN has been trained to predict drug toxicity (Figure  8 ).

Refer to caption

This work has demonstrated a method where the weights of a convolutional layer are learned via quantum circuits while performing the underlying matrix multiplication of discrete dot products with quadratic quantum speed-up. This strategy performs at the level of classical models with equal number of parameters and can be transferred to a classical CNN mid-training to allow for noiseless training convergence.

2.3 Looking Ahead: Quantum Machine Learning for Large Molecules

mRNA and antibody-based biotherapeutics are critical for the development of next-generation therapies, yet both pose complex challenges, such as determining mRNA structures and understanding antibody-antigen interactions. Quantum computing has already shown promise by predicting mRNA secondary structures (see Figure 9 ) 82 , and quantum neural networks are now being applied to tackle antibody-antigen interactions. Notably, Paquet et al. 83 introduced QuantumBound, a hybrid quantum neural network designed to predict the physicochemical properties of ligands within receptor-ligand complexes. Furthermore, Jin et al. 84 developed a QNN model to predict potential COVID-19 variant strains using available SARS-CoV-2 RNA sequences. These early successes highlight the potential of quantum neural networks to address key challenges in biotherapeutics.

Refer to caption

3 Generative Quantum Machine Learning

3.1 quantum autoencoders.

The primary motivation behind the development of autoencoders is to compress data into a latent space, reducing dimensionality while preserving essential information of the training data. Similarly, the original motivation for development of quantum autoencoders (QAEs) is to compress quantum data (Figure  10 ).

Refer to caption

Variational Autoencoders (VAEs) , a specific type of autoencoders, have gained popularity for molecular generation due to their ability to learn compact representations of molecular structures and generate new molecules with similar properties. QVAEs can compress quantum states and could therefore enable new avenues for molecular generation, though the exact benefits of QVAEs in this domain require further investigation.

There are two primary types of QAE, both utilizing hybrid quantum-classical schemes where classical computers are used for parameter optimization. The first type employs a quantum circuit as the model architecture  85 , 86 , 87 , 88 , 89 , 90 , 91 , aiming to leverage quantum gates and operations to encode and decode quantum states (see Fig.  10(a) ). The second type, known as the Hybrid Quantum Autoencoder (HQA)  92 , employs measurement outcomes as the latent representations. This approach combines classical networks with QNNs in a hybrid model architecture, where classical vectors derived from quantum measurements are accessible for further analysis and processing (see Fig.  10(b) ). Note that the compression is effective (with no loss of information) only if the set of states to be compressed has support on a subspace (lower dimension) of its Hilbert space  93 . For example, the success of the Hubbard model example from Romero et al  85 . is due to the fact that these physical states exhibit certain symmetries.

A proposal for QVAE  94 involves the model architecture of the first type of QAE shown in Fig. 10(a) and a latent representation regularized as in classical VAE. The regularized latent space can enhance classification performance compared to QAE. However, the regularization process requires mid-circuit quantum state tomography, which may represent a practical challenge for fully characterizing the state and scaling up.

Despite the promising aspects of QAE, several challenges remain. First, training relies on classical optimization algorithms, which can obscure statements about the overall computational complexity. Second, these models assume that input states can be efficiently prepared, a relatively straightforward task for quantum data but challenging for classical data (see Fig.  10(c) ). The encoding of classical data into quantum states might negate the computational benefits offered by quantum computers. Consequently, no immediate advantage can be claimed for QAE on classical data over classical methods at present. However, advancements in quantum computing hardware and more efficient optimization schemes could lead to significant improvements, making QAE a more viable and efficient tool in the future, particularly as the training optimization and data encoding complexity becomes comparable to the quantum components of the models.

3.2 Quantum Generative Adversarial Networks

Generative Adversarial Networks (GANs) are machine learning models designed to generate new data samples that mimic samples from a given distribution. GANs consist of three primary components: the prior distribution/noise sampling, the generator, and the discriminator. The generator creates data samples from random noise sampling , while the discriminator evaluates the authenticity of the generated samples by comparison against real data. This adversarial training process helps the generator improve over time, creating increasingly realistic samples. GANs have found applications in molecular generation, much like VAEs , and have been shown to generate novel molecular structures that adhere to desired properties  95 , 96 , 97 , 98 .

A QGAN was proposed by Dallaire-Demers and Killoran  99 . This work introduced the concept of using quantum circuits within the GAN framework, specifically leveraging quantum circuits to measure gradients. Romero and Aspuru-Guzik  100 extended the concept of QGANs by modeling continuous classical probability distributions using a hybrid quantum–classical approach. While their results were promising, they noted that further theoretical investigations were necessary to determine whether their methodology offers practical advantages over classical approaches.

QGANs have been applied to generation of small molecules,  101 in a study that applied QGANs to the QM9 dataset  102 . That study reported better learning behavior due to the claimed superior expressive power and fewer parameters required by the quantum models. However, these QGANs struggled to generate valid molecules, and subsequent tests by other researchers indicated that these QGANs struggled to generate train-like molecules  103 .

Kao et al.  103 explored the advantages of QGANs in generative chemistry by testing different components of the GAN framework with quantum counterparts. They demonstrated that using a quantum noise generator (prior distribution sampling) could yield compounds with better drug properties. However, they found that quantum generators struggled to generate molecules that resembled those in the training set and encountered computational restrictions during further training. Additionally, they showed that a quantum discriminator with just 50 parameters could achieve a better KL score than a classical discriminator with 22000 parameters, indicating that quantum components can enhance expressive power even with a much fewer number of parameters. Nevertheless, these advancements often compromised the validity and uniqueness of the generated molecules, potentially undermining the efficiency of the sampling and generation processes.

Anoshin et al.  104 introduced a hybrid quantum cycle generative adversarial network for small molecule generation, utilizing the cycle-consistent framework from prior research  105 , 106 . Their approach featured a hybrid generator where a quantum circuit processed the noise vector (prior distribution) and connected to a MLP to generate molecular graphs. This method demonstrated comparable, or even improved, performance across various metrics, including uniqueness, validity, diversity, drug-likeness, as well as synthesizability and solubility, highlighting the potential of hybrid quantum-classical architectures in enhancing generative models. However, the study did not provide a detailed comparison of the total number of parameters used, limiting claims about its expressive power.

While QGANs show some promising results in molecular generation, particularly in areas like enhanced drug properties and the potential for better expressive power in discriminators, significant challenges persist. The expressive power derived from full quantum discriminators may come at the cost of compromising other crucial metrics in molecular generation. Additionally, when hybrid networks achieve improvements in drug properties and other metrics, the exact contribution of expressive power offered by the quantum component becomes less clear. Thus, an outstanding challenge is to achieve enhanced expressive power without sacrificing performance across other critical metrics.

3.3 Looking Ahead: Quantum Transformers

Much of the AI revolution is due to the transformer architecture introduced in the “Attention is All You Need” paper out of Google DeepMind  107 . This architecture was originally developed for language translation, and consisted of encoder and decoder components which are connected via a cross-attention mechanism. The encoder alone is useful for learning a context-rich representation for a given input sequence by masking some of the sequence and learning to predict the masked parts. The decoder is useful for generating new sequences by learning to predict the next parts of some sequence given a context. Within the realm of biochemistry and drug discovery, transformer encoders have been developed to extract feature vectors from SMILES strings to be used for downstream predictive tasks, and transformer decoders have been used to generate SMILES strings with prespecified characteristics  108 , 109 , 110 , 111 , 112 . The fundamental capabilities of the transformer architecture are due to the self-attention mechanism where query, key, and value vectors are computed for each input token (e.g., a sub-word in text or a character in a SMILES strings), attention scores are derived via a scaled dot product of query and key vectors, and softmax normalizes these scores to obtain weights that modulate the aggregation of the value vector, effectively capturing the magnitude with which each token will attend to every other token in the sequence. The self-attention mechanism is executed multiple times in parallel through what is referred to as multi-head attention.

The overwhelming success of the classical transformer in ML has naturally piqued the interest of QML researchers. Most implementations of quantum transformers have been adapted as Vision Transformers (ViTs) rather than for Natural Language Processing (NLP) 113 , 114 , 115 , 116 . While classical ViT models have been utilized in predictive tasks in chemistry and biophysics 117 , 118 , 119 , the primary role of transformers in the context of drug discovery has remained with transformer-based generative models.

Refer to caption

Quantum-based attention for generative pre-trained transformers are still in their infancy, and while many of the results presented thus far have been largely theoretical, the field is rapidly advancing. In 2022, DiSipio et al. 121 discuss the beginnings of quantum NLP, and highlighted that the underlying mathematical operations of the transformer’s self-attention mechanism all have implementable quantum formulations. In 2023, both Gao et al. 122 and Li et al. 123 show implementations for a quantum self-attention mechanism. Most recently, Guo et al. 120 and Liao et al. 124 independently present full end-to-end GPT quantum algorithms. Notably, the work from Guo et al. presents a rigorous complexity analysis and demonstrates a theoretical quantum advantage for numerous normalization operations throughout the architecture. The structure of a classical transformer decoder layer and the corresponding quantum implementation by Guo et al. is shown in Figure 11 .

The motivation for creating a quantum transformer is to reduce the complexity of the self-attention mechanism, which is the bottleneck of the architecture. The traditional classical self-attention mechanism scales 𝒪 ⁢ ( n 2 ⁢ d ) 𝒪 superscript 𝑛 2 𝑑 \mathcal{O}(n^{2}d) caligraphic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d ) for sequence length n 𝑛 n italic_n and embedding dimension d 𝑑 d italic_d . This arises from multiplying the query and key matrices Q ⁢ K ⊤ 𝑄 superscript 𝐾 top QK^{\top} italic_Q italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT as well as applying the resulting pairwise attention matrix to the value matrix V 𝑉 V italic_V . Unfortunately, the current quantum implementations that potentially achieve a complexity advantage rely on assumptions that are seldom true in ML, such as matrix sparsity. Some classical techniques try to avoid the 𝒪 ⁢ ( n 2 ⁢ d ) 𝒪 superscript 𝑛 2 𝑑 \mathcal{O}(n^{2}d) caligraphic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d ) complexity of scaled dot-product attention through alternative methods 125 , 126 , 127 . Similarly - instead of scaled dot-product attention - Quantinuum released an open-source model, Quixer 128 , that proposes a quantum analog of the k-skip-n-gram NLP technique for learning relationships between tokens. Quixer mixes embedded tokens by using linear combination of unitaries (LCU) 129 , and further computes skip-bigrams between words using quantum singular value transformations (QSVT) 130 . Quixer’s model scales O ⁢ ( log ⁡ ( n ⁢ d ) ) 𝑂 𝑛 𝑑 O(\log(nd)) italic_O ( roman_log ( italic_n italic_d ) ) in the number of qubits and O ⁢ ( n ⁢ log ⁡ ( d ) ) 𝑂 𝑛 𝑑 O(n\log(d)) italic_O ( italic_n roman_log ( italic_d ) ) in the number of gates. In contemporary transformer applications, sequence length n 𝑛 n italic_n is often much larger than the embedding d 𝑑 d italic_d which makes the logarithmic scaling in the number of qubits with respect to n 𝑛 n italic_n a promising look into the future of transformers.

While the present models largely do not claim an explicit complexity quantum advantage, this should not dissuade future researchers from utilizing the available methods for their pharmacological applications. The nascency of the field presents an opportunity for researchers in academia and pharmaceutical industry alike to hunt for advantages elsewhere. Presently with no current works in the literature applying quantum transformers to chemical, biological, or pharmaceutical tasks, this should inspire researchers to investigate if these quantum transformers can learn hidden features inaccessible to classical learning styles as indicated by Li et al. 123 . In this event, combining features extracted from both a quantum transformer component and a classical transformer component could present a model with a richer understanding of chemical and biological function, leading to exciting downstream effects in drug design.

4 Potential of Bosonic Quantum Processors for Quantum Machine Learning

4.1 basics of bosonic quantum computing.

Hybrid qubit-qumode devices 131 , 132 , 133 have the potential to augment the power of qubit architectures by allowing for data encoding in a much larger Hilbert space with hardware efficiency. For example, qumodes could amplify the impact of VQCs in applications to QML beyond the implementations discussed in Section  1.3.4 .

An arbitrary qumode state | ψ ⟩ ket 𝜓 \ket{\psi} | start_ARG italic_ψ end_ARG ⟩ , corresponding to the state of a quantum harmonic oscillator, can be expanded in its Fock basis state representation as a superposition of a countably infinite set of orthonormal photon-number states { | n ⟩ } ket 𝑛 \{\ket{n}\} { | start_ARG italic_n end_ARG ⟩ } . In practice, however, the expansion is truncated with a Fock cutoff d 𝑑 d italic_d , as follows:


According to Eq. ( 11 ), a qumode generalizes the two-level qubit into a d-level state (also known as qudit 134 , 135 , 136 ), thus offering an expanded basis set. Beyond the expanded basis, the hardware of bosonic modes are relatively weakly affected by amplitude damping errors 133 , which leads to extended lifetimes, and the possibility of implementing efficient error correction codes.  137 , 138 , 139

Recent advancements in bosonic quantum hardware have significantly progressed, enhancing the implementation of qumodes across various architectures 131 However, achieving universal quantum computing remains challenging when relying solely on native qumode operations. This is where hybrid qubit-qumode hardware have made notable strides. For example, in the circuit quantum electrodynamics (cQED) framework, a microwave cavity coupled to a transmon qubit has demonstrated considerable potential (Figure  12 ).  140 The interplay between qubit and qumode dynamics enables the development of hybrid qubit-oscillator gate sets, which are efficient in achieving universality. 141 , 142 , 133 .

Refer to caption

Additionally, photonic processors offer programmability that facilitates the simulation of bosonic systems.  143 , 144 In contrast, qubit based hardware is inherently suited for simulating fermions through the Jordan-Wigner transformation.  145 , 146 , 147 Therefore, a hybrid qubit-qumode architecture is particularly attractive, particularly since qubit-only or bosonic-only native gates might require deeper circuits for specific applications, although methods have been developed to represent bosons using qubits and vice versa.  148 , 149 , 150 .

Incorporating efficient bosonic representation could enable practical simulations beyond the capabilities of conventional qubit-based quantum computers, as already shown for example in calculations of vibrational spectra of small polyatomic molecules.  151 This can be achieved with photonic quantum processors 152 , 153 , 154 , cQED devices 151 , and even hybrid qudit-boson simulators 155 .

Another unique feature of qumodes is that they can also be represented by continuous variable (CV) bases corresponding to position and momentum operators of a quantum harmonic oscillator, 156 with no counterpart for qubits. For example, in the position representation, an arbitrary qumode state | ψ ⟩ ket 𝜓 \ket{\psi} | start_ARG italic_ψ end_ARG ⟩ can be expressed, as follows:


where ψ ⁢ ( x ) = ⟨ x | ψ ⟩ 𝜓 𝑥 inner-product 𝑥 𝜓 \psi(x)=\braket{x}{\psi} italic_ψ ( italic_x ) = ⟨ start_ARG italic_x end_ARG | start_ARG italic_ψ end_ARG ⟩ is the oscillator complex valued amplitude at x 𝑥 x italic_x . As state and process tomography are necessary to calibrate and model hardware noise, hybrid processors offer simple protocols to determine the Wigner function of qumode states 157 , 158 , 159 , 160 , 161 , allowing further development of abstract machine models 133 .

4.2 Potential Advantages of Qubit-Qumode Circuits in QML

Hybrid qubit-qumode circuits, such as the one shown in Figure  12 , can be parameterized with universal ansatzes to approximate any unitary transformation of the qubit-qumode system. An attractive choice of a universal ansatz 142 applies repeating modules of a qubit rotation gate,


where σ x subscript 𝜎 𝑥 \sigma_{x} italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and σ y subscript 𝜎 𝑦 \sigma_{y} italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are Pauli X and Y matrices, followed by an echoed conditional displacement (ECD) gate,


where a ^ † superscript ^ 𝑎 † \hat{a}^{\dagger} over^ start_ARG italic_a end_ARG start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT and a ^ ^ 𝑎 \hat{a} over^ start_ARG italic_a end_ARG are bosonic creation and annihilation operators, respectively.

The qumode Hilbert space may offer advantages over qubit-based registers since it allows for more efficient representations for predictive and generative tasks  134 , 135 , 136 . For example, a system with 8 qubits involves a Hilbert space with 2 8 = 256 superscript 2 8 256 2^{8}=256 2 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT = 256 basis states, which could be represented by two qumodes with control over d = 16 𝑑 16 d=16 italic_d = 16 Fock states (each mode offering a Hilbert space equivalent to the space expanded by 4 qubits) 151 . Therefore, encoding of complex molecular information that typically requires many qubits would potentially benefit from hybrid qubit-qumode circuits, as these systems offer significant hardware efficiency compared to qubit circuits with a similarly sized Hilbert space. Additionally, the circuits of qumode states can be based on efficient ansatzes or shallow circuits that bypass the need of deep circuits based on elementary logic gates. 134 , 151

4.3 Encoding Classical Information in Qubit-Qumode Circuits

We introduce two possible methods for encoding classical (or quantum) data in the form of quantum states of a qumode coupled to a qubit. Similar to amplitude encoding for qubit systems, we can adapt the method discussed in Section  1.3.3 for qumodes. We simply modify Eq. ( 9 ) to encode a vector of length d 𝑑 d italic_d into the amplitudes α k subscript 𝛼 𝑘 \alpha_{k} italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of a d 𝑑 d italic_d -level qudit, as follows:


where U x subscript 𝑈 𝑥 U_{x} italic_U start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT is the unitary transformation that encodes the data provided by the amplitudes in the form of the qumode state | x ⟩ ket 𝑥 |x\rangle | italic_x ⟩ . Here, | 0 ⟩ d subscript ket 0 𝑑 \ket{0}_{d} | start_ARG 0 end_ARG ⟩ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is the initial vacuum state of the qumode corresponding to an empty cavity without photons. Preparing U x subscript 𝑈 𝑥 U_{x} italic_U start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT requires parameterization of an ansatz with universal qumode control such as the one with blocks of a qubit rotation gate followed by an ECD gate (R-ECD ansatz) outlined in Figure 12 and Section  4.2 . 142 Other ansatzes are also available which can be parameterized to encode any arbitrary data set by amplitude encoding in the form of a qumode state. 141 , 133

Refer to caption

One technical challenge of these generalized phase encoding methods is that the encoded states for different states could partially overlap with each other, unless an orthogonalization procedure is enforced. The partial overlap could lead to some level of confusion due to ambiguity of the encoding. To address this challenge, the parameters assigned to each token can be made learnable parameters such that the encodings are optimized to be as different as possible.

5 Efficient Circuit Simulation for Near-Term Research and Computing Unit Integration

Despite recent progress, current Quantum Processing Units (QPUs) remain limited in size and computational capabilities due to noise and scaling challenges, which impedes progress in algorithmic research. To address this challenge, circuit simulation techniques are meeting the critical need to advance research boundaries. An open-source platform for seamlessly integrating and programming QPUs, GPUs, and CPUs within a single system is provided by NVIDIA’s CUDA-Q 162 (see Figure 14 ). Various quantum computing frameworks, including Cirq, Qiskit, TorchQuantum, and Pennylane, 163 , 164 , 165 , 166 utilize GPU-accelerated simulation through the cuQuantum libraries 167 featured in the CUDA-Q simulation backend. By employing the CUDA-Q compiler alongside cuQuantum APIs as simulation backends, users can achieve near-optimal GPU acceleration and exceptional performance at scale.

In this section, we demonstrate how CUDA-Q can be utilized to accelerate and scale up quantum circuit simulations. This is applicable to various fields including quantum machine learning for chemistry. We use CUDA-Q v0.8 for simulations and show how the compute resources scale with the size of the simulation. Examples used to reproduce the results presented in this section are available in GitHub 168 .

Refer to caption

5.1 Circuit simulator with state vector and GPU acceleration.

Desktop CPUs can handle the simulation of small numbers of qubits; for instance, on a laptop with at least 8 GB of memory, noiseless simulations can reach up to 24 qubits, while noisy simulations are feasible with up to 18 qubits 169 . However, as the memory required to store the full state vector grows exponentially with the number of qubits, GPUs are needed for larger simulations. For example, an NVIDIA DGX A100 can simulate 20 qubits with exceptional speed, while a CPU would be very slow at performing the state vector simulation of similar size, as shown in Figure 15 .

Figure 15 compares the logarithmic (log 10 ) runtime for computing the expectation value of a quantum circuit similar to the one shown in Figure 16 using a state vector simulator on one CPU (AMD EPYC 7742 64-Core Processor) as compared to one NVIDIA A100 GPU. The quantum circuit in Figure 16 is a standard parameterized quantum circuit employed in QNNs for different applications such as QGANs applied for drug discovery and molecular generation  103 , 170 . Specifically, Figure 15 shows the comparison of the runtime on a single CPU as compared to a single GPU for data-points of ten thousand (i.e., ten thousand expectation values) as a function of the number of qubits. It is shown that the runtime on the CPU significantly increases as we increase the number of qubits while increases only modestly on the NVIDIA A100 GPU. For example, for the 18 qubit circuit, there is a ≈ \approx ≈ 150 × \times × speed up on the single GPU. When increasing the number of qubits to 20, the speed up is ≈ \approx ≈ 530 × \times × . These results emphasize the need for GPU supercomputing to accelerate simulations of quantum algorithms and applications to research and development. Such simulations would enable studies beyond small-scale proof-of-concept calculations in application studies to real-world scenarios.

Refer to caption

Another example demonstrating the capabilities of CUDA-Q are the implementations of the VQE- Quantum approximate optimization algorithm (VQE-QAOA) algorithm for simulations of molecular docking.  172 and protein folding 173 For example, the VQE-QAOA algorithm has been applied to find the optimal configuration of a ligand bound to a protein, implementing the molecular docking simulation as a weighted maximum clique problem.  172 Simulations were performed with up to 12 qubits. The CUDA-Q tutorial 174 reproduces the results using DC-QAOA and compares the CPU and GPU runtimes (Table 1 ). For 12 qubits, a 16.6 × \times × speed up is observed on a single GPU when compared to a single CPU.

Qubits CPU time (s) GPU time (s)
6 0.322 0.160
8 1.398 0.390
12 6.863 0.412

CUDA-Q also allows for gate fusion to enhance state vector simulations with deep circuits, thereby improving performance.  175 , 176 Gate fusion is an optimization technique that combines consecutive quantum gates into a single gate (see Figure 17 ), which reduces the overall computational cost and increases the circuit efficiency.  177 , 178 By grouping small gate matrices into a single multi-qubit gate matrix, the fused gate can be applied in one operation, eliminating the need for multiple applications of small gate matrices. This optimization reduces memory bandwidth usage, as applying a gate matrix G to a state | Ψ ⟩ = G ⁢ | ϕ ⟩ ket Ψ 𝐺 ket italic-ϕ |\Psi\rangle=G|\phi\rangle | roman_Ψ ⟩ = italic_G | italic_ϕ ⟩ involves reading and writing the state vector. The memory bandwidth (in bytes, including reads and writes) can be calculated, as follows:


where ‘svSizeBytes’ represents the state vector size in bytes and ‘ncontrols’ is the number of control qubits (e.g., a CNOT gate has one control). Applying two gates, G 2 ⁢ G 1 ⁢ | ϕ ⟩ subscript 𝐺 2 subscript 𝐺 1 ket italic-ϕ G_{2}G_{1}|\phi\rangle italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_ϕ ⟩ , requires two reads and two writes, whereas applying the combined gate ( G 1 ⁢ G 2 ) ⁢ | ϕ ⟩ subscript 𝐺 1 subscript 𝐺 2 ket italic-ϕ (G_{1}G_{2})|\phi\rangle ( italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) | italic_ϕ ⟩ only needs one read and one write.

Gate fusion can significantly enhance simulation performance for deep circuits which are crucial for quantum applications in chemistry. A notable example is the unitary coupled cluster singles and doubles (UCCSD) ansatz, widely used in quantum computational chemistry calculations. For instance, when running a single observation call ( i.e. , computing one expectation value) for the C 2 H 4 molecule using the UCCSD ansatz with 24 qubits on an NVIDIA A100, the total elapsed time is 30.02 second without gate fusion. In contrast, with gate fusion, the elapsed time is reduced to 12.44 second, demonstrating a 2.4 × \times × speedup. The code for this comparison is available on GitHub 179 .

Refer to caption

5.2 Parallelization and Scaling

NVIDIA’s CUDA-Q platform provides a clear overview of the various devices in a quantum-classical compute node, including GPUs, CPUs, and QPUs. Researchers and application developers can work with a diverse array of these devices. Although the integration of multiple QPUs into a single supercomputer is still in progress, the current availability of GPU-based circuit simulators on NVIDIA multi-GPU architectures enables the programming of multi-QPU systems today.

5.2.1 Enabling Multi-QPU Workflows

CUDA-Q enables application developers to design workflows for multi-QPU architectures that utilize multiple GPUs. This can be achieved using either the ‘NVIDIA-mQPU’ backend  180 or the ‘remote-mQPU’ backend, which we discuss further in Sec.  5.2.3 . The ‘NVIDIA-mQPU’ backend simulates a QPU for each available NVIDIA GPU on the system, allowing researchers to run quantum circuits in parallel and thus accelerating simulations. This capability is crucial for applications such as quantum machine learning algorithms. For example, in training QNNs, computing expectation values for numerous data-points is often required to train the model. By batching these data-points, they can be processed simultaneously across multiple GPUs.

Figure 18 compares results obtained by running a QNN workflow running on a single GPU versus those obtain by distributing the workflow across four GPUs (in a single CPU node with 4 GPUs). The code for this comparison is available in GitHub 181 . For an application using 20 qubits, we find that the runtime with four-GPUs is approximately 3.3 times faster than using a single GPU. Although parallelization requires some synchronization and communication across the GPUs, which slightly limits the speedup to being less than 4x, this still demonstrates strong scaling performance. It highlights the efficient utilization of GPU resources when available.

Refer to caption

Another example of a commonly used application primitive that benefits from parallelization using the ‘NVIDIA-mQPU’ backend is the Hadamard test. The Hadamard test is crucial for computing the overlap between different states, as necessary to evaluate correlation functions, and expectation values which involve calculating O ⁢ ( n 2 ) 𝑂 superscript 𝑛 2 O(n^{2}) italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) independent circuits in a wide range of applications, including prediction of drug toxicity  18 and determining the electronic ground state energy of molecules  182 , 183 . By leveraging parallelism, these O ⁢ ( n 2 ) 𝑂 superscript 𝑛 2 O(n^{2}) italic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) circuits can be efficiently executed across as many QPUs –whether physical or simulated– as are available.

5.2.2 Scaling Circuit Simulations with Multi-GPUs

The conventional state-vector simulation method requires storing 2 n complex amplitudes in memory when simulating n 𝑛 n italic_n qubits. This results in exponentially increasing memory requirements for circuits with a large number of qubits. If each complex amplitude requires 8 bytes of memory, the total memory required for an n qubit quantum state is 8 bytes × \times × 2 n . For instance, with n = 30 𝑛 30 n=30 italic_n = 30 qubits, the memory requirement is approximately 8 8 8 8 GB, while for n = 40 𝑛 40 n=40 italic_n = 40 qubits, it jumps to about 8700 8700 8700 8700 GB. CUDA-Q addresses this challenge by enabling the distribution of state-vector simulation across multiple GPUs via the ‘NVIDIA-mGPU’ backend.  180 For detailed information of the algorithm, see Sec. II-C in Ref. 167 . Additionally, examples of using the ‘NVIDIA-mGPU’ backend are available on GitHub 184 .

The ‘NVIDIA-mGPU’ backend combines the memory of multiple GPUs within a single DGX compute node and across multiple DGX compute nodes in a cluster. DGX compute nodes, part of NVIDIA’s DGX platform, are high-performance computing (HPC) servers, specifically designed for HPC and artificial intelligence (AI) workloads, leveraging NVIDIA GPUs to accelerate intensive computations. By pooling GPU memory, this backend allows for greater scalability and eliminate the memory limitations of individual GPUs. Consequently, the capacity to simulate larger numbers of qubits is constrained only by the available GPU resources in the system.

Intra-node NVlink  185 is a powerful tool for large-scale simulations. An NVLink-based system enables greater performance optimization by providing direct access to the full NVLink feature set, bypassing the CUDA-Aware MPI layer. CUDA-Q v0.8 introduces an improved algorithm for intra-node NVLink, leveraging CUDA Peer-to-peer (P2P) communication.  186 Table 2 compares the performance of CUDA-Q 0.7 (using CUDA-aware MPI) and CUDA-Q 0.8 (using P2P) on an NVlink-enabled DGX H100 system. In these simulations, the state vector was distributed across a single node with 8 GPUs. Four large-scale quantum algorithms were benchmarked using the MPI and P2P API in CUDA Runtime. As shown in Table 2 , CUDA-Q v0.8 with P2P achieves up to 2.5x speedup for H-gates compared to CUDA-Q v0.7 with CUDA-aware MPI.

Algorithm Qubits Speed up (in simulation time)
H-Gates 35 2.47
QAOA 32 1.28
QFT 35 1.13
UCCSD 32 1.30

Additionally, developers can now use CUDA-Q to fully exploit the performance of the NVIDIA GH200 AI superchip,  187 further enhancing the capabilities of quantum simulation in CUDA-Q. With a combined CPU and GPU memory of 1.2TB, the GH200 AI superchip significantly accelerates quantum simulations, reducing the number of required nodes by 75 % percent \% % . This reduction is particularly crucial for quantum applications research, which is often constrained by memory limitations.

Table 3 compares the performance of the GH200 superchip and the DGX H100 for running a quantum algorithm using a state vector simulator. In this comparison, we employed 37 qubits and distributed the state vector across 8 GPUs on four nodes in the GH200 superchip and a single node in the DGX-H100. Our findings show that the GH200 superchip achieves up to 2.58x speed up for the quantum Fourier transform (QFT) and a 4x speed up for H-Gates.

Algorithm Qubits Speed up (in simulation time)
H-Gate 37 4.10
QFT 37 2.58

5.2.3 Combining Backends For Large Scale Simulations.

Quantum circuit simulations can be scaled up using the ‘NVIDIA-mGPU’ backend and parallelized with the ‘NVIDIA-mQPU’ backend, as described in the previous section. CUDA-Q provides the capability to combine both backends through the ‘remote-mQPU’ backend, enabling large-scale simulations (Figure 19 ). In this configuration, multiple GPUs comprise a virtual QPU . A practical example of using ‘remote-mQPU’ for QNNs is available on GitHub 188 .

Refer to caption

5.3 Quantum Circuit Simulator With Tensor Networks

The state vector method is effective for simulating deep quantum circuits, however, it becomes impractical for simulations of circuits with large numbers of qubits due to the exponential growth in computational resources required –making them unmanageable even on the most powerful supercomputers available today. As an alternative, the tensor network method represents the quantum state of N 𝑁 N italic_N qubits through a series of tensor contractions (see Figure  20 ). This approach allows quantum circuit simulators to efficiently handle circuits with many qubits.

Refer to caption

Tensors (see Figure 21 ) generalize scalars (0D), vectors (1D), and matrices (2D) to an arbitrary number of dimensions. A tensor network consists of a set of tensors connected together through tensor contractions to form an output tensor. In Einstein summation notation, a tensor contraction involves summing over pairs of repeated indices (see Figure 21 ). For example, a rank-four tensor M 𝑀 M italic_M can be formed by contracting two rank-three tensors C 𝐶 C italic_C and B 𝐵 B italic_B , as follows: M i ⁢ j ⁢ l ⁢ m = ∑ k C i ⁢ j ⁢ k ⁢ B k ⁢ l ⁢ m subscript 𝑀 𝑖 𝑗 𝑙 𝑚 subscript 𝑘 subscript 𝐶 𝑖 𝑗 𝑘 subscript 𝐵 𝑘 𝑙 𝑚 M_{ijlm}=\sum_{k}C_{ijk}\,B_{klm} italic_M start_POSTSUBSCRIPT italic_i italic_j italic_l italic_m end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_k italic_l italic_m end_POSTSUBSCRIPT . Here, the contraction is performed by summing over the shared index k 𝑘 k italic_k . Identifying an efficient contraction sequence is essential for minimizing the computational cost of the tensor networks .  167 , 189 The contractions between the constituent tensors define the topology of the network.  190 , 191

Refer to caption

CUDA-Q offers two GPU-accelerated tensor network backends: ‘tensornet’ and ‘tensornet-mps’ 192 . For a detailed explanation of tensor network algorithms and their performance, see Refs.  193 , 167 .

The ‘tensornet’ backend represents quantum states and circuits as tensor networks without any approximations. It computes measurement samples and expectation values through tensor network contractions.  194 This backend supports the distribution of tensor operations across multiple nodes and GPUs, enabling efficient evaluation and simulation of quantum circuits.

The ‘tensornet-mps’ backend utilizes the matrix product state (MPS) representation of the state vector, exploiting low-rank approximations of the tensor network through decomposition techniques such as QR and singular value decomposition. As an approximate simulator, it allows truncation of the number of singular values to keep the MPS size manageable. The ‘tensornet-mps’ backend supports only single-GPU simulations. Its approximate nature enables it to handle a large number of qubits for certain classes of quantum circuits while maintaining a relatively low memory footprint.

6 Challenges and Outlook

6.1 hardware.

When evaluating the physical implementation of quantum computers, it is essential to consider the widely recognized five criteria proposed by DiVincenzo  195 :

Scalable physical systems with well-characterized qubits : The system should contain qubits that are not only distinguishable from each other but also manipulable either individually or collectively. This requirement ensures that qubits can be controlled with precision for complex quantum computations.

Ability to initialize qubits to a simple, known state : Typically referred to as a “fiducial” state, this criterion emphasizes the importance of preparing qubits in a well-defined, simple initial state, such as the zero state. This initialization process is crucial for the reliability and predictability of subsequent quantum operations.

Decoherence times much longer than gate operation times : Quantum systems must exhibit long coherence times relative to the time it takes to perform quantum gate operations. This ensures that quantum information is preserved long enough to complete computations before being lost to decoherence.

A universal set of quantum gates : The hardware must support a set of quantum gates capable of performing any quantum computation. This typically includes a variety of single-qubit gates along with a two-qubit entangling gate, such as the CNOT gate, enabling the construction of complex quantum circuits.

Qubit-specific measurement capability : The system should allow for accurate measurement of individual qubits’ states after computation. This criterion is essential for retrieving the final output of quantum computations.

Gate-based quantum computer designs generally adhere to these criteria, yet achieving the most optimal performance remains a significant challenge. For QML, these hardware requirements introduce additional complexities.

QNNs often claim superior expressive power compared to classical neural networks. This advantage typically necessitates high connectivity among qubits, aligning with the need for well-characterized and scalable qubit systems described in Criterion (1). Ensuring such connectivity while maintaining system scalability and qubit fidelity is a non-trivial challenge in current hardware implementations.

Moreover, QML algorithms frequently utilize amplitude encoding, a technique that effectively encodes classical data into quantum states. This approach, however, is equivalent to preparing arbitrary quantum states, which goes beyond the simpler requirement of initializing qubits to a fiducial state as outlined in Criterion (2). Consequently, specific QML applications may require either modifications to the existing hardware criteria or the development of more advanced state preparation algorithms to achieve the desired outcomes.

Finally, when the final output of a QNN necessitates precise amplitude measurements of quantum states, the hardware must extend the measurement capabilities described in Criterion (5). Specifically, accurate and scalable quantum state tomography becomes essential to extract the necessary information from the quantum system. This represents another area where current quantum hardware may need further refinement to fully support the demands of QML.

6.2 Algorithms

Loading classical data into a quantum state is the often first step in a QNN, and is a step that will largely dictate the performance of the model, the potential advantages the quantum model possess over the classical, and the model’s quantum resource complexity. For example, angle encoding 7 is inexpensive to implement on quantum hardware, but it is difficult to extract a complexity advantage. Alternatively, amplitude encoding easily enables a complexity advantage due to the exponentially larger Hilbert space in which information can be stored, but at the expense of quantum resources to prepare such quantum state. In particular, state preparation techniques to prepare arbitrary state vectors scale exponentially with respect to the number of CNOT gates required to prepare the quantum state 196 , 197 . While this problem of state preparation may be daunting, promising data encoding workarounds are being developed. Data re-uploading is a strategy that allows circuits to handle more complex data by breaking the information into smaller quantum circuits 198 . Shin et al. presents a method for QML that utilizes a quantum Fourier-featured linear model to exponentially encode data in a hardware efficient manner 199 . The authors demonstrate the method achieves high expressivity and exhibits better learning performance compared to data re-uploading, notably when learning the potential energy surface of ethanol. These promising directions should motivate QML researchers to identify tasks where their input data exists in or can be transformed into a form that is known to be efficiently prepared 200 , 201 , 202 or where the exact input vector does not need to be known a priori and is learned through training. Furthermore, as QPUs evolve to include more qubits and improved interconnected topologies, state preparation algorithms that utilize ancillary qubits will help address the challenges associated with poor decoherence times and prolonged gate execution times, as they are capable of preparing arbitrary states with shallower depths 203 .

Similar to how classical ML architectures have the potential to suffer from vanishing gradients, VQCs have the potential to suffer from barren plateaus. Barren plateaus occur when the loss differences used to compute quantum weight gradients exponentially vanish with the size of the system. Larocca et al. present comprehensive review where the authors outline strategies to avoid and mitigate the problem of barren plateaus 204 . Some of these methods the aspiring QML researcher should be aware of are shallow circuits and clever weight initialization strategies. Notably, Ragone et al. 205 present a theorem to determine exactly if any noiseless quantum circuit will exhibit barren plateaus regardless of the circuit’s structure. The authors note that among the implications of their work, it is possible to design variational quantum circuits that exhibit high entanglement and use non-local measurements while still avoiding barren plateaus , going against conventional wisdom. This lifts restrictions and gives researchers a much deeper insight into the trainability of their circuits.

In addition to the difficulties of determining quantum gradients, updating the quantum weights can prove difficult as well. Classical neural networks have had tremendous success using backpropagation to update the model’s weights, however methods for updating quantum weights is still being intensely researched. QNNs most commonly employ the parameter-shift method 206 , 207 to estimate quantum gradients for each weight, however this can prove expensive as it requires running at least 2 ⁢ M 2 𝑀 2M 2 italic_M quantum circuits for M 𝑀 M italic_M trainable parameters during the backwards pass computation, giving a total time complexity of O ⁢ ( M 2 ) 𝑂 superscript 𝑀 2 O(M^{2}) italic_O ( italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . New methods for quantum backpropagation are emerging that is making the evaluation of quantum gradients more efficient, most recently the work by Abbas et al. 208 that reduces the complexity from quadratic parameter-shift method to O ⁢ ( M ⁢ polylog ⁢ ( M ) ) 𝑂 𝑀 polylog 𝑀 O(M\text{polylog}(M)) italic_O ( italic_M polylog ( italic_M ) ) time. The expensive nature of required quantum resources to update weights encourage many to optimization methods. Many quantum neural networks in the literature often employ the Constrained Optimization by Linear Approximations algorithm 209 for weight optimization, however this method is only applicable for models with few trainable parameters. Work is being done to improve gradient-free based optimization of VQC parameters that are more efficient than the parameter-shirt method. Kulshrestha et al. devise an optimization scheme with good scalability potential that trains at the level of classical optimizers while outperforming them in computation time 210 . Weidmann et al. present an optimization method that significantly improves convergence of QNNs compared to the parameter-shift method 211 .

6.3 Outlook

In this review, we have examined the use of QNNs implemented on gate-based quantum computers for applications in chemistry and pharmaceuticals. While the integration of quantum computing into these fields holds the potential for significant advancements, it also presents unique challenges that must be addressed.

As discussed in the previous subsections, the hardware and algorithmic challenges for QML are substantial. The requirements for coherence, qubit connectivity, and state preparation introduce significant hurdles that have yet to be fully overcome. QNNs often require precise qubit control and extended coherence times, which current quantum hardware struggles to provide consistently. On the algorithmic front, issues such as state preparation, barren plateaus, and efficient quantum gradient computation remain critical bottlenecks that demand innovative solutions.

Recent progress in quantum error correction, highlighted by Google Quantum AI’s breakthrough  212 , marks a significant milestone. This achievement suggests that we are nearing the development of more reliable quantum systems, which is crucial for the practical implementation of QML in real-world scenarios. However, there remains a pressing need for improved scalability of quantum hardware and the development of more robust error correction protocols.

Looking ahead, as quantum technology continues to mature, we anticipate the emergence of more sophisticated applications, such as the discovery of new drugs and materials, the optimization of chemical reactions, and the exploration of molecular structures with unprecedented accuracy. The intersection of quantum computing and machine learning offers a unique opportunity to transform how we tackle some of the most complex challenges in science and industry.

The authors acknowledge support from the National Science Foundation Engines Development Award: Advancing Quantum Technologies (CT) under Award Number 2302908. VSB also acknowledges partial support from the National Science Foundation Center for Quantum Dynamics on Modular Quantum Devices (CQD-MQD) under Award Number 2124511.

Disclosure Statement

AG and SK are employees of Moderna, Inc. and may own stock/stock options in the company. AMS, YS, GWK, VSB and MHF, EK have ongoing collaborative projects using CUDA-Q which do not alter the scientific integrity of the work presented herein. Other authors declare no conflict of interest.

