september 2022 - june 2023

Deep learning for the design of quantum sensors

Master's thesis

Abstract

Deep learning is an area of machine learning that is experiencing enormous development in recent years and whose applications are revolutionizing very diverse fields, including science. On the other hand, quantum metrology studies how to apply the principles of quantum physics to make highly precise measurements. Recently, interesting applications of machine learning have been found in quantum metrology for the measurement of homogeneous fields, that is, physical parameters that do not depend on position. In this work, the possibility of reconstructing inhomogeneous fields using quantum sensors and deep learning techniques is studied. In particular, we seek to determine the position of an electron in a plane from the measurement of three nitrogen-vacancy (NV) centers, a problem with important practical applications. To do this, first the dynamics of the systems involved are simulated and, subsequently, with the simulated data, artificial intelligence models are trained to infer the position of the electron. As a result, subnanometer error scales are achieved and the way is opened for future work in this line.

Introduction

In recent years, enormous progress in computational capacity and data collection has allowed the current deep learning revolution to materialize, an area of machine learning that uses large learning models and massive amounts of data to solve a wide variety of tasks in different fields, including science. On the other hand, quantum metrology is that discipline that studies how to estimate physical parameters with great precision using quantum resources. Due to the importance of being able to make precise measurements, both in the scientific and technological fields, and thanks to the fact that it already has practical results, quantum metrology constitutes one of the most relevant and most developed applications of quantum theory. Recently, the possible applications of deep learning techniques in this area have begun to be studied. Specifically, promising results have been obtained for the measurement of homogeneous fields, that is, physical parameters that do not depend on position, such as the frequency of an electromagnetic field.

Regarding the motivation for this work, until now, a good part of the problems that have been addressed in the field of quantum sensors are those in which a homogeneous field is measured. The measurement of an inhomogeneous field is more complicated, it is carried out using techniques such as Bayesian inference and requires detailed knowledge of the physical models of the systems involved. A relevant example of reconstruction of an inhomogeneous field is the location of a certain electron. This is a problem that would have very significant applications if it could be solved using sensors such as those based on nitrogen-vacancy centers, which are biocompatible and capable of operating at atomic scales and at room temperature. Specifically, this problem could open the door to studying conformational changes in proteins, chemical reactions in real time, the structure of complex molecules, etc. The problem is that there is no direct and easy way to obtain the coordinates of an electron. However, it is possible that the use of deep learning techniques makes it possible to solve inhomogeneous field problems like this one, in an automated, efficient way and without the need to know the physical models that describe the systems studied.

The specific problem studied in this work is that of inferring the position of an electron from the analysis of the measurements of a quantum sensor using deep learning. To do this, the interaction between an electron and a sensor formed by 3 NV centers will be simulated and, subsequently, with the simulated data, a deep learning model capable of inferring the position of the electron from the measurements of said sensor will be searched.

Theoretical framework

Firstly, quantum sensors are devices that apply the principles of quantum mechanics to estimate physical parameters with great precision, taking advantage of the great sensitivity of quantum systems to external perturbations and their great spatial resolution due to their small size, typically atomic scale. Basically they are qudits, systems of two or more energy levels, whose state can be initialized and read, and which have sufficiently long coherence times. Furthermore, the use of sensors with several qudits makes it possible to resort to quantum entanglement to sometimes overcome the precision limit for classical or non-entangled quantum sensors. The estimation of a parameter using a quantum sensor usually requires knowledge of the Hamiltonian that describes its evolution and the effect that the study signal has on it.

A very popular quantum sensor implementation is based on nitrogen-vacancy (NV) centers. NV centers are one of the most common types of color center in diamond, that is, fluorescent defects in the crystal. NV centers are interesting because they are very stable, with a very robust and spin-dependent luminescence, that is, they are sensitive to magnetic fields; also for having good coherence times even at room temperature (on the order of \(10\;\mathrm{\mu s}\)) and being able to be synthesized at the atomic scale

The optical properties of NV centers can be explained by three electronic levels, of which the fundamental one is a spin triplet. Of its three sublevels, the one with the lowest energy corresponds to a value \(m_s = 0\) of the \(z\) component of the spin, while the other two sublevels are degenerate, although they can be separated by applying a magnetic field. In the presence of only a magnetic field \(\vb{B}\) and under standard approximations, the Hamiltonian that describes an NV center is the following: \[H_{\mathrm{NV}} = \hbar DS_z^2 - \hbar\gamma\vb{B}\cdot\vb{S}\] As can be seen, the magnetic field acts on the spin of the NV center with a vector dependence, which makes them sensitive to changes in both its magnitude and its orientation.

On the other hand, machine learning is a branch of artificial intelligence whose objective is to make computers learn, that is, improve their performance on a given task, from data. In turn, deep learning is an area of machine learning that is characterized by making use of large learning models, based on so-called artificial neural networks, which are trained with massive amounts of data.

Neural networks, in their simplest form, are supervised learning models that consist of \(L\) "layers", in each of which a linear transformation is applied, given by a matrix \(W^{(l)}\) and a vector \(\vb{b}^{(l)}\), which constitute the parameters of the model, followed by a nonlinear activation function \(\sigma^{(l)}\): \[\vb{a}^{(l)} = \sigma^{(l)}\pqty{W^{(l)}\vb{a}^{(l-1)}+\vb{b}^{(l)}}\] Its learning or training is carried out by taking a set of data and optimizing a cost function \(J\) that measures the discrepancy between the model predictions, \(\vu{y}\), and the corresponding expected values, \(\vb{y}\), given by the labels of the training data.

A particular type of neural networks are the so-called convolutional neural networks (CNNs), which allow data arrays to be analyzed efficiently and taking into account the position of their elements. In a convolutional network, the usual neuron layers are replaced by sets of "filters", arrays of parameters that apply a convolution on the output of the previous layer: \[a_{k,i_1,\ldots, i_n}^{(l)} = \sigma^{(l)}\bqty{\pqty{\vb{a}^{(l-1)}*K^{[ k](l)}}_{i_1,\ldots, i_n} + b^{[k](l)}}\]

Another type of neural network is recurrent neural networks (RNNs), which in this case allow data sequences to be analyzed efficiently and taking into account the order of their elements. Basically, a recurrent network is a simple network that is applied to each position of the data sequence \(\vb{x}\) and that, as an additional input, at each layer, receives the activations of that same layer at the previous position in sequence: \[\vb{a}^{\ev{t}(l)} = \sigma^{(l)}\pqty{W^{(l)}\bqty{\vb{a}^{\ev{t-1}(l)}, \vb{a}^{\ev {t}(l-1)}} + \vb{b}^{(l)}}\] \[\vu{y}^{\ev{t}} = \sigma^{(L)}(W^{(L)} \vb{a}^{\ev{t}(L- 1)} + \vb{b}^{(L)})\]

It is also possible to build bidirectional recurrent networks, that is, that analyze sequences in one direction and in the opposite direction, and some popular variants are the so-called GRU and LSTM networks.

Methodology

Regarding the methodology of the work, it is divided into two parts. First, the interaction between a set of NV centers and an electron is simulated, and measurements of the NV centers are obtained for different positions of the electron. Secondly, with these measurements, learning models are trained to infer the position of the electron from them.

Regarding the simulation, it will be considered that the interactions of the NV centers with each other and with the electron will be of the dipolar type. It will be assumed that the electron starts from a thermal state at room temperature. The NV centers, on the other hand, will be initialized in their ground state. Alternatively they will be initialized in a maximally entangled state, with the aim of seeing if entanglement produces any positive effect on the task of inferring the position of the electron. Once the systems are initialized, they will be left to evolve freely for a certain time \(\tau\). Finally, the probabilities of measuring each state of the computational base of the set of NV centers will be obtained

With these probabilities, it is possible to generate measurements of the NV centers of the sensor and, thus, obtain a set of data, with measurements for different \(\vb{r}_0\) positions of the electron, with which to train the deep learning models.

Regarding the part of the deep learning models, to evaluate the performance of the three types of networks already exposed (simple, convolutional and recurrent), the value of a metric will be studied, in this case the mean absolute error (MAE), on three different data sets:

  • Training set: it will consist of 100,000 different simulation configurations and will be used to adjust the parameters of the models, that is, so that they learn.
  • Validation set: will consist of 10,000 configurations and will be used to manually adjust the hyperparameters of the models, those that do not learn on their own, and also to detect a possible generalization error.
  • Test set: will consist of 10,000 configurations and will be used to evaluate the real performance of the models previously optimized using the two previous data sets.

The procedure to follow to find the optimal model will be the following:

  • Minimize the error on the training set.
  • Minimize error on the validation set.

Finally, once the optimal model has been chosen, its performance on the test set will be evaluated

All the simulation code and the deep learning models can be consulted in the corresponding repositories on GitHub:

Results

After carrying out several tests, the values chosen for the simulation parameters are those detailed below. All of them are physically reasonable and offer an acceptable degree of sensitivity. Firstly, \(N=3\) NV centers have been considered, one for each degree of freedom of the electron position. The NV centers have been arranged at the vertices of an equilateral triangle, with side \(a = 3.5\;\mathrm{nm}\), on the \(xy\) plane and centered at the origin. To simplify the problem, the accessible region of the electron has been restricted to a square of side \(L = 5\;\mathrm{nm}\), parallel to the plane of the NV centers, centered on the \(z\) axis and at a distance \(z = 1.25\times a\). On the other hand, in order to optimize the sensitivity of the NV centers, the value of the magnetic field has been set as \(B = D/(2\gamma)\), so that the transition between the fundamental level and the first excited level of the NV centers is resonant with the transition between the two possible states of the electron. Regarding the evolution time, \(\tau = 1\;\mathrm{\mu s}\) has been taken, which is less than the coherence time of the NV centers.

The dependence of the mean spin of each NV center on the position of the electron, when initializing them in their ground state and for the previous parameters, is shown on the next plot. At first glance, it seems possible to establish a bijective relationship between the values of the mean spins of the three NV centers and the position of the electron. It would be expected, therefore, that a neural network would be capable of learning this relationship.

Regarding the learning models, the architectures that have been found to have better performance, for each type of network, are the following. All of them first calculate the average spin of the measurements, since otherwise it would worsen their performance and multiply their number of parameters. The simple network (NN) model consists of two intermediate layers of 500 neurons each. The convolutional model (CNN) is made up of two convolutional layers of 300 filters each, with dimensions \(1\times 2\), plus a simple layer with 300 neurons. The recurrent model is made up of two simple bidirectional recurrent layers of 100 neurons each. The LSTMs and GRUs layers had slightly worse performance and increased the number of parameters. All models also have a simple last additional layer that processes the output.

The following graph shows the training history of the previous models. The training has been carried out for 100 epochs and 10,000 measurements have been taken for each simulation configuration. As can be seen, no overfitting problem has occurred, since there is no appreciable difference between the errors on the training set and the validation set. Nor does it seem that an underfitting problem is occurring, since the error always stabilizes around \(0.3\;\mathrm{nm}\) for the majority of architectures and network sizes studied.

In the next graph, the average absolute error on the test set is represented as a function of the number of measurements per configuration. The last point on the x-axis, denoted by \(\infty\), represents the case in which the mean spin is calculated directly from the probabilities, obtained in the simulation, for each state of the NV centers. This would be equivalent to calculating the mean spin by performing an infinite number of measurements and therefore represents an ideal case. As can be seen and was already seen in the previous graph, while the convolutional and recurrent models seem to have similar performance, the simple neural network model presents a somewhat greater error. Although in this case, due to its spatial arrangement, the relative order of the NV centers is irrelevant when analyzing its measurements, it would not be irrelevant in the case of considering more NV centers or in other spatial configurations. In that case, the performance difference could be even more significant.

On the other hand, as expected, the mean error decreases as the number of measurements increases, and the reason is that the estimate of the mean spin becomes more precise and the image that the models "see" of its distribution is less distorted, as shown on the next plot.

Below, the distribution of absolute error (AE) on the testing set for the convolutional model, which had the lowest mean error (MAE), is represented. In the case of infinite measurements, it can be seen that most of the points have an error between the tenth and the thousandth of a nanometer. Specifically, more than 90% of the points have an absolute error of less than a tenth of a nanometer, while only 1% exceeds a nanometer error. For the case of 10,000 measurements, however, the 90th percentile amounts to \(0.49\;\mathrm{nm}\).

In the following graph, the distribution of absolute error (AE) on the testing set is represented, for the convolutional model and infinite measurements, against the real position of the electron. As can be seen, it is quite homogeneous, without large conflict zones.

Finally, no advantage has been found in initializing the NV centers in a maximally entangled state. What's more, the performance of the models worsens drastically. The reason is that, in this case, the probabilities of the three possible spin values of each NV center are very similar and present very small variations, which are more difficult to appreciate with a finite number of measurements.

Conclusions

As already said, quantum metrology represents one of the most relevant practical applications of quantum theory, where, recently, interesting progress is being made in the measurement of homogeneous fields thanks to machine learning. On the other hand, the possibilities offered by these techniques in the reconstruction of inhomogeneous fields is a path yet to be explored.

This work sought to carry out a proof of concept, studying the viability of this approach in an apparently simple problem: inferring the position of an electron in a plane from the measurement of 3 NV centers. As a result, subnanometer error scales are obtained, which could surely be improved by exploring new simulation configurations.

Without a doubt, it is worth continuing to commit to this approach. In the case of homogeneous fields, machine learning makes it possible to solve problems without needing to know the models that describe the systems involved, in contexts in which the sensor presents a complex response or under conditions of noise in the measurements, perhaps due to having a limited number of them. There does not seem to be any theoretical impediment to being able to extend these advantages to the inhomogeneous case and, if achieved, it would allow the automated use of quantum sensors in a new range of problems. In particular, being able to determine the position of electrons would have important practical applications in such fundamental areas as chemistry or medicine.

Get in Touch