Federated Quantum Machine Learning

Distributed training across several quantum computers could significantly improve the training time and if we could share the learned model, not the data, it could potentially improve the data privacy as the training would happen where the data is located. One of the potential schemes to achieve this property is the federated learning (FL), which consists of several clients or local nodes learning on their own data and a central node to aggregate the models collected from those local nodes. However, to the best of our knowledge, no work has been done in quantum machine learning (QML) in federation setting yet. In this work, we present the federated training on hybrid quantum-classical machine learning models although our framework could be generalized to pure quantum machine learning model. Specifically, we consider the quantum neural network (QNN) coupled with classical pre-trained convolutional model. Our distributed federated learning scheme demonstrated almost the same level of trained model accuracies and yet significantly faster distributed training. It demonstrates a promising future research direction for scaling and privacy aspects.


Introduction
Recently, advances in machine learning (ML), in particular deep learning (DL), have found significant success in a wide variety of challenging tasks such as computer vision [1][2][3], natural language processing [4], and even playing the game of Go with a superhuman performance [5].
In the meantime, quantum computers are introduced to the general public by several technology companies such as IBM [6], Google [7], IonQ [8] and D-Wave [9]. Theoretically, quantum computing can provide exponential speedup to certain classes of hard problems that are intractable on classical computers [10,11]. The most famous example is the factorization of large numbers via Shor algorithm [12] which can provide exponential speedup. While the search in unstructured database via Grover algorithm [13] can provide quadratic speedup. However, currently available quantum computers are not equipped with quantum error correction [14,15] and would suffer from the device noise. Quantum computation tasks or quantum circuits with a large number of qubits and/or a long circuit depth cannot be faithfully implemented on these so-called noisy intermediate-scale quantum (NISQ) devices [16]. Therefore, it is a highly challenging task to design applications with moderate quantum resources requirements which can leverage the quantum advantages on these NISQ devices.
With the above-mentioned two rapid growing fields, it is then natural to consider the combination of them. Especially the machine learning applications which can be implemented on NISQ devices. Indeed, the area quantum machine learning (QML) draws a lot of attention recently and there are several promising breakthroughs. The most notable progress is the development of variational algorithms [17][18][19] which enable the quantum machine learning on NISQ devices [20]. Recent efforts have demonstrated the promising application of NISQ devices in several machine learning tasks .
One of the common features of these successful ML models is that they are data-driven. To build a successful deep learning model, it requires a huge amount of data. Although there are several public datasets for research purpose, most advanced and personalized models largely depend on the collected data from users' mobile devices and other personal data (e.g., medical record, browsing habits and etc.). For example, ML/DL approaches also succeed in the field of medical imaging [47,48], speech recognition [49][50][51], to name a few. These fields rely critically on the massive dataset collected from the population and these data should not be accessed by unauthorized third-party. The use of these sensitive and personally identifiable information raises several concerns. One of the concerns is that the channel used to exchange with the cloud service providers can be compromised, leading to the leakage of high-value personal or commercial data. Even if the communication channel can be secured, the cloud service provider is also risky as malicious adversaries can potentially invade the computing infrastructure. There are several solutions to deal with such issues. One of them is called federated learning (FL), which focuses on the decentralized computing architecture. For example, users can train a speech recognition model on his cell phone and upload the model to the cloud in exchange of the global model without upload the recordings directly. Such framework is made possible due to the fact of recent advances in hardware development, making even the small devices so powerful. This concept not only helps the privacy-preserving practice in classical machine learning but also in the rapidly emerging quantum machine learning as researchers are trying to expand the machine learning capabilities by leveraging the power of quantum computers. To harness the power of quantum computers in the NISQ era, the key challenge is how to distribute the computational tasks to different quantum machines with limited quantum capabilities. Another challenge is the rising privacy concern in the use of large scale machine learning infrastructure. We address these two challenges by providing the framework of training quantum machine learning models in a federated manner.
In this paper, we propose the federated training on hybrid quantum-classical classifiers. We show that with the federated training, the performance in terms of the testing accuracy does not decrease. In addition, the model still converges quickly compared to the nonfederated training. Our efforts not only help building secure QML infrastructure but also help the distributed QML training which is to better utilize available NISQ devices. This paper is organized as follows-in Section 2, we introduce the concept of federated machine learning. In Section 3, we describe the variational quantum circuit architecture in details. In Section 4, we describe the transfer learning in hybrid quantum-classical models. Section 5 shows the performance of the proposed federated quantum learning on the experimental data, followed by further discussions in Section 6. Finally we conclude in Section 7.

Federated Machine Learning
Federated learning (FL) [52] emerges recently along with the rising privacy concerns in the use of large-scale dataset and cloud-based deep learning [53]. The basic components in a federated learning process are a central node and several client nodes. The central node holds the global model and receives the trained parameters from client devices. The central node performs the aggregation process to generate the new global model and share this new model to all of its client nodes. The client nodes will train locally with the received model with their own part of data, which in general is only a small portion. In our proposed framework, the local clients are quantum computers or quantum simulators with the circuit parameters trained via hybrid quantum-classical manner. In each training round, a specified number of client nodes will be selected to perform the local training. Once the client training is finished, the circuit parameters from all the client nodes will be aggregated by the central node. There are various methods to aggregate the model. In this work, we choose the mean of the client models. The scheme of federated quantum machine learning is shown in Figure 1. For further discussion and advanced settings on federated learning, we refer to [54][55][56][57][58][59][60][61].

Variational Quantum Circuits
Variational quantum circuits (VQC) or quantum neural networks (QNN) are a special kind of quantum circuits with adjustable circuit parameters subject to optimization procedures developed by the classical machine learning community. In Figure 2 we introduce the general setting of a VQC in which the E(x) encodes the classical data into a quantum state which the quantum gates can actually operate on and the W(φ) is the learnable block with parameters φ which can be seen as the weights in classical neural network. There are quantum measurements in the final part of VQC which is to readout the information from a quantum circuit and these classical numbers can be further processed with other classical or quantum components. Figure 2. General structure for the variational quantum circuit (VQC). The E(x) is the quantum routine for encoding the classical data into the quantum state and W(φ) is the variational quantum circuit block with the learnable parameters φ. After the quantum operation, the quantum state is measured to retrieve classical numbers for further processing.

Quantum Encoder
For a quantum circuit to operate on a classical dataset, the critical step is to define the encoding method which is to transform the classical vector into a quantum state. The encoding scheme is important as it is relevant to the efficiency of hardware implementation and potential quantum advantages. In NISQ era, the number of qubits as well as the circuitdepth are limited. Therefore, we need to encode the classical values with small number of qubits and without too many quantum operations. For more in-depth introduction of various kinds of encoding methods used in QML, refer to [71]. A general N-qubit quantum state can be represented as: where c q 1 ,...,q N ∈ C is the amplitude of each quantum state and q i ∈ {0, 1}. The square of the amplitude c q 1 ,...,q N is the probability of measurement with the post-measurement state in |q 1 ⊗ |q 2 ⊗ |q 3 ⊗ ... ⊗ |q N , and the total probability should sum to 1, that is, In this work, we use use the variational encoding scheme to encode the classical values into a quantum state. The basic idea behind this encoding scheme is to use the input values or their transformation as rotation angles for the quantum rotation gate. As shown in Figure 3, the encoding parts consist of single-qubit rotation gates R y and R z and use arctan(x i ) and arctan(x 2 i ) as the corresponding transformations.
. Variational quantum classifier. The variational quantum classifier includes three components: encoder, variational layer and quantum measurement. The encoder consists of several single-qubit gates R y (arctan(x i )) and R z (arctan(x 2 i )) which represent rotations along y-axis and z-axis by the given angle arctan(x i ) and arctan(x 2 i ), respectively. These rotation angles are derived from the input values x i and are not subject to iterative optimization. The variational layer consists of CNOT gates between each pair of neighbouring qubits which are used to entangle quantum states from each qubit and general single qubit unitary gates R(α, β, γ) with three parameters α, β, γ. Parameters labeled α i , β i and γ i are the ones for iterative optimization. The quantum measurement component will output the Pauli-Z expectation values of designated qubits. The number of qubits and the number of measurements can be adjusted to fit the problem of interest. In this work, we use the VQC as the final classifier layer, therefore the number of qubits equals to the latent vector size which is 4 and we only consider the measurement on the first two qubits for binary classification. The grouped box in the VQC may repeat several times to increase the number of parameters, subject to the capacity and capability of the available quantum computers or simulation software used for the experiments. In this work, the grouped box repeats for 2 times.

Quantum Gradients
The hybrid quantum-classical model can be trained in an end-to-end fashion, following the common backpropagation method used in training deep neural network. When it comes to the gradient calculation on quantum functions, parameter-shift method is employed. It can be used to derive the analytical gradient of the quantum circuits. The method is described in the reference [72,73]. The idea behind the parameter-shift rule is that given the knowledge of calculating the expectation of certain observable of quantum functions, the quantum gradients can be calculated without the use of finite difference method.

Hybrid Quantum-Classical Transfer Learning
In the NISQ era, the quantum computers are not error-corrected and thus cannot perform calculations in a fault-tolerant manner. The circuit depth and number of qubits are therefore limited and it is non-trivial to design the model architectures which can potentially harness the capabilities provided by near-term quantum computers. In this work, we employ the hybrid quantum-classical transfer learning scheme inspired by the work [24]. The idea is to use a pre-trained classical deep neural network, mostly convolutional neural networks (CNN) to extract the features from the images and compress the information into a latent vector x which is with much smaller dimension than the original image. Then the latent vector x is processed by the quantum circuit model to output the logits of each class. The scheme is presented in Figure 4. In this work, we employ the VGG16 [1] pre-trained model as the feature extractor.

Dataset
Pre-trained model

Experiments and Results
In this study we consider the following setting: • Central node C: Receive the uploaded circuit parameters θ i from each local machine N i and aggregate them into a global parameter Θ and distributes to all local machines.

•
Training points are equally distributed to the local machines and the testing points are on the central node to evaluate the aggregated global model.

•
Individual local machines N i : Each has a distinct part of the training data and will perform E epochs of the training locally with the batch size B.

Cats vs. Dogs
We perform the binary classification on the classic cats vs dogs dataset [76]. Each image in this dataset has slightly different dimensions, therefore we preprocessed to make all of the training and testing samples in the dimension of 224 × 224. In Figure 5 we show some of the examples from this dataset. Here, we have in total 23,000 training data and 2000 testing data. The testing data are on the central node which will be used to evaluate the aggregated model (global model) after each training round. The training data are equally distributed to the 100 local machines N i where i ∈ {1 · · · 100}. Therefore in each local machine, there are 230 training points. In each training round, 5 local machines will be randomly selected and each will perform 1, 2 or 4 epochs of training with its own training points. The batch size is S = 32. The trained model will then be sent to the central node for the aggregation. The aggregation method we use in this experiment is the collected model average. The aggregated model (global model) will then be shared to all the  Table 1 in order to fit input dimension of the VQC layer. The dashed-box in the quantum circuit repeats twice, consisting of 4 × 3 × 2 = 24 quantum circuit parameters. The VQC receives 4-dimensional compressed vectors from the pre-trained VGG model to perform the classification task. The non-federated training for the comparison is with the same hybrid VGG-VQC architecture as the one used in the federated training. We perform 100 training rounds and the results are presented in Figure 6. We compare the performance of federated learning with nonfederated learning with the same hybrid quantum-classical architecture and the same dataset. Table 1. The trainable layer in our modified VGG model. This layer is designed to convert the output from the pretrained layer to a smaller dimensional one which is suitable for the VQC to process. The activation function used in this layer is ReLU . In addition, dropout layers with dropout rate = 0.5 are used.

Linear
ReLU Dropout (p = 0.5) Linear ReLU Dropout (p = 0.5) Linear In the left three panels of Figure 6, we present the results of training the hybrid quantum model via federated setting with different number of local training epochs. Since the training data are distributed across different clients, we only consider the testing accuracies with the aggregated global model. In the considered Cats vs Dogs dataset, we observe that both the testing accuracies and testing loss reach the comparable level as the non-federated training. We also observe that the training loss, which is the average from clients, has fluctuations compared to non-federated training (shown in Table 2). The underlying reason might be that in each training round, different clients are selected, therefore the training data used to evaluate the training loss are different. Yet the training loss still converges after the 100 rounds of training. In addition, the testing loss and accuracies converge to comparable levels to the non-federal training, regardless of the local training epochs. Notably, we observe that a single epoch in local training is pretty enough to train a well-performed model. In each round of the federated training, the model updates are based on the samplings from 5 clients, with 1 local training epoch. The computing resources used are linear with 230 × 5 × 1 = 1150 in total. While for a full epoch of training with non-federated setting, the computing resources used are linear with 23,000. This results imply the potential of more efficient training on QML models with distributed schemes. This particularly benefits the training of quantum models when we are using high-performance simulation platform or an array of small NISQ devices, with the moderate communication overhead.

CIFAR (Planes vs. Cars)
In this experiment, we use the data from the CIFAR-10 dataset [77]. The dimension of the images in this dataset is 32 × 32. In Figure 7 we show some of the examples from this dataset. The hybrid quantum-classical models used in this experiment is the same as in the previous experiment. Here we have in total 10,000 training points and 2000 testing points. The testing points are on the central node (global model) which will be used to evaluate the aggregated model after each training round. The training points are equally distributed to the 100 local machines N i where i ∈ {1 · · · 100}. In each training round, 5 local machines will be randomly selected and each will perform 1, 2 or 4 epochs of training with its own training points. The batch size S = 32. The trained model will then be sent to the central mode for aggregation. In the left three panels of Figure 8, we present the results of training the hybrid quantum model via federated setting with different number of local training epochs. Since the training data is distributed across different clients, we only consider the testing accuracies with the aggregated global model. In the considered Planes vs Cars dataset, we observe that both the testing accuracies and testing loss reach the comparable level as the non-federated training (shown in Table 3). Similar to the previous Cats vs Dogs dataset, we observe that the training loss, which is the average from clients, has fluctuations compared to non-federated training. In addition, the testing loss and accuracies converge to comparable levels to the non-federated training, regardless of the local training epochs. Notably, we observe that a single epoch in local training is pretty enough to train a well-performed model. In each round of the federated training, the model updates are based on the samplings from 5 clients, with 1 local training epoch. The computing resources used are linear with 100 × 5 × 1 = 500 in total. While for a full epoch of training with non-federated setting, the computing resources used are linear with 10,000. This results again imply the potential of more efficient training on QML models with distributed schemes. This particularly benefits the training of quantum models when we are using high-performance simulation platform or an array of small NISQ devices, with the moderate communication overhead.   6. Discussion

Integration with Other Privacy-Preserving Protocols
In this study we consider the federated quantum learning framework. One of the limitation is that the process of exchanging model parameters can potentially be attacked. Moreover, we can not exclude the possibilities that malicious parties are joining the network, which will get the aggregated global model. The leaked model parameters can be used to deduce the training data of the model [78]. There are other protocols which can further boost the security. For example, it has been shown that trained models can be used to recover training entries [79]. In addition, it is also possible for adversaries to find out whether a specific entry is used in training process [80]. These possibilities raise serious concerns when the QML models are used to process private and sensitive data. One of the potential solution is to train the model with differential privacy (DP) [81]. With DP, it is possible to share the trained model and still keep the private information of the training data. Another direction is to incorporate the secure multi-party computation [82] which can further increase the security in decentralization computing. For example, a recent work Entropy 2021, 23, 460 9 of 14 using universal blind quantum computation [83] provides an quantum protocol to achieve privacy-preserving multi-party quantum computation.

Different Aggregation Method
In this work, we use the simplest aggregation method which is simply calculating the average of parameters from each local machine (model). In a more realistic application scenario, clients may upload a corrupt trained model, or the communication channel may be interfered by noise, which can potentially compromise the global model if there is no other countermeasures. Several recent works present advanced aggregation schemes to address this issue [84,85]. The implementation of these advanced protocols with quantum machine learning is an interesting direction for future work.

Decentralization
This research presents a proof-of-concept federated training on quantum machine learning models. The scheme includes a central node to receive the trained models from clients, to aggregate them and to distribute the aggregated model to clients. This central node can be vulnerable to malicious attacks and the adversaries can compromise the whole network. Moreover, the communication bandwidth between clients and the central node may vary, leading some undesired effects in the synchronization process. To address these issues, recent studies propose various decentralized federated learning schemes [86][87][88][89][90][91][92]. For example, the distributed ledger technologies (DLT) [93][94][95][96] which power the development of blockchain have been applied in the decentralized FL [97][98][99][100][101][102][103]. The blockchain technologies are used to ensure the robustness and integrity of the shared information while remove the requirement of a central node. Blockchain-enabled FL can also be designed to encourage the data-owner participating in the model training process [104]. In addition, peer-to-peer protocols are also employed in FL to remove the need of a central node [105,106]. Gossip learning [107,108] is an alternative learning framework to FL [109][110][111][112]. Under gossip learning framework, no central node is required, nodes on the network exchange and aggregate models directly. The efficiencies and capabilities of these decentralized schemes such as blockchained FL and gossip learning in the quantum regime are left for future work.
In classical machine learning, distributed training frameworks are designed to scale up the model training to computing clusters [113], making the training on large-scale dataset and complex models possible. Potential direction is to apply the federated quantum learning to the high-performance quantum simulation.

Other Quantum Machine Learning Models
In this work, we consider the hybrid quantum-classical transfer learning architecture which includes a pre-trained classical model as the feature extractor. Currently the available quantum computers and simulation software are rather limited and do not possess large number of qubits. However, the proposed framework can be extended well beyond the transfer learning structure. Recently, a hybrid architecture combining tensor network and quantum circuit is proposed [114]. Such hybrid architecture is more generic than the pre-trained network used in this work. It is interesting to investigate the potential of decentralizing such kind of architectures. Moreover, it is possible to study the federated learning on quantum convolutional neural networks (QCNN) [65,66,[115][116][117][118][119] when largerscale quantum simulators or real quantum computers are available. The proposed model is not limited to learning classical data such as the ones presented in this work. It is possible to extend the scheme of this work to learn quantum data as well.

Potential Applications
This work can potentially be integrated with the work [46,120] for decentralizing the quantum-enhanced speech recognition. Another potential direction is in the use in healthcare in which a tremendous amount of sensitive personal data need to be processed to train a reliable model. For example, the work [67] studied the application of VQC in dementia prediction which would benefit from the federated training to preserve the users' privacy. Recently, the application of quantum computing in financial industries have drawn a lot of attention [121]. It is expected that federated QML would play an important role in finance as well.

Conclusions
In this work, we provide the framework to train hybrid quantum-classical classifiers in a federated manner, which can help in preserving the privacy and distributing computational loads to an array of NISQ computers. We also show that the federated training in our setting does not sacrifice the performance in terms of the testing accuracy. This work should benefit the research in both the privacy-preserving AI and the quantum computing and pave a new direction for building secure, reliable and scalable distributed quantum machine learning architecture. Data Availability Statement: Publicly available datasets were analyzed in this study. This two datasets used in this study can be found here: https://www.cs.toronto.edu/~kriz/cifar.html and https://www.kaggle.com/karakaggle/kaggle-cat-vs-dog-dataset (accessed on 13 April 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: