The Proposal of a Fully Quantum Neural Network and Fidelity-Driven Training Using Directional Gradients for Multi-Class Classification

Ewald, Dawid

doi:10.3390/electronics14112189

Open AccessArticle

The Proposal of a Fully Quantum Neural Network and Fidelity-Driven Training Using Directional Gradients for Multi-Class Classification

by

Dawid Ewald

Department of Intelligent Systems, Faculty of Telecommunications, Computer Science and Electrical Engineering, Bydgoszcz University of Science and Technology, 85-796 Bydgoszcz, Poland

Electronics 2025, 14(11), 2189; https://doi.org/10.3390/electronics14112189

Submission received: 4 May 2025 / Revised: 23 May 2025 / Accepted: 27 May 2025 / Published: 28 May 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

In this work, we present a training method for a Fully Quantum Neural Network (FQNN) based entirely on quantum circuits. The model processes data exclusively through quantum operations, without incorporating classical neural network layers. In the proposed architecture, the roles of classical neurons and weights are assumed, respectively, by qubits and parameterized quantum gates: input features are encoded into quantum states of qubits, while the network weights correspond to the rotation angles of quantum gates that govern the system’s state evolution. The optimization of gate parameters is performed using directional gradient estimation, where gradients are numerically approximated via finite differences, eliminating the need for analytic derivation. The training objective is defined as the quantum-state fidelity, which measures the similarity between the network’s output state and a reference state representing the correct class. Experiments were conducted using the Qiskit AerSimulator, which allows for the accurate simulation of quantum circuits on a classical computer. The proposed approach was applied to the classification of the Iris dataset. The experimental results demonstrate that the FQNN is capable of effectively learning to distinguish between classes based on input features, achieving stable test accuracy across runs. These findings confirm the feasibility of constructing fully quantum classifiers without relying on hybrid quantum—classical architectures. The FQNN architecture consists of multiple quantum layers, each incorporating parameterized rotation operations and entanglement between qubits. The number of layers is determined by the ratio of quantum parameters (weights) to the number of input features. Each layer functions analogously to a hidden layer in a classical neural network, transforming the quantum-state space into a richer feature representation through controlled quantum operations. As a result, the network is capable of dynamically modeling dependencies among input features without the use of classical activation functions.

Keywords:

quantum neural networks; fidelity-based training; fully quantum models; iris dataset

1. Introduction

In recent years, we have witnessed a dynamic development of machine learning methods and neural networks, which have become the foundation of many breakthrough achievements in fields such as image recognition, natural language processing, and recommendation systems. However, the increasing size of datasets and the complexity of models have led to significant limitations of classical neural networks in terms of computational demands and scalability.

In response to these challenges, a new research domain has emerged—quantum machine learning (QML)—which combines machine learning techniques with the potential of quantum computing [1,2,3]. Quantum computers, thanks to properties such as superposition and entanglement, offer information-processing capabilities that may outperform classical systems in selected tasks [4].

One of the most promising directions in QML is the development of Quantum Neural Networks (QNNs) [5]. These models employ quantum circuits for data processing, where qubits take the role of classical neurons, and parameterized quantum gates serve as analogues of weights. In most existing approaches, QNN architectures are hybrid in nature—some parts of the data processing or optimization are performed classically, which limits the full exploitation of quantum computational potential [6,7].

In this study, we propose an alternative approach: a Fully Quantum Neural Network. Our model is based entirely on quantum operations, without the use of classical neural layers. In the proposed architecture, input features are encoded into qubit states, while network weights correspond to the rotation angles of quantum gates. The network structure consists of several quantum layers, each involving rotations based on input features and weights, as well as entangling operations implemented via controlled-rotation (CRY) gates. As in to classical neural networks, quantum layers transform the input state into progressively more complex representations in feature space, enabling effective classification.

The FQNN training process is carried out through directional gradient-based optimization of quantum gate parameters, using quantum-state fidelity as the objective function. The gradients are estimated numerically via finite-difference methods, eliminating the need for analytic derivative computation. The experiments were conducted using the Qiskit AerSimulator quantum simulator, which enables the accurate emulation of quantum circuit behavior in a classical environment.

The aim of this study is to design and evaluate an FQNN for the task of classifying data from the classical Iris dataset. To evaluate the performance of the proposed model, we applied it to the classification of the Iris dataset, a standard benchmark containing 150 examples from three classes. We assess the learning effectiveness of the network, the stability of the optimization process, and the achieved classification accuracy, thereby demonstrating the practical applicability of fully quantum architectures in machine learning tasks without reliance on classical processing layers.

In this work, we denote quantum gates and operators using standard symbols (e.g., X, Y, H), without the hat accent, following common convention in quantum computing literature.

Several recent works have explored the concept of Quantum Neural Networks (QNNs), aiming to harness the expressive power of quantum computation for machine learning tasks. Notable examples include variational and circuit-centric models [5,7,8,9]. While promising, these approaches often exhibit key limitations:

Many architectures are hybrid in nature, relying partially on classical processing or optimization routines, which can limit the full exploitation of quantum parallelism.
Classical backpropagation or gradient-based optimization is frequently used, introducing overhead and potential incompatibility with quantum-native learning paradigms.
Circuit depth and the number of qubits required for complex datasets may exceed current NISQ-era hardware capabilities, making practical implementation difficult.

These limitations motivate the development of fully quantum learning models that are independent of classical optimization techniques. In this context, we propose a Fully Quantum Neural Network (FQNN) architecture that operates entirely within the quantum domain, using fidelity-based learning and directional gradient estimation as a quantum-native optimization strategy.

The remainder of this paper is structured as follows.

Section 2 describes the architecture of the proposed Fully Quantum Neural Network (FQNN), including the quantum encoding of input features and the representation of trainable weights using quantum gates.

Section 3 explains how entanglement is introduced in the FQNN through controlled-rotation gates and discusses its role in modeling nonlinear dependencies.

Section 4 details the label-encoding scheme, the use of the swap test for fidelity evaluation, and the quantum-inspired error measurement process.

Section 5 presents the training algorithm, including directional gradient estimation, fidelity-based optimization, and step-size adaptation techniques.

Section 6 outlines the full pseudocode of the FQNN learning process, summarizing the network construction and inference.

Section 7 describes the experimental setup, including dataset preparation, network configuration, and evaluation metrics.

Section 8 discusses the results of multiple training runs, highlighting performance, convergence behavior, and robustness of the proposed method.

Finally, Section 9 concludes the paper and suggests future directions for developing scalable and hardware-efficient Quantum Neural Networks.

2. Architecture Description FQNN

The Fully Quantum Neural Network (FQNN) proposed in this work has been designed as a sequence of quantum layers in which input information is encoded and processed exclusively through quantum circuits. The network consists of the following components:

Data encoding: Each input feature is encoded into a single qubit using a rotation operation $R_{y} (θ)$ , where the rotation angle is proportional to the feature value. This process enables the embedding of classical data into the quantum-state space.
Quantum weight layer: The network’s weights are represented by additional qubits, each subjected to an independent quantum rotation $R_{y} (w)$ , with the rotation angles corresponding to trainable network parameters. These angles play a role analogous to classical weights in conventional neural networks.
Entanglement: For each data–weight qubit pair, controlled $C R Y$ gates are applied to introduce entanglement between the data and weight states. This mechanism allows the network to capture nonlinear dependencies among input features.
Label encoding and error measurement: To perform classification, the network prepares a quantum state corresponding to the target label. The similarity between the output state of the network and the reference label state is evaluated using the swap-test procedure.

2.1. Data Encoding

The goal of data encoding in a Quantum Neural Network is to map classical input feature values

x_{i}

(scaled to the range [0,1]) onto appropriate quantum states of qubits, thereby enabling further quantum processing.

2.2. Feature-to-Qubit Rotation Assignment

In the FQNN, each input feature

x_{i}

is assigned to a single qubit and encoded via a rotation around the Y-axis of the Bloch sphere using the

R_{y} (θ)

gate. The rotation gate around the Y-axis is defined by the following matrix operator:

R_{y} (θ) = exp (- i \frac{θ}{2} Y) = [\begin{matrix} cos (\frac{θ}{2}) & - sin (\frac{θ}{2}) \\ sin (\frac{θ}{2}) & cos (\frac{θ}{2}) \end{matrix}]

(1)

For the angle

θ = π

, we obtain:

R_{y} (π) = [\begin{matrix} 0 & - 1 \\ 1 & 0 \end{matrix}] = Y

(2)

Qubit State Transformation

A qubit is initialized in the ground state

| 0 〉

:

| 0 〉 = (\begin{matrix} 1 \\ 0 \end{matrix})

(3)

Each input feature

x_{i} \in [0, 1]

is mapped to a rotation angle

θ_{i} = π x_{i}

and applied to the qubit using the

R_{y} (θ_{i})

gate. The resulting qubit state is:

| ψ (x_{i}) 〉 = cos (\frac{π x_{i}}{2}) | 0 〉 + sin (\frac{π x_{i}}{2}) | 1 〉

(4)

This implies that:

If $x_{i} = 0$ , the qubit remains in the state $| 0 〉$ .
If $x_{i} = 1$ , the qubit is rotated to the state $| 1 〉$ .
For intermediate values of $x_{i}$ , the qubit is in a superposition of $| 0 〉$ and $| 1 〉$ .

The computational basis state

| 1 〉

is represented as:

| 1 〉 = (\begin{matrix} 0 \\ 1 \end{matrix})

(5)

Here, the real coefficients

cos (\frac{π x_{i}}{2})

and

sin (\frac{π x_{i}}{2})

are the quantum amplitudes, which define the probability distribution of the qubit’s measurement outcome and encode classical data in a quantum superposition.

2.3. Vector Representation of the Entire Input Encoding

Assuming there are N features

(x_{1}, x_{2}, \dots, x_{N})

, after encoding all features, the full initial data state are given by the tensor (Kronecker) product of all individually encoded qubits:

| ψ (x) 〉 = | ψ (x_{1}) 〉 \otimes | ψ (x_{2}) 〉 \otimes \dots \otimes | ψ (x_{N}) 〉

(6)

Through this encoding:

The classical feature values control the amplitudes of the quantum states.
The input data are transformed into a superposition of quantum states, which can then be further modified through quantum operations (rotations, entanglement).

This encoding does not alter the classical information itself but extends it into a Hilbert space of dimension

2^{N}

, enabling the modeling of more complex relationships among features than classical networks can achieve.

2.4. Quantum Weight Layer in FQNN

The purpose of the quantum weight layer is to introduce trainable parameters into the network, analogous to the weights in classical neural networks. In classical networks, weights define the strength of connections between neurons, influencing the outcome of signal propagation. In an FQNN, this role is assumed by the rotation angles of quantum gates acting on separate weight qubits.

Layer Structure

In the quantum weight layer, each weight qubit:

is initialized in the state $| 0 〉$ ;
undergoes a rotation around the Y-axis using a quantum $R_{y} (θ)$ gate;
where the parameter $θ$ is an independent network parameter optimized during training.

Weight Parameterization

Each weight parameter

θ_{j}

is independently optimized based on the results of the swap test and the learning process.

Initially, the parameters $θ_{j}$ are randomly initialized within the interval $[0, 2 π]$ .
During training, the values of $θ_{j}$ are updated in the direction that increases the fidelity between the network’s output state and the target label state.

2.5. Interpretation of the Role of Weights in FQNN

In a classical neural network:

Weights control how strongly input features influence subsequent layers and the final prediction outcome.

In the proposed FQNN:

The weights (rotation angles $θ_{j}$ ) determine the extent to which the state of a weight qubit is “tilted” from $| 0 〉$ towards $| 1 〉$ .
Subsequently, during the entangling operation (CRY gate), the state of the weight qubit governs the influence on the data qubits, modifying their amplitudes accordingly.

2.6. Vector Representation of the Weight Layer

Assuming M weight qubits with parameters

θ_{1}, θ_{2}, \dots, θ_{M}

, the state of the entire quantum weight layer can be expressed as:

| Ψ_{weights} 〉 = ⨂_{j = 1}^{M} (cos (\frac{θ_{j}}{2}) | 0 〉 + sin (\frac{θ_{j}}{2}) | 1 〉)

(7)

which corresponds to the tensor product of the individual-weight qubit states.

3. Entanglement in FQNN

Entanglement in a Quantum Neural Network serves to establish nonlinear dependencies between input features and quantum weights.

In classical neural networks, nonlinearity is introduced via activation functions (e.g., ReLU, sigmoid).
In FQNNs, nonlinearity and correlations are achieved through entangling operations between data qubits and weight qubits.

3.1. Role of Entanglement

Entanglement establishes a quantum correlation between data qubits (which encode classical input features) and weight qubits (which encode learnable parameters). This correlation ensures that the evolution of the data qubit is no longer independent but directly influenced by the state of the weight qubit. As a result:

The state of each data qubit evolves nonlinearly based on the weight qubit’s state [5,8].
The input is encoded into quantum amplitudes, which are further transformed through entangled quantum operations—analogous to activation functions in classical NNs.

This mechanism allows the FQNN to model complex functions and decision boundaries that go beyond linear transformations.

3.2. Implementation via Controlled Rotations

In the proposed model, entanglement is implemented via controlled

R_{y} (θ)

gates (CRY), defined as [9]:

CRY (θ) = | 0 〉 〈 0 | \otimes I + | 1 〉 〈 1 | \otimes R_{y} (θ)

(8)

Here:

The weight qubit serves as the control qubit;
The data qubit is the target;
The operation applies a rotation to the data qubit only when the control qubit is in the $| 1 〉$ state.

The action on a general two-qubit state is:

| ψ 〉 = α | 00 〉 + β | 01 〉 + γ | 10 〉 + δ | 11 〉

(9)

After applying

CRY (θ)

, only the amplitudes associated with the control qubit in state

| 1 〉

undergo transformation. For example:

CRY (θ) (| 1 〉 \otimes | 0 〉) = | 1 〉 \otimes (cos (\frac{θ}{2}) | 0 〉 + sin (\frac{θ}{2}) | 1 〉)

(10)

3.3. Interpretation and Relevance

This entanglement-driven mechanism effectively entangles the input data with learnable parameters, embedding nonlinear correlations directly into the quantum-state space. It enables the FQNN to capture more expressive representations and decision functions, comparable to deep layers in classical neural networks.

4. Label Encoding and Error Measurement in FQNN

In classical machine learning, a label indicates the class membership of a given sample. In the proposed FQNN, the label is not stored as a discrete value (e.g., “0”, “1”, or “2”), but is instead represented as a quantum state encoded across two qubits.

Label Encoding:

The label-encoding procedure involves preparing a quantum register in a specific quantum state corresponding to the target class. This encoded label state is later compared to the output state of the network using the swap test.

This approach enables error evaluation in terms of quantum-state similarity rather than direct numerical comparison, which aligns with the principles of quantum information processing.

4.1. Label-Encoding Scheme

Each class label y is encoded using two qubits as follows:

The state $| 00 〉$ corresponds to class 0;
The state $| 01 〉$ corresponds to class 1;
The state $| 10 〉$ corresponds to class 2.

Formally, for each class y, the label state

| label (y) 〉

is prepared as:

\begin{matrix} | label (0) 〉 & = | 0 〉 \otimes | 0 〉 \\ | label (1) 〉 & = | 0 〉 \otimes | 1 〉 \\ | label (2) 〉 & = | 1 〉 \otimes | 0 〉 \end{matrix}

The preparation of the appropriate label state is carried out using X (NOT) gates on the corresponding qubits:

If a component of the desired label state is “1”, an X gate is applied to the corresponding qubit.

Example: For label 2:

The first qubit (qubit 0) is set to $| 1 〉$ using an X gate;
The second qubit (qubit 1) remains in the state $| 0 〉$ (no operation needed).

4.2. State Comparison—Error Measurement

The swap test is a quantum algorithm used to determine the similarity between two quantum states

| ψ 〉

and

| ϕ 〉

. The goal of the procedure is to estimate the fidelity, defined as [10]:

F (| ψ 〉, | ϕ 〉) = {|〈 ψ | ϕ 〉|}^{2}

(11)

The algorithm requires one ancillary qubit prepared in a superposition state using a Hadamard gate, along with two quantum registers representing the states

| ψ 〉

and

| ϕ 〉

.

The swap-test procedure is as follows:

Prepare the ancillary qubit in a superposition state:

$H | 0 〉 = \frac{1}{\sqrt{2}} (| 0 〉 + | 1 〉)$

(12)
Apply controlled-swap (CSWAP) operations between corresponding qubits of $| ψ 〉$ and $| ϕ 〉$ , controlled by the ancillary qubit. The CSWAP gate is also known as the Fredkin gate [11], and it conditionally swaps the states of two qubits depending on the state of a control qubit.
Apply a second Hadamard gate to the ancillary qubit.
Measure the ancillary qubit.

The probability of measuring the ancillary qubit in the

| 0 〉

state is:

P (0) = \frac{1 + {|〈 ψ | ϕ 〉|}^{2}}{2}

(13)

Thus, the fidelity can be extracted as:

{|〈 ψ | ϕ 〉|}^{2} = 2 P (0) - 1

(14)

In the presented FQNN model, the swap test is used to evaluate the similarity between the network’s output state and the target label state. A high fidelity value indicates a successful classification, and swap-test measurements directly guide the optimization process of the network parameters.

After encoding the label, a swap test is performed between:

the output state of the network $| ψ_{out} 〉$ ;
and the label state $| label (y) 〉$ .

The swap-test measures the fidelity [10]:

F = {|〈 ψ_{out} | label (y) 〉|}^{2}

(15)

which quantifies the similarity between the network’s prediction and the correct label.

4.3. Error Measurement Algorithm and Its Role in Training

Definition of Average Fidelity

In this work, the term average fidelity refers to the fidelity between quantum states, averaged over a mini batch of input samples drawn from the training set. Specifically, for each input sample, a quantum circuit is prepared that encodes both the feature data and the target class, and the fidelity between the resulting quantum states is estimated using the swap test.

The fidelity between two pure quantum states

| ψ 〉

and

| φ 〉

is given by:

F (| ψ 〉, | φ 〉) = {| 〈 ψ | φ 〉 |}^{2}

(16)

In the implementation, this value is computed via the swap test, and thus the approach assumes that both quantum states are pure. This assumption is valid under the current circuit-based construction of the FQNN, where both the data and label states are initialized deterministically as pure states.

If one of the states is mixed, the fidelity should instead be calculated using the Uhlmann fidelity:

F (ρ_{ψ}, ρ_{φ}) = {(Tr \sqrt{\sqrt{ρ_{ψ}} ρ_{φ} \sqrt{ρ_{ψ}}})}^{2}

(17)

where

ρ_{ψ}

and

ρ_{φ}

are the density matrices corresponding to the quantum states.

In this paper, we restrict ourselves to fidelity evaluations between pure states. To avoid confusion, this assumption applies to all relevant equations (e.g., Equations (11), (15), and (17)).

If the swap test yields a high fidelity (

F \to 1

), it indicates that the network has correctly classified the input sample. If the fidelity is low, it means that the network has made a classification error.

During training, the average fidelity across the dataset is used as an indicator of prediction quality.

Objective Function:

In the proposed FQNN, the objective function is not based on a classical loss function (such as cross-entropy). Instead, training is driven by maximizing the average fidelity between the network’s output states and the corresponding label states.

Thus, error measurement in FQNN involves:

minimizing the difference between the network output state and the target label state;
maximizing the swap-test result (i.e., the fidelity).

5. Training and Optimization

The training process of the Fully Quantum Neural Network is based on optimizing the rotation parameters of quantum gates, which represent the network’s weights. Due to the quantum nature of computation and the lack of direct access to derivatives of the objective function, this work adopts a finite-difference directional gradient approach for gradient estimation [12].

5.1. Training Procedure Overview

Each training epoch consists of the following steps:

5.1.1. Gradient Estimation

For each weight

θ_{j}

, two small variations are considered: a positive shift

+ Δ_{j}

and a negative shift

- Δ_{j}

, where

Δ_{j}

is a dynamically adjusted update step. For both perturbations, the average fidelity (i.e., average swap-test result) is computed over a small mini batch of training samples.

5.1.2. Direction Selection

The update direction is determined by comparing the fidelities:

If $F^{+} > F^{-}$ , the weight is updated in the positive direction.
If $F^{-} > F^{+}$ , the update direction is negative.

5.1.3. Weight Update

Weights are updated based on the chosen direction:

θ_{j} \leftarrow θ_{j} + {dir}_{j} \cdot Δ_{j}

(18)

where

{dir}_{j} \in {+ 1, - 1}

indicates the direction of the update for the j-th weight.

5.1.4. Step-Size Adaptation

Every few epochs (e.g., every five epochs), the values of

Δ_{j}

are reduced by multiplying them with a decay factor (e.g., 0.9), allowing for finer tuning of the parameters as the model approaches convergence.

5.2. Objective Function and Quality Metric

The objective function for the training process is to maximize the average fidelity between the network output state

| ψ_{out} 〉

and the encoded label state

| label (y) 〉

:

Fidelity = {|〈 ψ_{out} | label (y) 〉|}^{2}

(19)

The average fidelity over a mini batch is used as an estimator of prediction quality when evaluating different parameter updates.

Additionally, the classification accuracy on the test set is measured after each epoch to verify the effectiveness of the learning process.

5.3. Optimization Strategies

5.3.1. Mini-Batch Learning

To accelerate the training process and increase the stability of gradient estimation, swap tests are performed on randomly selected small subsets (mini batches) of the training data (e.g., five samples per batch).

5.3.2. Adaptive Step-Size Reduction (Delta Annealing)

The dynamic reduction in the update step size enables more precise fine-tuning of the weights in the later stages of training, minimizing the risk of oscillations around the optimum.

5.3.3. Batch Randomization

At the beginning of each epoch, the training examples are randomly shuffled, which helps reduce the risk of overfitting to a specific data distribution.

5.4. Pseudocode of the Training Procedure

To provide a clear and structured overview of the training methodology applied in the Fully Quantum Neural Network, this section presents the pseudocode for the optimization Algorithm 1. The pseudocode outlines the main steps of the learning process, including gradient estimation via finite differences, weight updates based on fidelity maximization, and adaptive adjustment of the update step size.

This high-level representation facilitates a better understanding of the iterative learning dynamics before proceeding to the detailed description of experimental implementation and evaluation.

Algorithm 1: Training Procedure for Fully Quantum Neural Network (FQNN)

6. Pseudocode of Fully Quantum Neural Network (FQNN)

This section presents the pseudocode of the Fully Quantum Neural Network training and inference procedure. The pseudocode summarizes the main steps of the method, including quantum data encoding, weight initialization, network construction, fidelity-based optimization, and model evaluation [Algorithm 2]. It serves as a concise, high-level description of the complete quantum learning process implemented in this work.

7. Experimental Setup

7.1. Input Data

The experiments were conducted on the Iris dataset, consisting of 150 examples described by four numerical features: sepal length, sepal width, petal length, and petal width. Each example belongs to one of three classes representing different species of Iris flowers.

Prior to training, the data were scaled individually for each feature to the range

[0, 1]

using Min–Max Scaling.

The dataset was then split into a training set (80%) and a test set (20%), maintaining proportional class distribution through stratified sampling.

7.2. Simulation Environment

All experiments were conducted using the Qiskit AerSimulator quantum simulator. Simulations were performed in a noise-free environment, with the default implementation of ideal quantum gates. Each quantum circuit was executed 3000 times (shots = 3000) to obtain stable statistical results during swap-test measurements.

Algorithm 2: Full Training and Evaluation Procedure for Fully Quantum Neural Network (FQNN)

7.3. FQNN Architecture

The designed FQNN architecture consisted of:

Four qubits representing the input data features;
Four qubits representing the quantum weights;
Three auxiliary qubits:
–
Two qubits for encoding the class label,
–
One qubit for performing the swap test.

In total, the quantum circuit involved 11 qubits. The number of network layers was determined based on the ratio of the number of parameters to the number of input features, resulting in a three-layer quantum network.

7.4. Training Parameters

The training parameters were configured as follows:

Number of training epochs: 12;
Initial weight update step:

$Δ_{j} = \frac{π}{32} for each weight$
Every 5 epochs, the values of $Δ_{j}$ were reduced by a factor of 0.9 (delta annealing);
In directional gradient estimation, mini batches of 5 randomly selected training samples were used for each parameter update.

7.5. Performance Evaluation Metrics

The model’s performance was evaluated using two main metrics:

Average fidelity: the fidelity value from the swap test between the FQNN output state and the encoded label state, calculated on selected training samples;
Classification accuracy: the percentage of correctly classified examples on the test set, measured after each epoch.

7.6. Repeated Trials

To ensure the statistical reliability of the results and mitigate variance due to parameter initialization, the FQNN training process was repeated 20 times.

Each run was initialized independently and trained for 12 epochs using the same configuration and data splits. All reported performance metrics (fidelity, accuracy, delta values) were averaged over these 20 runs.

Where applicable, standard deviation bands or min–max ranges were included in plots to illustrate the consistency and stability of the training process across runs.

8. Results and Discussion

8.1. Best Performing Run

Among the 20 independent executions, multiple runs achieved the highest test accuracy of 96.67%. However, Run 12 was selected as the representative best-performing case, as it also exhibited the highest training accuracy (89.17%) and stable convergence behavior throughout all epochs.

Figure 1 demonstrates a clear and steady increase in fidelity and accuracy, confirming the effectiveness of the directional gradient method combined with delta annealing. The network shows both high expressiveness and generalization capability, making this configuration a strong candidate for practical applications.

Table 1 lists the final classification results for the best-performing FQNN run (Run 12), which achieved a test accuracy of 96.67%. Out of 30 test samples, 29 were correctly classified, with only one misclassification (sample 11: predicted class 0 instead of 2). This confirms that the network learned a highly effective decision boundary between the three Iris classes.

The test set includes a balanced mixture of examples from all three classes. The high accuracy and precision of prediction indicate that the FQNN is capable of generalizing well beyond the training data, even with a limited number of epochs and parameters.

8.2. Worst Performing Run

Despite identical training conditions, this run showed slower fidelity growth and lower test accuracy (Figure 2). The variability may stem from suboptimal initial parameters or mini-batch selection. Notably, delta values still decreased, suggesting proper optimization dynamics.

Table 2 presents the final classification results of the worst-performing run (Run 18), which achieved a test accuracy of 83.33%. Despite the correct classification of most samples, the model failed on several instances belonging to class 2, misclassifying them as class 0.

This run highlights the sensitivity of the FQNN training process to initial quantum parameter configurations and optimization dynamics. The fidelity values in this run grew more slowly compared to others, and the network struggled to form clear class boundaries for complex decision surfaces. Nevertheless, the model preserved basic discrimination ability and converged toward reasonable accuracy after 12 epochs.

8.3. Aggregated Results Across 20 Training Runs

Figure 3 illustrates the evolution of average fidelity across training epochs for 20 independent FQNN training runs. Each line represents a separate execution with different random initialization of quantum-gate parameters.

All runs show a clear upward trend in fidelity, indicating successful learning of the target label states. The gradual increase confirms that the quantum circuit is progressively aligning its output state closer to the encoded class state. Most runs converge to fidelities above 0.75, with some reaching 0.80 or higher by the final epoch.

Although minor variance in the convergence rate is visible (e.g., Run 8 exhibits slower improvement), the overall consistency across runs suggests that the training method is robust and repeatable. This validates the effectiveness of fidelity-based optimization using directional gradient estimation.

Figure 4 presents the test set classification accuracy obtained during 20 independent executions of the FQNN training process. Despite variations in initialization and stochastic mini-batch selection, most runs exhibit consistent improvement in accuracy over epochs.

The convergence behavior demonstrates the general robustness of the model, with the majority of runs exceeding 85% test accuracy by the final epoch. A few outliers with slower convergence (e.g., Run 8 or Run 17) reflect sensitivity to initial conditions or local optima, but the overall trend indicates high reliability and reproducibility.

This multi-run visualization provides strong evidence that the FQNN can learn effective class boundaries across different training conditions, reinforcing the utility of the quantum fidelity-based learning paradigm.

Table 3 summarizes the final training and test set classification accuracies for 20 independent runs of the FQNN model. Each run consisted of 12 training epochs using identical hyperparameters and architecture, but with randomly initialized quantum weights.

The training accuracy values exhibit low variance, with a mean of 0.8642 and a standard deviation of approximately 0.0201, indicating consistent convergence across runs. The test accuracy remains high overall, with a mean of 0.9350 and standard deviation of 0.0324.

Most runs achieve a test accuracy above 0.93, confirming that the FQNN model generalizes well to unseen data. One notable outlier (Run 18) yielded a lower test accuracy (0.8333), which may be attributed to unfavorable initialization or mini-batch sampling.

Overall, the results demonstrate that the quantum model not only trains stably but also generalizes effectively across repeated trials, validating the fidelity-based training strategy.

Detailed plots for each of the 20 training runs, including fidelity, accuracy, and delta progression, are provided in Appendix A. These visualizations further confirm the stability and variability of the FQNN training process across different random initializations.

Training metrics for Runs 1–6 are shown in Figure A1, Runs 7–12 in Figure A2, Runs 13–18 in Figure A3, and the final set, Runs 19–20, in Figure A4.

Together, these plots illustrate the general trend of increasing fidelity and accuracy over epochs, as well as the consistent decay of parameter update deltas. They also reveal individual variations between runs, such as slower convergence or temporary drops in fidelity, which underline the stochastic nature of training but do not hinder overall model effectiveness.

8.4. Summary and Insights

The Fully Quantum Neural Network model demonstrated reliable and stable training behavior across 20 independent runs. Despite stochastic initialization and sampling, the training process consistently converged toward high classification accuracy and fidelity values.

In the best-performing run, the network achieved a test accuracy of 96.67% and training accuracy above 89%, while maintaining smooth fidelity growth across all epochs. Even in the worst case, the model retained a test accuracy of 83.33%, indicating that the network was still able to capture meaningful class boundaries.

The quantum-native training approach based on fidelity maximization and directional gradient updates proved to be both efficient and robust. The use of swap-test-based fidelity as an objective function directly aligns quantum-state similarity with classification success, offering a natural and interpretable learning criterion for quantum models.

The proposed FQNN achieved competitive performance on the Iris dataset using only quantum operations, without reliance on classical neural layers. This supports the viability of fully quantum machine learning pipelines and motivates further exploration of scalable quantum architectures.

9. Conclusions

This study demonstrates the feasibility and effectiveness of training a Fully Quantum Neural Network for multi-class classification using only quantum operations. The proposed approach relies on quantum-native fidelity as the training objective and directional gradient estimation as the optimization strategy, removing the need for classical backpropagation.

Across 20 independent runs, the FQNN consistently achieved high test accuracy and state fidelity, with the best-performing run reaching 96.67% test accuracy. Even in the worst-performing case, the model retained meaningful classification ability, confirming the robustness of the method.

Key findings include:

Fidelity-based training enables geometrically meaningful optimization in Hilbert space.
Directional gradient updates, combined with delta annealing, offer stable convergence.
The fully quantum architecture avoids hybrid dependency, maintaining conceptual purity.

These results validate the practical potential of FQNNs for quantum machine learning tasks and provide a solid foundation for scaling such models to more complex datasets. Future work may explore circuit depth reduction, variational circuit design, and integration with noisy intermediate-scale quantum (NISQ) hardware.

Funding

This research received no external funding.

Data Availability Statement

Research institutions that wish to receive the data are requested to contact the author.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MDPI	Multidisciplinary Digital Publishing Institute
DOAJ	Directory of open access journals
TLA	Three-letter acronym
LD	Linear dichroism

Appendix A. Training Metrics for All 20 FQNN Runs

Figure A1. Training metrics for Runs 1–6. Each plot shows fidelity, accuracy, and delta over 12 epochs.

Figure A2. Training metrics for Runs 7–12.

Figure A3. Training metrics for Runs 13–18.

Figure A4. Training metrics for Runs 19–20.

Appendix B. Derivation of the Fidelity Estimation in the Swap Test

In this appendix, we provide a formal derivation of Equation (13), which relates the probability of measuring the ancillary qubit in state

| 0 〉

in the swap-test procedure to the fidelity between two quantum states

| ψ 〉

and

| ϕ 〉

.

Appendix B.1. Swap-Test Procedure and Measurement Probability

We consider an ancillary qubit initialized in state

| 0 〉

, and two quantum states

| ψ 〉

and

| ϕ 〉

. The full initial state is:

| init 〉 = | 0 〉 \otimes | ψ 〉 \otimes | ϕ 〉

We apply a Hadamard gate to the ancilla:

\to \frac{1}{\sqrt{2}} (| 0 〉 + | 1 〉) \otimes | ψ 〉 \otimes | ϕ 〉

Next, we apply the controlled-SWAP (Fredkin gate), yielding:

\frac{1}{\sqrt{2}} (| 0 〉 \otimes | ψ 〉 \otimes | ϕ 〉 + | 1 〉 \otimes | ϕ 〉 \otimes | ψ 〉)

We then apply another Hadamard gate to the ancilla:

\to \frac{1}{2} (| 0 〉 \otimes (| ψ 〉 | ϕ 〉 + | ϕ 〉 | ψ 〉) + | 1 〉 \otimes (| ψ 〉 | ϕ 〉 - | ϕ 〉 | ψ 〉))

The probability of measuring

| 0 〉

on the ancilla is the squared norm of the corresponding amplitude:

P (0) = {∥\frac{1}{2} (| ψ 〉 | ϕ 〉 + | ϕ 〉 | ψ 〉)∥}^{2}

Expanding the scalar product:

\begin{matrix} P (0) & = \frac{1}{4} (〈 ψ | 〈 ϕ | + 〈 ϕ | 〈 ψ |) (| ψ 〉 | ϕ 〉 + | ϕ 〉 | ψ 〉) \\ = \frac{1}{4} (〈 ψ | ψ 〉 〈 ϕ | ϕ 〉 + 〈 ψ | ϕ 〉 〈 ϕ | ψ 〉 + 〈 ϕ | ψ 〉 〈 ψ | ϕ 〉 + 〈 ϕ | ϕ 〉 〈 ψ | ψ 〉) \\ = \frac{1}{4} {(2 + 2 | 〈 ψ | ϕ 〉 |}^{2}) \\ = \frac{1 + {| 〈 ψ | ϕ 〉 |}^{2}}{2} \end{matrix}

Appendix B.2. Extraction of Fidelity

Rearranging the result allows us to express the squared fidelity as:

{| 〈 ψ | ϕ 〉 |}^{2} = 2 P (0) - 1

This is Equation (14), which is used in the FQNN model to define the learning objective as a function of quantum fidelity obtained from swap-test measurements.

References

Wang, Y.; Liu, J. A comprehensive review of quantum machine learning: From NISQ to fault tolerance. Rep. Prog. Phys. 2024, 87, 116402. [Google Scholar] [CrossRef] [PubMed]
Zeguendry, A.; Jarir, Z.; Quafafou, M. Quantum Machine Learning: A Review and Case Studies. Entropy 2023, 25, 287. [Google Scholar] [CrossRef] [PubMed]
Biamonte, J.; Wittek, P.; Pancotti, N.; Rebentrost, P.; Wiebe, N.; Lloyd, S. Quantum machine learning. Nature 2017, 549, 195–202. [Google Scholar] [CrossRef] [PubMed]
Cerezo, M.; Verdon, G.; Huang, H.Y.; Cincio, L.; Coles, P.J. Challenges and opportunities in quantum machine learning. Nat. Comput. Sci. 2022, 2, 567–576. [Google Scholar] [CrossRef] [PubMed]
Beer, K.; Bondarenko, D.; Farrelly, T.; Osborne, T.J.; Salzmann, R.; Scheiermann, D.; Wolf, R. Training deep quantum neural networks. Nat. Commun. 2020, 11, 808. [Google Scholar] [CrossRef] [PubMed]
Cui, W.; Yan, S. New Directions in Quantum Neural Networks Research. Control Theory Technol. 2019, 17, 393–395. [Google Scholar] [CrossRef]
Abbas, A.; Sutter, D.; Zoufal, C.; Lucchi, A.; Figalli, A.; Woerner, S. The power of quantum neural networks. Nat. Comput. Sci. 2021, 1, 403–409. [Google Scholar] [CrossRef] [PubMed]
Schuld, M.; Bocharov, A.; Svore, K.M.; Wiebe, N. Circuit-centric quantum classifiers. Phys. Rev. A 2020, 101, 032308. [Google Scholar] [CrossRef]
Benedetti, M.; Lloyd, E.; Sack, S.; Fiorentini, M. Parameterized quantum circuits as machine learning models. Quantum Sci. Technol. 2019, 4, 043001. [Google Scholar] [CrossRef]
Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information: 10th Anniversary Edition; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Barenco, A.; Bennett, C.H.; Cleve, R.; DiVincenzo, D.P.; Margolus, N.; Shor, P.; Sleator, T.; Smolin, J.A.; Weinfurter, H. Elementary gates for quantum computation. Phys. Rev. A 1995, 52, 3457–3467. [Google Scholar] [CrossRef] [PubMed]
Boresta, M.; Colombo, T.; De Santis, A.; Lucidi, S. A Mixed Finite Differences Scheme for Gradient Approximation. J. Optim. Theory Appl. 2022, 194, 1–24. [Google Scholar] [CrossRef]

Figure 1. Performance metrics for the best-performing FQNN training run (Run 12). Left: fidelity; center: classification accuracy; right: delta decay.

Figure 2. Metrics from the worst-performing FQNN training run (Run 18).

Figure 3. Average fidelity per epoch across 20 independent FQNN training runs. Each curve represents a single run.

Figure 4. Test classification accuracy across 20 independent FQNN training runs. Each line represents a separate run over 12 epochs.

Table 1. Final test-set predictions for the best-performing FQNN run (Run 12).

Sample #	Input Features (Scaled)	Predicted	Expected
1	[0.03, 0.42, 0.05, 0.04]	0	0
2	[0.50, 0.42, 0.66, 0.71]	2	2
3	[0.17, 0.17, 0.39, 0.38]	1	1
4	[0.19, 0.12, 0.39, 0.38]	1	1
5	[0.03, 0.50, 0.05, 0.04]	0	0
6	[0.56, 0.54, 0.63, 0.63]	1	1
7	[0.08, 0.67, 0.00, 0.04]	0	0
8	[0.31, 0.58, 0.12, 0.04]	0	0
9	[0.61, 0.42, 0.71, 0.79]	2	2
10	[0.31, 0.42, 0.59, 0.58]	1	1
11	[0.83, 0.38, 0.90, 0.71]	0	2
12	[0.72, 0.46, 0.69, 0.92]	2	2
13	[0.61, 0.42, 0.81, 0.88]	2	2
14	[0.58, 0.50, 0.59, 0.58]	1	1
15	[0.19, 0.58, 0.08, 0.04]	0	0
16	[0.19, 0.54, 0.07, 0.04]	0	0
17	[0.42, 0.83, 0.03, 0.04]	0	0
18	[0.36, 0.21, 0.49, 0.42]	1	1
19	[0.50, 0.38, 0.63, 0.54]	1	1
20	[0.47, 0.42, 0.64, 0.71]	2	2
21	[0.31, 0.71, 0.08, 0.04]	0	0
22	[0.67, 0.46, 0.78, 0.96]	2	2
23	[0.64, 0.38, 0.61, 0.50]	1	1
24	[0.50, 0.25, 0.78, 0.54]	2	2
25	[0.58, 0.33, 0.78, 0.88]	2	2
26	[0.67, 0.42, 0.68, 0.67]	1	1
27	[0.64, 0.42, 0.58, 0.54]	1	1
28	[0.39, 0.75, 0.12, 0.08]	0	0
29	[0.61, 0.42, 0.76, 0.71]	2	2
30	[0.25, 0.58, 0.07, 0.04]	0	0

Table 2. Final test-set predictions for the worst-performing FQNN run (Run 18).

Sample #	Input Features (Scaled)	Predicted	Expected
1	[0.03, 0.42, 0.05, 0.04]	0	0
2	[0.50, 0.42, 0.66, 0.71]	2	2
3	[0.17, 0.17, 0.39, 0.38]	1	1
4	[0.19, 0.12, 0.39, 0.38]	1	1
5	[0.03, 0.50, 0.05, 0.04]	0	0
6	[0.56, 0.54, 0.63, 0.63]	1	1
7	[0.08, 0.67, 0.00, 0.04]	0	0
8	[0.31, 0.58, 0.12, 0.04]	0	0
9	[0.61, 0.42, 0.71, 0.79]	0	2
10	[0.31, 0.42, 0.59, 0.58]	1	1
11	[0.83, 0.38, 0.90, 0.71]	0	2
12	[0.72, 0.46, 0.69, 0.92]	0	2
13	[0.61, 0.42, 0.81, 0.88]	2	2
14	[0.58, 0.50, 0.59, 0.58]	1	1
15	[0.19, 0.58, 0.08, 0.04]	0	0
16	[0.19, 0.54, 0.07, 0.04]	0	0
17	[0.42, 0.83, 0.03, 0.04]	0	0
18	[0.36, 0.21, 0.49, 0.42]	1	1
19	[0.50, 0.38, 0.63, 0.54]	1	1
20	[0.47, 0.42, 0.64, 0.71]	2	2
21	[0.31, 0.71, 0.08, 0.04]	0	0
22	[0.67, 0.46, 0.78, 0.96]	0	2
23	[0.64, 0.38, 0.61, 0.50]	1	1
24	[0.50, 0.25, 0.78, 0.54]	2	2
25	[0.58, 0.33, 0.78, 0.88]	2	2
26	[0.67, 0.42, 0.68, 0.67]	1	1
27	[0.64, 0.42, 0.58, 0.54]	1	1
28	[0.39, 0.75, 0.12, 0.08]	0	0
29	[0.61, 0.42, 0.76, 0.71]	0	2
30	[0.25, 0.58, 0.07, 0.04]	0	0

Table 3. Final classification accuracy on training and test sets for 20 independent FQNN runs.

Run ID	Training Accuracy	Test Accuracy
1	0.8667	0.9333
2	0.8583	0.9000
3	0.8667	0.9333
4	0.8833	0.9667
5	0.8500	0.9333
6	0.8917	0.9333
7	0.8583	0.9333
8	0.8583	0.9333
9	0.8500	0.9000
10	0.8750	0.9333
11	0.8917	0.9667
12	0.8917	0.9667
13	0.8667	0.9667
14	0.8583	0.9333
15	0.8667	0.9333
16	0.8667	0.9667
17	0.8667	0.9667
18	0.8000	0.8333
19	0.8750	0.9667
20	0.8417	0.9000
Mean	0.8642	0.9350
Std Dev	0.0201	0.0324
Variance	0.000403	0.001053

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ewald, D. The Proposal of a Fully Quantum Neural Network and Fidelity-Driven Training Using Directional Gradients for Multi-Class Classification. Electronics 2025, 14, 2189. https://doi.org/10.3390/electronics14112189

AMA Style

Ewald D. The Proposal of a Fully Quantum Neural Network and Fidelity-Driven Training Using Directional Gradients for Multi-Class Classification. Electronics. 2025; 14(11):2189. https://doi.org/10.3390/electronics14112189

Chicago/Turabian Style

Ewald, Dawid. 2025. "The Proposal of a Fully Quantum Neural Network and Fidelity-Driven Training Using Directional Gradients for Multi-Class Classification" Electronics 14, no. 11: 2189. https://doi.org/10.3390/electronics14112189

APA Style

Ewald, D. (2025). The Proposal of a Fully Quantum Neural Network and Fidelity-Driven Training Using Directional Gradients for Multi-Class Classification. Electronics, 14(11), 2189. https://doi.org/10.3390/electronics14112189

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Proposal of a Fully Quantum Neural Network and Fidelity-Driven Training Using Directional Gradients for Multi-Class Classification

Abstract

1. Introduction

2. Architecture Description FQNN

2.1. Data Encoding

2.2. Feature-to-Qubit Rotation Assignment

2.3. Vector Representation of the Entire Input Encoding

2.4. Quantum Weight Layer in FQNN

Weight Parameterization

2.5. Interpretation of the Role of Weights in FQNN

2.6. Vector Representation of the Weight Layer

3. Entanglement in FQNN

3.1. Role of Entanglement

3.2. Implementation via Controlled Rotations

3.3. Interpretation and Relevance

4. Label Encoding and Error Measurement in FQNN

4.1. Label-Encoding Scheme

4.2. State Comparison—Error Measurement

4.3. Error Measurement Algorithm and Its Role in Training

Definition of Average Fidelity

5. Training and Optimization

5.1. Training Procedure Overview

5.1.1. Gradient Estimation

5.1.2. Direction Selection

5.1.3. Weight Update

5.1.4. Step-Size Adaptation

5.2. Objective Function and Quality Metric

5.3. Optimization Strategies

5.3.1. Mini-Batch Learning

5.3.2. Adaptive Step-Size Reduction (Delta Annealing)

5.3.3. Batch Randomization

5.4. Pseudocode of the Training Procedure

6. Pseudocode of Fully Quantum Neural Network (FQNN)

7. Experimental Setup

7.1. Input Data

7.2. Simulation Environment

7.3. FQNN Architecture

7.4. Training Parameters

7.5. Performance Evaluation Metrics

7.6. Repeated Trials

8. Results and Discussion

8.1. Best Performing Run

8.2. Worst Performing Run

8.3. Aggregated Results Across 20 Training Runs

8.4. Summary and Insights

9. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Training Metrics for All 20 FQNN Runs

Appendix B. Derivation of the Fidelity Estimation in the Swap Test

Appendix B.1. Swap-Test Procedure and Measurement Probability

Appendix B.2. Extraction of Fidelity

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI