A Proposal of Quantum-Inspired Machine Learning for Medical Purposes: An Application Case

: Learning tasks are implemented via mappings of the sampled data set, including both the classical and the quantum framework. Biomedical data characterizing complex diseases such as cancer typically require an algorithmic support for clinical decisions, especially for early stage tumors that typify breast cancer patients, which are still controllable in a therapeutic and surgical way. Our case study consists of the prediction during the pre-operative stage of lymph node metastasis in breast cancer patients resulting in a negative diagnosis after clinical and radiological exams. The classiﬁer adopted to establish a baseline is characterized by the result invariance for the order permutation of the input features, and it exploits stratiﬁcations in the training procedure. The quantum one mimics support vector machine mapping in a high-dimensional feature space, yielded by encoding into qubits, while being characterized by complexity. Feature selection is exploited to study the performances associated with a low number of features, thus implemented in a feasible time. Wide variations in sensitivity and speciﬁcity are observed in the selected optimal classiﬁers during cross-validations for both classiﬁcation system types, with an easier detection of negative or positive cases depending on the choice between the two training schemes. Clinical practice is still far from being reached, even if the ﬂexible structure of quantum-inspired classiﬁer circuits guarantees further developments to rule interactions among features: this preliminary study is solely intended to provide an overview of the particular tree tensor network scheme in a simpliﬁed version adopting just product states, as well as to introduce typical machine learning procedures consisting of feature selection and classiﬁer performance evaluation.


Introduction
Machine learning procedures consist of a map composition aiming at the approximation of a certain concept, representing a truth mapping of each sampled object to a label [1,2]. Labels are known for each element in the supervised framework, while in the unsupervised one, this information is lost; fortunately, we configure our application in the first case. The previously mentioned procedures are defined as classification systems [1] or hypotheses [2], whose numerical implementation takes as inputs the experimental outcomes. These data can be already prepared in a format and quantity useful for further elaboration; otherwise, they require an intermediate step to extract the most important information, commonly known as features. Such preparation of input features of the classifier strongly depends on the scheme exploited for the discrimination among available information.
Machine learning applications have been thoroughly explored in the last few decades for many clinical purposes. The reason can be found in the exponential growth of large publicly available databases, which have fostered the design and development of novel strategies for data management and analysis. In particular, machine learning strategies have proven their effectiveness in computer-aided detection systems in several applications, such as modeling aging processes [3][4][5], predicting the onset of several pathological conditions [6][7][8][9][10][11], or exploring genetic patterns [12][13][14]. However, the cross-talk between data yielded by different sources is still a developing research field (e.g., radiomics, dosiomics) based on the existence of a wide variety of algorithmic schemes able to implement the interplay of features.
Quantum machine learning spans a wide variety of algorithms covering both supervised and unsupervised approaches. The basis offered by some examples such as k-means, k-medians, support vector machine, principal component analysis, and neural networks is exploited for the associated quantum version primarily aiming at the achievement of a speed-up in computation [15][16][17][18][19][20]. The improvement in the predictive capabilities is examined in our case study by means of classical hardware, even if our introductory analysis deals with the small subspace consisting of product states. The inclusion of the full Hilbert space is targeted by truly quantum machine learning, whose computational complexity is managed by the tensor networks representation of states, based on tensor rank decompositions [21][22][23]. Recent developments are focused on the implementation of graphical models, applied even in radiological therapy optimization [24,25], and also, medical imaging techniques are extensively studied for tomography and magnetic resonance applications [26,27].
The use of such quantum algorithmic schemes further extends the applicable engineered interactions among variables, as we apply in our preliminary study of a quantuminspired classifier circuit involving bioinformatics data, where the interaction is ruled by Hamiltonian terms imposing an energy cost between feature pairs in a multi-layer hierarchical structure. This approach driven by data science represents a resource concerning the management of the computational complexity characterizing systems endowed with an intractable number of degrees of freedom [28].
Cancer is a complex disease involving multiple data types, which may represent an obstacle for clinical and radiological diagnosis, especially in early stage cases. Algorithms supporting such kinds of decisions represent a valuable capability to face the complexity implied by the interplay among variables. The identification of patients characterized by metastatic diffusion of cancer cells has to deal with the following transition: a tumor mass requires a high quantity of resources, thus inducing modifications of vessels with respect to a healthy organ [29]. A local resource imbalance of the affected organ may dynamically involve the whole organ network through a cancer cell migration event. A predictive model needs to characterize this abrupt change of the scale concerned with the disease effects, thus imposing a careful choice of features, e.g., biomarkers for the specific topic to which we are referring.
In the biomedical framework, features are generally called prognostic factors, typical for the studied pathology, which establishes causal relations among them that may involve multiple variables for a single effect. The data quantity and quality of retrospective cases strongly influence the performances of the applied machine learning procedures. Both attributes are limited in medicine to achieve low measurement invasiveness. On the other hand, the time and costs required by intra-operative analysis of excised lymph nodes in breast cancer, together with threatening pathologies possibly caused by such a biopsy [30][31][32], have boosted the research into efficient algorithmic methods, as well as measurement setup engineering.
In breast cancer cases, the detection of lymph node metastasis in pre-operative stages can sensibly optimize care quality and efficiency, as well as patient safety [33]. Those not affected by clinical or radiological exams are said to be clinically negative, but among these cases, false negative patients are included, especially when they are characterized as early stage tumors. The decision making process for a complex disease like cancer is made harder also by the choice of features, thus defining a crucial step towards personalized medicine, especially in the early stage characterized by a low informative content of singularly considered prognostic factors. Within this framework, clinical decision support systems can act in a complementary way with clinical and radiological exams, such that patient's status is described more efficiently by relevant features.
The computational complexity characterizing the quantum-inspired classifier limits the presented performance analysis to up to three features, and we keep the same number of included prognostic factors in the classical one to provide a baseline. The algorithmic scheme adopted for the latter is based on the predictive model of CancerMath (CM) [34][35][36], a web-calculator estimating the probability of cancer cell migration towards lymph nodes. The last adopts a stratified format for some input data, as usually made for some medical information, thus suggesting to test two versions of the quantum-inspired classifier with an emerging interplay between the selection of true positive or true negative cases. In summary, we use different classification systems, so their comparison is purely qualitative.
Immunohistochemical analysis carried out by the sub-specialty department of breast disease in our institute yielded measurements of histological grade and expression of ER, PgR, Ki67 antigen associated with cell proliferation, and Her2/neu. The in situ component consisted of the presence of cancer in breast ducts, which usually remains in the tube without dangerous development in the external environment, while multiple tumors expressed the observation of more than one nodule.
Originally, ER, PgR, and Ki67 were percentage values, which were converted into binary variables for the CM and one of the two quantum-inspired classifiers by imposing a threshold equal to 1% for ER and PgR, while 21% for Ki67. The tumor size was expressed according to a categorical variable associated with intervals, which was converted selecting the midpoint: T1a ∈ (1, 5], T1b ∈ (5, 10], T1c ∈ (10, 20], and T2 ∈ (20,50]. The retrospective observational study was approved by our Institute's Scientific Board.

Classification Systems: The CancerMath Example
Automated classification serves as a support in clinical decisions, getting as inputs bioinformatics data recorded for each patient. Our data set D = {x 1 , . . . , x N } is a sample of the outcomes population set Ω yielded by multiple types of sensors s : Ω → D. Another map takes any sampled data into the label set L = { 1 , . . . , n }, T : Ω → L, named the truth or concept mapping, whose approximation through a hypothesis lies at the heart of classification systems theory. The concept mapping T defines a partition of the population The hypothesis mapping corresponds to the classification system A : Ω → L, given by the composition of sensors s with a feature map Ψ and a subsequent classifier c. Depending on the data set format, the feature map has to support the identification of attributes for classification: in practice, it has to extract characterizing features from the raw data subject to the map Ψ : D → F , where F is the feature space. To conclude the aforementioned composition, we introduce a classifier c : F → L, such that the classification system reads A = c • Ψ • s [1]. It is important to stress that a dependence on the parameters for both the feature map and the classifier can be introduced; we consider just the latter, with a specific choice of the parameter set for the CM and the two versions of quantum-inspired classifiers. Moreover, the feature map consists of the identity for most of the variables in the CM classifier with the exception of ER, PgR, and Ki67, each one mapped into two strata by imposing a threshold (1%, 1%, and 21%, respectively) and age with 6 strata as shown in Table 1. Instead the quantum case adopts a high-dimensional feature space, thus mimicking support vector machines. Finally, the histological type is used as a categorical variable in the CM classifier, while the 3 cases shown in Table 1 are converted into two binary features for both quantum-inspired classifiers. The latter is distinguished just by the adoption of a continuous or binary version of ER, PgR, and Ki67. Input feature formats for each case are summarized in Table 2. We denote the data as x j i , i = 1, . . . , N, where the index j runs over patients in the dataset, j = 1, . . . , M, such that we have a data set per patient D j = {x j 1 , . . . , x j N } yielded by some sensors. We choose as a reference classifier the scheme adopted by CM, an open-source cancer web-calculator [34][35][36]. The online code is not endowed with any training procedure, and it imposes pre-set parameters, while we implement it following literature instructions [34,35]. The algorithm implements a probabilistic model for the diffusion of cancer cells belonging to the primary lesion. The fundamental building block consists of a probabilistic estimation of the migration event to lymph nodes expressed by an exponential model: is fixed as the diameter datum of the primary cancer mass, g j i are the parameters associated with the remaining data, in our case with N = 10 regarding age, grading, histological type, ER (Pos/Neg with cut-off 1%), PgR (Pos/Neg with cut-off 1%), Ki67 (Pos/Neg with cut-off 21%), Her2/neu (0, 1 + , 2 + , 3 + ), multiple tumors (Pos/Neg), the in situ component (Pos/Neg), while Q n is a parameter referring to the whole population. We choose the notation g = (g 2 , . . . , g N ) because the index 1 refers to the diameter x j 1 , which is not endowed with a training parameter, with each g i containing components g h i with h labeling a specific stratum for each feature [37], listed in Table 1: depending on the ones a patient belongs to, a set of g j i is chosen among these components. The values of both Q n and the parameters g i are determined using a training procedure in the training set, yielding a measure of the prognostic factor impact as a statistically independent cause [34,35]. During this stage, the mean diffusion probability with a uniform weight has to be evaluated in each stratum of the training set composed by M i,h patients, In the first part of the training procedure, the g i parameters are initially assumed equal to one, and the parameter Q n is evaluated by equating the observed statistical frequency of positive lymph nodes in the whole data set with the expected mean diffusion probability, a function expressed by Equation (2) with known patient's diameters and unknown parameter Q n , which is determined numerically. Then, the value of Q n is set in each stratum, and the same procedure is carried out for each range of values of the prognostic factors. The value of parameter g h i is determined again by solving the equation between the observed statistical frequency of positive lymph nodes, corresponding to each patient's stratum deduced in Table 1 and the mean diffusion probability in Equation (2). Corresponding to missing data, the parameter g j i is imposed equal to 1 to avoid any influence on the product of Equation (1).
We have to underline that the presented procedure imposes unit parameters for all remaining variables during the calculation corresponding to a certain stratum. This translates into a commutative classifier with respect to data ordering: given a permutation π ∈ S N , the property c g (x j ) = c g (π(x j )) is verified, while this does not hold true in the following. We define this scheme as classical because of this property.

Least-Squares Problem via Quantum Measurements
Data are considered with a scaled range in the interval [0, 1], once we divide each value by max j D i , where D i = {x 1 i , . . . , x M i } are the considered datum values varying in the whole set of patients.
For quantum-inspired classifiers, we define a reproducing kernel Hilbert space through another feature map ψ : D → F , applied subsequently to the one presented in Table 2, whose elements are named qubits [19,21,22]: such that the feature vector of the considered set of N features is The encoded quantum data are classified using a quantum circuit U θ , in our case consisting of a tree tensor network circuit endowed with D parameters θ = (θ 1 , . . . , θ D ), which is composed by a sequence of two-qubit nearest-neighbor unitaries, halving the output lines of the qubits after each gate of the circuit. Each qubit line is a representation of the degrees of freedom introduced in Equation (3), as shown in Figure 1a, where they are contracted with a rotation endowed with two graphical legs associated with row and column indices: implemented according to the 2 × 2 identity matrix 1 and the generator σ y , belonging to the set of generators of the group SU(2), known as Pauli matrices: where the labels x, y, z are commonly used in physics. Figure 1. An index contraction is represented in (a), which rules a generic nearest-neighbors interaction in (b) for a Hilbert space tensor product. In (c), the graphical representation of a CNOT gate, which is included in the simple gate of (d), encircled by the orange dashed line [21].
These matrices implement any elaboration of qubits, representing a description of quantum magnetic spins, once we refer to components with respect to a chosen frame aligned with the z-axis: with associated projectors P ↑ = 1+σ z 2 , P ↓ = 1−σ z 2 that are used in the following, together with the spin flip σ x ψ ↑(↓) = ψ ↓(↑) , representing the quantum version of the classical bits NOT gate.
The spin description allows us to model any gate involving a qubit pair as an interaction implemented in the tensor product of their feature spaces, graphically represented in Figure 1b: the two qubits are in a superposition ψ 1 = aψ 1,↑ + bψ 1,↓ and ψ 2 = a ψ 2,↑ + b ψ 2,↓ , with a, a , b, b scalars, interacting according to a Hamiltonian, which implements a two spin interaction Hψ 1 ⊗ ψ 2 . A simple example consists of the Ising-type Hamiltonian H = Jσ z ⊗ σ z where J is a coupling constant that is tuned such that an antiparallel (ψ ↑(↓) ⊗ ψ ↓(↑) ) or parallel (ψ ↑(↓) ⊗ ψ ↑(↓) ) configuration is energetically favored [38]. Systems composed by more than two qubits require the adoption of successive interactions implemented between spin pairs: this circuit structure characterizes our case study with emerging non-equivalent three qubits circuits. The mathematical motivation underlying this evidence is twofold, because we can refer to the particular interaction we wish to engineer, even if the non-commutativity characterizing the sequential application of matrices is more fundamental.
We chose a quantum circuit implementing a CNOT gate interaction in Figure 1c, representing the quantum version of classical bits exclusive OR (XOR): whose structure suggests the name target qubit for the one in the first feature space in the tensor product, while the second is named the control qubit. Indeed, if the last is a spin up state, the target qubit is subject to a spin flip (e.g., CNOT ψ ↑ ⊗ ψ ↑ = ψ ↓ ⊗ ψ ↑ ), otherwise no elaboration is implemented (e.g., CNOT ψ ↑ ⊗ ψ ↓ = ψ ↑ ⊗ ψ ↓ ). Our application takes into account a particular structure given by two arbitrary singlequbit rotations followed by a CNOT [21,39], as shown in Figure 1d, thus implementing the interaction between data: where parameters θ 1 and θ 2 are tuned. The qubit line associated with the target qubit is further elaborated by another rotation, as shown in Figure 2. In a general circuit involving more than two qubits, the achievement of a single-qubit line by the halving iterative procedure is followed by the aforementioned rotation and its measurement through a projection onto the spin up component. We compute the associated expectation value to define a score: where we are explicitly referring to the qubit pair circuit represented in Figure 2.
Each patient belonging to our data set is endowed with a class label y j = {0, 1}, respectively negative or positive regarding the metastatic diffusion to lymph nodes. The knowledge of the patient's label establishes the supervised learning framework, which allows us to exploit the definition of a cost functional in a training set composed by M patients: whose minimization corresponds to the achievement of the minimal mean squared error in the output of the classifier. The key idea consists of the least-squares reformulation of support vector machines [17]. The trick underlying kernel methods, as support vector machines, pursues the mapping of not linearly separable initial data, in a high-dimensional space where a hyperplane separating classes can be used. The involved computational complexity is measured by the number of line contractions linked with the dimension of feature vectors, as well as by the number D of parameters, which grows linearly with N in the adopted scheme.
The last part of the training procedure concerns the variation in the order of each patient's data, because the tensor product is not commutative. Moreover, the circuit structure can be different, as depicted in Figure 3 for the three-qubit case.

Performance Evaluation
The data set is characterized by a given number of negative n and positive p patients as shown in Table 1. The classifier predictions involve true positives tp and true negatives tn, so we introduce: whose choice is recommended in biomedicine to highlight the missing detection by a clinical test or a predictive model of a particular status of patients. The imbalanced data set considered in our case study requires a careful evaluation of each metric, because the abundance of negative cases can yield a sufficiently high accuracy even if positive patients are prevalently misclassified. The score threshold characterizes a single classifier, and its variation in the interval [0, 1] defines a family of classifiers [1], whose performances are summarized by receiver operating characteristic (ROC) curves, given by points in the plane (1-specificity, sensitivity), using the area under the ROC curve (AUC). The optimal threshold is identified by Youden's index on ROC curves [40]: where its maximization takes place with the selected imbalanced data set, thus implying that a stable performance in the validation set has to concern the same ratio of positive and negative cases [41]. Optimal classifiers are selected during the training procedure, whose test is able to characterize their resilience with respect to sample variation if a test set not belonging to the training one is chosen. A k-fold cross-validation deals with this issue by partitioning the data set into k parts, k − 1 of which are selected to represent the training set, while the remaining one is the test set. The overall method takes into account k rounds to switch the test role, thus collecting scores of the whole data set. To avoid any influence on the partition given by the patients ordering, we implement multiple cross-validations characterized by a randomly chosen order in the patients' list: the statistics of previously mentioned learning metrics is collected in terms of the median, first, and third quartile as shown in Table 3. Table 3. Performances of classifiers trained over the whole dataset evaluated on 10 ten-fold crossvalidation rounds and summarized in terms of the median, first, and third quartile. At the top, the quantum-inspired classifier gets percentage values for ER, PgR, and Ki67; at the bottom, their binary version is given.

Results
The first stage of the forward stepwise feature selection takes into account any variable pair and associated permutation, according to the circuit depicted in Figure 3a. The histogram of AUC indices yielded by a ten-fold cross-validation of the quantum-inspired classifier adopting a percentage value for ER, PgR, and Ki67 implemented per variable pair is shown in Figure 4a, while in Figure 4b, a binary format of such features is introduced for the quantum circuit, where shaded bars refer to cases endowed with the diameter as the first feature in the input. The highest value in Figure 4a equals 0.633, corresponding to the pair (diameter, PgR), whose classifier is denoted as Q 2 , while in Figure 4b, it is yielded by (age, diameter) equal to 0.620, so defining the classifier Q 2 .  In the next feature selection stage, we keep these pairs and add as a third feature any one of the remaining variables, then including any permutation of the triplet. These inputs for the two non-equivalent circuit schemes in Figure 3b,c are ranked as previously explained according to associated AUC indices. The resulting histograms are represented in Figure 4c, with the best performing triplets yielding 0.654 and 0.600 for respectively (diameter, PgR, grading) and (Ki67, PgR, diameter), with classifiers Q 3 we consider (diameter, age, histological subtype) in the first one.
Once most important features are selected, we test the performance statistics in 10 tenfold cross-validation rounds, listed in Table 3. The distributions of the AUC index show a similar trend between corresponding selected classifiers. CM cases are always related to higher values of sensitivity, with the exception of CM

Discussion
The translation of an automated decision procedure based on machine learning techniques into an optimization problem leads to the research for optimal therapeutic plans conceived for each patient and weighted by certain features included in the predictive model, as personalized medicine predicts [42][43][44]. A priority role is played by the identifi-cation of most relevant prognostic factors in the detection of patient status, especially when we are dealing with complex diseases like cancer.
A prognostic subset of factors may behave as noise or a signal for a specific patient class, thus inducing our research for an optimal trade-off between specificity and sensitivity, as shown in Table 3. A limited separation between them is observed just for Q 3 , and Q 2 , while remaining cases report the association of quantum-inspired classifiers with specificity and CM ones with sensitivity. The highest value of sensitivity among quantum circuits is obtained through Q (a) 3 : the use of a binary version for biomarkers in the data set eases the recognition of positive patients, thus confirming the commonly used medicine data format for some prognostic factors to exalt this class.
The circuit structure in Figure 3b,c influences the AUC index histograms shown in Figure 4c,d. The most important qubit along the central line of Figure 3c works as the target one for both gates, a role causing an increasing width of the distribution of AUC indices with respect to the circuit in Figure 3b, where the central qubit is no longer the most important one, representing still the target one in the first gate, but it becomes a control qubit in the next layer. Nevertheless, the interquartile ranges in Table 3 represent evidence of the output stability for Q The currently adopted intra-operative procedure yields a sensitivity in the range 87.5-100%, while regarding the specificity, the range is equal to 90.5-100% [45][46][47]; therefore, algorithmic methods with this purpose have to exceed these thresholds. Nevertheless, the application of the CM classifier for a different patients data set, but targeting the same clinical issue, reports comparable performances even if exploiting a wider set of features [48]. The most balanced classifier in terms of sensitivity and specificity of this preliminary study corresponds to the the quantum-inspired Q (a) 3 , which seems to highlight the role played by the included histological subtype, emerging through the quantum circuit.
Some predictive models for lymph node status in the literature make use of genetic data for the classification of a specific kind of patient, selected through ER and Her2 values, obtaining an AUC index of 0.883 [49]. The lymphovascular invasion datum is included in another study, yielding an AUC equal to 0.750 [50]; its absence in our data set is due to the associated exam occurring in post-operative stages.
Tools provided by radiomic analysis represent a crucial step forward in performance improvements. Imaging techniques, like diffusion wavelet and dynamic contrast-enhanced magnetic resonance imaging, are able to include new information, as well as multiple data elaboration methods [51][52][53][54]. Nevertheless, clinically negative patients are not generally subjected to magnetic resonance imaging, but only to first level instrumental investigations such as mammography and ultrasound, whose recordings are jointly used with histopathological data to predict the probability of lymph node metastasis, thus obtaining high performance results [55,56]. Radiomic studies about these imaging categories aimed at the prediction of the lymph node involvement are sparse, unlike those focused on the detection and characterization of breast lesions [57][58][59][60][61][62][63][64].
The feature map adopted in our case study with real components severely limits the number of included prognostic factors, thus yielding a quantum-inspired classifier still not characterized by clearly distinguished performances with respect to the chosen classical one. The presented preliminary study serves as an exploratory survey in this emerging applied biomedical framework for quantum computing.

Conclusions and Future Developments
At this stage in the feature selection, the reported performances completely hinder any clinical practice of the presented quantum classifiers as a clinical decision support system. The training procedure of the CM classifier leads to an opposite result concerning sensitivity with respect to the exalted specificity in quantum-inspired cases. The performances observed for the CM classification are comparable to those recently obtained in a larger data set by the same classifier endowed with a higher number of features, while this kind of bioinformatics data was not previously targeted in the quantum framework.
To overcome the presented results, we will implement truly quantum machine learning in a twofold way: feature vectors are generally defined through complex scalars, while real coefficients are adopted in this preliminary study, and the exploration of the full Hilbert space not limited to just product states has to be included. Recent developments of quantum supervised learning via tensor networks show that the combination of classical elaboration schemes, as convolutional neural networks, with matrix product state encodings yield high level performances, whose application in biomedical data categorization will surely represent a resource. In addition, the exploitation of quantum Bayesian networks will enhance the investigation of causal relations well beyond the capabilities offered by naive Bayes classifiers. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available because are propriety of Istituto Tumori 'Giovanni Paolo II'-Bari, Italy.