Quantum support vector regression for disability insurance

We propose a hybrid classical-quantum approach for modeling transition probabilities in health and disability insurance. The modeling of logistic disability inception probabilities is formulated as a support vector regression problem. Using a quantum feature map, the data is mapped to quantum states belonging to a quantum feature space, where the associated kernel is determined by the inner product between the quantum states. This quantum kernel can be efficiently estimated on a quantum computer. We conduct experiments on the IBM Yorktown quantum computer, fitting the model to disability inception data from a Swedish insurance company.


Introduction
Support vector machines (SVM) were first introduced as part of Vapnik's Statistical Learning Framework [14]. Support vector classification aims to classify data, e.g. to determine if a picture contains a cat or a dog, whereas support vector regression (SVR) is used to model real-valued quantities, such as mortality rates or financial asset returns. All SVM's exploit the so-called kernel trick, where an optimization problem with data that has been mapped into a high-or even infinite-dimensional feature space may be efficiently solved by considering its Wolfe-dual [11], for which the necessary input is reduced to a so-called kernel matrix consisting of inner products in the feature space between all data pairs. For cases where the kernel matrix can be readily determined, the corresponding optimization problem can be efficiently solved.
Rebentrost et. al. [7] showed that an SVM can be implemented on a quantum computer. This work was recently expanded on by Schuld and Killoran [12] and Havlicek et. al. [4]. In essence, two related methods have been proposed. The first method consists of encoding data in a high-dimensional quantum feature space, calculating a quantum kernel, and subsequently using a variational quantum circuit to find a separating hyperplane. A second approach proposes to use a quantum computer to estimate the kernel, and to implement the resulting SVM optimization on a classical computer, a so-called hybrid classical-quantum implementation. Quantum kernel methods can be efficiently used to solve some optimization problems where the kernel cannot efficiently be determined on a classical computer. This was recently demonstrated by Liu et. al. [6].
In the insurance literature, SVMs have been used as a mortality graduation technique [5]. They could equally well be used to model other transition probabilities, such as disability inception or termination rates. These quantities are often estimated using classical techniques such as maximum likelihood or splines [1,3,8,9,2]. In this paper, we propose a hybrid classical-quantum SVR model for logistic disability inception probabilities, using a quantum kernel that can be estimated on a quantum computer. We conduct experiments on the IBM Yorktown quantum computer using disability inception data from a Swedish insurance company.
This paper is organised as follows. In Sections 2 and 3, we review kernel theory, support vector regression and quantum kernel estimation. In Section 4, we propose a support vector regression model with a quantum kernel for disability inception rates. In section 5, we estimate the kernel matrix associated with disability inception data from a Swedish insurance company on a quantum computer. This kernel is then used in a support vector regression to estimate disability inception rates. The results are compared to those from classical support vector regression.

Kernels and support vector regression
In this section we review kernel theory and support vector regression. Closely following [12], we let x i ∈ R d , i = 1, . . . , n, denote observations in a data set, and let the mapping Φ : R d → F be a feature map that maps a sample data point x to a feature vector Φ(x) in a (usually higher-dimensional) feature space F, usually taken as a Hilbert space. The mapping Φ naturally gives rise to a so-called kernel through the relation where ·, · denotes the inner product on F. Note that, since K(x, z) is determined by the inner product of Φ(x) and Φ(z), it can be seen as a similarity measure between x and z in the feature space. The reproducing kernel Hilbert space (RKHS) associated with Φ is defined by Note that the functions w, Φ(x) can be interpreted as linear models in the feature space F. Now, assume that we are given a cost function C that measures the goodness of fit of a model by comparing predicted values {f (x i )} i with observed values {y i } i , and that has a regularization term g( f ), where g is a strictly increasing function.
Then, any function f ∈ R that minimizes the cost function C can be written as for some parameters α i ∈ R, i = 1, . . . , n.
Perhaps the most famous application of the kernel approach is support vector regression (SVR) [14]. SVR can be formulated as a convex optimization problem of the form P: min w,b,ξ,ξ where ε determines the error tolerance of the solution, C is a regularization parameter, and ξ i ∈ R and ξ i ∈ R, i = 1, . . . , n, are slack variables. It can be shown [11] that the dual formulation of P is given by and that the solutions of P and D coincide and are given by where α i = λ i − λ i . In order to fit the model (4) to data, we must first determine the In the classical paradigm, we would choose a tractable kernel such as the kernel corresponding to radial basis functions (the so-called Gaussian kernel), evaluate the kernel matrix, and finally fit the model to data by solving the optimization problem D. An alternative to classical kernels is provided by the so-called quantum kernels, which we will briefly review in the following section.

Quantum kernel estimation
In quantum kernel estimation, the kernel is determined by a quantum feature map. Following [7], [12] and [4], we let Φ : x → Φ(x) (or |Φ(x) using Dirac's notation) denote a quantum feature map that maps a data point to a quantum state which is an element Φ(x) of a Hilbert space H. Any quantum state ψ ∈ H naturally satisfies the famous Schrödinger equation where H is the Hamiltonian operator associated to the quantum system. If H is timeindependent, the solution to (5) is given by where the operator U defined by is the unitary time evolution operator associated with H. Thus, in analogy with (6), using the characterization of reproducing kernel Hilbert spaces (RKHS), it can be shown (see [10], [12], and the references therein) that for every pair (Φ, x) there is an operator U Φ (x), known in the field of quantum computing as a feature embedding circuit, that is implicitly determined by the relation where Ω 0 (also denoted |0 . . . 0 using Dirac's notation) denotes the ground state i.e. the quantum state with the lowest energy level (associated with the smallest eigenvalue of the generator of the operator U Φ (x)). Further, let the kernel K corresponding to Φ be given by As mentioned above, K(x, z) it is essentially a similarity measure between x and z in the quantum feature space H. It should be noted that the definition of the kernel (9) deviates from the form (1) that is common in the classical literature, in that it involves taking the absolute value squared of the inner product. This is due to the following: Using (8), the kernel can be written as that is, K(x, z) is given by the probability of obtaining the measurement outcome Ω 0 when measuring the quantum state Ψ(x, z) defined by where U † denotes the adjoint operator of U . The probability (10) can be estimated on a quantum computer by loading the state Ψ(x, z) into a quantum circuit. This circuit is then run multiple times, and (10) is estimated by the frequency of Ω 0 -measurements. Hence, the form of the kernel (9) allows us to readily estimate it using a quantum computer. The advantage of the quantum approach is that there exist kernels which are hard to evaluate classically that can, in theory, be efficiently determined by a quantum computer. This was recently demonstrated in [6].

Model description
We consider a population of insured individuals, divided into subgroups based on some common characteristics. Let E i be the number of healthy individuals from the population subgroup i, i = 1, . . . , n, in a given disability insurance scheme. We denote by D i the number of individuals falling ill amongst the E i insured healthy individuals. For each population subgroup i there is some associated data x i ∈ R d which may e.g. contain information about age, gender, and other characteristics of the population subgroup at hand. We assume that the conditional distribution of D i given E i is binomial: where p(x i ) is the probability that an individual randomly selected from E i falls ill. We propose to model the logistic disability inception probabilities using support vector regression: where K ∈ R n×n is a kernel matrix associated with the data {x i } i , and β ∈ R and α i ∈ R, i = 1, . . . , n, are parameters to be estimated from historical data. We propose to fit the model using a weighted support vector regression, with the weight for each sample proportional to the population subgroup size E i , placing higher importance on large subgroups where the sampling errors of the observed inception probabilities are lower. The logistic transform guarantees that the probabilities estimated from the model lie in their natural interval (0, 1).
In the classical paradigm we would choose a tractable kernel such as the kernel corresponding to radial basis functions or the linear kernel. We propose instead that this kernel be calculated using a quantum feature map, to be evaluated on a quantum computer. In order to fit the model (13) on a quantum computer, we must first choose a specific quantum feature map. This, in turn, determines the layout of our quantum circuit through (8). There are many ways to choose a suitable feature map, see e.g. [12]. We suggest to choose Φ such that it captures the richness of the data x while still being simple enough to be run on today's limited and noisy quantum computers. For simplicity, we will now assume that our data is two-dimensional, i.e. x i ∈ R 2 . Then, it is enough to use a two-qubit unitary operator to obtain estimates of K. To this end, we choose the unitary operator given by where R Y and R Z denotes rotations around the Y and Z axes of the Bloch sphere (a.k.a. the Riemann sphere), respectively, and C RZ denotes a controlled R Z operation on the second qubit, using the first qubit as a control. A graphical representation of the quantum circuit that implements this unitary is presented as follows: To facilitate interpretation, we let x i,1 be a dummy variable taking the value 1 if the population subgroup is male, and 0 otherwise, and x i,2 be the associated age of the population subgroup, measured in centuries. First, we apply a R Y (πx i,1 )-gate to q 0 . This flips q 0 from |0 to |1 for male subgroups. Then, we perform a R Y (πx i,2 ) rotation on q 1 . This operation rotates the state of q 1 from |0 towards |1 , with the angle of rotation increasing as the age of the subgroup increases. The C RZ (πx i,2 ) performs an additional rotation around the Z-axis of the Bloch sphere, with the angle of rotation increasing as the age of the subgroup increases. Note that this rotation is only performed if q 0 is in the state |1 , i.e. if the subgroup is male. Finally, we perform a R Y (πx i,2 ) rotation on q 0 .
For each data pair (x i , x j ), we run this quantum circuit inserting the values of x i , and then run the adjoint circuit inserting the values of x j . Finally, we perform a measurement on the two qubits. This circuit is run multiple times, and K(x i , x j ) is estimated by the frequency of obtaining the measurement Ω 0 := |00 . The resulting quantum circuit can be graphically represented as This circuit is designed to clearly separate male and female subgroups, and to gradually increase the dissimilarity between different age groups as the difference in ages increases. Note that for i = j, all rotations cancel out, and the circuit will measure the state |00 with probability 1, i.e. K(x i , x i ) = 1 as expected.

Numerical results
In this section, we estimate the kernel matrix associated with disability inception data from a Swedish insurance company. This kernel is then used in a support vector regression to estimate the logistic disability inception rates. The data consists of inception counts for 81 groups of individuals as well as the associated age and gender for each group.

Estimating the kernel matrix
We estimate the kernel matrix using the circuit from the previous section using two different techniques. Using (10) and (14), we classically compute K(x i , x j ) for each pair (x i , x j ) by matrix multiplication. Here, classically computing the kernel is possible due to the simple structure and low dimension of the unitary operator (14). This process is hereafter referred to as a state vector simulation. Figure 1 displays the estimated kernel matrix. This matrix has an interesting structure: it is block-diagonal. This is due to the fact that the second quadrant of the matrix correspond to the inner products of the female population groups. These share the common characteristic 'female', and each row is similar to its neighbours due to the encoding: similar ages are also similar in the quantum feature space. Analogously, the fourth quadrant of the matrix contains the male population groups. The first and third quadrants contain the inner products between male and female population groups, and so are dissimilar in the quantum feature space.
Next, we run multiple experiments on the IBM Yorktown 5-qubit quantum computer (of which we use two qubits) to obtain an estimate of (10). For each data pair (x i , x j ), we run the circuit 8192 times, measure the outcomes, and estimate K(x i , x j ) with the observed frequency of the |00 state. Figure 2 displays the estimated kernel matrix. Naturally, this simulation process introduces sampling error. Today's noisy and rather primitive quantum computers are unfortunately quite error prone, meaning that the total estimation error is often much larger than the sampling error. This issue can be partially mitigated using error correction techniques, see e.g. [13]. We note that this matrix deviates somewhat from the kernel matrix obtained by state vector simulation, but it has the same characteristics of the block-diagonal structure and an increasing dissimilarity with increasing age difference.

Fitting Swedish disability inception rates
We now fit the disability inception model (13) to data with support vector regression, using four classical kernel methods, i.e. the linear kernel, a polynomial kernel of rank 3, the radial basis functions kernel, and a sigmoid kernel. In addition, we fit the model to data using the quantum kernels based on a state vector simulation as well as from the IBM Yorktown 5-qubit quantum computer. The models are fit using leave-oneout cross-validation, so that for each train-test-split, a single out-of-sample logistic disability inception rate is estimated. After applying the inverse logistic function to obtain a disability inception rate, we then calculate the weighted R 2 statistic for the out of sample rates, again using the population counts as weights. The results are presented in Table 1. The state vector quantum kernel performs better than three out of the four classical kernels, the exception being the polynomial kernel. The Yorktown quantum kernel is only slightly worse compared to the state vector simulation.
The out-of-sample estimates are displayed in Figure 3. Note that, due to confidentiality, the actual values of the estimates are not reported. The support vector regression approach manages to capture the difference between the genders, as well as finding a pattern in the age dimension. The middle-aged population groups are larger than the others, meaning that the highest weights will be placed on these ages for the purposes of calibrating the model. The observations with very high or very low ages are consid- We note especially that the estimated inception rates from the Yorktown quantum kernel are comparable to the ones obtained from the state vector simulation, even though the estimated quantum kernels were themselves quite different. We believe that this is due to the fact that the characteristics of the kernel, namely the block-diagonal structure and the increasing dissimilarity with increasing age difference, were preserved, even though the actual kernel estimates differed significantly from each other. Seeing as the model under consideration fits the data well and produces errors that are comparable to today's classical methods, we conclude that estimating disability inception rates with quantum support vector regression is a viable statistical method even on today's noisy quantum computers. This bodes well for the future where complex and high-dimensional data might well be modeled and fitted accurately to data in a timely fashion using quantum computers. Age of population subgroup Inception probability Figure 3: Out-of-sample disability inception rates estimated by state vector simulation and from the IBM Yorktown quantum computer.