Homomorphic Model Selection for Data Analysis in an Encrypted Domain †

: Secure computation, a methodology of computing on encrypted data, has become a key factor in machine learning. Homomorphic encryption (HE) enables computation on encrypted data without leaking any information to untrusted servers. In machine learning, the model selection method is a crucial algorithm that determines the performance and reduces the ﬁtting problem. Despite the importance of ﬁnding the optimal model, none of the previous studies have considered model selection when performing data analysis through the HE scheme. The HE-based model selection we proposed ﬁnds the optimal complexity that best describes given data that is encrypted and whose distribution is unknown. Since this process requires a matrix calculation, we constructed the matrix multiplication and inverse of the matrix based on the bitwise operation. Based on these, we designed the model selection of the HE cross-validation approach and the HE Bayesian approach for homomorphic machine learning. Our focus was on evidence approximation for linear models to ﬁnd goodness-of-ﬁt that maximizes the evidence. We conducted an experiment on a dataset of age and Body Mass Index (BMI) from Kaggle to compare the capabilities and our model showed that encrypted data can regress homomorphically without decrypting it.


Introduction
Due to the continuous increase in computational power and the rapid development of processing and storage technologies, people use cloud computing to manage and analyze massive amounts of biomedical data generated from many sensors. Using cloud computing requires one to trust and share their data with the cloud server. Since all the data resides with the cloud server, a critical data issue arises if the cloud provider misuses the information. One of the best ways to guarantee data privacy is to encrypt data before sending it to the cloud server. However, there is still the problem of the cloud servers calculating the encrypted data to respond to the request from the client. The client needs to offer the secret key to the server to decrypt the data before executing the calculations, which could lead to breach of confidentiality or invasion of privacy. Homomorphic encryption (HE) is the appropriate solution for these issues as it allows the implementation of operations on encrypted data without decrypting it.
Machine learning provides systems with the ability to learn and improve automatically from experience without being programmed to do so. HE can be used to preserve privacy and can also be applied to machine learning to find useful information from big data. For example, HE based machine learning prevents leakage of personal information, while predicting a patient's disease in the healthcare industry or creating a model for assessing an individual's creditworthiness in the banking industry. Various machine learning methods have been studied using various HE libraries [1][2][3]. Several studies have been conducted about linear regression, logistic regression, K-Nearest Neighbor, and K-means in References [4,5] based on the proposed HE scheme.
In our previous work, we treated HE based model selection from a validation technique perspective. In this paper, we extend the prior work by adding HE based evidence approach in a Bayesian framework by proposing various ways to construct the final algorithm, including the matrix inverse approach in different ways. To determine the best model for the HE, we conducted experiments withe the proposed model using the same dataset and compared the results with the results from previously proposed model. The key contributions of this study were as follows: • The proposed model implements basic dynamic operations with bit-by-bit operations that take accuracy and speed into account. We introduced isomorphic logarithms and HE logarithms used for Bayesian model selection. • We also employed a matrix multiplication method based on our operations and expanded our computation to form matrix inverse operation using the Gauss-Jordan elimination method. • Finally, we proposed two ways of the first HE based model selection: Homomorphic cross-validation (HCV) and Homomorphic Bayesian model selection (HBMS), to determine the complexity of models that can explain the data well when given the encrypted data.

Homomorphic Encryption
Homomoprhic encryption (HE) is a scheme that allows arbitrary computations on encrypted data. It is based on the property that the result of operations between ciphertext is equal to the result of operations between plaintexts, hence the server can operate on the client's data without decrypting it. If the following equation holds, then the encryption scheme is called homomorphic over the • operation .
Enc(m 1 • m 2 ) = Enc(m 1 m 2 ), ∀m 1 where Enc is an encryption algorithm, and M is a set of plaintexts.
In 1978, the concept of privacy homomorphism was originally proposed as a modification to the RSA cryptosystem, which describes the concept of preserving the computations between ciphertexts as presented by Rivest et al. [6]. This concept has led to numerous attempts by researchers around the world to search for a homomorphic scheme with various sets of operations. However, the main hindrance that researchers faced for about 30 years was the limitation of the number of operations that could be evaluated. During arithmetic, the noise level is mounting, and at some point, it becomes too large that is impossible to proceed with the operation without losing its correctness.
In 2009, Craig Gentry presented the idea of a Fully homomorphic encryption (FHE) scheme that is based on NTRU (N-th degree Truncated polynomial Ring Units), a lattice-based cryptosystem that is considered Somewhat homomorphic encryption (SHE). It can hold up to a limited number of operations-that is, a few multiplications and many additions. A public FHE shceme is composed of 4-tuple of probabilistic polynomial time protocols and algorithms namely (KeyGen, Enc, Eval and Dec): • KeyGen(1 λ ) → (pk, evk, sk): KeyGen is key generation algorithm that takes a security parameter λ and outputs public encryption key pk, public evaluation key evk and secret decryption key sk.

•
Enc(pk,m) → ct: Encryption algorithm Enc takes pk and a vector of plaintext message m∈{0, 1}, and outputs a ciphertext ct.
• Dec(sk,ct) → m * or ⊥: Decryption algorithm Dec takes sk and a ciphertext ct, and outputs a message m * ∈{0, 1} if ct is result of Enc(pk, m) and pk is matched to sk, otherwise, it outputs ⊥.
The key idea proposed in Gentry's work is bootstrapping, which is the process of refreshing a ciphertext and maintaining a lower level of noise to produce a new ciphertext so that more homomorphic operations can be evaluated. However, the bootstrapping procedure required adding many ciphertexts to the public key, and encrypting the individual bits (or coefficients) of the secret key. This resulted in very large public keys, and plaintext has to be bit-by-bit encrypted, increasing the capacity of the ciphertext increases at each step, making it too expensive in terms of computations.

Our Novel Approach Based on THFE Library
Various HE libraries (e.g., References [7][8][9][10][11]) have been suggested based on Gentry's scheme [12]. Chillotti et al. released TFHE (The Fast Fully Homomorphic Encryption over the Torus) [13] library, which was designed from FHEW(Fastest Homomorphic Encryption in the West). It is based on both learning with errors (LWE) [14] assumption and its ring variant [15]. It significantly improves the performance of the bootstrapping operation-less than 0.1 s and facilitating an unlimited number of operations based on a gate-by-gate bootstrapping procedure. Morever, the bootstrapping key size was scaled down from 1 GB to 24 MB to maintain the security level while reducing the overhead caused by noise.
The library supports the homomorphic evaluation of the binary gates (AND, OR, XOR, NOR, NAND, etc.), as well as the negation and the MUX gate, which can be used for various operations. Additionally, there is no restriction on the number of gates or their compositions, making it possible to perform any computations over encrypted data.

Bitwise Representation of Number
We performed homomorphic encryption of plaintext bits, that is, each bit within the plaintext is encrypted with the same key. Therefore, using our notation, given the plaintext a, the result of the encryption stage yields a ciphertext ct.a, which is an array containing encrypted bits. This is different from the mainstream approach since it is lattice-based and initiated from the input's integer values. Specifically, integer-based FHE requires the encoding process through the rounding operation of plaintexts for the conversion of real number input. Through the rounding process, the outcome contains an error.
Our approach is more accurately designed to solve the problem of conversion from real number to integer. In general, floating-point number system guarantees a broader range of input; however, it can also lead to more complex algorithms when using HE scheme. HE gate operations take a significant amount of time compared to plaintext gate operations, thus using floating-point is less efficient. On the other hand, the major advantage of using a fixed-point representation is that it is a simple integer arithmetic operation with much less logic than floating-point, which improves performance by reducing the bootstrapping procedure on every encrypted bit. Therefore, it is crucial in FHE to reduce, so we only adopt a fixed-point number system in this paper.
We assigned r 2 , r 2 − 1 and 1 bit for integer, decimals and signed bit, respectively. The encryption of each bit is designated in the same position as in the plaintext. In consequence, the values between plaintext and its corresponding ciphertext are precisely equal. By assigning a lengthy input, we guarantee higher accuracy with a tradeoff of the execution time, increasing dramatically. Typically, HE gate operations take a significant amount of time compared to plaintext gate operations. This is due to heavy loads of noise accumulated through HE gates. Therefore, realistically, we assigned the length of the input as 8, 16, and 32 for experimentation. Figure 1 shows an example of an 8-bit fixed-point representation of number 6.75 using our approach.

Bitwise Operation
We illustrate some of the critical concepts, such as our basic scheme and functions, to help us understand. However, in order to grasp fully-understanding of our approach, interested readers should refer to the previous works of our approach [4,5,16,17]. For the sake of the flow, we introduce the very basics of our approach.
The goal of HE operations is to construct a HE function that yields the encrypted result that matches the plaintext operation. For instance, result of plaintext addition, a + b should match the result of the HE addition, ct.a + HE ct.b where + HE represents HE addition. These basic HE operations are built from the combination of HE gate operations. We provide a simple illustration of our method in constructing HE absolute value operation to show the difference between plaintext and ciphertext.
Suppose we have a plaintext value a which is then converted to fixed point number constituting 0 and 1. If the goal is to derive an absolute value of a in plaintext condition, it is not hard to see that by using 2's complement method, one can easily obtain the absolute value depending on the sign of a. However, in the encrypted state, since the sign bit of ciphertext, ct.a[r − 1] where r is the length of the input, is encrypted or not known, we have to consider both cases where a is positive or negative. Thus, we have to perform both negative and positive cases of ct.a. Consider the case when ct.a is positive, then the sign bit is Enc[0]. Using NOT gate on the sign bit, our encrypted value becomes Enc [1] and call it ct.s. With the value ct.s, if we perform AND gate operations to every bits of ct.a, we will obtain ct.a when ct.a is positive. If ct.a is negative, then ct.s is Enc[0] which, in the same manner, yields Enc[0] after executing AND gate operations to every bits of ct.a.
Also in the negative case, we have to perform 2's complement operation on ct.a, and denote the outcome of operation to be ct.n. Likewise, AND gate operations on every bits of ct.a and the sign bit, which is Enc[0] provides the result of negative case. Therefore, by adding two results of positive and negative cases, one can obtain an encrypted result of absolute value operation starting from the encrypted number of a. In this way, we constructed various HE operations from HE bootstrapping gates. Now, we provide basic notations of gates and operations that are used in our work. Table 1 demonstrates our homomorphic operations and homomorphic functions used in this study.

HE Bitwise Logarithm
We introduced absolute value function in the previous section to demonstrate that there should be other measures in the plaintext algorithm in order to obtain the same result in the encrypted domain. Now, we suggested another key HE function logarithm used in derivation of evidence function. The details of the HE function are well explained in Reference [16]. In this paper, we briefly explain its method and show an example to illustrate our approach.
To design logarithm function in plaintext, we derive y = log 2 x in the first stage. Next, we obtain the general form of logarithm with different base. We assume x and y to be real numbers s.t. x ∈ (1, 2) and y ∈ [0, 1). Then, y can be written as where binary representation of y ∈ [0, 1) is [0, 0, · · · , 0, b r 2 −2 , · · · , b 1 , b 0 ]. Since y = log 2 x can be rewritten as x = 2 y , we obtain nested-form of the equation as the following.
By squaring x, the formula becomes . Since b i is either 0 or 1 and if x 2 ≥ 2, b i is equal to 1 otherwise 0. For the case of b i = 1, we divide x 2 by 2. Following the procedure recursively provides bits b i or fractional part of y.
So far, the above procedure is to obtain decimals of y that are less than 1. If y ≥ 1, we perform the above procedure for the fractional part of y whereas we only count the index of the position of most significant bit of x for deriving integer of y. In conclusion, summation of the two outcomes is the result of y = log 2 x.
This plaintext mechanism is useful to our scheme on the encrypted domain since the process of obtaining decimals mainly involves two fast-operations: comparison and shift. However, the approach to solve the problem should be different. Given the problem of ct.y = log 2 (ct.x) ct.x, our initial work is to gain ct.x ∈ [Enc(1), Enc (2)). This is not an easy task compared to plaintext situation where we can easily shift bits of x to normalized x s.t. x lies in between 1 and 2. Instead, we make a detour using our HE functions to process normalization of a ciphertext. This is the first step that we perform that is different from plaintext domain. Algorithm 1 is the process of normalization of ct.x.
ct.a ← HomShift(ct.x) by r 2 − i 6: end for 7: ct.p ← Add all the elements of ct.o[i] 8: ct.p ← Subtract ct.p by Enc( r 2 ) 9: ct.e i ← HomEqualCompare(ct.s i , ct.p) 10: ct.r i ← Bitwise bootsAND(ct.e, ct.a) 11: ct.r ← Add all the elements of ct.r i 12: return ct.r Next, we proceed to obtain fractional bits of ct.x from the Algorithm 1. The problem in the encrypted domain is to decide whether the square of ct.x is greater than or equal to Enc(2). We use HomLargeCompare function to compare the values of ciphertexts, Enc(2) and ct.x 2 that returns as Enc(0) if the former is larger than the latter value. In addition, with this ciphertext, we can decide whether to shift ct.x 2 by 1 or not. Our explained method of obtaining encrypted decimals of ct.y is listed in the Algorithm 2.

Time Complexity of Designed Homomorphic Operation
The experimental environment configuration settings are as follows. All computations ran on a computer with 32 GB RAM, Intel Core i7-8700 CPU 3.2 GHz(Intel, Santa Clara, CA, USA), Ubuntu 18.04 and we used TFHE library version 1.0.1. was used. We measured computational speeds of 1-bit basic gate operations for 1000 times in TFHE. All bootstrapping HE gates except NOT and MUX gates took about 10.8 ms to evaluate. The MUX gate took 20.9 ms and NOT gate takes 0.000154 ms, which is significantly lower than other gates. Therefore, we ignored the speed of NOT gate in calculating execution. We denoted time complexity of all binary gates as T B except Mux gate which was denoted as T X . Table 2 shows performance time for each operation in detail.
As more bits are assigned to the input value, execution time increases dramatically because it involves many operations, including addition, subtraction and comparison that increase linearly with the length of data and their interactions.

Model Selection
Most statistical inference approaches for analyzing the data aim to make "good" predictions. However, "good" predictions we cannot be obtained if the proper models are not chosen in the first place. Worse, there exists no such perfect model that is generally suitable for any data. Therefore, a crucial step in data analysis is to consider a set of candidate models and then select the most appropriate one. This is called model selection, one of the most important and essential steps to obtain stably accurate results in data analysis. Also, the model selection can be divided into two main branches because the meaning of the 'model' is interpreted differently in various fields.

•
A Model regarded as an algorithm : It selects the most appropriate process among different machine learning approaches-for example, support vector machine (SVM), KNN, logistic regression, and so forth. • A Model regarded as a complexity: It selects among different hyperparameters in a set of features for the same machine learning approach-for example, determining an order between polynomial models in linear regression.
In this study, we focus on model selection with various complexities, by adapting numerical solutions for the regression analysis.
Polynomial regression is a type of linear regression that refers to the relationship between the independent variable x and dependent variable y modeled as Mth degree polynomial: In Equation (3), θ is a set of polynomial coefficients, and it is determined by fitting the polynomial to the training data by minimizing the errors. M is the order of the polynomial, so we need to estimate the optimal order M * to obtain the best prediction results. Generally, model selection criteria are based on an estimator of generalization performance evaluated over the data. The models following M can be evaluated as estimated errors that are decomposed by bias and variance. If the model is too simple to describe the data, there is a high probability of a high bias and low variance called under-fitting. On the contrary, over-fitting occurs when a complex model has low bias and high variance. In machine learning, an over-fitted model may fit perfectly into training data but may not be suitable for new data.
To avoid under-fitting and over-fitting, one must select an appropriate model with optimal complexity. Thus, we need to find a way to determine the right value between models with different complexity. A considerable number of selection procedures were proposed in the literature, for example, the AIC(Akaike Information Criterion) method [18], the Cp method [19], the BIC(Bayesian Information Criterion) method [20], the Cross-Validation (CV) method [21], Bayesian evidence methods [22], and so forth. In this study, we developed two model selection algorithms that can work in encrypted domain for the Cross-validation (CV) and Bayesian model selection.

Cross Validation
CV is one of the most widely used methods to evaluate predictive performances of a candidate model in model selection. CV estimates the expected error, and does not require the models to be parametric. It makes full use of data without leaking information into the training phase. Regarding data splitting, some of the data is used for fitting each model to be compared, and the rest of the data is used to measure the predictive performances of the models by the validation errors. Through these processes, the model with the best overall performance is selected.

Bayesian Model Selection
Bayesian model selection is also a well-known approach for choosing an appropriate model. The Bayesian paradigm offers a principle approach that addresses the model choice by considering the posterior probability given a model. The Bayesian view of model comparison includes the consistent application of the rules of sum and product of probabilities, as well as the use of probabilities that represent uncertainty in model comparison [23]. More precisely, suppose that the comparing models can be enumerated and indexed by the set {M i : i = 1, · · · , N }. It represents the probability distribution for the observed data D generated by model M i . The posterior distribution for a set of model parameters is: where p(D|θ, M i ) is the likelihood and p(θ|M i ) represents the prior distribution of the parameters of M i . The model evidence for model M i , based on the product and sum rule as, represents the preference shown by the data for different models, and it is also called marginal likelihood because it can be viewed as a likelihood function over the space of models in which the parameters have been marginalized out.

Problems
Our approach is different from the mainstream of the homomorphic encryption scheme. We addressed several critical underlying problems that the current research holds.

Model Selection in Homomorphic Encryption
First, a majority of the current literatures that links HE to machine learning are confined to solving a "modeled" problem. Based on their HE operations, they look for a suitable data distribution that can show high performance of their functions. Kim et al. [1] and Aono et al. [3] achieved high performance of HE logistic regression on encrypted data in terms of accuracy and speed, using iDASH dataset and Pima diabetes dataset. But, they achieved the high performance mainly because they choose the appropriate data distribution that their performance of HE logistic regression can significantly benefit from.
However, if it were to perform logistic regression on unknown(or encrypted) data distribution, it is unlikely that the same results would be achieved. Therefore, we proposed model selection as a first step to decide the appropriate model for fitting unknown data distribution. We claim that our approach should be more fundamental because it provides an adequate model for encrypted data distribution preceded in the first round.
Furthermore, implementing an inverse of the matrix multiplication(and division) in the model selection process is a particularly challenging task in HE. Reference [24] discusses inverse matrix operation through the Schulz algorithm under the HE domain, however, their scheme does not have a division operation. Therefore, they inevitably must adopt an approximation method, in order to perform the inverse matrix operation during the model selection process. In this process, the error increases linearly through iterations. However, our approach, Gauss-Jordan elimination, based on bitwise operations mentioned in Section 2, can accurately derive the inverse matrix results. The proposed approach is the Gauss-Jordan elimination. The most useful property of the method is that reduced row echelon form is unique, which means row-reduction on a matrix will produce the same answer regardless of how the row operation are performed, which is essential to get a definite answer in the encrypted state. In the following, we propose homomorphic matrix operations to discuss a method of obtaining an inverse matrix.
Through our previous works [5,16], we designed nonlinear functions such as logarithm and exponential functions that can be applied more fundamentally and used them in the Bayesian model selection process. However, the mainstream approach does not support nonlinear functions such as exponential function and logarithm function [4,25]. Instead, they approximate these types of functions in terms of Taylor series or least square approximation in the form of summation of polynomials.
This approach is only limited to solving logistic regression problem and cannot be used to design other types of algorithms.

Problems of Cross-Validation, an Existing Model Selection Method
In the previous study [17], we used CV as an evaluation method for model selection. When performing CV using two sets of factors-the training and the test sets, the number of folds(k) affects the performance of our approach in terms of speed. It is known that the larger the value of k, the more accurate the results. However, in our research, the cost of HE operations' execution time is significantly high, balancing accuracy and speed. Thus, we need other methods that offset the shortcoming of computation time for our HE functions.
To overcome these shortcomings, we proposed model selection from a Bayesian perspective. This method avoids the over-fitting problem by performing marginalization based on the parameter instead of the point estimation of the model's parameter values. In this case, the models can be directly compared based on the training set, hence a verification set is not required, which avoids multiple training for each model needed to implement the CV method, reducing computational time in the long run.

Homomorphic Matrix Operations
We begin with matrix multiplication and the inverse of a square matrix before implementing the model selection algorithm.

Homomorphic Matrix Multiplication
As more complex algorithms require many matrix operations have grown, performing matrix operations on HE has become equally important. Let A = [a ik ] and B = [b kj ] be two square matrices of size n × n. Using the component-wise definition of matrix multiplication, C = [c ij ] can be expressed as The matrix multiplication process requires multiplication and addition of its entries. First, homomorphic multiplication method is introduced in Algorithm 3. When entries of encrypted matrices ct.A and ct.B enter the input value, multiplication runs mainly on addition. Specifically, assuming the multiplication is between r bits, the multiplication operation converted to r number of additions using only AND gate and Right-Shift operations. Therefore, when we execute multiplication of entries of r bit, the result is obtained by j times of right shift and i ∈ {0, 1, ..., r − 2} times of summation. To obtain results regardless of the sign of the given data, we calculate the positive product using absolute value operation and then perform 2's complement operation on the result according to its sign. Finally, the integration of part-by-part calculations is carried out, and the calculated value shown as a result. The outcome is double the length of the input, therefore, we limit the boundary of result to r to obtain an adjusted result in which the lengths of the partial and integer parts are equal to our setting.
The Algorithm 4 shows the addition operation using encrypted input values, which is the last part of the dot product of the matrix multiplication operation. As with the method in the plaintext, is based on a full adder scheme. Addition can be designed simply due to basic gate operations with bootstrapping, so only 2 XOR, 2 AND, and 1 OR gates are used in this process.
Thus, Algorithm 5 outputs the multiplication of matrices ct.A and ct.B through homomorphic multiplication and homomorphic addition. for j = 1 to n do 4: for k = 1 to n do 5: ct.c ij ← HomMulti(ct.A ik , ct.B kj ) 6: ct.C ← HomAdd(ct.C, ct.c ij ) 7: end for 8: end for 9: end for 10: Return ct.C The performance evaluation of this approach is as follows. The order of square matrix and the length of data are set as a factor of matrix inversion. The time complexity T matmulti for homomorphic matrix multiplication can be represented as follows: T matmulti = (8r 2 + r + 1)n 3 T M + (r 2 + 2r)n 3 T X .

Accurate Homomorphic Matrix Inversion
If matrix W is invertible, then every equation Wx = p has a unique solution. Gauss-Jordan algorithm computes W −1 given W. There are many ways to find the inverse matrix. Examples include such as Cramer's rule, Gaussian elimination, and Gauss-Jordan elimination. Cramer's rule is computationally intensive compared to other methods, and roundoff error may become significant on large problems with non-integer coefficients. Gauss-Jordan elimination is a modification of the Gaussian elimination that reduces the computations involved in back substitution by performing additional row operations to transform the matrix from echelon form to reduced row echelon form (rre f ). In general, the Gauss-Jordan method may require a more significant number of arithmetic operations as the size of matrix increases, however, the rre f of a matrix has the advantage of guaranteeing uniqueness. The elimination process consists of three possible steps called elementary row operations: swap two rows, scale a row, and subtract a multiple of a row from another. Before we construct our algorithm, we first figure out how it works. If I is an identity matrix, the equation can be express as follows.
The equation is still same if we multiply both terms by a non-singlar matrix V 0 : The trick of the Gauss-Jordan elimination consists of finding a series of matrices V 0 , V 1 , ..., V n−1 so that The process expression must be true for P so, x is the solution of Wx = P, by definition, V n−1 ...V 1 V 0 I ≡ W −1 . Thus, given W, the Gauss-Jordan algorithm works exactly this way, it computes W −1 .
• phase1: Convert ct.W into the row echelon form by eliminating all ct.W jk in equation below the diagonal. By this phase of elimination, all elements below the diagonal are eliminated, and ct.W becomes an upper triangular matrix. • phase2: Convert the re f obtained into a diagonal matrix, called rre f , by eliminating all ct.W jk in equation ct.W j above the diagonal. By this phase of elimination, all elements above the diagonal are eliminated, and ct.W becomes a diagonal matrix, which can be turned into ct.I.
The performance evaluation of this approach is as follows. The order of square matrix and the length of data are set as a factor of matrix inversion. The time complexity T inverse for homomorphic matrix inverse can be represented as follows:

Homomorphic Cross Validation Approach
The method proposed using the Equation (3) to construct the model selection algorithm is polynomial regression. Given a set of N points (x t , y t ) N−1 t=0 for all n, the goal is to fit the data with a polynomial of a degree M and the least-square method figure out the coefficients θ t to minimize the squared error function The coefficient gets its global minimum when the gradient of R is zero, or the partial derivatives of R must be zero for 0 ≤ j ≤ M: It simplifies to This equation is a linear system of (M + 1) equations in (M + 1) unknown coefficients and is of the form X T Xθ = X T y. (15) where X=[a ti ] is an N × (M+1) matrix whose general entry is a ti = x i t , and θ = [θ j ] is a (M + 1) × 1 column vector and y=[y t ] is an N × 1 column vector. The coefficients of the polynomial are The assumption that x k are increasing guarantees that X T X is invertible, so the least square estimator of θ is given by θ = (X T X) −1 X T y (17) and Algorithm 7 describes the details of the equation. K-fold CV method is used to evaluate the model selection performance. The values of the coefficients θ is determined by fitting polynomial to the data. This can be obtained by minimizing the error function called mean − square − error(MSE). It is widely used for the sum of the squares of the errors between the prediction y(x t , θ) for each data point x t and the corresponding target values z t , For K subsets, we compute the SSE of each subset, R(θ) k , on the training set and record the total error on the validation set then compute the average error over all folds. Finally, we choose the value of the parameter that minimizes the average error.
Based on TFHE, we implement the K − f old homomorphic CV for the model selection. We first initialize part of training and test error matrix elements to zero and updated our elements using the model selection algorithm as a polynomial regression function. During this process, the value of the elements are computed through homomorphic operations and the solution of matrix inverse (ct.X T ct.X) is multiplied by the vector ct.y. After K-fold CV, final return value stands for average MSE at each polynomial degree.

Homomorphic Bayesian Model Selection Approach
Another way to overcome over-fitting associated with maximum likelihood is marginalizing (summing or integrating) over the model parameters instead of making point estimates of corresponding values. Therefore, the model can be directly compared to the training data without the need for a validation set, avoiding multiple training runs for each model associated with CV. To do so, we focus on fully Bayesian predictive distribution for our regression models.
Evidence function can directly link the hyperparameters α, β, and observations D by marginalizing the parameters θ in the Bayesian framework, that is, we adopt prior distributions over hyperparameters and make predictions by marginalizing with respect to α, β and parameters θ, where z is target value, p(z|θ, β) is defined by N (z|y, β −1 )z and p(θ|z, α, β) is defined by N (θ|µ N , V N ) with µ N =βV N H T z and V −1 N = αI+βH T H. According to Bayes' theorem in the previous section, the posterior distribution for hyperparameter is given by The evidence function p(z|α, β) is obtained by integrating over the parameters, so where p(z|θ, β) is obtained by taking the logarithm of the likelihood function and making use of the standard form for the univariate Gaussian and p(θ|α) is defined by N (θ, 0, α −1 I). M is the number of parameters in the model and N is the number of data. R(θ) is defined as and R(µ N ) is together with where H is the design matrix. The right hand side of the equation shows the application of the square over parameter through recognition as being equal to a constant of proportionality to the regularized SSE function. Hessian matrix L is same as V −1 N and it implies that a square matrix of second-order partial derivatives of a scalar-valued function, L = R(θ). µ N is also given by µ N = βL −1 H T z. Based on the equation above, the integral over parameter can evaluate by appealing to the standard result for the normalization coefficient of a multivariate Gaussian. Moreover, the log of the marginal likelihood created using Equation (22), which is the required expression for the evidence function.
The optimal model complexity determined by the maximum evidence is given through the equation above. Based on TFHE, the evidence approximation for the model selection is presented in Algorithm 8.

Implementation
In this section, we present the results of the model selection for encrypted data. Two types of datasets were used to implement the model selection algorithm. The input value is encrypted with 16-bits in length and used for calculation. In the first experiment, we confirmed that the HE Gauss-Jordan Elimination algorithm works correctly with various HE functions. To expand the scope, we tested the algorithm on a real-world dataset from Kaggle and used it in our experiment to see that the algorithm works well for many features.

Toy Dataset
We use artificially created dataset comprising 20 observations of x, together with corresponding observations of the values of y to check performance and evaluation of our algorithm. Figure 2 shows a plot of a training set comprising N = 10 data points and test set comprising N = 10 data points respectively.
Then we evaluated through two different ways to achieve good generalization by making accurate predictions for new data. In this process, we use root − mean − square − error (RMSE) as opposed to MSE because the square root ensures that RMSE is measured on the same scale as the variable y, so it is easy to interpret. Figure 3 shows the results of implementing two approaches of our HE model selection. Toy dataset is too small to perform CV procedure, therefore, we focused on training and test error of each order of M. Values of M in the range 3 M 7, give small values for the test set error in the Figure 3a. For 9th polynomial, the regression model fits exactly to all data points, therefore, RMSE error converges to 0 in training dataset. In this case, the values of 6th provide a smaller error, making it is the best predictor of new data when M is 6.
In the evidence approach, we fixed α = 5 × 10 −3 , β = 1.5 and performed our algorithm and when M = 3, giving the best evidence of all polynomials. Additional increases in the value of M make only small improvements in the fit to the data but increases the complexity penalty, which decreases the overall evidence value. Table 3 shows the evidence values for each order in the plaintext and ciphertext through the same experimental method. The difference in the evidence values obtained from the two results indicates that there are not many errors for the experiment in the encrypted state. The error occurs in the process of converting a real number plaintext into a ciphertext, which can be reduced by increasing the number of input r − bits.

Body Mass Index (BMI) Dataset
BMI, consisting of various sensors including ultra-sonic, is a measurement of a person's weight based on his/her height. BMI is calculated by dividing the weight of the body in kilograms by the square of height in meters, and the result of BMI indicates if that person is obese, overweight, regular, or underweight. BMI classification does not depend on age, therefore, we determined the tendency between them through regression. For experimental evaluation, we use "Medical Cost Personal Datasets" from Kaggle [26]. The dataset consists of age, sex, BMI, number of children, region, and so on, however, in this experiment, we used only two types of age and BMI to find out the correlation between them. The total number of data for each type is 1339, and we sampled 1000 data respectively to proceed model selection.
Before starting the homomorphic CV approach, we standardized out dataset because variables measured at different scales do not contribute equally to the analysis. Our dataset contains features highly varying in magnitudes, units, and ranges. Standardization (or Z-score normalization) is the process of rescaling the features to have the properties of a gaussian distribution with µ = 0 and σ = 1, where µ is the mean and σ is the standard deviation from the mean. Figure 4a shows 9th order of polynomial regression for standardized data and Figure 4b shows standardized data distribution of each dataset.

Body Mass Index (BMI) Dataset
BMI, consisting of various sensors including ultra-sonic, is a measurement of a person's weight based on his/her height. BMI is calculated by dividing the weight of the body in kilograms by the square of height in meters, and the result of BMI indicates if that person is obese, overweight, regular, or underweight. BMI classification does not depend on age, therefore, we determined the tendency between them through regression. For experimental evaluation, we use "Medical Cost Personal Datasets" from Kaggle [30]. The dataset consists of age, sex, BMI, number of children, region, and so on, however, in this experiment, we used only two types of age and BMI to find out the correlation between them. The total number of data for each type is 1339, and we sampled 1000 data respectively to proceed model selection.
Before starting the homomorphic CV approach, we standardized out dataset because variables measured at different scales do not contribute equally to the analysis. Our dataset contains features highly varying in magnitudes, units, and ranges. Standardization (or Z-score normalization) is the process of rescaling the features to have the properties of a gaussian distribution with µ = 0 and σ = 1, where µ is the mean and σ is the standard deviation from the mean. Figure 4a shows 9th order of polynomial regression for standardized data and Figure 4 b shows standardized data distribution of each dataset.  For training and test sets of randomly selected age and BMI values, we assumed that each set represents an entire dataset. We performed 5 − f old CV technique to estimate RMSE. A polynomial problem associated with implementing nth order regression on the computer is that the normal equations tend to be ill-conditioned, especially for higher-order versions. Thus, as with the toy dataset, we varied complexity by using our algorithm that range in model order from 0 (least complex) to 9 (most complex). Figure 5a shows training, test, validation, and 5 − f old CV results. The validation dataset is a part of the training set that is sacrificed to evaluate the performance of the model when the CV method is not used. In order to use all the data through the 5 − f old method, we put the training, test, and validation set together and divide them into five equally partitions, and use sequentially four  For training and test sets of randomly selected age and BMI values, we assumed that each set represents an entire dataset. We performed 5 − f old CV technique to estimate RMSE. A polynomial problem associated with implementing nth order regression on the computer is that the normal equations tend to be ill-conditioned, especially for higher-order versions. Thus, as with the toy dataset, we varied complexity by using our algorithm that range in model order from 0 (least complex) to 9 (most complex). Figure 5a shows training, test, validation, and 5 − f old CV results. The validation dataset is a part of the training set that is sacrificed to evaluate the performance of the model when the CV method is not used. In order to use all the data through the 5 − f old method, we put the training, test, and validation set together and divide them into five equally partitions, and use sequentially four of them as a training set and the remaining one as a test set. The performance measure reported by 5 − f old CV is then the average of the values computed in the loop. We repeated the procedure 50 times to obtain the total RMSE values of each dataset. The boxplot in Figure 5b reveals the tendency of distribution for the dataset. In the CV approach, the best estimator corresponds to a polynomial model of order 4. of them as a training set and the remaining one as a test set. The performance measure reported by 5 − f old CV is then the average of the values computed in the loop. We repeated the procedure 50 times to obtain the total RMSE values of each dataset. The boxplot in Figure 5 b reveals the tendency of distribution for the dataset. In the CV approach, the best estimator corresponds to a polynomial model of order 4. To maximize model evidence over model order M, we set up hyperparameter α = 1× 10 −6 , β = 18.3. As with the CV approach, we figure out the evidence approximation up to the 9th order and when the order of polynomial model was 2, it corresponded to the best estimator corresponds as with Figure 6. We measured the time performance of each HE model selection approach and the execution times were 1764 and 480 min, respectively.  To maximize model evidence over model order M, we set up hyperparameter α = 1× 10 −6 , β = 18.3. As with the CV approach, we figure out the evidence approximation up to the 9th order and when the order of polynomial model was 2, it corresponded to the best estimator corresponds as with Figure 6. We measured the time performance of each HE model selection approach and the execution times were 1764 and 480 min, respectively.

Limitation
Although our proposed approach improves computational resources, it still has a few limitations. In an encrypted domain, each operation speed is very slow as the memory space is large. However, this problem can be solved by accelerating calculations with many state-of-art technologies. Using FPGA(Field Programmable Gate Array), ASIC(Application Specific Integrated Circuit), GPU(Graphic Processing Unit) [27][28][29], the computations can be offloaded to the HE co-processor.
It improves the performance of certain HE tasks by performing directly in hardware. The CPU vector computation extension using SIMD(Single Instruction Multiple Data) [30] also improves efficiency. Therefore, we can combine these approaches into a hybrid system to create a fully optimized solution.

Conclusions
To date, privacy-preserving machine learning is one of the key research areas in computer security and machine learning. Homomorphic encryption is one of the most theoretically sound approaches for privacy-preserving machine learning and data mining. However, most algorithms made using homomorphic encryption do not consider the model's complexity and stability. It is not straightforward to evaluate the stability of the model given data in the homomorphic encryption since all data are encrypted. Therefore, homomorphic model selection methods are an essential tool for data analysis in the encrypted domain, especially for big data obtained from biomedical sensors. If the model is not properly chosen, the results cannot be trusted, and it can be a disaster for the biomedical domain. Therefore, homomorphic model selection algorithms that provide both privacy and stability. Therefore, we introduced two different methods for constructing the homomorphic model selection-homomorphic Cross-Validation (HCV) and homomorphic Bayesian model selection (HBMS). In addition, both model selection approaches require homomorphic matrix inversion. We designed and developed a homomorphic matrix inversion using Gauss-Jodan elimination in the homomorphic encryption scheme, specially, using a homomorphic mux operation. Finally, based on the results of the two different approaches, we confirmed that the HBMS approach is more efficient in encrypted domains.