Statistical Approaches Based on Deep Learning Regression for Verification of Normality of Blood Pressure Estimates

Oscillometric blood pressure (BP) monitors currently estimate a single point but do not identify variations in response to physiological characteristics. In this paper, to analyze BP’s normality based on oscillometric measurements, we use statistical approaches including kurtosis, skewness, Kolmogorov-Smirnov, and correlation tests. Then, to mitigate uncertainties, we use a deep learning method to determine the confidence limits (CLs) of BP measurements based on their normality. The proposed deep learning regression model decreases the standard deviation of error (SDE) of the mean error and the mean absolute error and reduces the uncertainties of the CLs and SDEs of the proposed technique. We validate the normality of the distribution of the BP estimation which fits the standard normal distribution very well. We use a rank test in the deep learning technique to demonstrate the independence of the artificial systolic BP and diastolic BP estimations. We perform statistical tests to verify the normality of the BP measurements for individual subjects. The proposed methodology provides accurate BP estimations and reduces the uncertainties associated with the CLs and SDEs using the deep learning algorithm.


Introduction
Blood pressure (BP) is a key consideration when making decisions about the cardiovascular activity of patients. The oscillometric method has lately been popularly and is utilized in automatic BP measurement devices that are now readily accessible in the marketplace. Though oscillometric BP device's manufacturers rarely disclose algorithms, but the literature mentions that the maximum amplitude technique is most commonly used to estimate arterial blood pressure (ABP) [1]. The mean arterial BP is easily estimated via the cuff pressure whereby the oscillation amplitude becomes maximal utilizing the oscillometric waveform when the cuff pressure matches the average arterial BP. The compliance of the arterial wall is then maximized, so the ABP volume change with respect to the maximum change in the arterial pressure and the amplitude of the oscillation signal in the cuff is also at maximum. Even though the ABP estimation relies on the oscillometric technique, it causes a variety of errors because the BP signals are characterized by continuous variations. Specifically, the systolic BP (SBP) and diastolic BP (DBP) are subject to significant and continuous change over time [2] in response to physiological oscillations due to factors such as food intake, emotional status, level of exercise, and disease states, such that they can fluctuate to 20 mmHg for several heartbeats [3]. However, the SBP and DBP are generally estimated at two random instances in time with respect to the systolic and diastolic ratios (SBPR and DBPR) in the oscillometric waveform envelope [2], which offers no information on the significant BP fluctuations. There is no established criterion for determining accurate SBP and DBP or SBPR and DBPR [4]. A journal editorial asserted that there is an acutely aware of biological fluctuations [5]. The element of accurate BP estimation follows: the first is the accuracy of the measurement system and the second is the physiological variability. The first source of uncertainty is represented by BP measurement error according to the ANSI/AAMI SP 10 standard [6], in which physiological variations of the BP are overlooked by the great majority physicians. Therefore, oscillometric BP devices generally provide a single parameter with no confidence limit (CL) to distinguish them from the fluctuations of physiological characteristics [5]. If CL estimates are applied to the BP monitor, the results of the wide CL estimate on the BP value can be determined by the unstable BP value. Conversely, a narrow CL estimate can be determined with a stable BP value. However, since repeatable status to recreate measurements cannot be assured, it is practically and economically impossible to acquire lots of measurements to one subject utilizing an oscillometric-based BP monitor [1]. In such a real situation, CL should be estimated using a small number of BP values measured over a short time. Thus, using the bootstrap technique, a CL estimate can be acquired using a BP value of a small sample size [1]. Unfortunately, this approach assumes that BP based on the oscillometric measurement for individual subjects is an independent and identical distribution (iid), which supposes that the populations for which BP are measured are normally distributed.
The assumption of normality is especially critical when determining CLs for BP measurement for individual subjects. The normality assumption is a key factor in the reliability of estimations and statistical tests [7]. Thus, this assumption should be taken seriously because it is impossible to estimate BP accurately when this assumption does not hold. For this reason, the assumption of normality should be verified with respect to BP measurements for individual subjects.
In this investigation, our research objective is to use statistical approaches including kurtosis, skewness, Kolmogorov-Smirnov (KS), and rank tests to examine the normality of BP values based on oscillometric measurements. We also estimate the SBP, DBP, and CLs of these values based on the normality for individual subject using a deep learning [8]. Deep learning is one part of machine learning techniques that models high-level abstractions in data by utilizing multi-layered architecture with complex nonlinear transformations [9]. However, our BP measurements for individual subjects are drawn from only a small sample size due to limited measurements available for individual subjects, which is a disadvantage when using a deep learning that works best with big data [10]. To address this problem, we first generate artificial features from the original data using a parametric bootstrap technique for the SBP and DBP estimates. This approach can efficiently represent a very complex relationship between the feature data acquired from oscillometric signals and those from target BPs. Then, we utilize the bootstrap technique again to decide the CLs using the estimated target BP for the deep learning technique. Next, we statistically analyze the normality of the BP measurements. We believe this to be one of the first studies using statistical analysis for individual BP estimation as the contributions below: • We propose a new approach for estimating BP parameters that mitigates uncertainties (physiological variability), such as the CLs and standard deviation of errors, using the deep learning technique. To do so, we use a small sample of oscillometric BP measurements and bootstrap techniques for individual subjects. • First, we perform the kurtosis and skewness tests to verify the normality of the BP measurements for individual subjects. • We then use a rank test to analyze the independence between the artificial BPs estimations. Figure 1 shows the introduced methodology and we describe the procedure below. The upper part in the figure shows the first procedure in which the proposed algorithm is trained, and the lower part illustrates the validation procedure. We combine these two parts to estimate SBP and DBP. After bio-signal processing on the BP signals, we obtain the input data from the oscillometric waveform (OMW) signals and envelopes. We then perform the pre-processing as shown in Figure 1. Next, we generate the artificial features from the original features and evaluate the normality of the distribution of all the features. We then build the deep learning model via the two training processes and perform pre-training and fine-tuning using a back-propagation algorithm [8,10]. In the second procedure, we use the unseen data set to evaluate the proposed methodology. Using the deep learning regression model, we estimate the target BP values (SBP and DBP) for the individual subjects. Subsequently, to determine the CLs of the BPs, we generate artificial BPs using target BP values generated using the parametric bootstrap scheme. Finally, we fully verify the normality of the artificial BPs for individual subjects.  In the next Section 2, we explain the experimental data and BP measurement procedures. In Section 3, we briefly describe the oscillometric BP estimation using the deep learning regression model. We describe the statistical analysis of the BP estimation and verify the normality of BP values based on oscillometric measurements in Section 4. In Section 5, we present the results and conclusions.

Data Set
This research has been passed by a research ethics committee and all test volunteers agreed before the BP measurement. The experimental BP data were originally recorded from 85 healthy people with free from cardiovascular disease, age from 12 to 80, 48 man and 37 women as shown in Table 1. We obtained five sets of oscillometric BP measurements from each volunteer using a wrist-mounted blood pressure monitor with the ANSI/AAMI standard protocol [6,11]. We averaged the readings made by two independent specialists (nurses) to yield the SBP and the DBP target value [1]. This procedure was repeated four more times to record five measurements. There was a one-minute break between each measurement. Each volunteer sat comfortably in a chair during BP measurements, which wound the BP cuff around the volunteer's left wrist and could be raised to a heart level. The reference device, auscultatory BP cuff, was wound around the upper left arm to match the heart level. The upper left cuff was bloated around the arm so as to occlude the brachial artery. When the upper cuff was deflated, the blood flow produced Korotkoff (K) signals that could be heard with a stethoscope placed beside the upper left cuff. The first K signal (K1) that was measured in mmHg by a manometer of the upper left cuff, was used to predict SBP, while the fifth signal (K5) was utilized to predict DBP [11]. It was not possible to measure the upper left arm and wrist BP measurements at the same time due to the difficulty of occlusion of brachial arteries through upper left arm sphygmomanometers. Thus, nearly 1.5 (min.) after each signal was acqured by the device of the wrist measurement, two medical staffs concurrently measured SBP and DBP utilizing a sphygmomanometer. In our procedure, we sequentially separated the BP data of the voluntary workers into a learning set (300 measurements obtained from 60 volunteers with five data sets) and validation set (125 data acquired from 25 volunteers with five data sets). Then, we repeated this procedure to ensure that individual volunteers were included only once in the validation procedure. Five BP data in an individual volunteer constitute only a small number of input data in the learning procedure. Thus, we also utilized the artificial data generated from the real data. We used the unseen data to evaluate our methodology. As mentioned above, we obtained our data from the oscillometric waveforms, which we utilized to create artificial data based on the parameter bootstrap technique [12].

Features Obtained from Oscillometric Signals
We extract informative features from the oscillometric waveform signals [4] and build envelopes after processing the signals of the oscillometric BP measurements for estimating the reference BP values. We then utilize these estimates as the reference BPs in the proposed deep learning algorithm. More details regarding these features can be found in [4] for the interested readers.

Artificial Data Obtained Using Bootstrap
Since we have only five BP measurements for individual subjects, we create artificial input data utilizing the bootstrap technique as in [1,12], which is a technique for improving the exactness of estimates using a limited data set for situations in which traditional approaches are ineffective for use in enhancing the exactness [1,12]. Here, we assume X = [x 1 , ..., x N ] to be a random data of the distribution T with unknown values [θ, σ]. We then approximate T by T( θ, σ|X), where we define the average and standard deviation by is approximated by a Gaussian distribution, that is known as the parametric bootstrap technique [12]. We utilize this parametric way to create the artificial features , which we acquire from the real input data X.

KS Analysis for Data
To use the artificial feature distribution with confidence, we fully evaluate the normality of each feature. First, we conduct the KS test to verify the normality of each artificial data distribution. Thus, we suppose that T * is a distribution of an artificial input data [θ * 1 , θ * 2 , ..., θ * N ], where N is the size of replication. We present the probability of measuring the equality between the distribution of artificial input data and hypothesis [11]. Based on the test results, we clearly confirm the distribution of the artificial feature to be a cumulative distribution function (CDF) as shown in Figure 2, which shows the plots of the distribution of artificial features that fit the normal distribution. As the size of N replication increases, the distribution more closely adheres to a normal distribution [1]. If the results of the every artificial input data in KS test set to 0, as represented in Table 2, we can not reject the null hypothesis at the α (=0.05) significance level. Moreover, we note that all p values of the KS test are greater than the α (=0.05) significance level. In addition, if the KS test values ks are greater than the critical values cv, the null hypothesis will be rejected. Thus, we accept the null hypothesis that distributions of the artificial features follow Gaussian distributions [13].  We also validate the consistency and convergence of the artificial features obtained by bootstrap method [12]. Thus, our artificial features should fit well with real features using the bootstrap convergence for the sample mean based on the theorem [15] as where T * represents the conditional distribution based on the bootstrap technique using the real feature X and · ∞ is sup x∈R | · |.
Therefore, it can be confirm that the distribution of [15].
where β is a bias and θ is the real feature. If the bias is close to zero, the estimate is considered to be almostly unbiased. Thus, we calculate the bias and standard error of the artificial feature as follows: where β(θ * (·)) is the prediction error (i.e., bias).
whereŜ * e represents the standard error using the parametric bootstrap andθ . We find that the bias of the artificial features are very small in the exemplary sample and that the standard errorŜ e * acquired from the bootstrap technique more closely approximatesθ thanŜ e =σ, thus the bootstrap technique can be utilized as a good producer for increasing the number of samples for the artificial features as shown in Table 3. Therefore, The artificial features are found to be very close to the real features. We also find that the CLs of the artificial features include all artificial and real features as shown in Table 3. The remaining features represent very similar results as shown in Table 3.

Deep Learning Based Regression
We can see that the deep learning estimator is essentially based on the distributed representation, which implies that we can describe the obtained data by the interactions of various components at different levels [9]. We organize our deep learning estimator in two training procedures, with a pre-learning and tuning with respect to the target BPs. The pre-learning phase consists of deep learning, which calculate the highest layer to the lowest layer. The deep learning is a probability generation model that consists of several hidden layers. The top two layers consist of non-directional connections, whereas the hidden layers compose a top-down acyclic structure, where the units in the lower layer represents the input vectors, which are subsequently connected to two layers known as the restricted Boltzmann machine (RBM) [8]. The RBM is a basic component of the deep learning that contains several hidden layers. Our model of deep learning is given as P(X * , s 1 , s 2 , ..., s l ) = P(X * |s 1 )P(s 1 |s 2 ) · · · P(s l−2 |s l−1 )P(s l−1 , s l ), where the probabilities of the condition layers P(s i |s i+1 ), s i represent the hidden units at layer i, and X * denotes the re-sampled input data. We can define a probability as follow: where Q denotes the partition function, c denotes the bias of the input data, b is the bias of the hidden units, and W is the weight values. Thus, we can define the conditional probability of a layer, given the other by P(s i |s i+1 ) = sigm −c − ∑ n i+1 k=1 Ws i+1 k , where sigm(x) is 1/(1 + exp(−x)), c = [c i j ] and denotes the bias for unit j of layer i, and W = [W i kj ] is the weight matrix for layer i. To connect an input layer with a hidden layer, we use the Gaussian-Bernoulli RBM (GBRBM) [8], because we assume the artificial input data to follow an asymptotic normal distribution. Then, we stack multiple Bernoulli-Bernoulli RBMs (BBRBMs) behind the first GBRBM [8]. Next, we train the second BBRBM by using the hidden layer of the first GBRBM [8]. In the learning procedure, we use pre-learning to initialize weights and biases and use them like an efficient starting point [9] to fine-tune by stochastic gradient descient. We use the minimum mean squared error function to estimate the cost function [16] as follows: where Ω is a loss function; Y * d n (W, c) and Y d n denote the dth hypothesis and target BP data at the sample index n, respectively; D and B represent the size of data and the size of batch, respectively; and (W i , c i ) is the weight and bias values learned at the ith layer. Then, we can iteratively update the estimated weights and bias, as follows: where is a learning rate, η denotes a momentum value, H is the size of hidden layers, and H + 1 is the output layer.

CL Estimation
Our goal in estimating the CLs is to utilize the bootstrap algorithm to identify the uncertainty (physiological variability) and to provide the CLs of five BP estimations obtained from the deep learning regression model for individual subjects. First, we describe the bootstrap principles of the nonparametric and parametric approaches. The basic idea is to generate many artificial BP estimates by resampling real BP estimates, X = [x 1 , ..., x n ], utilizing n independent measurements from an unknown probability distribution T to generate a CL for µ(X). In addition, we assume X to be a random data of the distribution T with unknown parameter [µ, σ]. As such, we sample the artificial sample X * from T( µ, σ|X) using the Monte-Carlo approach, whereby [ µ, σ] is generally the maximum likelihood estimate from X = [x 1 , ..., x n ]. Therefore, when N → ∞, we can approximate a Gaussian distribution as given by F( µ * , σ * |X * ) ∼ = N (µ, σ).
In our approach, we an also obtain the CLs using the bootstrap technique, which was estimated utilizing the BP target values in the deep learning technique. First, the estimated BP (Y * ) values is as given in line 1 of the Algorithm 1. We then acquire two matrices as given by the following: where Equations (8) and (9) obtain using the lines 4 and 5 of below algorithm, here, S and D indicate SBP and DBP, respectively, and * denotes the resampled data from the bootstrap technique. We then vertically calculate each column, as lines 7 and 8 of the algorithm, to acquire the average of each column. We then perform ascending sorts as shown in lines 10 and 11. The sorted values are given as Ξ S * = [ θ S * 1 , θ S * 2 , · · ·, θ S * N ], supposing θ S * α is the 100α th percentile of N bootstrap replications [ θ S * 1 , θ S * 2 , · · ·, θ S * N ]. We obtain the percentile interval θ S * lower , θ S * upper of the 1 − 2 · α, from this bootstrap technique, as follows: Algorithm 1: CL using bootstrap based on deep learning.
End For

End Procedure
Where B is the number of volunteers, and n and N denote the number of measurements and replications, respectively, for individual subjects. Due to the physiological characteristics of individual subjects and the cost of the experiment, it is difficult to obtain many BP measurements. Therefore, we estimate CLs utlizing the bootstrap method based on the deep learning algorithm.

Computing and Testing for Kurtosis and Skewness
To verify the normality of the artificial BP measurements for individual volunteers, we conduct kurtosis and skewness tests [17], and determine whether the distribution of the artificial BP measurements that resample a normal curve is an approximately Gaussian distribution. We compute the artificial sample mean and variance as follows: . We use Equations (11) and (12) to estimate the kurtosis and the standard error of the kurtosis.
We use Equations (13) and (14) to estimate the skewness and the standard error of the skewness as follows: Therefore, we evaluate the normality based on the z-scores for kurtosis and skewness, as given by the following, respectively: where we set α to 0.05, and then z-scores to verify that we have an approximately Gaussian distribution. Based on this outcome, we consider further joint testing for kurtosis and skewness. With normality, the null hypotheses for these situations are expressed as follows: H 0 : kurt( Ξ S * ) = 3 and skew( Ξ S * ) = 0.

Normality Test Using KS
The KS one-sample test is a technique for evaluating the correspondence between two sets of values [13]. The null hypothesis states that the artificial BP measurements have approximately Gaussian distributions. The alternative hypothesis is that the distributions of the artificial BP measurement are not approximately normal. We set the frequency level at α (=0.05). To conduct the KS test, we begin with a decision about the relative empirical distribution F( Ξ S * ), based on the observed artificial measurements. This test can be used to find a two-tailed probability p to determine if the artificial BP measurements are statistically similar or different by utilizing the point at which these two distributions exhibit the greatest divergence [17]. The p value less than the significant level indicates the distribution of artificial BP data that is not Gaussian. The p value greater than the significant level indicates the distribution of artificial BP data that is sufficiently Gaussian.
The KS test is based on the empirical distribution function [13] and uses a cumulative distribution function (CDF) T(x) = P(X 1 ≤ x) (CDF) of the true underlying distribution of the data. Here, to simplify the following equations, we assumed that x ≡ θ S * , X 1 ≡ θ S * 1 and X ≡ Ξ S * . We can define an empirical CDF in the following: where I [−∞,x] (X b ≤ x) represents an indicator equal to 1 if (X b ≤ x) and equality to 0 otherwise, which determines the sample points ratio below level x. The law of large numbers is described such as The ratio of the BP data in the set [−∞, x] is close to the probability of this set.

Theorem 2.
If T(x) continues, the distribution of the least upper bound is not rely on the unknown distribution P in the BP sample: Therefore, this approximation remains uniformly across all x ∈ R, where · ∞ is sup x∈R | · |. The result of Equation (16) demonstrates that the T * N converges on the distribution of T in that the sup-norm of the difference has a probability of zero.

Independence Test Based on Rank
Next, we assume the N bivariate observations Ξ S * = [ θ S * 1 , θ S * 2 , · · ·, θ S * N ] and Ξ D * = [ θ D * 1 , θ D * 2 , · · ·, θ D * N ] for each individual subject (i) to be random resamples from the BP estimation results generated by the proposed approach, as shown in the algorithm. That is, the member of the [ Ξ S * , Ξ D * ] set were mutually independent and identically distributed in the bivariate population. We suppose that T ΞS * , ΞD * are joint distribution functions for the bivariate population of the [ Ξ S * , Ξ D * ] set. Unlike other approaches, the Spearman rank-order correlation method deals with the relationship between two populations [13]. That is, this approach addresses how one population changes with respect to another. We obtain the z-value of a correlation coefficient (CORR) r and the CORR r for a large sample using the Spearman rank-order test [13]. As shown below, the null hypothesis indicates that there is no correlation (independence) between the artificial SBP and DBP measurements.
The level is the frequency set at α(= 0.05), so there is a 95% confidence that the statistical difference of any observed BP measurement will be real and not due to chance. The asymptotic approximation utilizes r's normality. When the null hypothesis is true, we obtain the expected value and variance of r. Thus, the expected value and variance of r are given by E(r) = 0 and var(r) = 1/N − 1 under H 0 , respectively.

Experimental Results and Comparison
We evaluated a protocol-based on BP measurement algorithm to ensure that the mean error (ME) is less than ±5 mmHg and that the error of standard deviation (SDE) is less than 8 mmHg [6]. For all BP measurements according to the British hypertension protocol (BHS) [18], we identified a % of the mean absolute error for three groups: 5 mmHg or less, 10 mmHg or less, and 15 mmHg or less. If 60% of the error measurements of BP algorithm is within 5 mmHg, 85% are within 10 mmHg, and 95% are within 15 mmHg, the algorithm can be classified as grade A.

Statistical Analysis
Based on this configuration in Table 4, the mean errors in SBP and DBP values acquired from the deep learning method were compared with those acquired from the previous algorithms, as shown in Table 5. The error of the standard deviation obtained by the deep learning method was found to be 6.3 mmHg and 5.45 mmHg in the SBP and DBP. These results represent superior performance compared to previous algorithms. The results of the BHS protocol indicate that the deep learning method acquired better BP estimates when compared to the results of the previous methods as presented in Table 6. Table 4. Parameters, [8,9] of the deep learning algorithm.

Number of the Hidden Unit in Three Layers:
[ (11, (32)  According to the overall performance evaluation results, we can conclude that the proposed technique reduces the ME's variance and increases the performance confidence level. As described in the introduction, the CLs of SBP and DBP acquired by deep learning method were narrower than those acquired by the previous methods as shown in Table 7. Although the CLs results obtained by the proposed method were wider than those obtained by [1], we noted differences of 2.0 mmHg and 1.6 mmHg in the SDEs of the CLs for the SBP and DBP, between the deep learning algorithm and the conventional method [1]. Thus, we can argue that the uncertainties of the SDEs were reduced when using the deep learning estimator. To evaluate normality with respect to the individual BP measurements, we statistically analyzed the kurtosis, skewness, KS, correlation test results with their associated standard deviations (std) using the artificial BP measurements based on the results of the deep learning algorithm. Kurtosis is a measure of a population that identifies how flat or peaked it is in terms of a Gaussian distribution. The kurtosis for a Gaussian distribution is 3. Therefore, a kurtosis value greater than 3 indicates a heavy-tailed distribution and a kurtosis value of less than 3 indicate a light-tailed distribution. As shown in Figure 3 and Table 8, the kurtosis results for the SBP and DBP estimations were 2.99 and 3.01, which means that the distributions of the artificial BP measurements were almost normal. The skewness of a population can be represented as measure of its horizontal symmetry with respect to a Gaussian distribution. The skewness of a Gaussian distribution is 0 and symmetric data have a skewness of almost 0. The negative value is skewed to the left and the positive value to the right. We confirmed that the symmetric distributions of the artificial BP measurements were −0.01 and 0.01, respectively, for the SBP and DBP estimations, which indicates that they were very close to 0 for the number of bootstrap replications (N = 1000), as shown Figure 3 and Table 8.  We performed a KS test to verify the each distribution's normality and showed that these were very similar to the normal distribution [19]. As the number of N bootstrap replications became large, the distribution of the artificial BP measurement was close to a Gaussian distribution. We confirmed the distribution of the artificial BP measurement to be a Gaussian distribution based on the test results, which represent the cumulative distribution function (CDF) of the artificial BP measurements versus the theoretical CDF of a Gaussian distribution, as shown in Figure 4.
Moreover, we confirmed that the hypothesis results h for the mean artificial BP measurements were 0 for the SBP and DBP estimations. These results indicate that we accepted the null hypothesis at the 0.05 significance level. In addition, we could not reject the null hypothesis because the ks (=0.02) values were less than the critical values cv (=0.04). Also, our p (=0.78) and (=0.79) KS test values were greater than the α (=0.05) significance level for SBP and DBP, as shown in Table 8. Therefore, we accepted the normality of the distribution of the artificial BP measurements for individual subjects. In addition, we obtained CORR r s (=0.01) and r d (=0.01) values based on the asymptotic normality between the artificial SBP and DBP measurements and acquired variances of 0.0009 and 0.0009 for r s and r d using the independence based on rank as described the last column in Table 7, respectively. Therefore, these results are clearly close to the expected values and variances. Thus, we did not reject H 0 (independence) at the α = 0.05 level.

Conclusions
The main contribution of this paper was our verification of the normality of the BP data, using only five samples, using various statistical methodologies for individual subjects. We clearly determined the independence of the artificial SBP and DBP estimations using the deep learning model based on the distribution-free test of rank. The proposed methodology also provides accurate BP estimations and reduces uncertainties such as the CLs and SDEs, as determined by the deep learning regression estimator. In the near future, we will pursue additional non-normally testing using a new subject populations.