PPG-Based Systolic Blood Pressure Estimation Method Using PLS and Level-Crossing Feature

: This paper proposes a cu ﬀ -less systolic blood pressure (SBP) estimation method using partial least-squares (PLS) regression. Level-crossing features (LCFs) were used in this method, which were extracted from the contour lines arbitrarily drawn on the second-derivative photoplethysmography waveform. Unlike conventional height ratio features (HRFs), which are extracted on the basis of the peaks in the waveform, LCFs can be reliably extracted even if there are missing peaks in the waveform. However, the features extracted from adjacent contour lines show similar trends; thus, there is a strong correlation between the features, which leads to multicollinearity when conventional multiple regression analysis (MRA) is used. Hence, we developed a multivariate estimation method based on PLS regression to address this issue and estimate the SBP on the basis of the LCFs. Two-hundred-and-sixty-ﬁve subjects (95 males and 170 females [(Mean ± Standard Deviation) SBP: 133.1 ± 18.4 mmHg; age: 62.8 ± 16.8 years] participated in the experiments. Of the total number of subjects, 180 were considered as learning data, while 85 were considered as testing data. The values of the correlation coe ﬃ cient between the measured and estimated values were found to be 0.78 for the proposed method (LCFs + PLS), 0.58 for comparison method 1 (HRFs + MRA), and 0.62 for comparison method 2 (HRFs + MRA). The proposed method was therefore found to demonstrate the highest accuracy among the three methods being compared.


Introduction
Hypertension is known as a risk factor of serious diseases such as stroke, myocardial infarction, and chronic kidney diseases [1]. Management of the blood pressure (BP) by daily monitoring can help improve an individual's lifestyle. However, a cuff-based BP measurement method is inconvenient because the measurement requires compression of the artery by the cuff. Hence, cuffless BP estimation has recently attracted attention [2][3][4]. There has been considerable effort to develop cuffless BP estimation methods over the years, and one of them is based on the photoplethysmography (PPG) and electrocardiography (ECG) signals. However, these BP estimation methods require two types of sensors to measure the ECG and PPG sensors, which increases the manufacturing costs of these devices. As the simplest method, studies have been carried out on developing cuffless BP estimation methods based only on PPG waveforms [5].
In PPG, the skin is irradiated by light from a light-emitting diode (LED) source. The PPG waveforms are obtained by converting the light transmitted or reflected to the PPG sensor, and the absorption characteristics of the PPG waveforms correspond to the blood volume in the blood vessels, particularly changes in the amount of hemoglobin [6]. PPG is considered as a useful physiological signal for cuffless BP measurement due to its inexpensiveness and simplicity of measurement. 2

of 11
Takazawa et al. reported two cases of the change in the second-derivative PPG waveform. One is the change in the second-derivative PPG waveform by medication with an antihypertensive agent and a vasopressor. The second is based on epidemiological studies of the presence or absence of several cardiovascular diseases including hypertension and the second-derivative PPG waveform [7]. Therefore, the BP can be estimated by extracting various features from the waveform [8][9][10][11].
However, it is difficult to extract features from PPG waveforms owing to variations in the baseline and poor undulation of the waveform [12]. Hence, the first derivative of the photoplethysmography (FDPPG) and the second derivative of the photoplethysmography (SDPPG) are often used. Since the SDPPG has several characteristic peaks, it is often used instead of PPG. Figure 1 shows examples of (a) PPG, (b) FDPPG, and (c) SDPPG waveforms. The relative heights of the peaks of the SDPPG waveform are typically used as features for estimating the BP [8]; this set of features is called the height ratio feature (HRF). However, the peaks of the SDPPG waveform are prone to noise, and it is particularly difficult to extract these features for low peak heights [13,14]. Figure 2a shows a typical SDPPG waveform, which consists of five waves, named successively as waves a, b, c, d, and e. These are used as the HRFs. Although clear peaks appear in the SDPPG waveforms of young adults with flexible blood vessels (such as that shown in Figure 2a), this is not the case for older adults who have lost flexibility in their blood vessels because of aging. In the latter case, there may be no clear undulations in the SDPPG waveforms, and/or the successive downstream peaks (waves c, d, and e) may not be present in the waveforms after filtering for noise reduction. Figure 2b shows an example of a loss of waves c and d. The disappearance of these peaks causes loss of the corresponding HRFs.     Thus, in this study, we used level-crossing features (LCFs) [15,16] because they can be reliably extracted from the SDPPG waveforms unlike the HRFs, which are dependent on the peaks of the SDPPG waveforms. In our proposed method, contour lines are drawn at arbitrary intervals on the SDPPG waveform, and the number of crossings between each contour line and the waveform and the total length of each contour line lying within the waveform are extracted as the LCFs. The SDPPG waveform in Figure 3 is the same as Figure 2b, and waves c and d are missing; however, the LCFs can also be obtained from this waveform. Further, in our previous study, the correlation between the systolic blood pressure (SBP) and the LCFs was confirmed [16]. However, research on blood pressure estimation by LCF has not been done yet. The object of this study is to confirm the effectiveness of the SBP estimation method using LCFs. Photonics, Shizuoka, Japan) with a wavelength of 660 nm ( Figure 3). High-frequency noise was removed using a first-order low-pass filter and 16th-order finite impulse response (FIR) filter with a cutoff frequency of 10 Hz. The SDPPG was calculated from the recorded PPG.   Even though we can extract the LCFs from these contour lines, the features extracted from adjacent contour lines tend to show similar trends; therefore, there is a strong correlation between these features. This issue is known as multicollinearity, which will hinder accurate estimation using conventional multiple regression analysis (MRA). Partial least-squares (PLS) regression [17] is one method that can be used to address this issue in multivariate estimation models, and it has been successfully used in various disciplines. Thus, in this study, we used PLS regression to establish a BP estimation model based on LCFs, which are prone to multicollinearity.

PPG and SBP Measurement
Two-hundred-and-sixty-five healthy subjects participated in this study. We obtained approval from the Kanai Hospital ethics committee and informed consent from the subjects after we explained the purpose of the study. The summary statistics are expressed as mean ± standard deviation (SD). The mean SBP, mean blood pressure (MBP), mean diastolic blood pressure (DBP), and mean age of the subjects were found to be 133.1 ± 18.4 mmHg, 104.5 ± 13.9 mmHg, 79.7 ± 11.5 mmHg, and 62.8 ± 16.8 years, respectively. The mean height, mean weight, and mean body mass index (BMI) of the subjects were found to be 157.3 ± 7.1 cm, 57.1 ± 10.1 kg, and 36.3 ± 5.8 kg/m 2 , respectively. The height and weight data were taken from subjects willing to disclose them.
We measured the SBP using an automatic BP monitor (UDEX-i, Canon Incorporated, Tokyo, Japan) at the left upper arm after the subject sat for 5 min at rest. The PPG waveform was obtained from the right index finger immediately after the BP measurement, and sampling was performed in 5-ms intervals for 20 s. The homemade PPG sensor used in this study consists of a red LED (LS T67B, OSRAM Opto Semiconductors, Regensburg, Germany) and a photodiode (S2387-16R, Hamamatsu Photonics, Shizuoka, Japan) with a wavelength of 660 nm (Figure 3). High-frequency noise was removed using a first-order low-pass filter and 16th-order finite impulse response (FIR) filter with a cutoff frequency of 10 Hz. The SDPPG was calculated from the recorded PPG.

Feature Extraction
In order to reliably obtain features from the SDPPG waveform, we propose the use of LCFs extracted from the SDPPG waveforms, and we apply these features in PLS regression to estimate the SBP. By assuming that the maximum amplitude of the waveform is 100% and by drawing contour lines at arbitrary intervals in the waveform, we can obtain two types of LCFs: (1) the number of crossings between each contour line and a waveform and (2) the total length of each contour line lying within the waveform. The procedure used to extract the features is described as follows.
Step 1. Record the PPG signal and compute the SDPPG.
Step 2. In the negative direction of time from the point of a wave, detect the zero crossing point first as the endpoints of each SDPPG. Segment the SDPPG waveform corresponding to each pulse by those endpoints, and standardize the waveform on the basis of the positive maximum value of each pulse (i.e., a wave height).
Step 3. Draw contour lines at arbitrary intervals on the standardized SDPPG waveform with a maximum positive value of 1. Extract the crossings (LCF 1) and the total length of the contour line (LCF 2) from all of the contour lines drawn on the waveform. Figure 4 shows a representative example of the LCFs extracted from the standardized SDPPG waveform. In our method, it is possible to extract features reliably from an SDPPG waveform with missing peaks because the LCFs are extracted from the contour lines rather than the peaks. When there are no significant changes in the upper part of the SDPPG waveform, the LCFs will have similar trends. Therefore, there is a strong correlation between the LCFs.
On the basis of a previous study related to the SDPPG [16], we extracted the LCFs from the contour lines drawn at 5% intervals from an amplitude of −100% to 100%. We deleted the features that have the same value for all subjects. The total number of LCFs after data processing was 76. The LCFs were extracted for each pulse in the SDPPG waveform for a subject, and the mean values of these features were determined on the basis of the number of pulses recorded for the subject. These mean values were taken as the LCFs for the subject. The SBP was estimated using these mean values of the LCFs. Similarly, mean values were used for the HRFs.

Feature Extraction
In order to reliably obtain features from the SDPPG waveform, we propose the use of LCFs extracted from the SDPPG waveforms, and we apply these features in PLS regression to estimate the SBP. By assuming that the maximum amplitude of the waveform is 100% and by drawing contour lines at arbitrary intervals in the waveform, we can obtain two types of LCFs: (1) the number of crossings between each contour line and a waveform and (2) the total length of each contour line lying within the waveform. The procedure used to extract the features is described as follows.
Step 1. Record the PPG signal and compute the SDPPG.

BP Estimation
PLS regression is one method that can used to address the multicollinearity issue in multivariate estimation models. Thus, we used PLS regression to develop a BP estimation model based on the LCFs of the SDPPG waveform. In PLS regression, the LCFs are converted into independent (uncorrelated) scores, and the estimation model is developed on the basis of these scores. These scores are linearly independent of each other and calculated to reduce the regression residual. The PLS regression model for estimatingŶ is as follows.Ŷ = XW (1) whereŶ is the estimated objective variable vector (SBPs) of size N × 1 (N is the sample size), X is the centralized explanatory variable matrix (LCFs) of size N × M (M is the number of features), and W is a coefficient matrix of size M × 1.
In order to assess the accuracy of our method in estimating the SBP, we compared our method (LCFs + PLS) with two similar methods: (1) comparison method 1 (HRFs + MRA) and (2) comparison method 2 (LCFs + MRA). In comparison method 1, we extracted the HRFs from the SDPPG waveforms of the subjects. As mentioned previously, the HRFs cannot be extracted in the absence of peaks. In these cases, we determined the mean value of a feature on the basis of the number of pulses without missing data within a single record. In comparison method 2, we extracted the LCFs from the SDPPG waveforms. We used MRA for both comparison methods to estimate the SBP. The two comparison methods used here are used for the purposes of showing that LCFs are superior to HRFs and showing that the matching of LCFs and PLS is excellent.
We evaluated the accuracy of these estimation models by comparing the estimated SBP values with those obtained from the BP measurements at the upper arms of the subjects. The accuracy of the models was evaluated on the basis of the correlation coefficient (R value) between the measured and estimated values and the standard error of the estimate. We used the data collected from 180 subjects as the training data for BP estimation models. The testing data used to evaluate the estimation accuracy were collected from 85 subjects, and these data were not used to develop the estimation model.

Results
In our method, the PLS regression model was developed on the basis of six scores, which was the optimum number of scores determined after applying Wold's R criterion [18] to the training data for 180 subjects. This PLS regression model consists of mean values for centering the LCFs and the regression coefficients. This PLS regression model was provided in Table A1 of Appendix A. The conventional MRA model was established for comparison method 1, considering that there were only four features (b/a, c/a, d/a, and e/a). In addition, the correlation coefficients of the features were not significantly high: the highest correlation coefficient was 0.631. An MRA model was established for comparison method 2 on the basis of stepwise regression using the Akaike information criterion [19]. Owing to the large number of LCFs (76), it makes the calculation of the stepwise method difficult. For this reason, we selected 20 LCFs beforehand, where the absolute value of the correlation coefficient between the features and the SBP was greater than 0.3. Finally, the number of LCFs used in this stepwise method was five. We selected these 20 LCFs as the initial state of the explanatory variable set and selectively calculated the explanatory variable by a forward-backward stepwise selection method with the Akaike information criterion. Finally, five LCFs were selected as the explanatory variables. Figure 5 shows scatterplots of the measured and estimated BPs obtained from our method (LCFs + PLS), comparison method 1 (HRFs + MRA), and comparison method 2 (LCFs + MRA). The correlation coefficients between the estimated and measured values are 0.78, 0.58, and 0.62 for our method, comparison method 1, and comparison method 2, respectively. The standard errors of the estimate are 9.09, 11.76, and 11.20 mmHg for our method, comparison method 1, and comparison method 2, respectively. On the basis of the results, it can be deduced that our method gives the highest estimation accuracy, followed by comparison method 2, and least of all, comparison method 1. Figure 6 shows the results of a Bland-Altman (BA) analysis [20], which we performed to determine the systematic errors of the regression models. Table 1 summarizes the correlation coefficients of the BA plots and the mean and SD × 1.96 of the vertical axis of the BA plots. Even though all of the regression models have negative correlation coefficients, our estimation model has the weakest correlation. The broken line at the center of the three broken lines drawn in each plot in Fig. 6 is the mean of this difference. This mean shows the bias between the SBP measured by the cuff and the SBP estimated from PPG. The bias between the measurement of the SBP by the cuff and PPG is very small in any case. Moreover, the broken lines above and below each plot represent the limit of agreement (LOA). This LOA was determined by ±SD × 1.96 of the difference above and below, centered on this average. A narrower range for the LOA means that the BP estimation method works more equivalently to the cuff measurement method. The upper side of each LOA was 17.64, 23.04, and 20.88 mmHg. The lower sides were −18.19, −23.33, and −23.31 mmHg.
The total number of subjects was 265; therefore, the SDPPG was determined for 5919 pulses on the basis of 20-s measurements for each subject. On the basis of waves a, b, c, d, and e in the SDPPG waveform, the number of pulses in which one or more HRFs were missing was 312. Hence, the percentage of cases in which features could not be extracted was 5%. The percentage of pulses in which all of the HRFs within one record could be extracted was less than 20% for four subjects. In these cases, there were no missing LCFs.

Discussion
The results suggest that our method gives a better estimation accuracy. The explanatory ability of the BP estimation model in comparison method 1 is weak owing to the small number of HRFs (4), which we believe leads to the low estimation accuracy. Even though seven LCFs were selected in comparison method 2, these features comprise those with relatively strong correlations, which we believe reduces the explanatory ability of the SBP estimation model.
In this study, we obtained PPG data for 20 s from 265 subjects. Among them, 20% of the HRFs could not be acquired in some cases. In contrast, all LCFs have been acquired. In other words, the proposed LCFs are considered to be superior to HRFs in both accuracy and robustness.
We evaluated the BP estimation accuracy of the proposed method with the standard of the British Hypertension Society (BHS) [21]. Table 2 summarizes the BHS BP grading system. In the BHS standard, the BP value by auscultation is used as the reference BP value, but in this study, the SBP value by the automatic oscillometric BP monitor was used as an alternative. On this basis, we referred to the estimation accuracy of the SBP of the proposed method to the BHS standard. With the proposed method for the tested subjects, the cumulative percentage of subjects whose SBP absolute error was within 5 mmHg was 38%, at 10 mm Hg was 78%, and within 15 mm Hg was 91%. As a result, the grade of the proposed method is D. Therefore, the SBP estimation method proposed in this paper is not suitable for clinical use. However, the proposed LCFs show a better result than the conventional HRFs. LCFs are thought to be used instead of HRFs in future PPG-based cuffless BP estimation methods. From this viewpoint, we compare the proposed method with previous cuffless BP estimation methods. First, we will describe the features of our proposed method. Calibration is unnecessary before SBP estimation, and information other than PPG (height, weight, age, and sex) is unnecessary for estimating the SBP. In the BP estimation method of Shin et al. [9], the correlation between the BP and the proposed index, which is a combination of the PPG features and the height of a subject, was analyzed. The correlation coefficient between the index proposed in Shin et al. [9] and the SBP is 0.826, which is higher than the result of this study (0.78). However, in Shin et al. [9], a correlation analysis was performed within a single dataset. On the other hand, in this study, the accuracy (correlation) was checked by dividing the training data and testing data. A cuffless BP estimation method that constructs an estimation model for each individual was proposed in Lin et al. [10]. However, this method requires periodic BP calibration. In Choudhury et al. [11], a cuffless BP estimation method based on the Windkessel model was proposed. In this method, the two parameters of Windkessel model are estimated using PPG features. The number of PPG waveform features used in this method was relatively small-four. In our proposed method using PLS regression, six scores (i.e., combined LCFs) related to the SBP were obtained. These scores may improve the estimation accuracy using a small number of features, as in Choudhury et al. [11].
As a limitation, the PPG data were acquired in this study in the sitting posture at rest. The acquisition site was the fingertip. Therefore, it cannot be used for acquisition during exercise. There is no study on implementations on smart watches etc. worn on the wrist.

Conclusions
In this study, we proposed a BP estimation method based on PPG waveforms to realize stress-free BP measurements. In this method, LCFs were extracted from SDPPG waveforms, and PLS regression was used to establish a model to estimate the SBP on the basis of these features. We compared the estimation accuracy of our method with those of HRFs + MRA and LCFs + MRA. The results confirmed that our method has a superior estimation accuracy. Although the SBP estimation method proposed in this study has not reached an accuracy sufficient for clinical use, only the LCFs are used, and we did not use any other information (gender, body characteristics, etc.). In order to improve the accuracy of the proposed method, it is conceivable to use the information of subjects other than LCFs.

Conflicts of Interest:
The authors declare no conflict of interest. (1) −70%