Improving Cuff-Less Continuous Blood Pressure Estimation with Linear Regression Analysis

: In this work, the authors investigate the cuff-less estimation of continuous BP through pulse transit time ( PTT ) and heart rate ( HR ) using regression techniques, which is intended as a ﬁrst step towards continuous BP estimation with a low error, according to AAMI guidelines. Hypertension (the ‘silent killer’) is one of the main risk factors for cardiovascular diseases (CVDs), which are the main cause of death worldwide. Its continuous monitoring can offer a valid tool for patient care, as blood pressure ( BP ) is a signiﬁcant indicator of health and, using it together with other parameters, such as heart and breath rates, could strongly improve prevention of CVDs. The novelties introduced in this work are represented by the implementation of pre-processing and by the innovative method for features research and features processing to continuously monitor blood pressure in a non-invasive way. Currently, invasive methods are the only reliable methods for continuous monitoring, while non-invasive techniques measure the values every few minutes. The proposed approach can be considered the ﬁrst step for the integration of these types of algorithms on wearable devices, in particular on those developed for the SINTEC project.


Introduction
Hypertension remains one of the major risk factors for the development of cardiovascular diseases (CVDs), which are the major cause of mortality in the world [1]. In spite of the well-known and heavily advertised benefits of lowering BP, a large part of the population still have high BP as a leading risk factor for disease and disability, and actually, the numbers keep growing year after year [2]. Chronic hypertension is typified by elevated baseline BP for long periods of time [3]. Consequently, BP monitoring is recommended for the diagnosis of hypertension and the accurate estimation of cardiovascular risk in all adults over 18 years of age [4]. BP is a risk factor that strongly derives from an unhealthy lifestyle, with more than 90% of the cases of arterial hypertension depending on poor nutrition, obesity, and a lack of physical activity. Rising blood pressure in the elderly is associated with structural changes in the arteries and with large artery stiffness [5,6].
Moreover, there is a close correlation between the increase of BP and cardiovascular risk, and all evidence indicates that treating the elderly hypertensive patient will reduce the risk of cardiovascular events [7].
There are two families of methodologies for BP monitoring: invasive and non-invasive, and the typically adopted solutions are as follows [8].

•
For continuous monitoring, the invasive arterial catheter method is used, which has potential risks to patients, such as infection and various vascular damage ( Figure 1).
• For intermittent monitoring, an occluding arm cuff (sphygmomanometer) is used, and BP is obtained either manually (by auscultation of Korotkoff sounds or palpation) or automatically (by oscillometry) [9]. The Holter blood pressure monitor (HBPM) allows for intermittent measurements (for example, once every fifteen or thirty minutes), which might last up to two days [10]. However, clinical situations exist where safe invasive arterial monitoring may be difficult or, on the contrary, it is difficult to measure BP with traditional cuff-based devices [3]. The concepts and methods of cuff-less BP measurement have been debated and studied for decades [11]. Among the most popular methods for continuous cuff-less BP measuring are those based on pulse wave velocity (PWV), pulse transit time (PTT, Figure 2), and pulse wave analysis (PWA) [12]. Overall, PTT has shown great potential and provided the best results in terms of mean absolute error (MEA) and reliability. PTT is defined as the time that the pressure wave takes to move from a proximal to a distal point of the body [13] and can be calculated by processing electrocardiogram (ECG) and photoplethysmogram (PPG) signals [14,15]. Recent improvements in the technology of ECG and PPG allow for BP estimation with reasonable accuracy [13,16]. Previous studies have already identified the correlation between PTT and BP since the 2000s, and machine learning approaches have been attempted to compensate the lack of PTT accuracy [17,18]. Subsequent studies on the bio-mechanical properties of vessels on the self-regulating mechanism of blood flow led to the use of one of the Moens-Korteweg's fluid dynamic laws to directly link PWV to SBP and DBP. In particular, it is possible to derive the pressure values directly from initial calibrations and from the characteristics of the specific subject under examination. Poon and Zhang, through mathematical approximations, estimated the pressure values [19]. To avoid short-and long-term instabilities, repeated calibrations were needed in the process [20][21][22].
The novelty of this work is the improvement of techniques using PTT for BP estimation to obtain a reliable algorithm with a low computational cost, in order to integrate it into the SINTEC devices. In fact, firstly, this work involved the test of the algorithm on ECG, PPG, and ABP signals extrapolated from MIMIC III online database. Once the validation on these signals was verified, the next step was the test of the algorithm on ECG and PPG signals non-invasively acquired from healthy subjects through wearable devices (SHIMMER modules). Finally, due to the promising results, the future goal involves the test of the algorithm on SINTEC devices which are even less bulky than SHIMMERs, since the sensors are integrated into the modules that are adhered to the subject's skin with a patch. Moreover, the connectors are made of a stretchable liquid metal alloy which makes it possible for the subject to carry out any type of activity during the day.

Physiological Signals: ECG, PPG, ABP
Relationships between ECG, PPG, arterial blood pressure (ABP), systolic blood pressure (SBP), and diastolic blood pressure (DBP) are examined in the following sections.

Electrocardiogram
The electrocardiogram (ECG) signal is obtainable in a non-invasive way, and visually represents electrical and chemical cardiac muscle fiber activity during the cardiac cycle. An important role is played by QRS complex, a series of intense upward and downward deflections due to ventricular depolarization generated after atrial depolarization and consisting of three waves, namely Q, R, and S waves [23].
In particular, the R-peaks (reflecting left ventricle depolarization activity) are used in this study ( Figure 3). Having found the time interval, ∆t, between two consecutive R-peaks, it is possible to calculate the heart rate (HR) [24], as:

Photoplethysmogram
Photoplethysmography (PPG) is an optical technique, based on a low-intensity infrared (IR) light sensor, used to detect changes of blood volume in peripheral circulation [25,26]. Considering that light is absorbed more by blood than tissues, changes in the intensity of light can be transduced in blood flow variations. Since the sensor is very sensitive, even small blood volume variations can be detected. The PPG waveform has alternating (AC) and direct (DC) components ( Figure 4) [27]:

•
The AC component represents blood volume cardiac variation in each heartbeat, and it is attributed to the pulsatile behavior of the heart [28].

•
The DC component is highly correlated to central and periphery venous pressure [29].
The average blood volume changes slowly over time, but rapid changes could be caused by several factors, e.g., breathing, presence of a disease, vasomotor activity, sympathetic nervous system activity, and thermoregulation [30].

Arterial Blood Pressure
The arterial blood pressure (ABP) signal is the representation of a pressure wave, moving through the arteries. This wave has different rates of diffusion and morphology, depending on the cross-section of the artery. Physiologically, a pressure wave that spreads through a viscoelastic tube is progressively attenuated with an exponential reduction in speed, but if the tube has different diameters due to the branches into which it divides, there is an amplification of the signal due to reflections. The recording of ABP in clinical settings is performed invasively in the least rigid vessel, the aorta, where reflection is negligible [31]. From this signal, it is in fact possible to extrapolate SBP and DBP, which correspond to the maximum and minimum signal values, respectively.

Methods
It is shown in [32] that PTT is highly correlated with BP. The starting point is the Bramwell-Hills and Moens-Kortweg's equation [33]: where h is the vessel wall thickness, ρ is the blood density, d is the vessel diameter, L is its length, and E is the vessel elastic modulus (or elasticity). In 1991, Leslie A. Geddes discovered that there was an exponential relationship between E and the pressure, P [8], specifically: By replacing Equation (3) in the Bramwell-Hills and Moens-Kortweg's Equation (2), PWV can be written as: Which leads to: and to: The relationship between BP and PTT can thus be simplified as [34]: Chan et al. [19] postulated that, if the variation of d with the BP is negligible and if the change in the arterial wall tone (E 0 ) is slow enough, then the second term of the right-hand side of Equation (7) can be regarded as constant during the observation window, and it is possible that: A linear approximation for Equation (8) was proposed: Since several studies highlight the improvement caused by adding the heart rate (HR) into the equation, the mathematical relationship between BP and PTT becomes [35]: Coefficients a, b, and c are subject-specific parameters and must be obtained through a calibration procedure. This last equation is the linear regression model used in this study. For the evaluation of maximum and minimum BP values, we separated the estimation of ABP into DBP and SBP, as in the following Equation (11): This method was considered because the results obtained were acceptable. However, these results could be improved in particular in the prediction of signals with greater dynamism. The continuity of the signal was used to overcome this problem. As the signal is continuous, therefore, its extractable characteristics are also continuous. The idea was to consider the values in a specific observation window. The length of this observation window was chosen according to a trial-and-error procedure.

Data Collection
In this work, we used signals from two different databases. An algorithm was initially developed that uses the signals present on the MIMIC III online database [36]. After obtaining promising results, we built a second database containing the signals we recorded using wearable devices, overcoming some of the problems related to the MIMIC database, such as noisy signals, signals that are too short, missing signals, etc. The new database was built by recording ECG and PPG signals with SHIMMER [37] wearables and blood pressure values with the OMRON HeartGuide smartwatch [38].

MIMIC III Database
The MIMIC database, the most popular database used for non-invasive pressure estimation, was exploited for this study. Made by the MIT Lab for Computational Physiology, it collects more than 60,000 acquisitions from ICU (Intensive Care Unit) patients. Our choice is justified by the high number of available signals, including ECG, PPG, and ABP, which allowed to implement the system [39]. Signals in the dataset are sampled at 125 Hz. First, it was essential to select only the useful records contained in the database. In MIMIC III, a very large number of recordings are available, but not all of them contain all the signals of interest for this study (ECG, PPG, ABP) and a sufficient number of samples (at least 1 min of recorded signal). Among all of the signals collected (99 in total), we performed a selection based on:

•
Presence of all ECG, PPG, and ABP signals.

•
Presence of peaks or periodicity for more than 5 s. • About 50 s length for each signal.
Through this selection, 61 signals were further processed in order to obtain the peaks and, subsequently, extract HR and PTT features [40]. BP should be measured continuously during diverse activities by means of innovative devices and, in order to improve the performance, we proposed the use of HR and PTT features in a specific window length. The set of ECG, PPG, and ABP signals relating to a single patient, which from now on we will call 'batch', was discarded if one or more of the following conditions were met:

•
Absence of any peaks (leading to erroneous estimate of HR and PTT features) was noticed for more than 1 s. • HR or PTT estimated after this process were physically impossible (e.g., DBP reaches 0 mmHg).

•
Possible lack of synchronization between the signals.
By doing so, a total of 5 batches were removed. Finally, a total amount of 90 recordings (Available online: https://github.com/DanieleRussoGH/Sintec_project, accessed on 27 February 2022) were used for the regression process.   While recording, all the subjects were seated and relaxed. Each individual wore three ECG electrodes, as shown in Figure 7a, to obtain the LA-RA derivation, and the PPG SHIMMER clip on the left index covered by a thick black tie to allow a better adhesion to the skin and to avoid light interference (Figure 7b). Moreover, during measurements, the subjects wore the OMRON HeartGuide (Figure 7c) on the right wrist [38], a control device which returns the SBP and DBP values each minute, that are used in the regression process to calibrate the algorithm.
Signals acquired with SHIMMER modules are sampled at 504.12 Hz and exported to the calculator as .mat files through the software ConsensysPRO v1.6.0-64 bit, while BP values and the hour in which they were returned by the control device are written directly on the calculator as .csv files.

Filtering of Signals
For signals from the MIMIC database, band-pass filtering [42] was applied to ECG and PPG signals (not necessary on ABP signals). The chosen filter was the 5th-order Butterworth filter, with upper, f H , and lower, f L , cutoff frequencies as follows: To obtain the best signals possible, we would have to use two filters, but this would have resulted in a different delay between the signals. This is different for signals from the SHIMMER database. ECG signals are subject only to the baseline removal, while the PPG signals are filtered with a 7th-order low-pass Butterworth filter with the following features: This was performed in order to remove the 50 Hz noise which corrupts the signals and makes them difficult to process. Furthermore, the baseline of PPG signals was removed.

Feature Extraction
Feature selection was employed to identify all of the predictive biomarkers needed for BP estimation. Among these, the correlations between PTT in combination with HR for the assessment of BP are known in the literature.
In addition to the studies presented above, for what concerns the signals in the MIMIC III database, the features collected in a time period equal to the period of about two cardiac cycles (T = 1.5 s) were used. The proposed generic reference formula is an extension of Equation (10): where index i identifies the i-th sample of the signal, while N represents the total number of samples in the period T.
To extract PTT, it is necessary to identify the R-peaks of the ECG and the systolic peaks of the PPG (see Figure 2). Signals must be synchronized to the millisecond, but this requirement is fulfilled in both databases [43].

Extraction for MIMIC III Database
Extraction of the PPG peaks (S-peaks) is not easy, since each signal of the MIMIC III database has a different amplitude, probably due to the inter-subject variability of the PPG [44]. The software was written in Python, and the method scipy.signal.find_peaks() was used to detect peaks. The difficulty was to make the software usable with signals of any shape. To do so, signals were normalized, but the search for the S-peak (S p ) of the PPG led to the identification of multiple erroneous peaks. To ensure that only the S-peaks were detected, the kernel density estimation (KDE) of amplitude of the peaks was calculated and plotted ( Figure 8). If, as in the figure, there are two peaks in the distribution, the program identifies the minimum of the two values and only keeps the peaks above that value of amplitude. Then, SBP was extracted as the maxima of ABP, whereas DBP was obtained as ABP minima, as shown in Figure 9c.
To extract PTT, it is necessary to identify the R-peaks of the ECG and the systolic peaks of the PPG (see Figures 2). Signals must be synchronized to the millisecond, but this requirement is fulfilled in both databases [43].

Extraction for MIMIC III Database
Extraction of the PPG peaks (S-peaks) is not easy, since each signal of the MIMIC III database has a different amplitude, probably due to the inter-subject variability of the PPG [44]. The software was written in Python, and the method scipy.signal.find_peaks() was used to detect peaks. The difficulty was to make the software usable with signals of any shape. To do so, signals were normalized, but the search for the S-peak (Sp) of the PPG led to the identification of multiple erroneous peaks. To ensure that only the S-peaks were detected, the kernel density estimation (KDE) of amplitude of the peaks was calculated and plotted ( Figure 8). If, as in the figure, there are two peaks in the distribution, the program identifies the minimum of the two values and only keeps the peaks above that value of amplitude. Then, SBP was extracted as the maxima of ABP, whereas DBP was obtained as ABP minima, as shown in Figure 9c.

Extraction for SHIMMER Database
For signals in the SHIMMER database, instead, ECG and PPG thresholds in scipy.signal. find_peaks() functions were changed for each subject to obtain a better detection of R-peaks and S-peaks. This was performed in particular for PPG signals, because of its variability depending on the subject (skin conductance variability) and on the sensitivity of the recording method (SHIMMER clip). Examples are shown in Figures 10 and 11.

Regression Process
Signals extracted from MIMIC III and SHIMMER databases were used for BP retrieval. In both cases, the regression process has been structured in two phases (training and testing [45]), with some differences (see . In particular, while the regression process with signals from the MIMIC III database uses a sliding window (WND) (Figure 13 and Equation (12)), the regression methods with signals from the SHIMMER database were simply based on Equation (11). Equation (12) is more suitable for the processing of continuous signals (e.g., ABP), so when using it on the signals recorded by SHIMMER devices, the error values are higher. We obtained better results using Equation (11).   In both cases, HR and PTT values were extracted in the same way. HR was calculated through the evaluation of the difference between each couple of consecutive R-peaks' indices, and then it was divided by the value of the sampling frequency to obtain the value expressed in seconds (Equation (13)).
Concerning the PTT values' extraction, they were evaluated by processing ECG and PPG signals and comparing the positions of their peaks. In particular, based on the assumption that each R-peak will be followed by an S-peak, only those values maintaining this pattern are kept as valid.
In order to remove outliers possibly caused by malfunctioning of the hardware, in the case of the signals from the MIMIC III database, the HR and PTT signals were divided into ten different windows, and for each of them the mean and the standard deviation (SD) were evaluated. All points located out of the range of mean ± SD were substituted with a blank space and then they were interpolated thanks to the Python pandas.interpolate (method = 'polynomial', order = 5) function (order = 5 for HR, order = 1 for PTT). Figure 13, starting from the beginning of the signal, five adjacent windows were considered, and every two adjacent blocks, a window in the middle was also taken into account.

As shown in
On the other hand, in the case of the signals from the SHIMMER database, all points located out of the range of mean ± SD evaluated over the whole HR and PTT signals were substituted with a blank space and then they were interpolated thanks to the Python numpy.interp() function.
The complete scheme is reported in Figure 15, in which the whole approach for this work is described focusing on how the parameter extraction procedure was performed.

Results
Four different algorithms, from the scikit-learn Python library, were used for the estimation of both DBP and SBP values: linear regressor, ridge regressor, support vectors regressor (SVR), and random forest regressor. Each algorithm was tuned to obtain the best fitting of the data in the training phase.
For the 90 patients from the MIMIC III database, the first 75% of the sample (about 40 s of signal) for each batch was used for model construction, and the other 25% (about 10-15 s of signal) for testing the algorithm (Figure 16). For each signal from the SHIMMER database, the first 75% of the samples (about 15 min of signal) were used to train the models, and the last 25% (about 5 min) to test them ( Figure 17). The signals in Figure 17 may not seem smooth and, at some points, they seem not to follow the trend of the real signal very well. This happens because the OMRON HeartGuide does not return continuous signals (as in Figure 16) but punctual BP values every minute. Thus, after interpolating these reference values, the real BP signals may seem "edgy".

Testing
Prediction ability was evaluated by exploiting the mean absolute error (MAE) value [46][47][48]. In the current work, all algorithms tested on the 90 patients from the MIMIC III database and on the 50 measurements from the SHIMMER database exhibited an average error value below 5 mmHg for both DBP and SBP values [49] (Table 1).

Selection for MIMIC III Database
Error assessment does not highlight meaningful differences among algorithms, and thus other parameters need to be considered for algorithm selection. From a first glance, it can be seen that linear, random forest, and ridge regressors are able to follow the trend of the signal testing phase, being able to recognize the variability, while SVR tends to maintain an average trend ( Figure 16). Besides, SVR and random forest regressors demonstrated greater stability when the input feature values were quite dissimilar to the previous ones. The ability to generalize was assessed by time-series-split cross-validation. Five iterations steps have been constructed. The training set fold consists of a number of samples equal to 15% of the batch in the first iteration. At each iteration, it increases by 15% until it reaches 75% of samples (maintaining the chronological order). Instead, the size of the test set remains fixed at 15% of the batch (about 9 s): the test set samples follow those of the training set in each iteration. Table 2 reports the obtained results. According to stability observation, error evaluation, and cross-validation results, the best model for this application is the random forest regressor (see Table 3).  Table 1 shows that linear and ridge regressors are equally valid methods for signals from the SHIMMER database, because they return the same MAE and SD values for each prediction. If a choice must be made between the two regression methods, the linear regressor ( Figure 18) could be chosen because it is characterized by a shorter computational time than the ridge one. In fact, the latter could be considered a regularization of the linear regressor, as it introduces the regularization hyperparameter which keeps the learning weights of the function as low as possible; thus, the decision time of the algorithm increases [50]. The random forest regressor presented the highest SD values, while SVR showed the lowest SD values, but the predicted trend did not follow the real one very well ( Figure 19). For these reasons, both regressors were discarded, and the linear regressor was used as the best model for signals from the SHIMMER database, according to the guidelines referring to a reduced number of measurements. It is important to highlight that the instrumental error added by the OMRON HeartGuide device must be taken into account. The OMRON HeartGuide device has an accuracy equal to ±3 mmHg. According to the error propagation theory, to comply with the AAMI guidelines, it will not be sufficient to fall below 5 mmHg, but the 3 mmHg inserted by the smartwatch must also be considered. The limit will therefore be equal to 2 mmHg of maximum acceptable error (see Table 4).

Conclusions
In conclusion, the proposed approach can be considered the springboard for the integration of these types of algorithms on wearable devices. The novelty of our approach is represented by:

•
The innovative way for the removal of outliers (possibly caused by malfunctioning of the hardware and/or the malpositioning of the SHIMMER sensors) using SD for signals from both databases.

•
The efficient and tailored method for the identification of peaks by KDE for signals from the MIMIC III database.

•
The use of a sliding window with a time period equal to the period of about two cardiac cycles (T = 1.5 s) for signals from the MIMIC III database.

•
The use of ECG and PPG signals acquired from healthy subjects with wearable (SHIM-MER) devices.
The combination of the described approaches led to the ability to estimate BP with a better precision with respect to state-of-the-art algorithms; in fact, the results fulfilled the AAMI guidelines with a MAE < 5 mmHg (signals from the MIMIC III database) or MAE < 2 mmHg (signals from the SHIMMER database), and SD < 8 mmHg [46], even when the variability of parameters was high ( Table 1).
The next step, as the authors described in the introduction, will involve the test of this algorithm on the signals recorded through modules developed with the European SINTEC project [51]. With the development of wearable devices that measure BP non-invasively, it is hoped that the number of people who continuously measure their blood pressure will increase. In this way, it could be possible to prevent the onset or degeneration of CVDs which are still the leading cause of mortality in the world [32]. These diseases are often triggered by hypertension, a pathology which, despite the widely known risks, afflicts an increasing number of people. It is hoped that by providing a convenient and reliable means of monitoring, the number of people with hypertension will decrease.
The results obtained in this work show that the proposed algorithm for the cuff-less estimation of the BP can potentially enable mobile devices to constantly monitor the BP under different conditions.  Institutional Review Board Statement: All subjects provided their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, by which all the experiments of the SINTEC project, whose identification code is 824,984, are regulated.

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement:
The authors provide the codes and material used at the following links: MIMIC III database: https://github.com/DanieleRussoGH/Sintec_project (accessed on 27 February 2022). SHIMMER database: https://github.com/SOFIAGALICI/BP_Evaluation_Shimmer (accessed on 27 February 2022). received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 824984. We acknowledge all the participants in the experimentation for their availability and collaboration.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: