Estimating Blood Pressure from the Photoplethysmogram Signal and Demographic Features Using Machine Learning Techniques

Hypertension is a potentially unsafe health ailment, which can be indicated directly from the blood pressure (BP). Hypertension always leads to other health complications. Continuous monitoring of BP is very important; however, cuff-based BP measurements are discrete and uncomfortable to the user. To address this need, a cuff-less, continuous, and noninvasive BP measurement system is proposed using the photoplethysmograph (PPG) signal and demographic features using machine learning (ML) algorithms. PPG signals were acquired from 219 subjects, which undergo preprocessing and feature extraction steps. Time, frequency, and time-frequency domain features were extracted from the PPG and their derivative signals. Feature selection techniques were used to reduce the computational complexity and to decrease the chance of over-fitting the ML algorithms. The features were then used to train and evaluate ML algorithms. The best regression models were selected for systolic BP (SBP) and diastolic BP (DBP) estimation individually. Gaussian process regression (GPR) along with the ReliefF feature selection algorithm outperforms other algorithms in estimating SBP and DBP with a root mean square error (RMSE) of 6.74 and 3.59, respectively. This ML model can be implemented in hardware systems to continuously monitor BP and avoid any critical health conditions due to sudden changes.


Introduction
Measuring blood pressure (BP) is an important aspect in monitoring the health of a person.High blood pressure, generally, means that a person has a higher risk of health problems [1].High blood pressure puts a huge amount of strain on the arteries and the heart.This strain can make the arteries less flexible over time.As they become more inflexible, the lumen becomes narrower.Therefore, the probability of it being clogged up (clot) increases.A clot is very dangerous and may cause heart attack, stroke, kidney diseases and dementia.As a result, it is important for a person to monitor their blood pressure regularly.In most cases, measuring blood pressure once or twice a day is more than enough.
Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensorsHowever, sometimes the doctor needs to track the blood pressure continuously.This is because blood pressure is known to decrease at night.So, it is useful to measure the blood pressure overnight, as an abnormal dip in blood pressure may suggest a higher risk of cardiovascular problems [2].
The current standard methods include either a cuff based BP measurement or an invasive procedure for BP measurement.The cuff method measures the blood pressure after a set interval (e.g., of 15 minutes).This means that the end-result is discrete and uncomfortable to the user.Furthermore, this process requires the arm to be kept steady while the inflation and deflation causes disturbance in patient's sleep.Arterial lines management is an invasive procedure that allow for continuous blood pressure monitoring.However, the invasive procedure leaves the patient vulnerable to infection.Hence, there is a need for a non-invasive, cuff-less, continuous BP monitoring system.With the advent of digital sensors, signal-processing, machine-learning algorithms and advanced physiological models help to gather important human vital signs using wearable sensors [3,4].Even the indirect estimation of blood pressure (BP) using Photoplethysmography (PPG) has become more realistic [5][6][7][8].
Photoplethysmography (PPG) was being used for decades for measuring the amount of light absorbed or reflected by blood vessels in the living tissue.PPG technology is a versatile and low-cost technology [9], which can be extended to different aspects of cardiovascular surveillance including identification of blood oxygen saturation, heart rate, BP estimation, cardiac output, respiration, arterial ageing, endothelial control, micro-vascular blood flow, and autonomic function [10].Many different kinds of PPG signals have been identified and have been shown associated with age and cardiovascular pathology [11,12].In clinical practice, PPG signals are recorded from micro-vascular beds at exterior body locations, such as the finger, earlobe, forehead, and toe [13].The coverage area of the PPG sensor includes veins, arteries and numerous capillaries.PPG waveforms generally have three distinct features.As shown in Figure 1, a PPG waveform typically contains systolic peak, diastolic peak and a notch in between.The raw PPG signal typically includes pulsatile and non-pulsatile blood volumes [14].The pulsatile portion of the PPG signal is attributed to the variation in blood pressure within the arteries and is synchronous to the pulse, while the non-pulsating part is a result of normal blood volume, respiration, sympathetic nervous system, and thermoregulation [15].Green, red and infrared light are often used to extract PPG waveforms.Red and infrared light can reach approximately 2.5 mm, whereas green light can penetrate less than 1 mm into the tissue [16].Therefore, infrared light is typically used for acquiring PPG signal for the measurement of blood pressure.Although the PPG tool is a low-cost and portable optical Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensorselectronic device, its measurement has several challenges, such as, noise reduction [17][18][19] and multiphotodetector creation [20].Several techniques to estimate BP from PPG were proposed in the recent works.Some algorithms [21] incorporate waveform analysis and biometrics of PPG to estimate BP, which has been tested in subjects with different age, height and weight.When calibrated, PPG shows great potential to track BP fluctuations, which can bring enormous health and economic benefit.An easy and bio-inspired mathematical model was proposed at [22] to predict estimating Systolic BP (SBP) and Diastolic BP (DBP) through careful mathematical analysis of the PPG signals.Systolic and diastolic blood pressure levels were predicted using Pulse Transit Time (PTT) in [23,24] and combination of Paroxysmal Atrial Tachycardia (PAT) and heart rate in [25], while combination showed improvement over PTT alone.The beat-to-beat optical BP measurement method was developed, tested and reported using only PPG from fingertips [26].Key features such as amplitudes and cardiac part phases were extracted through a fast Fourier transformation (FFT) and used to train an artificial neural network (ANN), which was then used to estimate BP using PPG.In [27], support vector machine (SVM) algorithm showed better accuracy than the linear regression method and ANN.
The recent growth in the field of deep learning has made it potential for this application.Su et al. 2018 [28] discussed the problem of accuracy reduction in the current models for BP estimation from PPG due to the requirement of frequent calibration.A deep recurrent neural network (RNN) with long shortterm memory (LSTM) was used to create a model for the time-series BP data.PPG and Electrocardiogram (ECG) were taken as inputs, and PTT with some other features were used as predictors to estimate BP.This method showed improvements in BP prediction compared to other existing methods.Gotlibovych et al. investigated the potential of using raw PPG data to detect arrhythmia in 2018 [29] with reasonable success, which shows the possibility of using raw PPG signal as inputs to the deep learners.In [30], the authors have created a novel spectro-temporal deep neural network that took the PPG signal and its first and second derivative as inputs.The neural network model had residual connections and were able to get mean absolute error (MAE) of 6.88 and 9.43 for DBP and SBP, respectively.
Several research groups have analyzed and evaluated the quality of the open-source dataset, which was used in this study [18,[30][31][32].A novel approach [33] for treating hypertension based on the theory of arterial wave propagation and morphological theory of PPG was proposed to check the physiological changes in different levels of blood pressure.ECG and PPG signals were obtained simultaneously to detect hypertension.A model for PPG characteristic was analyzed and an inherent relationship between the characteristics of Systolic BP and PPG was established [34].In [35], PPG signal analysis was used to characterize obesity, age group and hypertension using PPG pulse based on the pulse decomposition analysis.
The features typically used for non-invasively estimating BP are: (i) t-domain, (ii) f-domain, (iii) (t, f)-domain, (iv) and statistical features.Several t-domain features, which were calculated from the original signal and its derivatives, were used by different groups [9,[36][37][38].In a different study, Zaid et al. [39] showed the use of frequency domain features for identifying neurological disorder and in this study, the authors have taken inspiration from Zaid et al. to create features in estimating BP accurately from the PPG signal.
Several studies reported different features of PPG signal for different application [9,34,38,40].Various groups have used these features for SBP and DBP measurement; however, there is still plenty of scope for improvement.Numerous automated ML techniques were evaluated and recorded for various PPG databases as mentioned earlier.Nonetheless, to the best of our knowledge, no recent work has combined t-, f-and (t, f) domain features to estimate BP with high accuracy using machine-learning approach.PPG signal processing is comparatively simpler and easier, so more attention is being paid to novel methods that extract features from PPG signals.To reduce the error in BP estimation based on the Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensorsPPG signal, this analysis not only extracts features from the PPG signal but also utilizes the demographic characteristics of subjects, such as height, weight and age etc.There are several features were extracted for BP estimation from PPG signal in this study, which were not used before by any other group.The manuscript is divided into four sections.Section 1 is discussing the basics of the PPG signal, related works and inspirations of this research.Methodology and database are presented in Section 2 along with pre-processing steps and system assessment.Section 3 summarizes the result and discusses the results while Section 4 concludes the work.

Materials and Methods
This section discusses about the dataset used in the study, the signal pre-processing techniques used, the features extracted, feature selection techniques used, and the machine learning algorithms models trained and tested to estimate SBP and DBP.

Dataset Description
The dataset used in this study was taken from Liang et al. [31], which is publicly available.The dataset contained 657 PPG signal samples from 219 subjects [18].The PPG signal were sampled at a rate of 1000Hz and contained 2100 data points per signal with a signal duration of 2.1s.Other than PPG signal, patients' demographic information such as age, gender, height, and weight along with systolic pressure, diastolic pressure, and heart rate were also recorded.Summary of the dataset is shown in Table 1.Of the 657 signals, many signals were of poor quality and could not be used for feature extraction.Liang et al. [18] used a skewness-based Signal Quality Index (SQI) to find the suitable signals.In the quality assurance process, 222 signals from 126 subjects were finally kept for this study.Figure 3 shows the sample PPG signal which were divided as fit and unfit for the study.It is obvious that the unfit waveforms either do not have prominent features or the diastolic part of the waveform is not obvious in the recorded signal and the data length is very short.Hence, they were not used for the study.

Preprocessing Signals
The raw PPG signals were prepared through different pre-processing stages before feature extraction, which are summarized below: It was also observed that after normalization, other pre-processing techniques were easier to implement.Figure 4 shows the sample PPG signal before and after normalization.

Signal Filtration
It was observed that, the signal from the database [31] has high-frequency noise components.Thus, the signals were filtered through low-pass filter that can remove these high-frequency components.Several filtration techniques were tested to de-noise the signal, such as, moving average, low pass finite impulse response (FIR) and Butterworth Infinite Impulse Response (IIR) Zero-Phase Filter.Figure 5 shows the raw signal overlaid with the filtered output using different type of filters.From the figure 5, we can see that the Butterworth filter produced the filtration.Hence, we used it to filter the PPG waveforms.Which was also used by others to remove noise from the PPG signals [9,12,37,41].In this work, sixth order IIR filter with a cut-off frequency of 25 Hz was designed in MATLAB.

Baseline Correction
The PPG waveform is commonly contaminated with baseline wandering due to respiration at frequencies ranging from 0.15 to 0.5 Hz [11,21,42,43].It is therefore very important that the signal is properly filtered to remove Baseline Wandering but that important information is preserved as far as possible.We used polynomial fit to find the trend in the signal.Then we subtracted the trend to get the baseline corrected signal as shown in Figure 6.

Feature Extraction
The block diagram summarizing the feature extraction details adopted in the study is shown in Figure 7.A PPG waveform contains many informative information such as systole, diastole, notch, pulse Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensorswidth, peak-to-peak interval etc.Some of the distinctive features of PPG waveform might be not dominant in some patients, such as the notch prevalence changing with age [44].To find the different key points of PPG signal as shown in Figure 11(a), the authors have followed the methods described in previous work [45].The technique was largely based on the derivatives and thresholds defined in [46] and [47].The dicrotic notch is an essential feature of the PPG signal.Figure 8 describes the algorithm to detect the dicrotic notch.To do so, a line was drawn from the systolic peak to the diastolic peak.The minimum of the subtraction of the straight line from the signal is the dicrotic notch.However, to make it more robust, Fix index was used which calculates the local minima within a given window (in this case 50ms) around a given point.Reliable detection of dicrotic notch in various situations is shown in the Figure 9. Another key feature is the foot of the PPG signal.To find the foot of the PPG waveform, the second derivative of the PPG waveform, also called APG (acceleration plethysmogram), was first calculated.From the APG, zone of interest was defined, where the moving average of APG is larger than an adaptive threshold.In the zone of interest, the highest point of the APG corresponds to the foot of the signal.This method is robust and allows detecting the foot of the signal very accurately.Figure 10 shows that the algorithm can detect prominent foot and flat foot accurately [45].signal is analyzed to extract a1, b1 point from its first derivative and a2, b2 point from second derivative.Figure 12 shows the frequency domain representation of the PPG signal.Frequency domain representation was analyzed and features related to the first three peaks were extracted.The length of the fast Fourier transform was 2100, which was equal to the number of data points in the signal.Furthermore, demographic data such a Height, Weight, BMI, Gender, Age and Heart Rate were also used as features.It was reported by several groups that demographic features are important features for BP estimation [48].Elgendi [9] emphasized the need of height details for accurate estimation of PPG waveform while Kavasaoglu et al. [36] found that demographic features were useful and highly ranked features in their Machine Learning Algorithm using PPG signal's characteristics features.In real-time scenario, age and BMI will be known to the user and heart rate can be easily calculated from the PPG signal.Definitions of the extracted time-domain and demographic features were listed in Tables 2, 3, 4 and 5. Frequency-domain and statistical features can significantly contribute in BP estimation and were defined in Tables 6 and 7 respectively.Therefore, 107 features encompassing seventy-five t-domain, sixteen f-domain, and ten statistical features were derived for each PPG signal along with six demographic data.The t-domain, f-domain and statistical features were identified from different previous works [3, 4, 9, 23, 25-27, 38, 39].It is reported in literature that 1-24 and 42-58 features were used in PPG related works [49].These features are considered as Literature Features in Section 3.   The amplitude of ('x') from PPG waveform 2. Diastolic Peak The amplitude of ('y') from PPG waveform 3. Height of Notch The amplitude of ('z') from PPG waveform 4. Systolic Peak Time The time interval from the foot of the waveform to the systolic peak ('t1 ' ) 5. Diastolic Peak Time The time interval from the foot of the waveform to the height of notch ('t2 ' ) 6. Height of Notch Time The time interval from the foot of the waveform to the diastolic peak ('t3 ' )

∆T
The time interval from systolic peak time to diastolic peak time 8. Pulse Interval The distance between the beginning and the end of the PPG waveform ('tpi ' ) 9. Peak to Peak Interval The distance between two consecutive systolic peaks (tpp)

Pulse Width
The half-height of the systolic peak

Inflection Point Area
The waveform is first split into two parts at the notch point.The area of the first part is A1 and the area of the second part is A2.The ratio of A1 and A2 is the inflection point area ('A1/A2 ')

Augmentation Index
The ratio of diastolic and systolic peak amplitude ('y/x')

Alternative Augmentation Index
The difference between systolic and diastolic peak amplitude divided by systolic peak amplitude ('(x-y)/x')

Systolic Peak Output Curve
The ratio of systolic peak time to systolic peak amplitude ('t1/x')

Diastolic Peak Downward Curve
The ratio of diastolic peak amplitude to the differences between pulse interval and height of notch time ('y/ tpi-t3')

t1/tpp
The ratio of systolic peak time to the peak-to-peak interval of the PPG waveform 17. t2/tpp The ratio of notch time to the peak-to-peak interval of the PPG waveform

t3/tpp
The ratio of diastolic peak time to the peak-to-peak interval of the PPG waveform 19.

∆T/tpp
The ratio of ∆T to the peak-to-peak interval of the PPG waveform 20.z/x The ratio of the height of notch to the systolic peak amplitude 21. t2/z The ratio of the notch time to the height of notch 22. t3/y The ratio of the diastolic peak time to the diastolic peak amplitude

x/(tpi-t1)
The ratio of systolic peak amplitude to the difference between pulse interval and systolic peak time

z/(tpi-t2)
The ratio of the height of notch to the difference between pulse interval and notch time The width of the waveform at 25% amplitude of systolic amplitude 26.Width(75%) The width of the waveform at 75% amplitude of systolic amplitude 27.Width(25%)/t1 The ratio of pulse width at 25% of systolic amplitude to systolic peak time Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensors

Width(25%)/t2
The ratio of pulse width at 25% of systolic amplitude to notch time 29.Width(25%)/t3 The ratio of pulse width at 25% of systolic amplitude to diastolic peak time 30.Width(25%)/∆T The ratio of pulse width at 25% of systolic amplitude to ∆T

Width(25%)/tpi
The ratio of pulse width at 25% of systolic amplitude to pulse interval 32.Width(50%)/t1 The ratio of pulse width at 50% of systolic amplitude to systolic peak time 33.Width(50%)/t2 The ratio of pulse width at 50% of systolic amplitude to notch time 34.Width(50%)/t3 The ratio of pulse width at 50% of systolic amplitude to diastolic peak time 35.Width(50%)/∆T The ratio of pulse width at 50% of systolic amplitude to ∆T 36.Width(50%)/tpi The ratio of pulse width at 50% of systolic amplitude to pulse interval 37. Width(75%)/t1 The ratio of pulse width at 75% of systolic amplitude to systolic peak time 38.Width(75%)/t2 The ratio of pulse width at 75% of systolic amplitude to notch time 39.Width(75%)/t3 The ratio of pulse width at 75% of systolic amplitude to diastolic peak time 40.Width(75%)/∆T The ratio of pulse width at 75% of systolic amplitude to ∆T 41.Width(75%)/tpi The ratio of pulse width at 75% of systolic amplitude to pulse interval The first maximum peak from the 1 st derivative of the PPG waveform

ta1
The time interval from the foot of the PPG waveform to the time at which a1 occurred 44.a2 The first maximum peak from the 2nd derivative of the PPG waveform after a1

ta2
The time interval from the foot of the PPG waveform to the time at which a2 occurred 46.b1 The first minimum peak from the 1 st derivative of the PPG waveform after a1

tb1
The time interval from the foot of the PPG waveform to the time at which b1 occurred 48.b2 The first minimum peak from the 2nd derivative of the PPG waveform after a2

tb2
The time interval from the foot of the PPG waveform to the time at which b2 occurred 50.b2/a2 The ratio of b2 to a2

b1/a1
The ratio of First minimum peak of 1st Derivative after a1 to first maximum peak of 1 st Derivative 52.ta1 /tpp The ratio of ta1 to the peak-to-peak interval of the PPG waveform

tb1/tpp
The ratio of tb1 to the peak-to-peak interval of the PPG waveform 54.tb2/tpp The ratio of tb2 to the peak-to-peak interval of the PPG waveform 55. ta2/tpp The ratio of ta2 to the peak-to-peak interval of the PPG waveform

(ta1 -ta2)/tpp
The ratio of the difference between ta1 and ta2 to the peak-to-peak interval of the PPG waveform

(tb1 -tb2)/tpp
The ratio of the difference between tb1 and tb2 to the peak-to-peak interval of the PPG waveform The ratio of the area under the curve from 0 Hz to 2 Hz to the area under the curve from 2 Hz to 5 Hz

pPeak-1/peak-2
The ratio of the first peak to the second peak from the Fast Fourier Transform of the PPG signal 86.Peak-1/peak-3 The ratio of the first peak to the third peak from the Fast Fourier Transform of the PPG signal 87.Freq-1/Ffreq-2 The ratio of the frequency at first peak to the frequency at second peak from the Fast Fourier Transform of the PPG signal 88.Freq-1/Ffreq-3 The ratio of the frequency at first peak to the frequency at third peak from the Fast Fourier Transform of the PPG signal

Maximum Frequency
The value of highest frequency in the signal spectrum.

Magnitude at Fmax
Signal magnitude at highest Frequency.

Ratio of signal energy
Ratio of signal energy between  ) and the whole spectrum.

Feature Selection
Feature selection or reduction is important to reduce the risk of over-fitting the algorithms.In this work, three feature selection methods: correlation-based feature selection (CFS), ReliefF features selection [50], and features for classification using minimum redundancy maximum relevance (fscmrmr) algorithm.ReliefF is a feature selection algorithm, which randomly selects instances and adjusts the weights of the respective element depending on the nearest neighbor [51].
Correlation is a test used to evaluate whether or not a feature is highly correlated with the class or not highly correlated with any of the other features [52,53].On the other hand, the fscmrmr algorithm finds an optimal set of features that are mutually and as dissimilar as possible, and can effectively represent the response variable.The algorithm minimizes a feature setʹs inconsistency and maximizes the relevance of a feature set to the answer variable [54].MATLAB built-in functions were used for CFS, ReliefF and fscmrmr feature selection algorithm [55].
In Table 9, the features selected by the feature reduction algorithm are listed.The features listed are those that produced the best results.predictions.Unlike many common supervised machine-learning algorithms that learn the exact values in a function for each parameter, the Bayesian approach infers a distribution of probability over all possible values.Ensemble Trees: An ensemble tree is a predictive model consisting of a weighted combination of multiple regression trees [57].The core idea behind the ensemble model is to pull together a set of weak learners to create a strong learner.

Hyper-parameters Optimization of the Best Performing Algorithm
The machine learning algorithms used were initially trained with default parameters.The performance of these algorithms can, however, be improved by optimizing their hyper-parameters.Hyper-parameters optimization was carried out on the algorithms using MATLAB 2019b Regression Learner App [58].

Evaluation Criteria
To evaluate the performance of the ML algorithms for estimating BP, four criteria were used.Here, Xp is the predicted data while the ground truth data is X and n is the number of samples: Mean Absolute Error (MAE): Absolute Error is the amount of predicted error.The Mean Absolute Error is the mean of all absolute errors.MAE = ∑ Mean Squared Error (MSE): MSE calculates the squares sum of the errors.MSE is a risk function, which corresponds to the expected value of the squared error loss.MSE contains both the estimatorʹs variance and its bias.
Root Mean Squared Error (RMSE): RMSE is the standard deviation of the residuals (prediction error).Residuals are a measure of how far away the data points are from the regression line; RMSE is a measure of how these residuals are spread out.
Correlation Co-efficient (R): it is a statistical technique, which measures how closely related are two variables (predictors and the predictions).It also tells us how close the predictions are to the trendline.
When using the Regression Learner App in MATLAB, the above criteria are automatically calculated by MATLAB and these values were used to evaluate the performance of the algorithms.Among these criteria, RMSE was chosen as the main criterion.

Results and Discussion
This section summarizes the performance of the machine-learning algorithm used in the study.As stated earlier, 19 different machine-learning algorithms were trained and validated.It is observed from Table 9 that the features of Table 5 have significant contribution along with demographic features in estimation.Out of the 19 algorithms, GPR and Ensemble Trees outperformed for all cases in the estimation of both Systolic Blood Pressure and Diastolic Blood Pressure.In Table 10, it can be noticed that ReliefF feature selection algorithm produced the best result when combined with GPR.Feature selected using ReliefF and GPR combination performed the best estimating SBP while CFS and GPR performed best for DBP.Moreover, R scored 0.74 and 0.68 for SBP and DBP respectively, which means that there is a strong correlation with the predictors and the ground truth.However, these results could be further improved by tuning the hyper-parameters.Bayesian Optimization was used, which is efficient and effective and operates by constructing a probabilistic model of the objective function, called the surrogate function, which is then optimally scanned with the acquisition function before the candidate samples are selected for evaluation of the real objective function.As shown in Figure 13, 30 iterations of the model were trained during optimization.Each time it iterates, it tunes the hyper-parameters.If the result gives an MSE, lower than the lowest MSE recorded, then that MSE is taken as the lowest.If there is no over-fitting, the lowest MSE should be reported at the end of the iterations.
Table 11 summarizes the performances of the algorithms after optimization.It is clear that the ReliefF feature selection algorithm with GPR outperforms the other algorithms.After optimization, the combination produced a remarkable improvement in R score for SBP and DBP estimation (0.95/0.96).
Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensorsIn general, due to different evaluation criteria, and different and inadequately defined datasets, it is difficult to compare similar works in this field.Some reported lowest errors using small selected subsets of public or private data, but others worked on large-scale data (Kachuee et al. [24] and Slapničar et al. [30]) which has greater errors.Looking at individual related works in Table 12, Kachuee et al. [24] proposed method employs physiological parameters, machine learning and signal processing algorithms Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensorsusing PTT approach and some time-domain PPG Features, where they showed promising result according to British Hypertension Society (BHS).Kim et al. [23] compared artificial neural network (ANN) with multiple regressions as a BP estimation method , but their study is limited to 20 subjects only and did not identify DBP.Cattivelli et al. [25] introduced an algorithm for estimating BP, but used a very small amount of data (34 recordings for 25 subjects).Zhang et al. [27] described the SVM and neural network approach using time-domain features which is used directly for the study of BP regression, and good results were obtained compared to previous work.In [59], Zadi et al. showed that the calculation of systolic and diastolic BP from PPG measurements using a viable method for continuous and non-invasive measurement of BP, however using a very small dataset (15 subjects only).Slapničar et al. [30] worked with a large dataset and using deep-learning Spectro-temporal ResNet algorithm has achieved a reasonable accuracy in estimation.Su et al. [28] used a conventional deep learning model for LSTM, but used the PTT approach as opposed to using only PPG on a small database.Finally, using time domain, frequency domain and statistical features to train an optimized feature reduced regression model, a very low error rate was achieved in this work.To the best of our knowledge, no work has extracted all these features and achieved such error rate using classical machine learning approach.In Table 11, comparative summary of recent works with this work is shown in respect of the evaluation parameters: MAE, MSE, RMSE and R.
It is also important to note that the standard for the evaluation of blood pressure measurement devices proposed by the Association for the Advancement of Medical Instrumentation (AAMI), the British Hypertension Society (BHS) and the International Organization for Standardization [60][61][62][63] is that a device is considered acceptable if the estimated blood pressure is less than 10 mmHg from the actual.The machine-learning algorithm proposed in the study has estimated with much higher precision and accuracy.According to Table 13, AAMI standard completely accepts the results of the GPR algorithm in DBP.However, the SD (standard deviation) of the model in the SBP evaluation is greater than the standardʹs maximum permissible range, but the mean is well in the acceptable range.

Conclusions
In this study, the authors have proposed and implemented a method for estimating Systolic and Diastolic blood pressure with the help of PPG signal features and machine learning algorithm.This successfully demonstrates how PPG signal can be used to accurately estimate the BP of patients noninvasively without using cuff-based pressure measurement.The entire pre-processing method of the PPG MEAN (mmHg) SD (mmHg) Subject AAMI [

Figure 1 :
Figure 1: A typical PPG waveform with notch, systolic peak and diastolic peak.

Figure 2 :
Figure 2: Overall system block diagram

Figure 3 :
Figure 3: Comparison of waveforms that are fit and unfit for study, (a) fit data, (b) unfit data.

Figure 5 :
Figure 5: Filtered Signals overlaid on the raw PPG signal.

Figure 6 :
Figure 6: Baseline Correction of PPG Waveform (a) PPG Waveform with baseline wander and 4 th degree polynomial trend, (b) PPG Waveform after de-trending.

Figure 9 :
Figure 9: Demonstration of dicrotic notch detection for different age group: Case 1 (26 years) ,2 (45 years), and 3 (80 years): (a) Filtered PPG signal where we draw a line from systolic peak to diastolic peak; (b) subtract the line from the signal and find its minimum point; (c) initial notch detected; (d) adjust the notch using fix index.

Figure 10 :
Figure 10: Detection of the foot of a PPG waveform; (a) Filtered PPG signal; (b) 2 nd derivative of PPG along with derivation of zone of Interest based on moving average of APG and adaptive threshold; (c) Foot of the signal detected. PPG

Figure 11 :
Figure 11: (a) Illustration of Time domain features in a PPG signal; (b) First and Second derivatives of PPG signal.

Figure 12 :
Figure 12: Frequency domain representation of PPG signal with important features.

Figure 13 :
Figure 13: Optimization of GPR model during training.

Figure 14 :
Figure 14: Comparison of the predicted output vs actual target for SBP estimation using different GPR: (a-c) models without optimization, (d-f) models with optimization.

Figure 15 :
Figure 15: Comparison of the predicted output vs actual target for DBP estimation using different GPR: (a-c) models without optimization, (d-f) models with optimization.

Table 2 .
Twenty-four Features from PPG signal

Table 3 .
Seventeen Width related PPG Features

Table 4 .
Sixteen Features derived from first and second derivative

Table 8 .
Six Demographic Features

Table 9 :
Features chosen by the feature selection algorithms Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensors

Table 10 .
Sensors 2020, 19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensorsEvaluation of the best performing algorithm for SBP and DBP

Table 11 .
Evaluation of the outperforming algorithms for estimating SBP and DBP after Optimization.

Table 12 :
Comparison with related work in relations to dataset, methodology and estimation error 19nsors 2020,19, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensors* Deep learning algorithm on a small database

Table 13 :
Comparison of this paper results with AAMI standardIn addition, the accuracy of the proposed algorithm is tested from the point of view of the BHS grading criteria.Grades represent the cumulative percentage of readings falling within 5 mm Hg, 10 mm Hg, and 15 mm Hg of the mercury standard.The GPR algorithm findings are shown in Table14, based on the BHS standard.The GPR model performance is consistent with the BHS standard grade B for both SBP and DBP estimation.

Table 14 :
Comparison of this paper results with BHS standard