Contactless Blood Pressure Estimation System Using a Computer Vision System

: Blood pressure (BP) is one of the most common vital signs related to cardiovascular diseases. BP is traditionally measured by mercury, aneroid, or digital sphygmomanometers; however, these approaches are restrictive, inconvenient, and need a pressure cuff to be attached directly to the patient. Therefore, it is clinically important to develop an innovative system that can accurately measure BP without the need for any direct physical contact with the people. This work aims to create a new computer vision system that remotely measures BP using a digital camera without a pressure cuff. The proposed BP system extracts the optical properties of photoplethysmographic signals in two regions in the forehead captured by a digital camera and calculates BP based on speciﬁc formulas. The experiments were performed on 25 human participants with different skin tones and repeated at different times under ambient light conditions. Compared to the systolic/diastolic BP readings obtained from a commercial digital sphygmomanometer, the proposed BP system achieves an accuracy of 94.6% with a root mean square error (RMSE) of 9.2 mmHg for systolic BP readings and an accuracy of 95.4% with an RMSE of 7.6 mmHg for diastolic BP readings. Thus, the proposed BP system has the potential of being a promising tool in the upcoming generation of BP monitoring systems.


Introduction
Monitoring of blood pressure (BP) is considered one of the essential indicators of human health that helps to detect and manage cardiovascular diseases. According to the World Health Organization (WHO) [1], cardiovascular diseases are the leading cause of 32% of all global deaths, and about 17.9 million people globally died in 2019 from cardiovascular diseases. Furthermore, in the last three decades, the number of adults diagnosed with hypertension has risen from 650 million to 1.28 billion worldwide [2]. Current instrumentations for measuring BP are based on mechanical or oscillometric recordings, such as a mercury sphygmomanometer, an aneroid sphygmomanometer, or a digital sphygmomanometer. These instrumentations, however, are cuff-based measurement methods that are directly attached to the patient's upper arm or wrist, which can be uncomfortable and cumbersome for repeated measurements or for long-term monitoring. Moreover, the BP readings are easily affected by unskilled examiners and cannot achieve continuous BP monitoring [3]. In addition, physical access to patients during the COVID-19 pandemic is another challenge for these contact instruments [4]. Therefore, a significant need exists for innovative systems that can accurately measure BP and assess cardiovascular risk when physical contact with the patients is either undesirable or unsafe.
In recent years, a computer vision system based on plethysmographic signals (extracting color variations at the skin's surface that contains a wealth of physiological information using a camera as a photodetector) has attracted considerable attention as a technique to measure BP. Previous attempts have tested contact finger-based video plethysmography using smartphones along with the corresponding electrocardiogram (ECG) signal to measure pulse transit time (PTT) and to estimate the related BP [5][6][7][8][9][10]. Despite the contact finger-based video plethysmography offering some advantages, it may expose patients to the risk of infection and skin irritation as well as being unsuitable for people with skin ulcers, burns, congenital and contagious conditions [11][12][13][14]. To address the limitations mentioned above while reducing the wiring and increasing safety, the development of contactless BP monitoring systems is increasingly desirable.
Recent studies have proved that video plethysmography could be used to remotely measure cardiac activity, including BP. For example, McDuff et al. [15] proposed a new imaging system based on a digital camera to remotely measure the cardiovascular blood volume pulse (BVP) detected via plethysmographic signals from the face using ambient light at a distance of 3 m. Then, they used an independent component analysis (ICA) technique followed by a bandpass filter (0.75-4.5 Hz) to extract the signal of interest and applied a peak detection method to detect the systolic and diastolic peaks, where the head motion and light conditions were the main challenges in their study. Another example, a study by Murakami et al. [16] measured PTT signals from two regions (wrist and ankle) using a digital camera. The study used a finite impulse response (FIR) low pass filter (2 Hz) as noise removal, phase delay compensation, and a peak detection method to extract pulse peaks at both regions and investigated their relationship to systolic blood pressure. However, the regions of interest (ROI) should be clear, and the subject should lie down on a bed during the measurement, and removal of garments may be required. Another study by Secerbegovic et al. [17] used a digital camera as a plethysmographic detector in combination with an electrocardiogram (ECG) to measure PTT signals from two regions (forehead and palm) and predict only systolic blood pressure using the ICA technique. The selected signal was then filtered with a band of 0.6 to 4 Hz. However, synchronizing plethysmographic signals with ECG signals was required due to the reflection of the pressure waves, sensor noise, and some movement considerations. Patil et al. [18] proposed a non-contact imaging system for BP measurement using plethysmographic signals from the forehead captured by a webcam. Their study also used ICA on the plethysmographic signal, and then leveraging features extracted from the output signals as inputs to a neural network system. However, their study was affected by lighting conditions and noise because the PTT is affected by variations in distance between the camera sensor and the selected ROIs. Another work by Sugita et al. [19] presented a new imaging system to predict BP variability using a pulse wave obtained from video plethysmography at three regions (right palm, forehead, and right cheek) without calculating the PTT. The plethysmographic signals were then directly filtered (around 1 Hz) to extract the heartbeatrelated component. This study, however, was tested using an external LED light source, and there were individual differences in the correlation of the proposed system with BP. Fan et al. [20] proposed a developed contactless imaging system to estimate BP after improving and detecting the peaks in the PTT signal obtained from video plethysmography from two regions (face and palm). The study improved the PTT using an adaptive Gaussian fitting model because the relationship between BP and PTT is not completely linear. A remote estimation of pulse wave features related to BP and arterial stiffness based on a computer vision system was proposed by Djeldjli et al. [11]. A digital camera and contact probes (finger and earlobe sensors) in association with an external light source were utilized to capture the video plethysmography and contact plethysmographic signals. The extracted signals were then filtered using a finite impulse response high-pass filter and continuous wavelet transform with a bandwidth of 0.5-4 Hz. A Contactless BP measurement based on video plethysmography and PTT was also proposed by [21]. The proposed system was then improved by using a neural network for training and BP prediction. A recent study by Iuchi et al. [22] used a digital camera to estimate continuous BP by continually tracking spatial information of facial pulse waves based on a convolutional neural network (CNN). However, the accuracy of the neural network systems remains dependent on the accuracy of the training data. Due to the natural characteristics of the plethysmographic signal, some physiological information often disappears for two reasons. The first reason is the surrounding environment, camera sensor, and light-transmitting properties of the skin, and the second reason is the loss of physiological information due to component analysis, signal filtering, and peak detection, requiring a number of challenges to be considered. As reported in Cheng et al. [23], the plethysmographic signals could be recovered from the effect of lighting changes by decomposing the green channel of the facial ROI into a set of signals with different scales of time series using an ensemble empirical mode decomposition of the Hilbert-Huang transform instead of ICA that works with motion artifacts. Therefore, this study proposes a contactless imaging system based on video plethysmography that overcomes the limitations mentioned above using an effective method that is tolerant of lighting changes during measurement. The proposed BP system used an adaptive decomposition method called a complete ensemble empirical mode to decompose the plethysmographic signals captured in two regions on the forehead into a set of signals with different scales of time series, and selecting the signals with the best frequency spectra that correspond to cardiac activity to achieve a remote estimate of BP without using component analysis, filtering and peak detection that will operate under ambient illumination conditions. The remainder of this paper consists of the following: Section 2 describes the materials and methods of the study, including research ethics and participants, experimental setup, and system overview. Section 3 presents the results and statistical analysis of the proposed imaging system performed on human participants with discussion. Finally, Section 4 concludes the paper.

Research Ethics and Participants
The study followed the principles outlined in the Declaration of Helsinki and received ethical approval from the research committee in the Training and Human Development Centre, Ministry of Health and Environment, Iraq (research protocol number: 1040). The participant information sheet and written consent forms were collected electronically with the possibility of withdrawing before the completion of data collection. Before starting the experiment, all participants were informed about how they will save and protect their personal data after completing the study. A total of 25 participants (18 males and 7 females) without any known cardiovascular disease aged from 18 to 60 years with different skin tones took part in the research study.

Experimental Setup
The experimental setup includes a digital camera (Nikon D5300, 10 MP, 18-55 mm Lens) mounted on a tripod and a commercial blood pressure monitor (Rossmax, Harrisburg, PA, USA) as a benchmark device. Prior to the measurements, all participants were asked to rest on a chair for several minutes to ensure a stable physiological state and then asked to face the camera at a distance of approximately 50 cm, as shown in Figure 1. They were also asked not to move and to breathe gently during video capture. Four video recordings from each participant were performed at different times. A total of 100 video data (10 s length) were collected with a resolution of 1920 × 1080 pixels at a frame rate of 60 frames per second and saved in MOV format without any compression. The experiments were conducted under ambient illumination conditions without any additional light sources. Demonstrating the proposed imaging system that recorded videos by a digital camera and processed the obtained signals while simultaneously recording the BP readings from a contact pressure cuff device for validation purposes.

System Overview
The framework of the proposed imaging system was designed to extract the cardiac relevant plethysmographic signals from the selected forehead regions and decompose the extracted plethysmographic signals into a set of signals with different scales of time series, as demonstrated in Figure 1.
The block diagram of the proposed imaging system is shown in Figure 2. As can be seen, the proposed imaging system consists of several main parts, including face and forehead detection, ROIs selection, signal decomposition, features extraction, and a determination of blood pressure ratio. The process is conducted by firstly converting the videos into frames. Then, the Viola-Jones detection method [24] is used to detect the facial region automatically. Kanade Lucas-Tomasi (KLT) feature extraction is used to detect the face in the first frame and then the extracted features across the video frames were used to continue tracking the facial region. This method is used due to its accuracy and low computational cost [25,26]. To increase the robustness of the proposed imaging system, a skin detection method based on multi-level thresholding is used as an alternative for face detection if the Viola-Jones detection method fails to detect the face due to background clutter issues. Multi-level thresholding is an image segmentation method that applies two or more threshold values in YCbCr color space. The forehead region is chosen as ROI because it is least affected by facial expressions, talking, and eye blinking. The selected ROI is then divided into two regions: the middle region (ROI 1 ) and the side region (ROI 2 ), as shown in Figure 3. Green channel (G) was selected from RGB color space since this component reportedly contains the strongest plethysmographic signals [27][28][29]. The required G values in the ROIs are processed for the computation of the photoplethysmograph. The time-series plethysmographic signals from ROI 1 and ROI 2 were extracted by averaging the pixels values for each frame in the selected video as follows: [30] where H 1 is the height of the selected middle forehead region ROI 1 in pixels, W 1 is the width of the ROI 1 in pixels, H 2 is the height of the selected side forehead region ROI 2 in pixels, W 2 is the width of the ROI 2 in pixels, and F i (x, y) represents the light level of the G component plane at the (x, y) coordinates of frame i, where the averaged values for Equations (1) and (2) are in a range from 0-255. Since the averaged pixel values can be affected by illumination variation noise, a complete ensemble empirical mode decomposition method [31] is used to separate the cardiac pulse signal in plethysmographic signals from the illumination variation noise without the need to filter the signal. This decomposition method is commonly used to remove illumination noise artifacts from plethysmographic signals [23,32,33]. This method is a noise-assisted adaptive data analysis method developed by Colominas et al. [31] to improve the decomposition of nonstationary and nonlinear signals into several signals with different scales of time series, called intrinsic mode functions (IMFs). It decomposes signals based on local characteristics of signals by adding positive and negative white noises to signals, thus avoiding the undesired mode mixing problem and achieving a negligible reconstruction error, presented in previous mode decomposition methods [34][35][36]. This method adds a noise-adding scheme at each stage of the decomposition and calculates a unique residue to obtain each IMF, making the obtained signals almost complete. In addition, selecting the optimal maximum number of sifting iterations can reduce execution time for this decomposition method. By applying this decomposition method, i G (t) from each region is composed into a finite number of IMFs, as shown in Figure 4. As illustrated in Figure 4, each signal is decomposed into five signals with different frequencies. IMFs signals were analyzed in the time-frequency domains using MATLAB built-in command, called "pspectrum". This command uses spectrogram function to returns a vector of time-series signal corresponding to the centers of the windowed segments used to estimate power spectrum of the selected signal. This function helps to visualize interference features embedded within the selected signal at a time resolution of 0.5 s and zero overlap between adjoining segments. Three IMFs (IMF3, IMF4 and IMF5) from each region with the largest maximal amplitude are selected as candidates to estimate BP and neglecting IMF1 and IMF2 that fall within the possible light changes frequencies to reduce the illumination noise artifacts, as shown in Figure 5. Then, the features (locations of peaks and the distance between the consecutive peaks) from the selected IMFs are extracted to estimate BP. The following formula is used to estimate the ratio of systolic BP as follows: where x is the distance between the first two consecutive peaks from IMF4 (ROI 1 ), y is the distance between the first two consecutive peaks from IMF5 (ROI 1 ), z is the average distance among the peaks from IMF3 (ROI 2 ), dx 1 = x 0.33 , dy 1 = y 0.25 and dz 1 = 2(1 + z).
The systolic BP can then be calculated using the following formula where d t1 = 10 (1.44−X sys ) where 1.44 is a value of X sys corresponding to systolic BP that equals approximately 90 mmHg. If (X sys > 1), the value of d t1 will be added to Equation (4) If (X sys < 1), the value of d t1 will be subtracted from Equation (4) If (X sys = 1), the value of d t1 will be ignored.
The following formula is used to estimate the ratio of diastolic BP as follows: where dx 2 = x 0.25 , dy 2 = y 0.33 , and dz 2 = 2z where d t2 = 10 (1.4−X dias ) where 1.4 is a value of X dias corresponding to diastolic BP that equals approximately to 60 mmHg. If (X dias > 1), the value of d t2 will be added to Equation (6) If (X dias < 1), the value of d t2 will be subtracted from Equation (6) If (X dias = 1), the value of d t2 will be ignored.

Experimental Results and Discussion
The videos and extracted signals were processed offline using the MATLAB program (MathWork Inc., Natick, MA, USA). For system verification, the results of the proposed imaging system were evaluated against their corresponding benchmark (BM) measurements using statistical parameters, such as accuracy, root mean square error (RMSE), mean absolute error (MAE), absolute percentage error (APE), mean absolute percentage error (MAPE). The comparison between systolic/diastolic BP readings against their BM readings for all collected data is shown in Figure 6. It is clear from Figure 6 that the accuracy of the proposed imaging system for systolic BP measurements was 94.6% with an RMSE of 9.2 mmHg, while it was 95.4% for diastolic BP measurements with an RMSE of 7.6 mmHg. The error ratios, including MAE and MAPE of the measured systolic BP readings, were also examined with respect to those of the BM measurements, as shown in Figure 7.    Figure 8a shows that the diastolic error differs over the range from 0 mmHg to 23 mmHg with an MAE of 5.47 mmHg, and Figure 8b illustrates that the APE of the diastolic BP readings differs over the 0-28.125% range of percentage with a MAPE of 7%. The proposed imaging system was also examined via statistical analysis, such as histogram test, probability density function (PDF), and cumulative density function (CDF). The measured BP readings by the proposed system firstly determined whether they were compatible with the benchmark measurements or not using a histogram test. The histogram test of the systolic BP data obtained from the proposed imaging system and the systolic BP data obtained from the benchmark is shown in Figure 9.  Figure 9 illustrates peaks of 35 and 28 points in the 133 and 124 mmHg classes for the proposed imaging system and the benchmark, respectively, leading to a good convergence between the proposed imaging system and the benchmark measurements. The histogram test of the diastolic BP data is shown in Figure 10.  proposed and the BM, respectively, leading to good convergence between the data measured by the proposed system and those measured by the benchmark. The measured BP data gained from the proposed imaging system were also investigated through a statistical metric based on a PDF test, as shown in Figures 11 and 12.  Finally, the CDF plot was used for adjusting the distribution of BP measurements against their corresponding benchmark measurements, which preserves the relative relationship of the main data with a reference range. In this work, the CDF plot demonstrated a large agreement between systolic and diastolic BP measurements and the benchmark measurements, as shown in Figures 13 and 14, respectively.  It is clear from Figure 13 that the CDF plot indicates a good agreement of 90% and 92% for systolic BP measurements values of 143 and 144 mmHg, respectively, whereas the CDF plot in Figure 14 indicates a better agreement of 94% and 95% for diastolic BP measurements values of 95 and 94 mmHg, respectively. The proposed imaging system achieves an average error percentage of 5.865% (Systolic BP) and 7% (Diastolic BP) compared to an average error percentage of 9.62% (Systolic BP) and 11.63% (Diastolic BP) obtained by [18], an average error percentage of 8.42% (Systolic BP) and 12.34% (Diastolic BP) obtained by [20], and an average error percentage of 9.28% (Systolic BP) and 9.84% (Diastolic BP) obtained by [21].
In addition to the lighting changes effects, the plethysmographic signals may be slightly affected by changing the subject's skin tone; therefore, this study recruited participants with different skin tones to increase the performance and accuracy of the proposed system. Through the data obtained, the error range of the obtained systolic BP falls within 7.27-9.2 mmHg, while it was 5.47-7.6 mmHg for the obtained diastolic BP compared to the benchmark measurements; this is because the benchmark measurements were also affected by movement artifacts and were inaccurate under some conditions, creating difficulty in the validation process with the proposed imaging system. Therefore, we tried to repeat the reference measurements until a steady reading was obtained. Additionally, the efficiency of the proposed imaging system is affected by subject movements, such as head rotation and facial expressions. It was also noticed that the proposed imaging system was inaccurate when perspiration was present on the forehead region or when it was covered with makeup, so all participants in the study were asked to clean and wipe the forehead region before filming. Finally, changing the distances between the camera and the face was not considered in this study.

Conclusions
In this study, a non-interventional imaging system was proposed to remotely estimate BP using plethysmographic video signals obtained from two regions of the forehead. The issue of illumination conditions that highly affected the plethysmographic signal was addressed using an efficient noise removal decomposition method. System performance was tested by comparing the error and the accuracy of the BP readings from the proposed imaging system, with the method showing promising results compared to the benchmark measurements. Thus, the proposed system makes the BP measurement potentially feasible and convenient for subjects, and also inexpensive, which is of great significance for healthcare applications and continuous vital signs monitoring.