Enhancing the Robustness of Smartphone Photoplethysmography: A Signal Quality Index Approach.

Heart rate variability (HRV) provides essential health information such as the risks of heart attacks and mental disorders. However, inconvenience related to the accurate detection of HRV limits its potential applications. The ubiquitous use of smartphones makes them an excellent choice for regular and portable health monitoring. Following this trend, smartphone photoplethysmography (PPG) has recently garnered prominence; however, the lack of robustness has prevented both researchers and practitioners from embracing this technology. This study aimed to bridge the gap in the literature by developing a novel smartphone PPG quality index (SPQI) that can filter corrupted data. A total of 226 participants joined the study, and results from 1343 samples were used to validate the proposed sinusoidal function-based model. In both the correlation coefficient and Bland-Altman analyses, the agreement between HRV measurements generated by both the smartphone PPG and the reference electrocardiogram improved when data were filtered through the SPQI. Our results support not only the proposed approach but also the general value of using smartphone PPG in HRV analysis.


Introduction
Heart rate (HR) is an indicator of the balance of multiple physiological systems such as the cerebral cortex, autonomic nervous system, endocrine system, and baroreflex [1,2]. Even while at rest, HR continuously adapts to physiological adjustments such as changes in arterial pressure caused by breathing [3]. By observing HR variability (HRV), researchers can assess our physical capability to adapt to internal physiological requests or changes in our surroundings.
Despite the importance of HRV in clinical diagnoses and preventative medical applications, the cost and immobility of traditional electrocardiogram (ECG) equipment limits its potential for continuous health monitoring. The advances in information technology have introduced several

Signal Pre-Processing
The current study employed three steps to convert film frames of the fingertip to pulse waveforms for further analysis.

Signal Extraction and Conversion
For each data collection session, a self-developed app first activated the in-built flashlight and recorded 120 × 160 pixel videos with approximately 30 frames per second (the actual frame rate was determined by the underlying operating system) (see Figure 1a). Raw YUV-format picture frames retrieved from the preview function were then converted into the RGB format. Then, the input signals (r i , b i , g i ) from the three color channels (red, blue, and green) were normalized using the 100-point moving average (R, B, G) and standard deviation (σ R , σ B , σ G ): where k is the amount of data collected in each data collection session. Because standard deviation can represent the relative strength of each channel, the signals were combined with the standard deviation-weighted average as follows: where σ C = σ C if σ C > 0.5, 3 < C < 252 0 otherwise , C ∈ {R, B, G} and t i ∈ T = {t i |i ∈ 1, .., k} denotes the time at which the i th data point was collected. A color channel was removed from the weighted average f (t i ) when the average of the input (C) was either too small or too close to the upper limit of 255, or when the standard deviation (σ C ) was too small (σ C ≤ 0.5).
In addition, the signs of the red and blue channel were reversed to denote the inverse relationship between the green channel and the other two channels.

Beat-to-Beat Interval (BBI) Segmentation
After converting the signals to a waveform input, we divided the "continuous" waveform dataset f (t i ) into segments representing each beat-to-beat interval (BBI). Given that the radial pulse waveform follows a right-skewed bell shape, distinct plateaus can be observed on the first derivatives during each heartbeat (see Figure 1b). Although the exact position of the maximum of the waveform is susceptible to noise, the detection of the plateau is relatively robust. Therefore, this study used a set of local maxima (M i )-above the 70th percentile (P 70 ) of the first derivatives f (t i )-to identify potential BBIs for further analysis.
The distances between two successive data points in M were converted into HRs to filter out points with HRs over 150. All first and second derivatives were calculated using first-and second-order central difference approximations. The set of maximum points (P i ) of the waveform between two data points in M is defined as P: The intervals segmented by the data points in P are then defined as the set of BBIs (B): Sensors 2020, 20, x 4 of 17

Figure 1.
Steps taken to convert raw signals to segmented and normalized pulse waveform.

Beat-to-Beat Interval (BBI) Segmentation
After converting the signals to a waveform input, we divided the "continuous" waveform dataset ( ) into segments representing each beat-to-beat interval (BBI). Given that the radial pulse waveform follows a right-skewed bell shape, distinct plateaus can be observed on the first derivatives during each heartbeat (see Figure 1b). Although the exact position of the maximum of the waveform is susceptible to noise, the detection of the plateau is relatively robust. Therefore, this study used a set of local maxima ( )-above the 70 th percentile ( ) of the first derivatives ′( )-to identify potential BBIs for further analysis.
The distances between two successive data points in were converted into HRs to filter out points with HRs over 150. All first and second derivatives were calculated using first-and secondorder central difference approximations. The set of maximum points ( ) of the waveform between Steps taken to convert raw signals to segmented and normalized pulse waveform.

BBI Normalization
To reduce the problem caused by baseline drifting [40], the amplitudes of data points in each BBI were normalized in proportion to the height difference of the two successive peaks as described below (see Figure 1c): The normalized data points were used to fit the regression model and to determine heartbeat points with various FPDTs. The current study then applied the R package RHRV [48] to convert the heartbeat points generated by the FPDTs to HRV measures for further analysis.

Sinusoidal Function-Based Photoplethysmography (PPG) Quality Index
Although the exact contour of the waveform is influenced by the characteristics of the individuals' systemic circulation [49], the general shapes are similar. For most healthy people, the contour is right skewed and bell shaped, with a dicrotic notch in the middle [50]. Since the shapes of the pulse waveforms are similar, PPG studies use waveform morphology to differentiate acceptable signals from contaminated ones [51].
Two families of functions have been used to describe the pulse waveform: Gaussian (or modifications, such as the Rayleigh functions) [46,[52][53][54] and sinusoidal functions. Since sinusoidal functions are more commonly used in hemodynamic studies (i.e., Fourier analysis) to predict health-related variables [55], and are less computationally demanding, this study fit the pulse waveform with the sum of sinusoidal functions. We define the model as: is the i th sinusoidal function, ω i ≥ 0 is the weighting of the i th sinusoidal function, c ∈ R is the scaling parameter, h ∈ R is the displacement parameter, and n is the number of sinusoidal functions included in the model (Figure 2a,b). As an explorative study, the frequencies of the sinusoidal functions (s i (t), i > 1) were provided as multipliers of the base function s 1 (t) to reduce model complexity. In addition, the weightings ω i were restricted to positive values and w 1 ≥ 2 × w i ∀i > 1 to keep the model (Equation (1)) right skewed and approximately bell shaped. Future studies may relax these constraints.  The model fit the input data points using nonlinear least-squares optimization with 10,000 iterations. Based on previous experience, we used a set of initial values: ( , , , , , , ℎ) = (7,7,3,1,1,2,0.1). Since the sum of the squared errors in the regression did not change significantly when n was greater than four in our preliminary data analysis, we set n = 4 in the model. When the waveform of an interval was severely corrupted, or the section contained many artifacts, the model failed to converge before reaching the maximum number of iterations or had a large root mean square error (RMSE). The model fitting was considered to have failed when the RMSE was larger than 0.5. The model fit the input data points using nonlinear least-squares optimization with 10,000 iterations. Based on previous experience, we used a set of initial values: (w 0 , w 1 , w 2 , w 3 , w 4 , c, h) = (7, 7, 3, 1, 1, 2, 0.1). Since the sum of the squared errors in the regression did not change significantly when n was greater than four in our preliminary data analysis, we set n = 4 in the model. When the waveform of an interval was severely corrupted, or the section contained many artifacts, the model failed to converge before reaching the maximum number of iterations or had a large root mean square error (RMSE). The model fitting was considered to have failed when the RMSE was larger than 0.5. The maximum number of iterations and the threshold for the RMSE were determined based on experience; future studies may re-examine these constraints. Further, because the pulse waveform was relatively stable, for each BBI, we used the fitting results from previous BBIs to filter out parameters that were outliers. When either ω 1 and ω 2 were outside of the boundary determined by Tukey's method, i.e., more than 1.5 times the interquartile range beyond the quartiles, the fitting was considered to have failed. We then defined the SPQI as the success rate of the model-fitting process. SPQI = number of successful model fittings number of BBIs The_SMF (8)

HRV Measures
There are three types of HRV measures: (1) time-domain, (2) frequency-domain, and (3) nonlinear. The frequency-domain components of HRV consist of four frequency bands: high frequency (HF), low frequency (LF), very low frequency (VLF), and ultralow frequency (ULF) ( Table 1). Given that this study only recorded 5-min videos, the ULF and VLF bands did not apply [56]. The HF and LF values were log-transformed because they are not distributed normally [43,57]. The time-domain indices of HRV quantify variability in the BBI. This study included three commonly used time-domain measures for comparison: rMSSD, pNN50, and SDNN. Nonlinear HRV measures are computationally complex and were accordingly excluded from this study.

FPDT Fiducial Point Definition Peak
The maximum point in each BBI.

Valley
The minimum point in each BBI.

M1D
The maximum point of the first derivative in each BBI.

M2D
The maximum point of the second derivative in each BBI.

Tangent
The point where the tangent line from the M1D intersects the horizontal line from the Valley. The first derivatives of a discrete data set are determined by the difference function approximation.

Agreement Analysis
We used two methods to compare the agreement between the smartphone PPG and reference ECG. First, we examined the Pearson correlation coefficients of the data generated by the smartphone PPG to the reference ECG. The correlation coefficients (r) were assessed with the Student's t-test where Second, we compared the agreement with the Bland-Altman method [65]. The Bland-Altman ratio (BAR) is defined as: where LA is the half range of agreement limits (± 1.96 × SD), and MPM denotes the mean of the pairwise mean. The two measurements are considered to have a good or acceptable agreement when the BAR is less than 10% or 20% [40,66].

Participants and Data Collection
The study protocol was approved by the Ethical Board of the Department of Psychology, Tsinghua University; 226 students and university employees in Shenzhen, China, joined the study. The average age was 23.4 years (σ = 3.36) with equal percentages of male and female participants. After a 5-min debriefing, participants were asked to remain seated for the entire data collection process. They were asked to wear an ECG chest strap (H10, Polar Electro Oy, Finland; sampling rate 1000 Hz [67]) and hold a smartphone (Mi 8 SE, Xiaomi, China; sampling rate 30 Hz) in their left hand. A self-developed app was then used to record 5-min videos of their fingertip multiple times during the 1-h session. A total of 1343 valid datasets were collected. The accuracy of the ECG chest strap is well-established in the literature [21], and studies employ the chest strap in detecting HRV for convenience when a 24-lead ECG is not available [68,69].

Correlation Coefficient Analysis
Before starting data analysis, this study applied Tukey's rule to remove outliers. All remaining HRV measurements using smartphone PPG were significantly correlated (p < 0.05) with the results detected using ECG (Table 3). In general, smartphone PPG provided better estimations for log HF (across the FPDTs, average r = 0.72), log LF (average r = 0.70), and SDNN (average r = 0.73) than for rMSSD and pNN50. Among the FPDTs, Tangent produced the best results, followed by Valley and M1D; however, M2D had the poorest performance. When the data were filtered with the SPQI, the correlation coefficients increased significantly. On average, they increased by 13% (from 0.669 to 0.758) and 26% (from 0.669 to 0.843) when the data were filtered with SPQI thresholds above 0.8 and 0.95, respectively. The improvement was particularly prominent for M2D, as the average correlation coefficient improved by 51% (from 0.472 to 0.712).
The advantage of using the SPQI is also apparent from the scatter plots ( Figure 3). When all data are included, a significant portion lies away from the straight line; when corrupted data are filtered out, a higher portion of the data lies along the straight line. In particular, there is an asymmetric bias for rMSSD and pNN50. Smartphone PPG tended to produce larger values for these measures when the signal quality was low, and therefore, their correlation coefficients were lower. Since rMSSD and pNN50 are more susceptible to the randomness of the samples, the benefits of using the SPQI were also more prominent for these two measures. When we filtered the data with an SPQI threshold level of 0.95, the correlation coefficients for rMSSD and pNN50 increased from 0.64 and 0.77 to 0.88 and 0.92, respectively.  However, when the data set was filtered with an SPQI > 0.95, the reduction in the number of samples was not negligible (see Table 4). On average, the number of samples decreased by 14% (from 1263 to 1060) and 56% (from 1263 to 557) when the data were filtered out by thresholds above 0.8 and 0.95, respectively.  However, when the data set was filtered with an SPQI > 0.95, the reduction in the number of samples was not negligible (see Table 4). On average, the number of samples decreased by 14% (from 1263 to 1060) and 56% (from 1263 to 557) when the data were filtered out by thresholds above 0.8 and 0.95, respectively.

Bland-Altman Ratio Analysis
The Bland-Altman analysis showed similar results to the correlation coefficient analysis. Among all FPDTs, Tangent generated the smallest BAR for SDNN, log HF, and log LF (see Table 5). M1D, in contrast, performed the best for rMSSD, and Valley had the lowest BAR for pNN50. The agreement of log HF and log LF was "acceptable" (BAR < 0.2) before filtering with the SPQI. The agreement of log HF and log LF generated by all FPDTs became "good" or close to "good" when the data were filtered with an SPQI > 0.95. The effect of the SPQI can also be observed from the Bland-Altman plot ( Figure 4). Considering data generated by Tangent, the number of points that lie beyond the upper and lower agreement limits was significantly reduced when the data were filtered with the SPQI. The same pattern was observed for all HRV measures. Similar to the correlation coefficient analysis, rMSSD and pNN50 showed the least agreement among HRV measures. Although filtering with an SPQI > 0.95 could significantly reduce the BAR, these two measures were still above the "acceptable" level for all FPDTs.

Principal Findings
The results from both the correlation coefficient and Bland-Altman analyses validated our proposed strategy: removing data that cannot fit a pre-defined model can significantly increase the accuracy of smartphone PPG. By choosing a relatively more robust FPDT, such as Valley or Tangent, and filtering data with the SPQI, the agreement between smartphone PPG and ECG can have a "good" agreement, and the correlation coefficient can be over 0.9.
Among the FPDTs, Tangent and M2D were generally the best performers [40,62]. Our data showed that Tangent had the best agreement with ECG for SDNN, log HF, and log LF, while M1D and Valley had the highest agreement for rMSSD and pNN50, respectively; M2D was the worst performer in our data. Given that M2D performed well in previous studies, our results suggest that

Principal Findings
The results from both the correlation coefficient and Bland-Altman analyses validated our proposed strategy: removing data that cannot fit a pre-defined model can significantly increase the accuracy of smartphone PPG. By choosing a relatively more robust FPDT, such as Valley or Tangent, and filtering data with the SPQI, the agreement between smartphone PPG and ECG can have a "good" agreement, and the correlation coefficient can be over 0.9.
Among the FPDTs, Tangent and M2D were generally the best performers [40,62]. Our data showed that Tangent had the best agreement with ECG for SDNN, log HF, and log LF, while M1D and Valley had the highest agreement for rMSSD and pNN50, respectively; M2D was the worst performer in our data. Given that M2D performed well in previous studies, our results suggest that M2D may be more sensitive to the randomness found in corrupted data (which was manually removed in previous studies).
Compared with rMSSD, pNN50, and HF, SDNN and LF generated by smartphone PPG generally have a higher agreement with the reference ECG [33,40]; our data demonstrated similar results. Neither rMSSD nor pNN50 reached an acceptable agreement between the smartphone PPG and the ECG, even with SPQI > 0.95 filters. Our data showed that these two measures were more susceptible to randomness and systematic bias. SDNN and log LF had an acceptable or good agreement when the data were generated by Valley, M1D, or Tangent. The same was true for log HF.

Limitations
Our data support the proposed method; however, there are still some limitations, as listed below. First, the underlying assumption of the SPQI is that most people share common cardiac waveform characteristics. Although this hypothesis is based on empirical studies [47], this premise limits the application of the SPQI to individuals with abnormal cardiac waveforms since the SPQI classifies samples that do not meet the preset pattern as poor quality. We expect future studies may pursue this research direction and try to differentiate corrupted data from valid but abnormal samples.
Second, the assumed application scenario of the proposed method is to determine the quality of a sample collected from a new participant based on the theoretical waveform pattern found by previous studies. We did not consider the possibility of using historical data for each individual to build a personalized quality index. Since cardiac waveforms have a larger between-group deviation (compared to other people) and smaller within-group differences (compared to one's historical data), a personalized quality index may help increase the accuracy, and resolve the shortcoming that the SPQI is not suitable for individuals with abnormal waveforms.
Third, there are many methods for improving smartphone PPG accuracy [27,[29][30][31][32][33]. For example, adding a suitable bandpass filter for signal processing [28] or excluding data with RR intervals that differ more than a certain threshold [70] are simple and effective approaches to reduce noise. In the current study, however, we did not use other proven noise reduction methods because we aimed to compare the relative accuracy of data filtered with the quality index rather than increase absolute accuracy. Whether the combination of the noise reduction methods and a quality index can further increase the accuracy and usability of smartphone PPG still warrants further analysis. Fourth, the sampling rate has a significant influence on the accuracy of HRV measures. Although there have been many different suggestions for the minimum sampling rate (ranging from 25 Hz [42] to 125 Hz [43] in PPG studies and 50 Hz [71] to 1000 Hz [72] in ECG studies), most smartphone cameras sample at about 30 Hz, which is below the level of most suggestions. Therefore, the poor measuring quality caused by low frame rates is generally considered a potential challenge to the validity of using the smartphone PPG method. However, researchers have proposed various methods to improve the accuracy rate at low frame rates [33], and empirical studies have also indicated that smartphone PPG results are comparable to those obtained using gold standard ECGs [21,28,[37][38][39][40]. There seems to be conflicting suggestions and conclusions in the literature, and therefore, more empirical evidence is required to clarify this issue. Further, smartphone-based physiological assessment applications are usually considered low-cost, convenient tools for public health and personal use. Many smartphone PPG studies, including the current study, aim to validate this new technology as an acceptable alternative when more sophisticated devices are not available, rather than using it as a substitute for medical-grade equipment.
Fifth, several traditional PPG quality indicators have been proposed in the literature [51,73]. However, the design of the smartphone is different from traditional medical PPG devices, which usually have a higher sampling rate [74], are designed to reduce motion artifacts, and use transmitted light sources rather than reflected light sources. Besides, most traditional PPG quality indicators were not validated with HRV measures [73]. It is still unclear whether these traditional PPG quality indicators are applicable to smartphone PPG data and HRV assessment.
Sixth, in this exploratory study, we proposed only one model design and did not compare other possible alternatives such as using the Gaussian function family or changing optimization constraints. Future studies may consider conducting an optimal parameter search and finding better model designs for the SPQI.
Finally, the higher the threshold, the higher the accuracy, and the fewer the valid samples ( Figure 5). The balance of the measurement quality and external validity of the research results is an essential requirement that should be considered carefully in future studies.
Sensors 2020, 20, x 13 of 17 Sixth, in this exploratory study, we proposed only one model design and did not compare other possible alternatives such as using the Gaussian function family or changing optimization constraints. Future studies may consider conducting an optimal parameter search and finding better model designs for the SPQI.
Finally, the higher the threshold, the higher the accuracy, and the fewer the valid samples ( Figure 5). The balance of the measurement quality and external validity of the research results is an essential requirement that should be considered carefully in future studies.

Conclusions
Smartphone PPG provides an unprecedented opportunity for both researchers and practitioners in clinical diagnoses, telemedicine, preventative medicine, and public health. Research on smartphone PPG also provides a theoretical foundation for several new research directions such as remote photoplethysmography [75] and PPG-based blood pressure estimation [76]. Although it may not become a substitute for the gold standard ECG, smartphones are easily accessible and reasonably accurate alternatives when medical-grade devices are extremely costly or unavailable (e.g., when dealing with an unexpected large-scale public health crisis, such as the recent coronavirus outbreak [77,78]). It is, however, an unfortunate reality that only a few researchers and ordinary users have used this new technology. The proposed quality index enables users to assess the credibility of the gathered HRV measures, which is essential to win the trust of practitioners or researchers in applied disciplines.
The number of participants in this study (n = 226 participants and 1336 collected samples) was relatively large compared with several other smartphone PPG studies [21,[38][39][40]79]. Therefore, the results from this study provide support, not only for the validity of the proposed SPQI, but also for the general value and practicality of using smartphone PPG in HRV analysis.

Conflicts of Interest:
The authors declare no conflict of interest. Figure 5. Trade-off between the number of valid samples and the agreement (BAR or correlation coefficient) of log HF between the smartphone PPG (Tangent) and reference ECG.

Conclusions
Smartphone PPG provides an unprecedented opportunity for both researchers and practitioners in clinical diagnoses, telemedicine, preventative medicine, and public health. Research on smartphone PPG also provides a theoretical foundation for several new research directions such as remote photoplethysmography [75] and PPG-based blood pressure estimation [76]. Although it may not become a substitute for the gold standard ECG, smartphones are easily accessible and reasonably accurate alternatives when medical-grade devices are extremely costly or unavailable (e.g., when dealing with an unexpected large-scale public health crisis, such as the recent coronavirus outbreak [77,78]). It is, however, an unfortunate reality that only a few researchers and ordinary users have used this new technology. The proposed quality index enables users to assess the credibility of the gathered HRV measures, which is essential to win the trust of practitioners or researchers in applied disciplines.
The number of participants in this study (n = 226 participants and 1336 collected samples) was relatively large compared with several other smartphone PPG studies [21,[38][39][40]79]. Therefore, the results from this study provide support, not only for the validity of the proposed SPQI, but also for the general value and practicality of using smartphone PPG in HRV analysis.