Description of a Database Containing Wrist PPG Signals Recorded during Physical Exercise with Both Accelerometer and Gyroscope Measures of Motion

: Wearable heart rate sensors such as those found in smartwatches are commonly based upon Photoplethysmography (PPG) which shines a light into the wrist and measures the amount of light reﬂected back. This method works well for stationary subjects, but in exercise situations, PPG signals are heavily corrupted by motion artifacts. The presence of these artifacts necessitates the creation of signal processing algorithms for removing the motion interference and allowing the true heart related information to be extracted from the PPG trace during exercise. Here, we describe a new publicly available database of PPG signals collected during exercise for the creation and validation of signal processing algorithms extracting heart rate and heart rate variability from PPG signals. PPG signals from the wrist are recorded together with chest electrocardiography (ECG) to allow a reference/comparison heart rate to be found, and the temporal alignment between the two signal sets is estimated from the signal timestamps. The new database differs from previously available public databases because it includes wrist PPG recorded during walking, running, easy bike riding and hard bike riding. It also provides estimates of the wrist movement recorded using a 3-axis low-noise accelerometer, a 3-axis wide-range accelerometer, and a 3-axis gyroscope. The inclusion of gyroscopic information allows, for the ﬁrst time, separation of acceleration due to gravity and acceleration due to true motion of the sensor. The hypothesis is that the improved motion information provided could assist in the development of algorithms with better PPG motion artifact removal performance.


Summary
Photoplethysmography (PPG) is a well-known noninvasive method for monitoring the heart.It operates by shining a light into the body and measuring the amount of light that is reflected back, or the amount of light that is transmitted through an appendage such as the finger, both of which vary with the amount of blood flow present [1].Unlike the electrocardiogram (ECG) which places sticky metal electrodes on the chest in order to monitor electrical activity from the heart, PPG monitoring can be performed at peripheral sites on the body and does not need a conductive gel in order to make a good body contact.As a result, PPG sensors are finding substantial new applications in wearable devices and smartwatches, see [2] as just one example, as the preferred modality for everyday heart monitoring by non-specialist users.
However, raw PPG signals are severely corrupted by motion artifacts, Figure 1.These arise from a number of sources, principally a relative movement between the PPG light source/detector and the skin of the user during motion [3].These artifacts obscure the heart related information and historically have limited the use of PPG to relatively motion free clinical situations [4].Very recently, a number of signal processing techniques have been proposed for separating true PPG components from motion artifact components in order to allow PPG-based heart monitoring during physical exercise for the first time.See [3,[5][6][7][8][9][10][11][12] as a small number of example algorithms.An example with no motion present shows clear peaks for each heart beat, here for a participant with a low resting heart rate of 42 beats per minute.Low-frequency baseline wander is seen but no other interference is present.(Bottom) An example taken during running shows many spurious peaks due to motion interference.Note that the two PPG traces are presented in arbitrary units and are not on the same y-scale.
Here, we describe and provide a new dataset of publicly available PPG recordings to help in the design and verification of new PPG signal processing algorithms for the extraction of heart rate during physical exercise.PPG signals recorded from the wrist are provided, together with measurements of the motion of the wrist and an ECG recording from the chest.This ECG recording allows a gold standard calculation of the heart rate to be found, within the bounds of how accurately the signals recorded from the different battery powered sensors with different clock drift rates are aligned in time.(This is discussed fully in Section 4.) Our new database complements the well-known 2015 IEEE Signal Processing Cup database [13] of PPG signals which were collected from 23 subjects during running and arm intensive exercises such as boxing.Our data allows out-of-sample testing of algorithms developed using the existing database, and our database has two further new contributions compared to this and other previously available PPG databases.
Firstly, PPG signals are recorded in four different exercise conditions: • While walking on a treadmill.
• While running on a treadmill.
• While using an exercise bike set to a low resistance (giving high cycling speeds).
• While using an exercise bike set to a high resistance (giving low cycling speeds).
Our database can be used for the development of PPG heart rate algorithms for a wider range of health, sports, and exercise applications, and for the further validation and out-of-sample testing of existing methods.Secondly, we provide multiple measurements of the motion of the wrist during the exercise.To date, most algorithms using PPG data during exercise have been based upon Adaptive Filtering [14], using a reference signal of the recorded motion to subtract the motion interference from the PPG spectrum, allowing the heart rate signal components to be seen.This motion estimate has been based upon using only one 3-axis accelerometer co-located with the PPG sensor.We have recorded an estimate of the PPG motion interferer signal using: • A ±2 g low-noise 3-axis accelerometer.
The use of a gyroscope allows the angular rotation and orientation of the PPG sensor to be captured in addition to acceleration data, for the first time.From this, it is possible to separate acceleration components due to gravity and acceleration components due to true motion of the sensor [15].(The mathematics required for using the gyroscope data to remove the effects of gravity from the accelerometer data are given in the Appendix here.)Examples of such processing are given in [16][17][18] as is commonly done for activity tracking, and using such approaches allows a better estimate of the true motion present compared to using accelerometer information alone.This could be used to develop bespoke gyroscope driven algorithms for removing motion interference from PPG signals, or could potentially be used with existing algorithms, providing them with a more accurate motion estimation input (as illustrated in the Appendix).To our knowledge, due to a lack of data sets that contain both accelerometer and gyroscope data, there are no current algorithms using gyroscope information to help remove motion interference from PPG signals and the ability to develop these this is a major contribution of the new dataset.
The use of multiple accelerometers is helpful for when working with a wide range of exercise situations, allowing both large accelerations and low-noise measurements to be captured simultaneously.The ±2 g accelerometer has a lower noise floor than the ±16 g accelerometer and so can accurately measure smaller motion components without them being corrupted by noise.In turn, the ±16 g allows motion to still be measured in cases where the ±2 g accelerometer saturates.
This article provides an overview of the new PPG database which will be of use for the development of new heart rate and heart rate variability algorithms using ECG and PPG data during exercise.Section 2 describes how the data is stored and accessed.Section 3 describes the experimental methodology and data collection procedure.Finally, Section 4 gives important notes on how the data is optimally used, particularly with regards to the synchronization between the PPG and motion data and the reference ECG signal.

Data Description
The database consists of multiple data records, one per participant and exercise activity, as described in Section 3. Within each data record, thirteen signals are present and the given name, unit, and description are defined in Table 1.The data is stored in Physionet WaveForm DataBase (WFDB) format [19] such that each record consists of two files, one with a .heaextension and one with a .datextension.The .hea file provides header information, including the signal names and units as given in Table 1, and parameters required to load the data values which are stored in the associated .datfile.These values are automatically handled by the Physionet toolkit software as discussed below.The first line in the .heafile also contains the sampling frequency (256 Hz for all signals) and the number of samples in the record.
A large number of tools are available for loading and processing WFDB format files.We recommend using the functions provided by the Physionet Toolkit [19] (https://physionet.org/physiotools/) which are available in C, C++, MATLAB, Python and other languages.The signals can be loaded using the rdsamp function.In MATLAB this is simply: where tm gives the sample number or time in the record, and signals is an M × 13 matrix with one column for each signal with M time samples.For example, after running the rdsamp command, the ECG trace can be plotted as: plot(signals(:,1)) filename is the name of the data file to load.These are given in the format sX_activity where X is a unique identification number of each participant and activity is a description of the exercise being performed in the data record.
An example of the signals collected during walking is shown in Figure 2.An example of the signals collected during low-resistance biking is shown in Figure 3.If desired, users can generate further example plots by running the commands given above.Seventy seconds of data from record s6_low_resistance_bike. Zoomed-in time domain information for the ECG and PPG traces between times 10 s and 12 s is also shown.

Methods
Measurements were taken using an ECG unit placed on the chest together with a PPG and an Inertial Measurement Unit placed on the left wrist while participants used an indoor treadmill and exercise bike.
Single-channel, two-electrode, ECG recordings were taken using an Actiwave (CamNtech, Cambridge, UK) recorder [20] and pre-gelled self-adhesive Silver-Silver Chloride (Ag/AgCl) electrodes, as are standard for ECG monitoring.These were placed on the upper chest with one electrode on either side of the heart.The 4 mm snap connector electrodes were connected to the 1 mm non-touchproof connector on the Actiwave using 15 mm converter cables provided by CamNtech.Movement of these cables introduces artifacts into the recorded ECG trace, essentially standard ECG cable artifacts [21].To minimize such movements, the cables were taped to the skin using a micro-porous surgical tape.A typical set-up for the ECG unit is shown in Figure 4. R peaks in this ECG trace were identified by hand and these times are included in the database to allow a gold standard reference heart rate comparison.These R peak times are referenced assuming the first sample in the ECG trace occurs at time 0 s.PPG and motion data were recorded using a Shimmer 3 GSR+ unit (Shimmer Sensing, Dublin, Ireland) [22].This contains a gyroscope, a low-noise accelerometer and a wide-range accelerometer integrated into a single package.(Integrated magnetometer and pressure sensors are also present but were not used.)A reflective mode PPG sensor with a 510 nm green LED was connected to the main Shimmer unit using the 3.5 mm headphone port.This PPG sensor was then glued to the main Shimmer unit as shown in Figure 5 (top) in order to give a rigid connection and allow the movement sensors inside the main Shimmer unit to accurately record the movement of the PPG sensor.The combined unit was then placed on the left wrist as shown in Figure 5 (bottom), in approximately the position of a standard watch.Care was taken to ensure that the PPG light source was pointing into the skin with minimal light leakage between the sensor surface and the skin which would also let ambient light into the PPG light detector.
Participants were then asked to perform one or more different types of exercise.Four options were available: walking on a treadmill at a normal pace for up to 10 min; light jog/run on a treadmill, at a pace set by the participant, for up to 10 min; pedal on an exercise bike set at a low resistance for up to 10 min; pedal on an exercise bike set at a higher resistance for up to 10 min.The objective was to introduce a range of representative motion artifacts into the collected heart signals, not to carry out a set exercise routine.As such, each participant was free to set the pace of the treadmill and pedal rate on the bike so they were comfortable and also to change these settings or stop the exercise at any time.Most participants spent between 4 and 6 min on each activity, and the duration of each data record is given in Table 2.All signals were sampled at 256 Hz and the start and stop time of each activity recorded.Records from eight participants are present (three male, five female), aged 22-32 (mean 26.5).In all cases, the subject was starting from rest and so in each record the heart rate should begin at a low resting value, and then increase during the activity.For participants where data records are present for more than one type of exercise, these were done as a single recording with a break of at least 10 min present between each activity.The data was then segmented offline into the portions corresponding to different activities.(See notes in Section 4.) All procedures were approved by the University of Manchester Research Ethics Committee and written consent obtained from all participants.This written consent included the option to not have the recorded data publicly shared, and the database only contains signals from participants who agreed to data sharing.
For the walking and running records, the database contains the raw PPG and motion signals present after segmentation into the appropriate activity.No filtering is applied, beyond that built in to the Shimmer hardware.For the cycling records, large amounts of high-frequency noise were present in the PPG traces.Prior to conversion to WFDB format, the cycling PPG traces were low-pass filtered using a second order IIR Butterworth digital filter with 15 Hz cut-off and zero group delay with the MATLAB filtfilt command.All ECG records have a 50 Hz notch filter applied as part of the Actiwave control software to remove mains interference.

Usage Notes
The data is freely available from the Physionet website, https://physionet.org/works/WristPPGduringexercise/, with the dataset name Wrist PPG during exercise.Nineteen records are available in total, with a mixture from the same person doing different activities and from different people doing different activities.This allows a standard leave-one-out cross validation approach to be used when finding the parameters of developed heart rate detection algorithms, where all but one of the records are used as training sets to search potential parameter values, and the last record used to assess performance.All possible combinations of test records are then used and such approaches have been shown to get very good generalisability [23].
For using the data, it is important to note that the Shimmer device was connected to the wrist using a continuously adjustable strap, rather than a watch type band with discrete notches.Both connection methods are available in commercial wrist PPG devices, see for example the Scosche Rhythm+ [24], however our connection method may not be the same as that used in the IEEE Signal Processing Cup database data [13].Care should be taken when directly comparing the two signal sets as any strap related artifacts may manifest differently.Our new data set should be seen as a stand alone complement, now also providing signals during biking and from gyroscopes.
Beyond this, the principal challenge in collecting and using the data is the synchronization between the different signals collected by different devices.Signals 2-12, that is everything apart from the ECG trace and R peak times, were collected using a single Shimmer 3 device and so all of the signals are sampled simultaneously and are intrinsically aligned.The ECG trace is collected using a separate Actiwave device and so this trace must be aligned in time to the Shimmer data in order to use it as a comparison case.The Actiwave does not provide precise timing information for every sample taken.It natively stores its data in EDF format [25] which only provides the time of the first sample.After this, it is assumed that samples are taken correctly, on average, every 39 ms (corresponding to a 256 Hz sampling rate).In contrast, the Shimmer device records the time at which each sample is taken in Unix time format and although set to a 256 Hz sampling frequency, these samples are not uniformly spaced at 39 ms as would be ideal.Figure 6 illustrates an example of the actual effective sampling rate, which is on average very close to 256 Hz but has a slight variance around this over time.Across all records, the average actual sampling rate of the Shimmer device is 255.69Hz.In any one experiment run, the signals collected by the Shimmer device therefore consistently have fewer samples than those provided by the Actiwave.This error will accumulate through the record, being worse at the end.
The final database files only have data from during exercise periods, the data from set-up periods and rest periods is not included.The recorded start and stop times of each activity were used to extract the appropriate segments of data.Actiwave data is extracted taking the 256 Hz sampling rate and finding the samples between the start and stop times.For example, if the ECG recording started at 10:40:00, and the walking activity started at 10:42:12 and finished at 10:46:48, the ECG samples in this segment are extracted from the full trace as ECG segment = ECG raw trace(132×256:408×256) where 132 is the number of seconds between the start of the ECG recording and the start of the activity and 408 is the number of seconds between the start of the ECG recording and the end of the activity.In contrast, the Shimmer data is extracted by finding the sample numbers corresponding to the start and stop time stamps provided for each sample.For example, for the PPG trace: PPG segment = PPG raw trace(find(sample with time stamp(10:42:12)) : find(sample with time stamp(10:46:48)) As the Shimmer device has an average sampling rate slightly below 256 Hz, this process results in signals 2-12 having fewer samples than the ECG signal from the Actiwave.To store the data in WFDB format, where all traces must have the same number of samples, NaNs have been added in to the end of each signal to equalize the length.
The differences in the durations of the Actiwave and Shimmer records are given in Table 3.In the worst case, a 1.21 s difference is present.Within any one record, we do not believe that a mis-alignment of a maximum 1.21 s, and often substantially less, should lead to a substantial error when comparing the reported heart rates.Most current PPG heart rate algorithms operate on data in overlapping 8 second windows giving a temporal blurring of the information and removing the need for an extremely precise alignment between the PPG and ECG signals.For handling this timing mis-alignment, we suggest that the simplest procedure is simply to ignore it and treat the signals as if they were the same duration with the same number of samples.Nevertheless, if more precise alignment of the signals is required, signal 12 in each record gives the time in seconds of each sample taken by the Shimmer device.This wraps from 0 to 60, and is the absolute time of the sample but with the day, hour, and minute information removed.If desired, this timing information could be used to resample one of the time series, either to downsample the ECG so it has samples at the same times as the Shimmer sample points, or to upsample the Shimmer data so that it has the same number of samples as the ECG trace.
Finally, our segmentation procedure takes the Actiwave ECG data as the baseline and assumes that it is correctly sampled at 256 Hz after the given start time.Multiple activities done by the same person were done sequentially in a fixed order: walking, running, low-resistance biking, high-resistance biking.Thus, if the Actiwave clock drifts significantly from 256 Hz, it is possible that the reported synchronization errors will be larger than those reported in Table 3 for the recordings that were taken later as the drift in the Shimmer clock will have been accounted for, but any drift in the Actiwave clock has not been accounted for.As no timing information is provided in the Actiwave output, it is not possible to check this issue further and provide a quantified estimate of this effect.As far as possible, given the PPG artifacts present during motion, we have checked by eye that the PPG and ECG traces in these later activities are correctly aligned, and we do not believe this to be a significant issue.We mention it here for completeness, and to help with the understanding of the generation of the database and its limitations when interpreting results based upon it.

Figure 1 .
Figure 1.Examples of wrist photoplethysmography (PPG) signals.(Top) An example with no motionpresent shows clear peaks for each heart beat, here for a participant with a low resting heart rate of 42 beats per minute.Low-frequency baseline wander is seen but no other interference is present.(Bottom) An example taken during running shows many spurious peaks due to motion interference.Note that the two PPG traces are presented in arbitrary units and are not on the same y-scale.

Figure 2 .Figure 3 .
Figure 2. Examples of the 12 collected signals collected during walking.ECG R peak times are also included in the database.Seventy seconds of data from record s1_walk.Zoomed-in time domain information for the ECG and PPG traces between times 67 s and 69 s is also shown.

Figure 5 .
Figure 5. Shimmer 3 unit used for recording the PPG, low-noise acceleration, wide-range acceleration, and gyroscope data.(Top) The PPG sensor is glued to the main Shimmer unit to give a rigid connection and allow the motion recorded by the main Shimmer unit to to accurately record the movement of the PPG sensor.(Bottom) Placement of the PPG sensor on the left wrist in approximately the position of a standard watch.

Table 1 .
Name, unit, and description of each signal in every file of the database.

Table 2 .
Duration of each data record to the nearest second/MM:SS."-" indicates that the database does not contain this activity for this participant.
Instaneous sampling rate of the Shimmer found from the sample time stamps provided.The average rate is very close to 256 Hz.Low values, <50 Hz occur approximately once per minute.

Table 3 .
Difference between the number of samples provided by the Actiwave (ECG signal) and Shimmer (all other signals) in seconds.In all cases, the Actiwave has more samples and the Shimmer data is padded with NaNs at the end of the record to equalize the lengths.