Automated P-wave quality assessment for wearable sensors

Citing this paper Please note that where the full-text provided on King's Research Portal is the Author Accepted Manuscript or Post-Print version this may differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.


Introduction
Continuous electrocardiogram (ECG) monitoring using wearable sensors enables early identification of several types of arrhythmia.However, data acquired by wearable sensors are susceptible to artefacts (e.g., due to poor sensor contact or motion artefact).Hence, a challenge in the use of continuous wearable monitoring, where data are collected without clinical supervision, is ensuring that only high-quality signals are used to derive clinical measurements.Physiological parameters extracted from artefact-corrupted signals may be inaccurate, which can lead to a high frequency of false alarms [1].Therefore, assessment of signal quality is a crucial step for accurate and precise data analysis, such as extracting features, identifying deteriorations, and generating alerts.
Hospital patients recovering from major cardiac surgery are at high risk of paroxysmal atrial fibrillation (AF), an arrhythmia which can be life-threatening.Wearable sensors are routinely used to monitor patients at risk of AF.The sensors typically have 3 or 5 ECG leads which are attached to the chest and an optional wired finger probe and are carried in a bag tied to the patient.The sensors provide ECG, respiratory rate, and arterial blood oxygen saturation monitoring.The data are transmitted in real-time to a central monitoring station, and alarms are sounded when arrhythmias, such as AF, are detected.Hence wearable sensors currently provide early recognition of AF, allow for it to be treated promptly, and help minimize the resulting complications.However, wearable sensors could have greater impact if used to predict AF before its onset, rather than simply detecting it when it occurs.This would allow prophylactic treatments to be administered, potentially preventing the arrhythmia, and consequently reducing the risk of complications, and healthcare costs.
Subtle alterations in P-wave morphology have been found to be predictive of AF [2,3].However, the relatively low amplitude of the P-wave makes it highly susceptible to noise, severely affecting the extraction and quantification of its features.This makes it difficult to distinguish between physiological changes in morphology and changes due to noise.Thus, the clinical utility of techniques for predicting AF could be highly limited by parameters being estimated erroneously from low quality P-waves, causing false alerts.Therefore, it is essential to exclude artefact-corrupted P-waves from analyses.Previously, this has been performed either manually, where expert cardiologists excluded unreliable P-waves by hand [2], or automatically, with a conventional template-comparison method [3,4].In the latter, individual P-waves were discarded if they had a cross-correlation coefficient lower than 0.7 with a template P-wave obtained with an averaging procedure [3,4].
In this study, we built on the previous work by developing an optimized P-wave quality index (PQI) designed to identify high-quality P-waves, from which clinical measurements can be reliably derived.This tool was designed using several P-wave quality assessment features, and its performance was assessed through comparison with manually annotated P-waves from a database of ECG recordings acquired using wearable sensors.

Data Description
The openly available paroxysmal AF prediction database (AFPDB) [5] was used to develop and assess the performance of the PQI.This database was originally assembled to develop techniques for predicting paroxysmal AF.The dataset contains wearable ECG recordings during sinus rhythm, acquired from both healthy controls and patients who subsequently developed AF.All ECG signals have a duration of 30 min, and are excerpts from recordings acquired with a two-channel long-term Holter wearable sensor at a sampling frequency of 128 Hz and with a 12-bit resolution [5].
This study used 44 records (23 from controls and 21 from AF patients) from the AFPDB.After pre-processing with a bandpass filter with cut-off frequencies of 0.5 and 40 Hz, the ECG lead in which P-waves were most visible for each record was chosen, and P-wave quality was manually annotated and double-checked.P-waves were classified into three distinct classes, as illustrated in Figure 1: Class A: High-quality clean P-waves; Class B: Complete noise and absent P-waves.This included P-waves with no resemblance to normal P-wave morphology, either due to motion artefacts, severe baseline wander, muscular activation interference, or simply the absence of a P-wave; Class C: Unreliable, noise-distorted P-waves.This included P-waves that had some resemblance to normal P-wave morphology, but were still unreliable (i.e., their morphology was still excessively affected by noise).The degree of distortion varied from mildly to heavily distorted (see Figure 1).
This resulted in 22 h of recording, corresponding to 97,989 P-waves: 88,606 annotated as class A (90.4%), 5102 as class B (5.2%), and 4281 as class C (4.4%).Even though class C P-waves had some resemblance to normal P-wave morphology, they were still considered unreliable.For that class, the degree of distortion varied from mildly to heavily distorted.

P-Wave Quality Index (PQI) Algorithm
The PQI tool algorithm is depicted in Figure 2. Briefly, the process started with P-wave detection and signal extraction and was followed by two different decision-making stages based on template comparisons.The first decision stage aimed to remove P-waves with no resemblance to the normal P-wave morphology (class B), while the second was more refined, removing P-waves still excessively distorted by noise, and hence unreliable (class C).

Removal of class B P-waves P-wave signal extraction ECG P-wave quality labels
Decision Stage 1

Decision Stage 2
Figure 2. The P-wave quality index (PQI) tool consisted of three steps.In the first, the P-waves were identified in the electrocardiogram (ECG) and extracted.P-wave template morphologies and P-wave features were extracted from these P-waves and were used in two different decision-making stages.This resulted in each P-wave being labelled as either high (class A) or low quality (classes B and C), allowing artefact-corrupted and unreliable P-waves to be excluded from the analysis.

P-Wave Signal Extraction
The first step towards assessing the P-waves' quality was to detect them in the ECG and extract them.This signal extraction step was of high importance, as proper P-wave quality assessment was only possible with accurate P-wave signal extraction.Firstly, P-wave peaks were identified.To do so, each R-peak was identified using the widely used Pan, Hamilton and Tompkins algorithm [6], followed by P-wave peak detection using the phasor transform delineation algorithm [7].Finally, P-wave signals were then extracted by taking a window of width 300 ms, centered on the corresponding P-wave peak.This window length ensured that the whole P-wave plus its surrounding baseline signal were captured.

Decision Stage 1: Removal of Completely Noisy or Absent P-Waves
The aim of the first decision-making stage was to exclude P-waves which were heavily distorted by noise or absent P-waves (class B).This first stage had the additional purpose of removing P-waves that could influence the more refined templates created during the second decision-making stage.
A P-wave template was created for each 30-min recordings as the average shape of the P-waves in that time period.Then, those P-waves were aligned with that template, and several features were extracted to retrieve information about them.These features were then used as candidate features for the PQI to assess P-wave quality.The following features were tested: A decision tree was built using a selection of those features (Figure 3).This model was estimated assuming both classes' probability as equal, allowing one to balance sensitivity and specificity.Furthermore, the decision tree had a maximum of five splits (decisions), to avoid overfitting.These models used features obtained from the P-wave signal and P-wave template comparisons.

Decision Stage 2: Removal of Distorted P-Waves
More refined P-wave templates could be created after removal of completely distorted P-waves.These were then used in a final decision stage with the aim of removing distorted and unreliable P-waves (class C), while being able to accommodate possible P-wave morphological variations.
This second decision stage was similar to the first, but with the single difference that a template was created, to which P-waves were aligned, every 20 P-waves (instead of every 30 min).This aimed to accommodate both physiological variations in the ECG over time and the greater P-wave variability that precedes AF [2,3].Similarly to the first decision stage, a decision tree was built, but this time maximizing the ability to correctly identify class A P-waves (Figure 3).

P-Wave Quality Index (PQI) Performance Assessment
The performance of the PQI was assessed on the 44 records through comparison with the manual annotations.Class A clean P-waves were considered as the positive class, while classes B and C were merged into one unique negative class.In addition, given the higher variability in P-wave morphology present in patients susceptible to AF, a sub-group analysis of the performance in controls and patients who experienced AF after the recording was conducted, and significance was assessed using a two-sample t-test at the 5% significance level.Sensitivity and specificity metrics were used to assess performance, as they are independent of class distributions, and therefore can provide comprehensive assessment of imbalanced learning problems, such as the present one.

P-Wave Quality Index (PQI) Performance Assessment
The PQI was able to identify high-quality P-waves with high sensitivity (93%) and good specificity (82%).Furthermore, no statistically significant difference was found in sensitivity or specificity between the control and AF groups (Table 1).

Discussion
Hospital patients recovering from major cardiac surgery are at risk of AF, which can be life-threatening.Wearable sensors are routinely used for ECG monitoring in the postoperative period and could have greater impact if used to identify the subtle changes in P-wave morphology which are predictive of AF.This would potentially allow AF to be prevented, reducing risks and costs.However, ECG signals acquired by wearable sensors are susceptible to artefact, making it difficult to distinguish between physiological changes in P-wave morphology, and changes due to noise.Hence, the future implementation of techniques that use the P-wave to predict AF will rely on proper P-wave quality assessment.
In this study, we proposed the PQI, a novel and optimised tool for automatically identifying low quality P-waves.Briefly, the algorithm starts by detecting and extracting the P-waves' signal, which is then used in two decision-making stages: the first with the aim of removing highly noisy or absent P-waves, and the second with the aim of removing less distorted, but still unreliable, P-waves.The PQI identified high-quality P-waves with high sensitivity (93%) and good specificity (82%) and performed similarly on healthy subjects and patients susceptible to AF, indicating that it was able to accommodate the P-wave variability that precedes AF [2,3].The high performance of the PQI suggests that it may have utility for identifying high-quality P-waves in wearable sensor data, which could be used to perform unsupervised predictions of AF.
The proposed tool was built and tested using a large dataset containing almost 100,000 manually annotated P-waves across different morphologies, and trialed 10 different P-wave quality assessment features, some of them novel.In addition, the proposed tool was built using simple decision tree models, making it more likely to perform well on novel data.The proposed tool can work on a near real-time basis, an important feature of quality assessment tools for use in continuous monitoring.Despite requiring 30-min signal portions, such time precision is enough to predict and act upon atrial arrhythmias such as AF.
Even though the presented tool exhibited high performance, future assessments of its clinical utility are warranted for validation.For instance, future studies should investigate whether the use of the PQI results in more precise measurements of P-wave features, and whether the PQI can be used to improve unsupervised predictions of AF.In addition, future studies should assess the PQI on independent datasets containing heart rhythms other than sinus rhythm and AF.Finally, the methodology of the PQI might be further improved in future studies.For instance, it is assumed that, during the creation of a P-wave template each 30 min, noise is cancelled out and, therefore, the obtained signal reflects a clean P-wave.This may not be the case during poor electrode contact for long periods of time.This can be safeguarded against in future works with the addition of a template-verification stage where, for example, the obtained template signals could be compared with a Gaussian function.

Conclusions
This paper presented a novel P-wave quality assessment tool, which was able to identify high-quality P-waves with high sensitivity (93%) and good specificity (82%).Measurements of P-wave morphology derived from high-quality P-waves could be used to predict AF using wearable ECG monitoring, potentially improving patient outcomes, and reducing healthcare costs.Further studies assessing the clinical utility of the PQI tool are warranted for validation.

Figure 1 .
Figure 1.P-waves were classified into three different classes: high-quality clean P-waves (class A), unreliable, noise-distorted P-waves (class C), and complete noise or absent P-waves (class B).Even though class C P-waves had some resemblance to normal P-wave morphology, they were still considered unreliable.For that class, the degree of distortion varied from mildly to heavily distorted.

Figure 3 .
Figure 3. Decision trees used in the decision-making stages of the P-wave quality index (PQI) tool.These models used features obtained from the P-wave signal and P-wave template comparisons.

Table 1 .
Performance of the P-wave quality index (PQI) tool on all 44 records, and comparison of performance between healthy controls and patients susceptible to atrial fibrillation (AF).