Quantitative Analysis Using Consecutive Time Window for Unobtrusive Atrial Fibrillation Detection Based on Ballistocardiogram Signal

Atrial fibrillation (AF) is the most common clinically significant arrhythmia; therefore, AF detection is crucial. Here, we propose a novel feature extraction method to improve AF detection performance using a ballistocardiogram (BCG), which is a weak vibration signal on the body surface transmitted by the cardiogenic force. In this paper, continuous time windows (CTWs) are added to each BCG segment and recurrence quantification analysis (RQA) features are extracted from each time window. Then, the number of CTWs is discussed and the combined features from multiple time windows are ranked, which finally constitute the CTW–RQA features. As validation, the CTW–RQA features are extracted from 4000 BCG segments of 59 subjects, which are compared with classical time and time-frequency features and up-to-date energy features. The accuracy of the proposed feature is superior, and three types of features are fused to obtain the highest accuracy of 95.63%. To evaluate the importance of the proposed feature, the fusion features are ranked using a chi-square test. CTW–RQA features account for 60% of the first 10 fusion features and 65% of the first 17 fusion features. It follows that the proposed CTW–RQA features effectively supplement the existing BCG features for AF detection.


Introduction
Atrial fibrillation (AF) is a common type of arrhythmia, of which the incidence is increasing year by year [1,2]. There are currently 335 million individuals suffering from AF worldwide, with an overall prevalence rate of 2.9% [3,4]. It is particularly important to study the daily real-time and continuous monitoring and diagnosis of AF. At present, AF is widely detected using electrocardiograms (ECGs), the current gold standard for diagnosis [5][6][7][8]. However, ECG-based AF detection has been restricted by the need to attach electrodes on the body surface. Although wearable devices such as smartwatches provide a convenient way of collecting ECG data for daily monitoring, the principle of ECG detection is based on electrophysiology, which is unable to characterize the dynamic changes of the heart activity [9]. Therefore, unobtrusive methods based on cardiac dynamics have been brought into focus, including ballistocardiogram (BCG), seismocardiogram (SCG), and photoplethysmogram (PPG) methods [10][11][12].
Considering that the waveform components of PPG are less complex than those of SCG and BCG, SCG sensors need to be attached to the chest, so the AF detection method based on BCG is widely studied, which reflects the status of the larger cardiovascular

Methods
The framework of the proposed method is shown in Figure 1.

Signal Acquisition and Preprocessing
BCG is a non-intrusive measurement of the vibration generated from the human heartbeat and arterial aortic blood circulation, which has the same rhythm as an ECG. In this study, BCG signals were recorded by means of the acquisition system that consisted of polyvinylidene fluoride (PVDF) and data acquisition hardware. The PVDF sensor (W × H: 30 cm × 60 cm, piezoelectric constant > 2 × 10 −11 C/N) was placed on the top of a regular bed mattress and located underneath the subjects' thorax with a sampling rate of 125 Hz (see 2.1 of Figure 1). During the recording process, the raw BCG waveform was amplified, and a Butterworth band-pass filter [17,26] was designed with a passband frequency of 0.7 Hz to 10 Hz to remove the high-frequency noise and low-frequency respiratory components and the motion artifacts, which was aimed at achieving a pure BCG signal. The ECG signal was collected by a CT-08S Holter Recorder at a sampling rate of 200 Hz. In order to address the problem of different sampling rates with BCG signals, the ECG signal was down-sampled to 125 Hz based on synchronized time stamps.
In total, 59 volunteers with paroxysmal AF, aged between 27 to 93 years, participated in this study; there were 34 males and 25 females. The BCG signal of each subject was recorded from 8 pm to 8 am in a lying position. Medical experts manually labeled the BCG signal as AF periods and NAF periods, taking synchronized ECG signals as the gold standard. AF periods and NAF periods were split to 24−s BCG segments, and 2000 AF segments and 2000 NAF segments were achieved as the BCG dataset. For AF classification, 80% of the BCG data were applied to train the classifiers, and the remaining 20% were recognized as independent testing data. In order to ensure the fairness of the experiments, all segments were collected from 59 subjects as evenly as possible. Additionally, the training and testing datasets were derived from different subjects to avoid overfitting.

Signal Acquisition and Preprocessing
BCG is a non-intrusive measurement of the vibration generated from the human heartbeat and arterial aortic blood circulation, which has the same rhythm as an ECG. In this study, BCG signals were recorded by means of the acquisition system that consisted of polyvinylidene fluoride (PVDF) and data acquisition hardware. The PVDF sensor (W × H: 30 cm × 60 cm, piezoelectric constant > 2 × 10 −11 C/N) was placed on the top of a regular bed mattress and located underneath the subjects' thorax with a sampling rate of 125 Hz (see 2.1 of Figure 1). During the recording process, the raw BCG waveform was amplified, and a Butterworth band-pass filter [17,26] was designed with a passband frequency of 0.7 Hz to 10 Hz to remove the high-frequency noise and low-frequency respiratory components and the motion artifacts, which was aimed at achieving a pure BCG signal. The ECG signal was collected by a CT-08S Holter Recorder at a sampling rate of 200 Hz. In order to address the problem of different sampling rates with BCG signals, the ECG signal was down-sampled to 125 Hz based on synchronized time stamps.
In total, 59 volunteers with paroxysmal AF, aged between 27 to 93 years, participated in this study; there were 34 males and 25 females. The BCG signal of each subject was recorded from 8 pm to 8 am in a lying position. Medical experts manually labeled the BCG signal as AF periods and NAF periods, taking synchronized ECG signals as the gold standard. AF periods and NAF periods were split to 24-s BCG segments, and 2000 AF segments and 2000 NAF segments were achieved as the BCG dataset. For AF classification, 80% of the BCG data were applied to train the classifiers, and the remaining 20% were recognized as independent testing data. In order to ensure the fairness of the experiments, all segments were collected from 59 subjects as evenly as possible. Additionally, the training and testing datasets were derived from different subjects to avoid overfitting.

RP Reconstruction
Recurrence is a fundamental property of dynamic systems, which can be exploited to characterize the system's behavior in phase space [27]. A powerful tool for their visualization and analysis, called RP, was introduced in the late 1980s by Eckmann, which reveals all the times when the phase space trajectory of the dynamic system visits roughly the same area in the phase space [21]. Cardiac activity is a dynamic system, so RP is used for nonlinear analysis of the BCG rhythm in this study. Figure 2 shows the BCG reconstruction process.

RP Reconstruction
Recurrence is a fundamental property of dynamic systems, which can be exploited to characterize the system's behavior in phase space [27]. A powerful tool for their visualization and analysis, called RP, was introduced in the late 1980s by Eckmann, which reveals all the times when the phase space trajectory of the dynamic system visits roughly the same area in the phase space [21]. Cardiac activity is a dynamic system, so RP is used for nonlinear analysis of the BCG rhythm in this study. Figure 2 shows the BCG reconstruction process. The basis of RP reconstruction is to reconstruct the phase space. Takens theorem states that a phase space can be reconstructed by the system component after selecting the appropriate delay time τ and embedding dimension m for the time series, which contains all of the information of the original time series [28].
Set the BCG time series { 1 , 2 , … , }, and then the m−dimensional BCG reconstructed vector is defined as Equation (1) Where m is the embedding dimension and τ is the delay time, i =1, 2, ...; n−(m−1) τ. In this study, the mutual information method is used to determine the parameter τ, and the pseudo-neighborhood method is used to determine the parameter m, which have been discussed in literature [20].
As shown in Figure 2, the distance between the points and in the reconstructed phase space can be calculated with Equation (2).
The RP is a collection of time pairs at the same position in the phase space trajectory in the two-dimensional time domain as shown in Equations (3) and (4).
Where the , ⋅ is the recursion value, and represents the difference between threshold and . The basis of RP reconstruction is to reconstruct the phase space. Takens theorem states that a phase space can be reconstructed by the system component after selecting the appropriate delay time τ and embedding dimension m for the time series, which contains all of the information of the original time series [28].
Set the BCG time series {u 1 , u 2 , . . . , u n }, and then the m−dimensional BCG reconstructed vector x i is defined as Equation (1).
where m is the embedding dimension and τ is the delay time, i = 1, 2, . . . ; n−(m−1) τ. In this study, the mutual information method is used to determine the parameter τ, and the pseudo-neighborhood method is used to determine the parameter m, which have been discussed in literature [20]. As shown in Figure 2, the distance between the points i and j in the reconstructed phase space can be calculated with Equation (2).
The RP is a collection of time pairs at the same position in the phase space trajectory in the two-dimensional time domain as shown in Equations (3) and (4).
where the R m·ε i i,j is the recursion value, and z ij represents the difference between threshold ε i and S ij .
Equation (4) shows that for the distance between the m-dimensional trajectory of time j and the m-dimensional trajectory of time i, a black point is placed at the coordinates (i, j), otherwise, a white point is placed. A key parameter in the analysis is the threshold ε i , which is chosen as 10% of the maximum phase space diameter based on the theory of the RP [27,[29][30][31]. The thresholded recurrence plots (TRPs) of AF and NAF are drawn in Figure 3a,b. In another case, without setting the threshold, the value of point (i, j) in the image matrix is calculated as the Euclidean distance between point i and point j in the Equation (4) shows that for the distance between the m-dimensional trajectory of time j and the m-dimensional trajectory of time i, a black point is placed at the coordinates (i, j), otherwise, a white point is placed. A key parameter in the analysis is the threshold , which is chosen as 10% of the maximum phase space diameter based on the theory of the RP [27,[29][30][31]. The thresholded recurrence plots (TRPs) of AF and NAF are drawn in Figure 3a,b. In another case, without setting the threshold, the value of point (i, j) in the image matrix is calculated as the Euclidean distance between point i and point j in the phase space. The unthresholded recurrence plots (UTRPs) of AF and NAF are shown in Figure  3c,d.

CTW-RQA Feature Extraction
RQA defines new measures of complexity by using geometrical structures of RP. According to the characteristic of phase space trajectories, recursive graphs contain typical small-scale structures such as single points, diagonals, vertical lines, and horizontal lines, which constitute the large-scale texture structures. RQA quantifies the small-scale and the large-scale texture structures [31]. In this paper, n consecutive time windows are designed to divide each BCG segment, which refine the region of interest for the original signal. Afterwards, k RQA features are extracted in each time window to constitude the CTW-RQA feature. The feature extraction process is shown in Figure 4.
In Figure 4, Wi are the time windows to be analyzed (I = 1, 2, …, n). Wij refers to the jth RQA feature extracted from the ith time window (j = 1, 2, …, k), k is the number of RQA features. Mi represents the feature vector composed of k RQA features in the ith time window, and the CTW-RQA feature of each BCG segment is defined in Equation (5).
In this study, k is selected as 13, and the 13 RQA features are shown in Table 1, which are denoted as RQA1, RQA2, …, RQA13.
In Table 1, RR is the percentage of recurrence points in an RP; DET is the percentage of recurrence points that form vertical lines, where the P(l) is the histogram of the lengths l of the diagonal lines; LAM is the percentage of recurrence points that form vertical lines, where P(v) is the histogram of the lengths v of the vertical lines; RATIO is the ratio between DET and RR; L is the expression of the average length of the diagonal lines; TT is the

CTW-RQA Feature Extraction
RQA defines new measures of complexity by using geometrical structures of RP. According to the characteristic of phase space trajectories, recursive graphs contain typical small-scale structures such as single points, diagonals, vertical lines, and horizontal lines, which constitute the large-scale texture structures. RQA quantifies the small-scale and the large-scale texture structures [31]. In this paper, n consecutive time windows are designed to divide each BCG segment, which refine the region of interest for the original signal. Afterwards, k RQA features are extracted in each time window to constitude the CTW-RQA feature. The feature extraction process is shown in Figure 4.
In Figure 4, Wi are the time windows to be analyzed (i = 1, 2, . . . , n). Wij refers to the jth RQA feature extracted from the ith time window (j = 1, 2, . . . , k), k is the number of RQA features. Mi represents the feature vector composed of k RQA features in the ith time window, and the CTW-RQA feature of each BCG segment is defined in Equation (5).
In this study, k is selected as 13, and the 13 RQA features are shown in Table 1, which are denoted as RQA1, RQA2, . . . , RQA13.
In Table 1, RR is the percentage of recurrence points in an RP; DET is the percentage of recurrence points that form vertical lines, where the P(l) is the histogram of the lengths l of the diagonal lines; LAM is the percentage of recurrence points that form vertical lines, where P(v) is the histogram of the lengths v of the vertical lines; RATIO is the ratio between DET and RR; L is the expression of the average length of the diagonal lines; TT is the percentage of the average length of the vertical lines; L max is the expression of the length of the longest diagonal line; V max is the expression of the length of the longest vertical line; DIV is the reciprocal of L max ; ENTR is the percentage of the Shannon entropy of the probability distribution of the diagonal line lengths P(l); TREND is the percentage decrease of the RP towards its edges; CLUST is the percentage of the ratio of the number of closed percentage of the average length of the vertical lines; is the expression of the length of the longest diagonal line; Vmax is the expression of the length of the longest vertical line; DIV is the reciprocal of ; ENTR is the percentage of the Shannon entropy of the probability distribution of the diagonal line lengths P(l); TREND is the percentage decrease of the RP towards its edges; CLUST is the percentage of the ratio of the number of closed triplets to the number of all triplets; is the percentage of the length of the white vertical line.

Symbols
Measure Definition

Feature Fusion
In order to validate the characterization performance of the proposed features, 17 time and time-frequency features and 16 energy features are extracted in this section. And the three types of features are ranked, and finally the fusion features and the ranked features are derived to improve the accuracy of AF detection.

Time and Time-Frequency Features and Energy Feature Extraction
In the literature [17], 6 time features and 11 time-frequency features were extracted from BCG signals, which were fed into 7 popular ML classifiers to detect AF. The definitions of the 17 time and time-frequency features are shown in Table 2, which are denoted as TTF1, TTF2, . . . , TTF17.
In Table 2, pp 10 (l) is defined as the peak-to-peak amplitude of ten segments that is split from 24-s BCG segments. S[f, t] represents the result of the energy spectral density calculation of data by defining a 5-s window, S(f) represents the result of the time mean of S[f, t], f peak represents the distance between peaks of S[f, t], and w peak [k] represents the distance from the kth peak to the average peak height.
In the literature [19], BCG signals were transformed into BCG energy signals in order to highlight the features of AF and NAF, and four new data sequences representing different characteristics of the BCG energy signals were generated. The mean value, variance, skewness, and kurtosis of the four data sequences were calculated, and 16 energy features were extracted for each BCG segment. Five ML algorithms were used to distinguish AF and NAF. The definitions of the 16 energy features are shown in Table 3, which are denoted as E1, E2, . . . , E16.
In Table 3, P(i) values are sample points that correspond to peaks, where i indicates the ith peak in the segment; T(i) comprises the coordinates of troughs; DA(i) denotes the relative difference of the peak amplitude; RT(i) stores the relative value of the trough amplitude; and BP(i) was defined to denote the number of burrs between P(i) and P(i + 1).

Symbols
Measure Definition Skewness Ratio of pp 10 (l) to the Mean max(pp 10 (l)/mean(pp 10 (l))) TTF6 Standard Deviation of pp 10 (l) std(pp 10 (l)) Kurtosis of Continuous Time Energy Spectral Density

Feature Ranking and Selection
In order to evaluate the features proposed, four types of features are ranked using the Fisher Score, chi-square test, minimum redundancy-maximum relevance (MRMR) and SHapley Additive exPlanations (SHAP) algorithms.
The Fisher Score is defined as the ratio of inter-class variance to intra-class variance [32], which is calculated as Equations (6)- (8). It can be deduced that when the inter-class variance is larger and the intra-class variance is smaller, the Fisher Score is larger. Therefore, higher ranked features are more discriminatory theoretically.
where x (k) represents the value of sample x for the kth feature, m w is defined the intra-class variance of the kth feature in the data set; and J fisher (k) defines the Fisher Score of the kth feature in the data set.
The chi-square test is a common hypothesis testing method based on the χ 2 distribution of the test statistic. Its null hypothesis H0 is that the observed frequency does not differ from the expected frequency. The basic idea of this test is as follows: assume that H0 is established; then, calculate the χ 2 value based on this premise, which represents the degree of deviation between the observed value and the theoretical value. According to the χ 2 distribution and degrees of freedom, the probability p of obtaining the current statistic and more extreme cases can be determined under the condition that the H0 hypothesis holds. The chi-square test checks whether each feature is independent of the label. A small p-value for the test statistic indicates that the corresponding feature is dependent on the label, proving that the feature is important [33]. To amplify the difference between features, importance scores are proposed, as shown in Equation (9).
The MRMR algorithm finds an optimal set of features that are maximally different from each other and that can effectively represent the label variable [34]. The algorithm calculates the mutual information between features and the mutual information between features and labels to quantify redundancy and correlation. The features are ranked according to the criteria of minimizing the redundancy of the feature set and maximizing the correlation between the feature set and the label variable.
The SHAP algorithm can interpret each feature's importance to predictions [35]. The SHAP value has been used to guide feature selection [36], which explains the deviation of the prediction for the query sample from the average prediction. In this algorithm, the model needs to be retrained on all feature subsets S ⊆ F, where F represents all features. It assigns an importance value to each feature that represents the effect on the model prediction when including that feature. To compute this effect, a model f S∪ {i} is trained with that feature present, and another model f S is trained with the feature withheld. Then, a comparison is conducted between the two prediction models with respect to the current input f S∪ {i} x S∪ {i} − f S (x S ) to calculate the SHAP values, where x S represents the input features in the value set S [35]. In this study, each BCG segment is used as x S , and the average of the absolute values of the resulting SHAP values is used to draw a SHAP summary plot and is applied to feature ranking.

Fusion Feature Extraction
Based on the ranking results of the CTW-RQA feature, the first 13 features are selected, which is consistent with the dimensionality of RQA features without CTW. Afterwards  Figure 5 demonstrates the extraction process of the proposed fusion features and ranked features.
lected, which is consistent with the dimensionality of RQA features without CTW. Afterwards, 17 time and time-frequency features, 16 energy features, and 13 ranked CTW-RQA features are combined as 46 fusion features, which provide abundant characterization information. However, there must be irrelevant or redundant information in the fused features. Therefore, 46 fusion features are ranked holistically, and the first 13, 16, and 17 features are selected, which are denoted as the Top 13, Top 16, and Top 17 ranked features. The selected dimensions are consistent with the time and time-frequency features, energy features, and ranked CTW-RQA features. The purpose is to compare the classification performance of ranked features with each type of independent feature in the same dimension. Figure 5 demonstrates the extraction process of the proposed fusion features and ranked features.

AF Detection Based on the RP Diagram
TRP and UTRP are reconstructed from 4000 BCG segments, which are fed into the designed CNN directly. The network mainly consists of eight convolution layers, four pooling layers, four dropout layers, one flatten layer, and one full connection layer and outputs the result of dichotomy, which is successfully performed for AF detection from the BCG signal [20]. The accuracy (ACC), sensitivity (SEN), precision (PRE), and

AF Detection Based on the RP Diagram
TRP and UTRP are reconstructed from 4000 BCG segments, which are fed into the designed CNN directly. The network mainly consists of eight convolution layers, four pooling layers, four dropout layers, one flatten layer, and one full connection layer and outputs the result of dichotomy, which is successfully performed for AF detection from the BCG signal [20]. The accuracy (ACC), sensitivity (SEN), precision (PRE), and specificity (SPE) are calculated to estimate the AF classification performance, which is shown in Table 4. From Table 4, the classification performance of UTRP is superior to TRP. However, the classification accuracy of UTRP and TRP is less than 80%, which cannot keep up with the demand of AF screening in routine life.

AF Detection Based on CTW-RQA Features
Based on Table 1, 13 RQA features are extracted from each 24-s BCG segment, and five ML classifiers are used for training and testing, including K-Nearest Neighbors (KNN), Naive Bayes (NB), Ensemble Learning (ENS), Random Forest (RF), and a Decision Tree (DT) [19,37,38]. The AF detection performance is shown in Table A1 of the Appendix A. As a comparison, CTW-RQA features based on three different time window lengths are extracted, and n is set to 6, 3, and 2. The same ML classifiers are used to detect AF, and the performance is shown in Tables A2-A4 of the Appendix A. From Tables A1-A4, RF is the optimal classifier, so Table 5 illustrates the AF detection performance based on RQA features and CTW-RQA features by means of RF.
As shown in Table 5, the accuracy of the RQA features and CTW-RQA features is higher than that of the CNN based on UTRP in Table 4. When n is selected as 3, the CTW-RQA features obtain the highest classification accuracy of 89.38%, which is superior to the RQA features without CTW.

AF Detection Based on Time and Time-Frequency Features and Energy Features
Based on Table 2, 17 time and time-frequency features (6 time domains and 11 timefrequency domains) are extracted from the 4000 BCG segments in this study. In addition, the 4000 BCG segments are transformed into BCG energy signals to extract 16 energy features based on Table 3. The same five ML algorithms are employed to classify AF and NAF. The classification performance is shown in Tables A5 and A6 of the Appendix A.
From Tables A5 and A6, RF achieves the optimal classification performance. Therefore, RF is applied to evaluate two types of features with CTW. As a comparison, n is also selected as 6, 3, and 2. The AF detection performance is shown in Tables 6 and 7. From Tables 6 and 7, the time and time-frequency features and energy features without CTW exhibit a superior classification performance and are suitable for feature fusion.

AF Detection Based on Fusion Features
From Table 5, n is selected as 3. According to Method Section 2.4.2, the Fisher Score, chi-square test, MRMR, and SHAP algorithms are used for feature ranking. The calculated results of 13 RQA features based on four feature ranking algorithms are shown in Table 8. According to Figure 5, the 39 CTW-RQA features are reduced to 13 CTW-RQA features based on Table 8. Then, 46 fusion features, composed of 13 CTW-RQA features, 17 time and time-frequency features, and 16 energy features, are fed into five ML classifiers to detect AF. The results are shown in Table 9. As shown in Table 9, RF achieves the optimal classification accuracy, which is 95.63%.

AF Detection Based on Ranked Features
From Table 9, the classification performance of the fusion features has been obviously improved. However, excessive feature dimensionality may affect the classification efficiency, so 46 fusion features are ranked by the same four feature ranking algorithms. Figure 6 shows the ranking results of the fusion features by the chi-square test method, and Figure A1 in the Appendix A shows the ranking results of the fusion features by the SHAP method.  As shown in Table 9, RF achieves the optimal classification accuracy, which is 95.63%.

AF Detection Based on Ranked Features
From Table 9, the classification performance of the fusion features has been obviously improved. However, excessive feature dimensionality may affect the classification efficiency, so 46 fusion features are ranked by the same four feature ranking algorithms. Figure 6 shows the ranking results of the fusion features by the chi-square test method, and Figure A1 in the Appendix A shows the ranking results of the fusion features by the SHAP method.  Table  10. According to Method Section 2.4.3, the Top 13, Top 16, and Top 17 ranked features are selected based on the ranking results of fusion features, which are fed into the optimal RF classifier. The AF detection accuracy based on the ranked features is shown in Table 10. From Table 10, the ranked features selected by the chi-square test have the highest classification accuracy. The Top 17 ranked features have a comparable classification performance with the 46 fusion features, which validates the effect of feature selection.

Discussion
In this study, CTW-RQA features are proposed to improve the accuracy of AF detection based on BCG signals. The main contributions of our work are the following: (1) RP is first applied to detect AF by means of BCG signals. TRP and UTRP are fed into the designed CNN to classify AF and NAF. (2) The CTW-RQA features are proposed for the first time to quantify the AF rhythm characteristic, which are compared with the 17 classical time and time-frequency features and 16 up-to-date energy features using five ML classifiers. The AF classification accuracy of the proposed feature is superior to the other existing features using the BCG dataset of this study. (3) The CTW-RQA features are combined with the two types of existing features to constitute the fusion features. Then, the fusion features are sorted and selected to constitute the ranked features. The 46 fusion features achieve the optimal classification accuracy of 95.63%, and the 17 ranked features result in a decrease of only 0.25%. It follows that the feature ranking and selection are necessary, and the proposed CTW-RQA features effectively supplement the existing BCG features for AF detection. Furthermore, the reasons underlying the experimental results are analyzed in detail.

Effects of RP and RQA
From Table 4, the AF detection performance based on TRP and UTRP is compared by means of a CNN, and the results show that UTRP has superior classification accuracy, which is 73.25%. This may be because UTRP contains substantially more information from the signal that generated it than TRP [39]. However, the classification accuracy of both TRP and UTRP could not satisfy the requirement of AF screening. This result may be explained as follows: RP is a sparse image, and non-contiguous areas contain key features, which are difficult to extract by the convolution kernel of the CNN [40]. Therefore, RQA is utilized to emphasize textural features of the RP graph more concretely. From Table 5, 13 RQA features are extracted to input into 5 ML classifiers, and the RF classifier has the optimal performance with an accuracy of 83.50%, which is superior to TRP and UTRP. It follows that RQA features characterize the abnormal rhythm information of BCG signals effectively, which improves the accuracy of AF diagnosis. Therefore, RQA features with the ML classifier are taken as the basis for subsequent feature modifications.

Effect of the Proposed CTW-RQA Features
In Table 5, the AF classification accuracy of the CTW-RQA features with three types of time windows (n = 6, 3, 2) is compared to the accuracy of the RQA features. The CTW-RQA features with 12-s (n = 2) and 8-s (n = 3) time windows exhibit a higher AF detection accuracy than the RQA features without CTW. However, the CTW-RQA features with a 4-s (n = 6) time window exhibit worse performance. It follows that longer time windows are insensitive to small rhythm changes; conversely, shorter time windows may miss occasional rhythm abnormalities. Therefore, the CTW-RQA feature with 8-s time windows (n = 3) achieves the best classification performance.
Similarly, from Tables 6 and 7, the time and time-frequency features and energy features without CTW show superior classification performance. This is since the time features are extracted to quantify the amplitude, location information of wave peaks, and inter-peaks in the BCG waveform, so time window addition may miss the effective information. Additionally, limited by the uncertainty principle [41], it is understood that a shorter CTW used to capture the signal segment leads to a poor resolution to represent the signal in the frequency domain, which in turn negatively affects the performance of AF detection. For the energy features, the BCG segments need to be transformed into BCG energy signals, which describe the change in energy distribution. Therefore, longer segmentation lengths are suitable to emphasize the distribution of energy. In summary, CTW is not added to the time and time-frequency features and energy features during feature fusion.
Compared to the time and time-frequency features and energy features without CTW, the proposed CTW-RQA features exhibit the optimal AF classification performance based on Tables 5-7. This is likely because the two existing features are more dependent on the variability of the waveform, whereas the data used in this study are derived from a larger number of subjects. Therefore, the proposed CTW-RQA features are more suitable for daily AF screening.

Effect of Fusion Features and Ranked Features
From Table 9, fusion features exhibit an optimal classification accuracy of 95.63%, which is improved significantly. This is because more dimensions and more types of features provide more information in general. Nevertheless, there is redundant information in the fusion features [42]. Therefore, feature ranking is applied to reduce the dimensions of features and to improve the efficiency of the model. From Table 10, the features selected by the chi-square test achieve the highest accuracy. This is because this method intuitively quantifies the degree of deviation between features and labels [33] and accurately finds the features strongly correlated with the labels, which is suitable for feature selection in large data sets. Based on the results of the chi-square test, the Top 13, Top 16, and Top 17 ranked features with AF detection accuracy are 95.25%, 95.25%, and 95.38%, respectively, which are comparable to the 46 fusion features. Additionally, with the same feature dimension, the classification accuracy of ranked features is higher than 13 CTW-RQA features, 16 energy features, and 17 time and time-frequency features. The Top 17 ranked features reduce the feature dimensions to improve the efficiency of the algorithm, while the classification result is only 0.25% lower than the 46 fusion features. It reflects the superiority of the ranked features, and the feature selection is necessary and effective.
According to the ranking results based on the chi-square test in Figure 6, there are 6 CTW-RQA features from the first 10 features, and CTW-RQA features account for 65% of the first 17 ranked features, verifying the characterization ability of the CTW-RQA features. Moreover, the proposed features supplement the existing BCG features and effectively improve the performance of AF detection.

Comparison with Existing Methods
For AF detection based on BCG signals, Bruser et al. extracted the time and timefrequency features from 10 subjects, and the best classifier achieved an accuracy of 96% from 856 BCG segments [17]. Yu et al. extracted the time-frequency features from 12 subjects, and the optimized classifier achieved an accuracy of 94.4% from 7816 BCG segments [18]. Wen et al. extracted the energy features from 37 subjects, and the best classifier achieved an accuracy of 94.5% from 2915 BCG segments [19]. In this study, 4000 BCG segments from 59 subjects were collected, which is more diverse. The more diverse range of subjects enabled the proposed method to be more generalizable, which is more relevant to the application environment of AF screening. Additionally, the AF detection accuracy of proposed features is superior to the accuracy of existing features based on the BCG dataset in this study. The optimal classifier achieves an accuracy of 95.63%, which could satisfy the demand of AF diagnosing in routine life. In summary, the proposed features present an advantage in the field of AF detection based on more diverse BCG datasets, which effectively supplement the existing BCG features.
Compared to existing wearable AF detection devices, the superiority of BCG-based AF detection is as follows. Current technologies employed in wearables and evaluated for AF detection are most often based on single-lead ECG or PPG. For ECG-based AF detection, wearable devices include smartwatches/bands [43,44], smartphones [45][46][47], skin patch recorders [45], hand-held devices [46,47], and smart clothing [48]. These monitoring devices are convenient, but they are based on cardiac electrophysiology and lack the analysis of cardio-dynamic changes. For PPG-based AF detection, wearable devices include smartwatches/bands [49][50][51], smartphones [52,53], and cameras [54]. These monitoring devices can provide cardio-dynamic information, but the accuracy is currently affected by motion artefacts, measurement location, skin conditions, ectopic beats, peripheral vascular disease, poor skin contact, and limited battery life [44,55]. In contrast, BCG-based AF detection devices are usually cushions or mattresses [16], which keep individuals unaware of the test and reflect the status of the larger cardiovascular system [13,14].
However, the limitation of BCG-based AF detection is the presence of motion artefacts, thus stationary conditions are always desired. In state-of-the-art research, the BCG dataset is generally collected during sleep, where high signal-to-noise ratio periods cover most of the acquired data. Compared with the techniques using wearable devices, BCG-based AF detection does not introduce an addition burden. For example, some devices demand individuals to place their fingers on the electrodes of a smartphone [46,47]. Moreover, the AF prevalence increases with age, especially for those aged > 65 years, whereas only 4.6% of smartwatch users in the United States are aged > 65 years [56]. Therefore, BCG-based AF detection devices are more suitable for the elderly or bedridden patients.

Conclusions
In this study, we reconstruct BCG signals with the RP method and propose CTW-RQA features to optimize the AF classification performance. The experimental results prove that the proposed features are feasible for AF detection using BCG signals. In the future, we will expand the data volume of BCG signals, including multiple postures and more subjects, to verify the universality of the algorithm. Additionally, the data length for each subject will be increased for personalized analysis for specific subjects. For feature selection, more feature ranking methods, such as the SHAP dependence plot, could be applied to evaluate the importance of the BCG features. The CTW-RQA features and feature selection method in this paper could be extended for the diagnosis of other arrhythmia diseases by means of BCG signals, and the influence of respiratory movement could be considered.

Data Availability Statement:
Acknowledgments: The authors would like to thank Bin Yu for his guidance regarding data acquisition and preprocessing. Thanks are extended to Hangzhou Bobo Technology Co.; Ltd. for providing original BCG data for these experiments.

Conflicts of Interest:
The authors declare no conflict of interest. Appendix A Figure A1. The ranking results of fusion features based on the SHAP summary plot.  Table A4. AF detection performance based on CTW-RQA features (n = 2).