A Convolutional Neural Network Framework for Sleep Apnea Detection via Ballistocardiography Signals

Di Sivo, Domenico; Errico, Palma; Fusco, Pietro; Venticinque, Salvatore

doi:10.3390/app16073314

Open AccessArticle

A Convolutional Neural Network Framework for Sleep Apnea Detection via Ballistocardiography Signals

¹

Department of Engineering, Campus Bio-Medico University, Via Álvaro del Portillo, 21, 00128 Roma, Italy

²

Department of Engineering, University of Campania “Luigi Vanvitelli”, 81031 Aversa, Italy

³

University of Salerno, 84081 Baronissi, Italy

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(7), 3314; https://doi.org/10.3390/app16073314

Submission received: 16 January 2026 / Revised: 20 March 2026 / Accepted: 26 March 2026 / Published: 29 March 2026

(This article belongs to the Special Issue Research and Applications of Artificial Neural Network)

Download

Browse Figures

Versions Notes

Featured Application

One specific application of this work is sleep monitoring using non-invasive, low-cost devices, particularly for measuring the ballistocardiogram, aimed at detecting sleep disorders for non-medical, well-being-oriented monitoring.

Abstract

The clinical diagnosis of sleep apnea conventionally necessitates resource-intensive Polysomnography (PSG). We propose a weakly supervised framework to detect apnea using non-invasive Ballistocardiography (BCG), thereby addressing the critical scarcity of labeled BCG data. Instead of manual annotation, our pipeline transfers knowledge from a synchronized ECG signal, using it as a “teacher” to generate pseudo-labels for the BCG model. We formulated a User-Defined Function (UDF) that combines Heart Rate Variability and ECG-Derived Respiration to autonomously label the BCG windows. These pseudo-labels were subsequently employed to train a 1D Convolutional Neural Network. Testing on a public dataset, the CNN model achieved 71.8% accuracy against the pseudo-labels. When projected against the clinical ground truth, we estimate a true accuracy of 77.7%. These results validate that ECG-based supervision can effectively train low-cost home sensors without the bottleneck of manual medical annotation.

Keywords:

artificial intelligence; machine learning; ballistocardiography; sleep disorder; electrocardiography; convolutional neural network

1. Introduction

Sleep disorders represent an increasingly significant public health concern, with substantial impacts on both physical and mental well-being. Accurate diagnosis generally requires overnight monitoring in specialized facilities through PSG, which, although regarded as the gold standard, is resource-intensive, invasive, and often uncomfortable for patients. The rising demand for sleep assessments has placed significant pressure on healthcare infrastructure and healthcare systems, particularly in hospital settings, where bed availability and staffing resources are limited.

In contrast, cost-effective, off-the-shelf devices can be comfortably worn or installed at home, enabling the collection of large volumes of data suitable for the remote monitoring of users’ health conditions. The fundamental challenge resides in extracting meaningful information from these data while addressing critical aspects such as detection accuracy, data availability and privacy for model training, computational performance, and overall cost.

This study presents a preliminary investigation into the use of a non-invasive BCG device for detecting sleep disorders. BCG technology enables contactless monitoring of cardiac and respiratory activity by capturing mechanical vibrations transmitted through the body, thus offering a promising alternative for both continuous and event-based sleep monitoring. The primary objective is to evaluate the feasibility of employing such a device for home-based sleep monitoring, which could substantially reduce the workload of clinical facilities by facilitating remote assessment and early diagnosis. This approach not only enhances patient comfort, by obviating the necessity for repeated hospital visits, but also contributes to more efficient resource allocation within healthcare institutions.

The proposed approach leverages artificial intelligence (AI) techniques with the aim of enabling, in the future, the transfer of knowledge from high-end medical devices, capable of extracting detailed physiological information, to models trained on data collected from low-cost, off-the-shelf sensors. Rather than focusing on the absolute diagnostic accuracy of medical-grade equipment, this study emphasizes the evaluation of general indicators of patient well-being.

To validate the proposed methodology, we conducted a real-world case study on apnea detection using BCG signals, training a deep learning algorithm on data automatically labeled. Preliminary experiments were performed on the dataset introduced in [1], extending the methodology with automatic data labeling and proposing novel deep learning architectures. Automatic labeling strategies were evaluated and compared against manual annotations to assess the generalization capability and robustness of the proposed framework.

Section 2 discusses related studies, while Section 3 outlines the overall methodology. The open dataset and the BCG labeling process are described in Section 4. Section 5 presents the technique for the automatic detection of apnea events from BCG signals. Data windowing and the neural network architecture are described in Section 6. Experimental results are discussed in Section 7. Section 8 analyzes the performance. Finally, conclusions are drawn in Section 9.

2. Related Work

A good night’s sleep is essential for maintaining psychophysical well-being. Lack of sleep or poor-quality sleep can contribute to the development of chronic conditions, including diabetes, cardiovascular disorders, kidney problems, anxiety, and depression [2]. The technological landscape of sleep monitoring encompasses a wide range of approaches and devices. Yin et al. [3] presented a comprehensive overview of long-term sleep monitoring technologies, categorizing physiological signals into three main types: bioelectrical, biomechanical, and biochemical.

PSG [4] remains the definitive gold standard for the clinical diagnosis of sleep disorders. It involves the use of a traditional device called a polysomnograph, which allows for the simultaneous monitoring of various physiological parameters, including brain activity (EEG), eye movements (EOG), muscle tone (EMG), heart rate (ECG), respiration, oxygen saturation, and body position. This device provides a comprehensive assessment of the patient’s neurological and cardiovascular status. Despite its diagnostic accuracy, PSG has notable limitations when used over extended periods or in home environments. The need to apply multiple electrodes can be uncomfortable for patients, which may adversely impact sleep quality and make home adoption challenging. These drawbacks have driven research toward alternative, less invasive solutions that are more suitable for daily and long-term monitoring.

In recent years, numerous studies have explored alternative technologies for sleep monitoring that do not require the use of PSG, proposing systems based on wearable sensors, piezoelectric sensors [5], multimodal sensors [6], and pulsed ultra-wideband (IR-UWB) radar technology [7]. In this context, ref. [8] introduced a contactless bed sensor for sleep apnea detection, providing a comparative study that validates its effectiveness against standard methods. The BCG is a non-invasive diagnostic technique that measures low-frequency oscillations generated by cardiac activity, enabling the analysis of events related to the cardiac cycle, such as the duration and intensity of heartbeats. This method is based on Newtonian mechanics and captures the mechanical forces produced by blood flow dynamics and myocardial contractions. Modern BCG has proven especially valuable for the early detection of cardiac anomalies, often identifying functional alterations even before obvious clinical symptoms appear. The technique relies on the analysis of characteristic BCG waves (G, H, I, J, K), each corresponding to specific events in the cardiac cycle, enabling precise differentiation between systolic and diastolic phases. This temporal correlation makes BCG a particularly useful tool for studying cardiovascular dynamics.

The article [9] reviews several technological solutions for acquiring the BCG signal. Several studies have investigated the potential of BCG as a non-invasive alternative to electrocardiography (ECG), demonstrating significant potential. One of the most notable studies in this area [10] compared heart rate measurements derived from the J wave of the BCG signal with those obtained from ECG, both recorded simultaneously using a BIOPAC system. Signal processing of the BCG data, performed using wavelet transform techniques, achieved an accuracy of 93%, demonstrating that BCG can effectively rival ECG for heart rate monitoring. At a more advanced level, Morokuma et al. [11] developed a deep-learning-based system capable of reconstructing ECG signals from BCG data. Leveraging a bidirectional LSTM neural network, the model estimated R–R intervals with an average error of only 0.034 s, paving the way for long-term, electrode-free cardiac monitoring, particularly useful during sleep. There are several case studies dealing with sleep monitoring using BCG, with particular attention to situations where PSG is too invasive or impractical. The study by [12] focused on children with severe autism, proposing an instrumental system installed in the bed capable of continuously monitoring physiological parameters during the night.

The main goal is to provide an objective and non-invasive assessment of sleep quality, minimizing discomfort in patients with complex neurological disabilities, for whom traditional devices such as ECG or PSG are not compatible with their behavioral needs. In the work of [13], an innovative approach for the detection of heart failure (HF) is presented based on the integration of BCG and respiratory signals analyzed through machine learning techniques. This system, designed for home monitoring, allows an early and non-invasive diagnosis, representing an economical and practical alternative to hospital investigations. Another case study is presented in [14], where a real-time cardiac monitoring system is proposed, capable of detecting abnormal heartbeats and generating automatic alerts.

The study presented in [1] proposes an automatic system for detecting breathing disorders during sleep, based on the analysis of BCG signals using a convolutional neural network (CNN). The approach is entirely non-invasive, thanks to the use of BCG sensors placed under the mattress, and it is independent of the sleeper’s body position. The paper begins by presenting the classical method for identifying the R-peak in the ECG signal, a crucial step for the synchronization and analysis of cardiac signals. It then introduces a formulation based on Cartan curvatures, which is used to describe and model the geometric characteristics of the BCG signal. This representation provides meaningful features for training the model and for the automatic detection of abnormal breathing events.

Extending and refining the methodology defined in [1], we introduce the automatic identification and labeling of abnormal breathing events in the high-resolution BCG signal to overcome the limitation of the proposed approach at a large scale, due to the excessive effort related to manual annotation of the ECG trace. Moreover, the integrated utilization of automatic unsupervised training and federated learning techniques allows for a continuous improvement of the model. Also, the deep learning model is different with respect to the one proposed in [1], because it does not take the transformation of the signals into an image as input, but works directly with the original values.

In [15], the use of low-cost off-the-shelf devices for training AI models on BCG data was investigated. In [16], the effectiveness of federated learning, compared with a centralized approach, was evaluated for detecting breathing anomalies during sleep on ECG signals.

This paper focuses on assessing the effectiveness of our AI model applied to BCG signals, evaluating its performance using a centralized approach on a public dataset. The ultimate aim is to enable the replacement of high-end medical devices with low-cost devices.

3. Methodology

PSG currently represents the gold standard for the recognition and annotation of apnea events during sleep. In this work, we aim to simplify the monitoring process both in the hospital setting, by detecting apnea events using only the ECG signal while simultaneously recording the BCG signal, and, more significantly, in the home environment, where monitoring is performed exclusively using BCG. This paradigm is equivalent to developing a robust model within a highly equipped laboratory (the hospital) and subsequently deploying it for everyday use in real-world environments (at home), ensuring that the knowledge acquired under clinical conditions remains effective and generalizable. The following section describes the complete methodological workflow, as illustrated in Figure 1.

The process begins within the hospital environment, where sleep-related physiological signals are recorded using high-precision medical devices, specifically an ECG. The ECG provides a clinically validated reference for detecting apnea events and serves as the gold-standard supervisory signal for subsequent analyses. Simultaneously, patients are equipped with low-cost devices, such as a BCG belt, capable of capturing indirect signals associated with cardiac and respiratory activity. Through synchronized monitoring, the data collected by the low-cost device can be temporally aligned with the ECG recordings, which are one of the signals recorded in the PSG, along with corresponding clinical annotations. This alignment enables the automatic labeling of low-cost BCG signals using the ECG as a supervisory reference to generate pseudo-labels, allowing the weakly supervised training of non-invasive and easily accessible monitoring systems. Subsequently, an artificial intelligence model is trained via supervised learning to recognize apnea-relevant patterns, using clinically validated annotations from the high-precision medical device as the gold standard.

4. BCG Analysis and Labeling

Automatic identification of obstructive sleep apnea (OSA) [17] refers to a disorder characterized by repeated interruptions in breathing during sleep. Traditional diagnosis, based on PSG, is accurate but expensive and poorly suited for home monitoring. This motivates the investigation of transferring knowledge from ECG signals (widely recognized for their effectiveness in detecting apnea-related cardiac alterations) to BCG signals, which are less informative but more practical for home-monitoring applications. The objective is to develop a predictive model based solely on BCG data. This approach aims to enable the replacement of ECG with BCG for home-based apnea detection while maintaining a high level of diagnostic accuracy, even in the absence of direct cardiac electrical measurements. It is important to note that the dataset used in this study was collected during controlled breath-holding sessions rather than overnight PSG-based OSA assessments.

4.1. Public Dataset

The dataset used in this study [18] was published in 2020 and made available via the Mendeley Data Repository (https://data.mendeley.com/datasets/9fmfn6kfn7/1 (accessed on 25 March 2026)). It was developed as part of a research project of the Faculty of Informatics and Management of the University of Hradec Králové, as well as the PERSONMED project - Center for the Development of Personalized Medicine in Age-Related Diseases.

The dataset is particularly suitable for our case study for the following reasons.

Synchronized ECG and BCG acquisition: The dataset contains simultaneous recordings of 12 BCG signals (acquired via a bedside platform with strain gauge sensors) and a reference ECG signal, sampled at 1 kHz with a 24-bit converter. This synchronization is essential for studying correlations between the two signals.
Presence of controlled breath-holding events: Participants performed voluntary breath-holding sessions, each lasting approximately 30 s. These events were manually annotated by experts, providing reliable time labels for training and validating classification models. Table 1 presents one of two experimental protocols, V1 or V2, which correspond to different acquisition procedures. A detailed description of the events performed by the volunteers, including the exact duration (in seconds) of each event, is provided in that table.
Well-organized data structure: Each measurement is represented as a temporal matrix with 14 columns—a validity flag, the 12 BCG signals, and the ECG signal. This format facilitates processing and temporal alignment between signals.

4.2. Labeling Procedure

The labeling procedure was carefully designed to produce high-quality annotations for training a CNN to analyze multi-channel BCG signals available in the selected dataset. A moving window was used to extract training samples from the original data. Each sample corresponds to 30 s (30,000 points) of the recorded signals, including 12 synchronized BCG signals recorded simultaneously at a sampling rate of 1 kHz from various sensor positions or orientations, as detailed in [18]. No preliminary signal processing was executed to remove noise or artifacts before labeling, ensuring that annotations reflect the raw characteristics of the signals. For each sample, the target label is computed by processing all twelve BCG synchronized signals in the same window.

In Figure 2, the top panel depicts one of the twelve BCG signals from the original dataset. A segment of the BCG signal, ranging from sample 300,000 to 480,000, was replaced with zeros to mitigate the influence of significant motion artifacts caused by subject movement during recording without affecting the feasibility of the approach; in fact, such an interval can be easily identified by the BCG device. This zeroing procedure was applied only to the signals belonging to the V1 subset of the dataset. This segment will therefore remain unlabeled. Two overlapping instances of the moving window are depicted with solid and dashed red lines.

The bottom panel of Figure 2 illustrates the User-Defined Function (UDF) employed in the labeling process (blue line), as defined in Equation (1). The value of

μ

is set to align each UDF with the right boundary of the apnea intervals defined in the original study. Meanwhile,

σ

is calculated to normalize the area under the curve to unity.

f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{1}{2} {(\frac{x - μ}{σ})}^{2}}

(1)

Moreover, the UDF function is constructed such that a 30 s sliding window, centered on the peaks of the UDF itself, yields an integral value close to one. The synchronization between the BCG and the UDF adheres to the annotation protocol described in [1], which maps specific ECG segments to apnea-related events. The integral of the UDF function, i.e., the area under the curve, measures the probability of finding an apnea event in the window. Taking into account that the American Academy of Sleep Medicine (AASM) [19] defines apnea as a complete or near-complete (≥90%) cessation of airflow lasting at least 10 s in adults, the UDF starts to grow after 10 s of breathing pause and increases until the subject restarts breathing. In fact, the maximum impact in terms of ECG anomaly is expected to be observed right before and after the end of the apnea interval.

Labeling was performed by integrating the UDF function over the sliding windows aligned with the BCG signal. Each resulting integration value was then used to label the corresponding 30 s BCG segment (30,000 points) across all 12 channels, based on the range, or basket, in which the integration value fell.

For example, consider a binary classification problem with two possible labels, 0 and 1. In this case, the integration result of the UDF function can fall into one of two intervals:

[0, 0.5) \cup [0.5, 1]

. Each sample is assigned a label according to the interval in which its UDF integration value lies. This method can be seamlessly extended to support both binary and multiclass labeling of BCG data.

Figure 3 provides a detailed view of the labeling process illustrated in Figure 2. Each labeled window corresponds to a fixed 30 s segment of BCG signals. By employing a sliding-window approach with adjustable stride parameters, the number of labeled training samples can be increased. This procedure produces a well-structured set of input–output pairs suitable for supervised learning, allowing the CNN to effectively capture and model the spatio-temporal features inherent in multi-channel BCG signals.

5. Automatic Apnea Detection

To enable unsupervised learning, the system must be able to automatically detect apnea episodes from BCG signals. To this end, we first conducted a preliminary analysis on ECG signals (the most representative of cardiac activity), which serve as a reference. ECG analysis allows us to visually determine the correct placement of the maximum of the unsupervised discriminant function, which is subsequently used to generate automatic labels for training the network on BCG data.

During apnea, the temporary interruption of respiration triggers distinct autonomic and cardiac perturbations. To quantify these changes, we introduce three ECG-derived metrics capturing such physiological variations.

Heart Rate Variation (DHR) quantifies instantaneous changes in heart rate using RR intervals. A standard metric is the standard deviation of RR intervals (SDNN):

$H R V_{S D N N} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(R R_{i} - \bar{R R})}^{2}}$

(2)

where $R R_{i}$ is the i-th RR interval and $\bar{R R}$ its mean value. Heart-rate variation can also be expressed as:

$D H R (n) = | H R (n + 1) - H R (n) |$

(3)

A decrease in HRV and DHR reflects autonomic imbalance and is often associated with apnea [20,21].
LF/HF ratio quantifies autonomic balance through the power spectral density of RR intervals:

$LF / HF ratio = \frac{P_{L F}}{P_{H F}}$

(4)

where $P_{L F}$ and $P_{H F}$ are the spectral powers of the low-frequency (0.04–0.15 Hz) and high-frequency (0.15–0.4 Hz) bands, respectively. An elevated LF/HF ratio serves as a marker for sympathetic predominance, typically observed during apnea [20,21].
ECG-Derived Respiration (EDR) is a respiratory surrogate extracted from the ECG, often from R-peak amplitude modulation:

$E D R (n) = A_{R} (n)$

(5)

where $A_{R} (n)$ is the amplitude of the n-th R-peak. An alternative respiration-related metric is:

$E D R_{R S A} (n) = R R_{peak} (n) - R R_{trough} (n)$

(6)

EDR reflects respiration-dependent cardiac modulation and is highly sensitive to apnea-related reductions in breathing effort [20].

Table 2 presents the main ECG-derived metrics and their expected changes during apnea events.

In the original dataset, each recording has ECG signals sampled at 1 kHz without any preprocessing or artifact removal. To compute the introduced parameters, a first key pre-processing step is to identify peaks (R-peaks) of the ECG signal, where R-R intervals are obtained.

To compute the DHR parameter, the ECG signal is filtered between 5 and 15 Hz to enhance R-peaks, which are detected using amplitude analysis. RR intervals are computed and converted into heart rate values (bpm). DHR is calculated as the absolute value of the difference between consecutive heart rate samples, and can be smoothed to reduce noise.

To compute the LF/HF parameter, the heart rate time series is interpolated at 4 Hz to obtain uniformly spaced samples. A moving window of 60 s with a 1 s step length extracts segments for power spectral density estimation via Welch’s method. The power is integrated over LF (0.04–0.15 Hz) and HF (0.15–0.4 Hz) bands, and their ratio (LF/HF) is computed, exactly as described.

Finally, to compute the EDR signal, R-peak amplitudes from the filtered ECG are extracted to form the raw EDR signal, which is then interpolated to create a continuous respiratory surrogate. The code also includes detection of respiratory pauses based on low-amplitude variability over a minimum pause duration. This matches the qualitative description of EDR computation and apnea event detection.

In Figure 4, the values of the introduced parameters were computed for an ECG trace extracted from the dataset. It can be observed that during each apnea interval, there is typically an increasing trend in all computed parameters, with the maximum values varying according to the individual physiological response of the subject. For example, the LF/HF signal demonstrates that the impact of apnea intensifies with each successive event, as the subject is unable to fully recover to the baseline state. Moreover, the peak effect of the phenomenon does not coincide exactly with the onset (right border) of the apnea interval. In Figure 4, LF/HF and EDR peaks provide the most reliable diagnostic indicators, although this may vary between traces. Specifically, the y-axis of the EDR signal represents ADC counts, corresponding to a bipolar representation of the 24-bit ADC output. Artifacts caused by user movement, as well as individual variations in physiological response and reactivity, may also influence signal recordings and introduce errors during testing.

5.1. Performance of Detection

To facilitate apnea event recognition, each parameter was first smoothed with a 5 s moving-average filter. Peaks and sustained elevations were then detected on the smoothed signals using feature-specific thresholds. Event candidates were evaluated in a running window of length W (30–40 s depending on the feature) slid with a 1 s step across the recording. For peak-based methods (DHR and EDR), a min-peaks criterion required at least N peaks above threshold within the window (e.g., N = 3 in W = 30–40 s). For LF/HF, a sustained-elevation rule enforced a minimum duration of 10 s above threshold. Detected events were matched to ground truth using overlap (for LF/HF) or a ±30 s proximity tolerance (for peak detections). Finally, we quantified temporal displacement as the difference between the time of maximum response and the right border of the matched apnea interval. This multi-parameter consensus successfully attenuated the false-positive rate compared with single-indicator detection, and the resulting detection error must be added to the classification error of the neural model.

5.2. Event Detection Performance

To identify optimal detection parameters for apnea event recognition, we performed a comprehensive threshold optimization analysis across ECG-derived physiological indicators using labeled apnea intervals. Three parameters were evaluated: delta heart rate (DHR), the low-frequency to high-frequency power ratio (LF/HF), and ECG-derived respiration (EDR). For each parameter, we systematically searched the parameter space to maximize event-based F1 score, where events were matched to ground truth intervals using temporal overlap for sustained elevations or proximity tolerance (

\pm 30

s) for peak detections.

Two complementary strategies were evaluated: (i) peak-based detection on DHR and EDR using a global amplitude threshold to mark transient HR accelerations, and to capture respiration-related envelopes, and (ii) sustained-elevation detection on LF/HF using a threshold plus a minimum duration to model gradual autonomic activation during apneic episodes. To reduce false positives in peak-based methods, a “min-peaks in window” criterion was added, requiring at least N peaks above threshold within W seconds to promote temporally coherent bursts over isolated excursions that often reflect noise or single ectopic beats.

Temporal displacement was quantified for each matched event as the difference between the event’s time of maximum response and the right (end) border of its corresponding ground-truth interval, summarizing the systematic anticipation or lag of autonomic/respiratory markers relative to episode termination.

For each parameter and method, grid searches were conducted across threshold ranges derived from feature percentiles and, where applicable, across minimum duration (LF/HF) or min-peaks/window hyperparameters (DHR/EDR), selecting configurations that maximized event-level F1 on the combined dataset.

Table 3 includes aggregate performance, reporting precision/recall trade-offs, enabling fair cross-feature comparison and multi-parameter fusion design for practical ECG-based apnea screening.

The LF/HF ratio and DHR methods demonstrated superior performance. The sustained elevation approach for LF/HF proved particularly effective as it captures the gradual sympathovagal imbalance characteristic of apnea episodes, whereas instantaneous peak-based metrics are more susceptible to transient physiological fluctuations unrelated to respiratory events. On the other hand, considering multiple DHR peaks helps to increase precision, providing excellent balance between precision and recall. These results demonstrate clinical viability with 84% precision and 80% recall for DHR with the minPeaks strategy, F1 = 0.82; and practicality with only 16% false alarm rate. The temporal displacement analysis revealed that the mean time displacement of the detected event is 17 s prior to the apnea interval termination, with a standard deviation of 8 s. This displacement is computed on ECG-based detections (DHR minPeaks) relative to the right border of the annotated apnea interval. We will use these values to estimate the real accuracy of the BCG-based detection method.

6. BCG Signal-Based Deep Learning Classification

The 12-channel BCG signal sampled at

f_{s}

= 100 Hz has segmented into fixed-length windows of duration T = 30 s and stride S samples; each window is mapped by a deep network to a posterior probability of apnea, and labels are assigned by the presence of any annotated apnea within the window. The stride S is tuned by validation, and the final model is evaluated at the stride yielding the best validation performance, including an event-level time-displacement error to quantify detection timing relative to ground truth.

Data and windowing.

Let

X (t) \in R^{C}

be the multichannel signal acquired from the four bed supports, each equipped with a 3-axis sensor measuring the components x, y, and z. Hence, the total number of channels is

C = 4 \times 3 = 12

. The signal is sampled at

f_{s} = 100 Hz

, and its discrete-time representation is

X [n] \in R^{C}

,

n = 0, \dots, N - 1

.

For a fixed window duration

T = 30 s

and stride S (in samples), the k-th analysis window is defined as:

n_{k} = k S, L = ⌊ T f_{s} ⌋, W_{k} = {n_{k}, \dots, n_{k} + L - 1} .

(7)

The corresponding network input for window k is the tensor:

X_{k} \in R^{C \times L},

(8)

obtained by stacking the

C = 12

channels (three per bed foot) over the indices in

W_{k}

.

The total number of windows extracted with stride S is given by:

K (S) = ⌊\frac{N - L}{S}⌋ + 1 .

(9)

Ground truth and window labels.

Let the set of apnea intervals be

E = {[a_{i}, b_{i}]}_{i = 1}^{M}

, expressed in sample indices (or equivalently in seconds by dividing by

f_{s}

). A window is labeled as positive, i.e., affected by apnea, if the output of the integration step of the UDF function falls within the interval [0.5, 1].

Deep classifier and loss.

A deep network

f_{θ} : R^{C \times L} \to 0, 1

maps

X_{k}

to a posterior

{\hat{p}}_{k} = f_{θ} (X_{k})

of apnea in window k.

To study the effect of temporal sampling density, models are trained for stride values

S \in S

(equivalently

Δ = \frac{S}{f_{s}}

s) in 500, 10,000, 15,000. Let

J (S)

be a validation metric (e.g., event-level

F_{1}

or average precision). The selected stride is

S^{★} = arg max_{S \in S} J (S) .

(10)

Neural Network Architecture

The proposed model architecture, illustrated in Figure 5, is a lightweight one-dimensional convolutional neural network (1D-CNN) designed for efficient feature extraction and classification from sequential data. The network follows a streamlined hierarchical structure, progressively transforming the input signal into a compact representation through a series of optimized layers.

The architecture begins with an input layer that receives the raw one-dimensional signal. This is followed by three successive convolutional blocks Conv1–Conv3. Each block consists of a 1D convolutional layer with a kernel size of 3, a stride of 2, and padding of 1, followed by batch normalization (BN) and a ReLU activation function. Unlike traditional deep models, this architecture utilizes a minimal number of filters—increasing from 1 to 3 across the blocks—effectively reducing the computational footprint while capturing essential temporal patterns through strided convolutions.

Following the third convolutional block, the feature maps are flattened into a 1D tensor to be processed by a sequence of three fully connected (FC) layers. The first two layers (FC1 and FC2 in Figure 5) comprise 20 and 10 neurons, respectively, both employing ReLU activation functions to introduce non-linearity. This progression refines the high-level features extracted by the convolutional front-end into a low-dimensional discriminative space.

The network generates its final output through the last fully connected layer (FC3 in Figure 5), which maps the learned representations to the target number of classes. This concluding stage provides the logits necessary for multi-class classification.

In contrast to high-capacity models, this architecture is specifically engineered to minimize computational overhead. With only 225,330 trainable parameters and a memory footprint of approximately 0.86 MB, the model ensures high efficiency and rapid inference while maintaining the learning capacity required for robust performance.

7. Experimental Results

The dataset was derived from the original BCG signals collected under controlled breath-holding protocols. A sliding window with a specific stride between consecutive windows was applied to the signals. To evaluate the impact of window overlap, the stride was varied from 5 s to 15 s. After generating the windowed samples, the data were divided into training and test sets using an 80–20% split. Training was conducted for 15 epochs, at which point the model reached convergence. Furthermore, a group split was implemented to ensure that data from any given patient appeared exclusively in either the training or the test set.

Three stride values were considered during the labeling stage to investigate their impact on model performance. Figure 6 shows the training behavior when the maximum of the UDF function is placed at the right border of each apnea window. The accuracy curves enable a comparative assessment of how stride influences classification performance.

The results show that with the smallest stride, the model does not effectively learn the temporal patterns, whereas increasing the stride up to 15 s allows the model to achieve stable performance, reaching its best accuracy after 7 epochs.

In the inference phase, the model was tested on the third portion of the dataset. We note that the stride value influences the composition of both the training and test datasets, affecting the number of samples as well as the overall difficulty of the task. To properly evaluate the effectiveness of the training process on the same model, it was therefore necessary to establish a common reference baseline. We addressed this by comparing the performance of models trained on three different training datasets against a challenging benchmark task, i.e., performing inference on the test dataset generated with a stride of 5000. This strategy provides a uniform basis for comparison, enabling the evaluation to focus on the models’ generalization capability independently of the original training sample size.

Table 4 and Table 5 summarize the performance results obtained during the inference phase, where a comprehensive evaluation was carried out across the model instances, trained with the different considered stride values.

Figure 6 shows that the best performance is achieved by the model trained and tested on the 5000-stride dataset; as the stride increases, performance progressively deteriorates. This trend is consistent with the results reported in Table 4, where the highest accuracy and lowest loss are obtained when both training and testing are performed with 15,000-stride windows. Furthermore, when training and inference are performed on different datasets, the model trained with 15,000-stride windows generally achieves superior performance compared to the 10,000- and 5000-stride configurations, while the 5000-stride model consistently demonstrates inferior accuracy and highest loss values.

This suggests that the 5000-stride model successfully classifies the specific sample types encountered during its training. In contrast, models trained with larger strides exhibit lower accuracy on their own sets because these wider intervals exclude samples that partially overlap with apnea events. Consequently, when the 10,000 and 15,000-stride models are tested on the 5000-stride dataset, they encounter sample variations, specifically those partial overlaps that were absent during their training phase.

Once we identified an optimal stride size, we aimed to evaluate the impact of imperfect detection of apnea intervals from the ECG using our analytical method. In practice, this imperfection introduces both mislabeling errors and time displacements, as detected apnea intervals do not perfectly overlap with the gold-standard annotations.

To simulate this effect, we applied a random time-shift to each apnea interval in the gold-standard annotations. Specifically, for each interval, a random offset

x \in [0, Δ μ]

was added, where

Δ μ

represents the maximum allowed displacement. We then computed the mislabeling rate using the UDF function under these time-shifted conditions. This procedure was repeated for different values of

Δ μ

to study the relationship between displacement error and mislabeling rate.

Finally, to assess the impact of such displacement errors on model training, we trained the network multiple times using datasets corresponding to different mislabeling rates induced by these random offsets. This approach allowed us to quantify how sensitivity to interval misalignment affects training performance.

Figure 7 shows the training accuracy and loss for different percentage values of mislabeling of time windows, using a stride of 10 s. As it was explained before, the mislabeling is generated injecting random time displacement in

[0, Δ μ]

of the apnea intervals, which in reality is introduced by detection error of the automatic labelling.

In Figure 8, it is shown how the test accuracy decreases when the mislabeling rate increases. Average and deviation is computed over 10 runs for each percentage value of the mislabeling rate.

Finally, Figure 9a shows the time required to preprocess patient files into windowed samples and calculate labels via UDF integrals, plotted against file size. Complementing this, Figure 9b depicts how training duration scales with the size of the training dataset.

8. Discussion an SOTA Comparision

The combination of the presented methods allows for the development of a weakly-supervised learning framework that employs a two-stage cascaded architecture, where ECG-based apnea detection serves as pseudo-ground truth for training a deep learning model on BCG data. The experimental results are here used to evaluate the end-to-end accuracy estimation under imperfect labeling supported by such a framework.

We clarify that the dataset used for experimental activities evaluation is based on controlled breath-holding protocols, rather than full-night PSG-confirmed OSA cohorts, which limits the generalization of quantitative results to clinical settings.

The ECG-based DHR + minpeaks method achieved an F1 score of

0.821

(precision = 0.842, recall =

0.800

) when evaluated against clinical ground truth annotations.

Assuming an event prevalence of

p = 0.10

, the expected confusion matrix components can be derived as follows:

\begin{matrix} T P & = R \cdot p = 0.8 \times 0.1 = 0.08, \\ F N & = p - T P = 0.1 - 0.08 = 0.02, \\ F P & = T P \times \frac{1 - P}{P} = 0.08 \times \frac{0.16}{0.84} = 0.0152, \\ T N & = 1 - (T P + F N + F P) = 1 - (0.08 + 0.02 + 0.0152) = 0.88 . \end{matrix}

This corresponds to an estimated true ECG accuracy of

{A c c}_{ECG} = T P + T N = 0.08 + 0.88 = 0.96 (or 96 %) .

A secondary neural classifier, trained on a synchronized signal using the detections from the first model as training labels, achieved an apparent accuracy of

79.85 %

with respect to these pseudo-ground-truth annotations. Considering that the reference classifier is not perfectly accurate, the propagated true accuracy of the neural classifier can be estimated as:

{A c c}_{BCG}^{t r u e} \approx P (B = A) \times {A c c}_{ECG} + [1 - P (B = A)] \times (1 - {A c c}_{ECG}),

where

P (B = A) = 0.7985

represents the probability that the labeling is correct.

Substituting the values gives:

{A c c}_{BCG}^{t r u e} = 0.7985 \times 0.9648 + 0.2015 \times (1 - 0.9648) = 0.777,

that is, an estimated true accuracy of approximately

77.7 %

.

This analysis illustrates that, although the neural model shows a relatively high apparent accuracy when compared to the reference labels, its true performance with respect to real apnea events is obviously lower. The recorded performance primarily characterizes consistency with the training annotations rather than a genuine improvement in event detection accuracy. To avoid ambiguity, Table 6 provides a concise nomenclature of all accuracy metrics reported in the manuscript.

Table 7 presents a comparison of our end-to-end accuracy with the 98% reported in previous works, like [8,18], due to substantial differences in experimental design and signal processing, contrasting idealized classification tasks against our continuous monitoring framework.

As shown in Table 7, direct comparison of accuracy values requires considering the evaluation protocol. State-of-the-art methods like [18] report 98% accuracy on pre-segmented event datasets, which effectively removes the challenge of detecting event boundaries and transitional artifacts. Note that the 98% accuracy reported in [18] is obtained under pre-segmented, event-based evaluation and is therefore not directly comparable with the continuous sliding-window evaluation adopted in this work.

In contrast, our framework operates in a continuous sliding-window regime essential for real-time alerts. Our ECG-based “Teacher” achieves 96% (

A c c_{E C G}

) consistency with PSG, with a specificity (TN rate) of 88%, confirming the reliability of our training labels. However, when moving to the fully automated BCG “Student” model, the estimated true accuracy (

{A c c}_{BCG}^{true}

) is 77.7%. While numerically lower than [18], this aligns with or exceeds other continuous monitoring baselines (e.g., [8] reported ≈50% accuracy for minute-by-minute bed sensor detection), reflecting the realistic trade-off between automation and precision in home settings.

9. Conclusions

This work demonstrates the feasibility of a centralized pipeline that leverages ECG-derived, automatically detected apnea events to pseudo-label multi-channel BCG and train a deep 1D-CNN for home-oriented monitoring, thereby bridging high-end clinical instrumentation and low-cost, unobtrusive devices. The approach integrates a principled UDF-based labeling strategy aligned with temporal criteria, multi-parameter ECG analytics (DHR, LF/HF, EDR) for robust event detection, and stride-aware BCG windowed classification over 12 channels sampled at 100 Hz with 30 s windows.

Methodologically, the study introduces an end-to-end pipeline that can scale beyond manual annotation by aligning PSG-supervised ECG events with synchronized low-cost BCG and converting them into probabilistic labels via a UDF integral consistent with apnea duration semantics. The analysis highlights the importance of temporal alignment and stride selection for balancing sample diversity and convergence, and shows that multi-parameter ECG consensus (notably, sustained LF/HF and peak-coherent DHR) improves robustness over single-indicator detection

Propagating reference-label imperfection yields an estimated true end-to-end accuracy of approximately 77.7%, clarifying that apparent gains primarily reflect consistency with pseudo-labels and could be used for an unobtrusive continuous monitoring at home of the user’s well-being, complementing clinical information.

The principal limitations are the small-scale optimization on a limited number of subjects with breath-holding protocols rather than full-night PSG-confirmed OSA cohorts and a large model footprint that currently constrains embedded deployment.

Future work will prioritize validation on larger, clinically representative datasets, expansion to PPG-based surrogates suitable for real-world acquisition, and model compression via distillation and pruning to enable efficient embedded inference. In parallel, evaluating federated learning against the centralized baseline and refining UDF timing to better account for systematic displacement can strengthen privacy-preserving, continuous home monitoring. Additionally, we will investigate hybrid time-frequency representations, such as spectrogram-based inputs, to potentially enhance performance by combining explicit frequency-domain information with our current time-domain approach.

Author Contributions

Conceptualization, S.V. and P.E.; methodology, S.V. and P.E.; software, P.F.; validation, D.D.S., P.E. and P.F.; formal analysis, D.D.S. and P.F.; investigation, P.E. and D.D.S.; resources, S.V. and P.E.; data curation, P.E.; writing–original draft preparation, D.D.S. and P.E.; writing—review and editing, S.V.; visualization, P.E.; and supervision, S.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Italian Ministry of Research and by the European Union under National Recovery and Resilience Plan, grant number P2022MWE3S (REDRAW Prin 2022 PNRR, DR n. 1409 of 14-09-2022).

Institutional Review Board Statement

This study uses publicly available anonymized biometric datasets. No new data were collected from human participants. Therefore, ethical approval was not required.

Informed Consent Statement

Not applicable.

Data Availability Statement

Archived dataset analyzed during the study is publicly available at https://data.mendeley.com/datasets/9fmfn6kfn7/1 (accessed on 25 March 2026).

Acknowledgments

Authors thank the scientists who created and made available the public data-set used in this work. Moreover, they thank the OpenAI ChatGPT free service based on GPT-5.3, https://chatgpt.com/ (accessed on 5 February 2026), which was used to improve the English of the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AASM	American Academy of Sleep Medicine
BCG	Ballistocardiography
PSG	Polysomnography
ECG	Electrocardiography
EOG	Electrooculography
EMG	Electromyogram
OSA	Obstructive Sleep Apnea
DHR	Heart Rate Variation
PPG	Photoplethysmography
FC	Fully Connected
LF	Low Frequency
HF	High Frequency
EDR	ECG-Derived Respiration
UDF	User-Defined Function
LSTM	Long Short-Term Memory
SDNN	Standard Deviation of RR intervals
BN	Batch Normalization
CNN	Convolutional Neural Network
IR-UWB	Impulse Radio Ultra Wideband

References

Cimr, D.; Studnicka, F.; Fujita, H.; Tomaskova, H.; Cimler, R.; Kuhnova, J.; Slegr, J. Computer aided detection of breathing disorder from ballistocardiography signal using convolutional neural network. Inf. Sci. 2020, 541, 207–217. [Google Scholar] [CrossRef]
Ramos, A.R.; Wheaton, A.G.; Johnson, D.A. Sleep Deprivation, Sleep Disorders, and Chronic Disease. Prev. Chronic Dis. 2023, 20, E77. [Google Scholar] [CrossRef] [PubMed]
Yin, J.; Xu, J.; Ren, T.L. Recent Progress in Long-Term Sleep Monitoring Technology. Biosensors 2023, 13, 395. [Google Scholar] [CrossRef] [PubMed]
Markun, L.C.; Sampat, A. Clinician-Focused Overview and Developments in Polysomnography. Curr. Sleep Med. Rep. 2020, 6, 309–321. [Google Scholar] [CrossRef] [PubMed]
Tal, A.; Shinar, Z.; Shaki, D.; Codish, S.; Goldbart, A. Validation of Contact-Free Sleep Monitoring Device with Comparison to Polysomnography. J. Clin. Sleep Med. 2017, 13, 517–522. [Google Scholar] [CrossRef] [PubMed]
Nam, Y.; Kim, Y.; Lee, J. Sleep Monitoring Based on a Tri-Axial Accelerometer and a Pressure Sensor. Sensors 2016, 16, 750. [Google Scholar] [CrossRef] [PubMed]
Pallesen, S.; Grønli, J.; Myhre, K.; Moen, F.; Bjorvatn, B.; Hanssen, I.; Heglum, H.S.A. A Pilot Study of Impulse Radio Ultra Wideband Radar Technology as a New Tool for Sleep Assessment. J. Clin. Sleep Med. 2018, 14, 1249–1254. [Google Scholar] [CrossRef] [PubMed]
Alqaraawi, A.; Alwosheel, A.; Alasaad, A. A new approach for detecting sleep apnea using a contactless bed sensor: Comparison study. J. Med Internet Res. 2020, 22, e18297. [Google Scholar] [CrossRef] [PubMed]
Sadek, I.; Biswas, J.; Abdulrazak, B. Ballistocardiogram signal processing: A review. Health Inf. Sci. Syst. 2019, 7, 10. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Pino, E.J.; Chávez, J.A.P.; Aqueveque, P. Noninvasive ambulatory measurement system of cardiac activity. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 7622–7625. [Google Scholar] [CrossRef]
Morokuma, S.; Saitoh, T.; Kanegae, M.; Motomura, N.; Ikeda, S.; Niizeki, K. Prediction of ECG signals from ballistocardiography using deep learning for the unconstrained measurement of heartbeat intervals. Sci. Rep. 2025, 15, 999. [Google Scholar] [CrossRef] [PubMed]
Carlson, C.; Suliman, A.; Prakash, P.; Thompson, D.; Wang, S.; Natarajan, B.; Warren, S. Bed-based instrumentation for unobtrusive sleep quality assessment in severely disabled autistic children. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 16–20 August 2016; pp. 4909–4912. [Google Scholar] [CrossRef]
Feng, S.; Wu, X.; Bao, A.; Lin, G.; Sun, P.; Cen, H.; Chen, S.; Liu, Y.; He, W.; Pang, Z.; et al. Machine learning-aided detection of heart failure (LVEF ≤ 49%)by using ballistocardiography and respiratory effort signals. Front. Physiol. 2023, 13, 1068824. [Google Scholar] [CrossRef] [PubMed]
Pino, E.J.; Chávez, J.A.P.; Aqueveque, P. BCG algorithm for unobtrusive heart rate monitoring. In Proceedings of the 2017 IEEE Healthcare Innovations and Point of Care Technologies (HI-POCT), Bethesda, MD, USA, 6–8 November 2017; pp. 180–183. [Google Scholar] [CrossRef]
Tramontano, A.; Tamburis, O.; Cioce, S.; Venticinque, S.; Magliulo, M. Heart rate estimation from ballistocardiogram signals processing via low-cost telemedicine architectures: A comparative performance evaluation. Front. Digit. Health 2023, 5, 1222898. [Google Scholar] [CrossRef] [PubMed]
Fusco, P.; Errico, P.; Venticinque, S. Federated Learning Algorithm for Identification of Apnea Sleeping Disorder. Lect. Notes Data Eng. Commun. Technol. 2025, 250, 253–261. [Google Scholar] [CrossRef]
Osman, A.M.; Carter, S.G.; Carberry, J.C.; Eckert, D.J. Obstructive sleep apnea: Current perspectives. Nat. Sci. Sleep 2018, 10, 21–34. [Google Scholar] [CrossRef] [PubMed]
Cimr, D.; Studnička, F. Automatic detection of breathing disorder from ballistocardiography signals. Knowl.-Based Syst. 2020, 188, 104973. [Google Scholar] [CrossRef]
Berry, R.B.; Budhiraja, R.; Gottlieb, D.J.; Gozal, D.; Iber, C.; Kapur, V.K.; Marcus, C.L.; Mehra, R.; Parthasarathy, S.; Quan, S.F.; et al. Rules for Scoring Respiratory Events in Sleep: Update of the 2007 AASM Manual for the Scoring of Sleep and Associated Events. J. Clin. Sleep Med. 2012, 8, 597–619. [Google Scholar] [CrossRef] [PubMed]
Attar, E.T. Detailed evaluation of sleep apnea using heart rate variability: A machine learning and statistical method using ECG data. Front. Neurol. 2025, 16, 1636983. [Google Scholar] [CrossRef] [PubMed]
Narkiewicz, K.; Montano, N.; Cogliati, C.; van de Borne, P.J.H.; Dyken, M.E.; Somers, V.K. Altered Cardiovascular Variability in Obstructive Sleep Apnea. Circulation 1998, 98, 1071–1077. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Reference scenario: data collected by high-end medical devices are used to train an AI model for low-end devices, enabling continuous monitoring at the patient’s home.

Figure 2. Labeling procedure overview. (Top Panel): Example of a 12-channel BCG dataset after alignment, shown from one of the twelve recordings. (Bottom Panel): User-defined function (UDF) implemented to annotate the BCG data.

Figure 3. Zoomed-in view of the labeling process, showing the UDF function profile and how the stride is applied to create sample windows.

Figure 4. ECG-derived autonomic and respiratory surrogates (DHR, LF/HF, and EDR) used as a supervisory reference. Yellow area correspond to apnea intervals.

Figure 5. CNN model architecture.

Figure 6. Training accuracy and loss for different sliding window strides of BCG signals. (a) Train accuracy. (b) Train loss.

Figure 7. Train accuracy and train loss for different mislabeling percentages of BCG signals. (a–c) Train accuracy. (d–f) Train loss.

Figure 8. Test accuracy and test loss against different mislabeling percentages. (a) Test accuracy. (b) Test loss. Area correspond to an interval to ±

σ

.

Figure 8. Test accuracy and test loss against different mislabeling percentages. (a) Test accuracy. (b) Test loss. Area correspond to an interval to ±

σ

.

Figure 9. Preprocessing time and training time. (a) Preprocessing time. (b) Training time.

Table 1. Schedule of monitored events.

V1 Schedule		V2 Schedule
Time (s)	Event Type	Time (s)	Event Type
0	start of measuring on back	0	start of measuring on back
60	breath-holding/inhalation	60	breath-holding/inhalation
120	breath-holding/inhalation	150	breath-holding/exhalation
180	breath-holding/exhalation	240	breath-holding/inhalation
240	breath-holding/exhalation	330	breath-holding/exhalation
300	position change	420	end of measuring
420	turning on the side	–	–
480	breath-holding/inhalation	–	–
540	breath-holding/inhalation	–	–
600	breath-holding/exhalation	–	–
660	breath-holding/exhalation	–	–
720	end of measuring	–	–

Table 2. Summary of ECG-derived metrics during apnea events.

Metric	What It Measures	Expected Behavior During Apnea
DHR	Beat-to-beat change in heart rate estimated from RR intervals.	Decreases during the apneic pause; rebounds during recovery.
LF/HF	Autonomic balance between sympathetic and parasympathetic modulation in HRV.	Increases, indicating sympathetic predominance during recovery.
EDR	Respiratory surrogate obtained from the ECG.	Strongly reduced or nearly absent during apnea; resumes afterward.

Table 3. Event-based performance comparison of ECG-derived parameters for apnea detection.

Parameter	Method	Peaks	Window	Precision	Recall	F1	Disp.
LF/HF	Sustained	—	10 s	0.86	0.60	0.71	−21 s
DHR	minPeaks	3	30 s	0.84	0.80	0.82	−817 s
EDR	minPeaks	3	40 s	0.68	0.65	0.67	−8 s

Table 4. BCG signal accuracy for test step.

		Test Stride
		15,000 s	10,000 s	5000 s
Train stride	15,000 s	77.39%	74.13%	71.84%
	10,000 s	71.94%	72.10%	71.43%
	5000 s	59.02%	60.12%	60.42%

Table 5. BCG signal loss for test step.

		Test Stride
		15,000 s	10,000 s	5000 s
Train stride	15,000 s	0.79	1.10	1.20
	10,000 s	1.37	1.26	1.39
	5000 s	2.00	1.95	1.94

Table 6. Summary of accuracy metrics used in this work.

Name/Symbol	Value	Description
Acc_ECG ECG-based detection accuracy	≈96%	Accuracy of the ECG-derived multi-parameter method (DHR minPeaks) against the clinical PSG ground truth. It validates the reliability of the pseudo-labels generated for BCG training.
${A c c}_{B C G}^{p s e u d o}$ Apparent accuracy (BCG-CNN vs. pseudo-labels)	≈71.8%	Accuracy of the trained 1D-CNN measured against the ECG-derived pseudo-labels on the test set. It reflects how well the model reproduces the ECG teacher’s labeling decisions, not how well it detects true apnea events.
${A c c}_{BCG}^{true}$ Estimated true accuracy (BCG-CNN vs. ground truth)	≈77.7%	Estimated accuracy of the 1D-CNN against the clinical ground truth, derived analytically via label-error propagation. This is the most meaningful end-to-end performance metric.
Acc_S State-of-the-art accuracy [18] (Cimr et al.)	≈98%	Accuracy reported in the literature under an idealized, pre-segmented event-based protocol. Not directly comparable with the metrics of this work, which adopts a continuous sliding-window evaluation.

Table 7. Comparison of apnea detection performance: idealized vs. continuous monitoring frameworks.

Method	Data Source	Evaluation Protocol	Accuracy	Notes
Cimr et al. [18] (SOTA)	BCG (Cartan Features)	Event-based/ Pre-segmented Classification of discrete, pre-selected event windows.	98.0% ( ${A c c}_{S}$ )	Idealized Benchmark: Does not account for continuous overlap or transition artifacts.
Proposed (Gold-standard)	ECG (DHR + minPeaks)	Continuous/Sliding Window Comparison vs. Clinical PSG (Gold Standard).	96.0% ( ${A c c}_{ECG}$ )	Labeling Validation: Validates the ECG-derived “Pseudo-GT” against expert PSG annotations.
Proposed (End-to-end)	BCG (Raw Signal)	Continuous/Sliding Window End-to-end inference on continuous streams.	≈77.7% ( ${A c c}_{ECG}^{true}$ )	Realistic Field Performance: Accounts for labeling noise, artifacts, and continuous time overlap.
Alqaraawi et al. [8] (Baseline)	Bed Sensor (MFOS)	Continuous/ Epoch-based Minute-by-minute evaluation against PSG.	≈50.0% ( ${A c c}_{S}$ )	Baseline: Highlights the difficulty of continuous home monitoring compared to event classification.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Di Sivo, D.; Errico, P.; Fusco, P.; Venticinque, S. A Convolutional Neural Network Framework for Sleep Apnea Detection via Ballistocardiography Signals. Appl. Sci. 2026, 16, 3314. https://doi.org/10.3390/app16073314

AMA Style

Di Sivo D, Errico P, Fusco P, Venticinque S. A Convolutional Neural Network Framework for Sleep Apnea Detection via Ballistocardiography Signals. Applied Sciences. 2026; 16(7):3314. https://doi.org/10.3390/app16073314

Chicago/Turabian Style

Di Sivo, Domenico, Palma Errico, Pietro Fusco, and Salvatore Venticinque. 2026. "A Convolutional Neural Network Framework for Sleep Apnea Detection via Ballistocardiography Signals" Applied Sciences 16, no. 7: 3314. https://doi.org/10.3390/app16073314

APA Style

Di Sivo, D., Errico, P., Fusco, P., & Venticinque, S. (2026). A Convolutional Neural Network Framework for Sleep Apnea Detection via Ballistocardiography Signals. Applied Sciences, 16(7), 3314. https://doi.org/10.3390/app16073314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Convolutional Neural Network Framework for Sleep Apnea Detection via Ballistocardiography Signals

Featured Application

Abstract

1. Introduction

2. Related Work

3. Methodology

4. BCG Analysis and Labeling

4.1. Public Dataset

4.2. Labeling Procedure

5. Automatic Apnea Detection

5.1. Performance of Detection

5.2. Event Detection Performance

6. BCG Signal-Based Deep Learning Classification

Neural Network Architecture

7. Experimental Results

8. Discussion an SOTA Comparision

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI