1. Introduction
Obstructive sleep apnea (OSA) is a prevalent sleep-related breathing disorder characterized by the repetitive partial or complete collapse of the upper airway during sleep. Clinical guidelines define OSA as breathing pauses lasting at least 10 s, which remains a significant global health challenge, affecting nearly 1 billion adults aged 30–69 years worldwide [
1,
2]. The American Heart Association (AHA) has identified OSA as an independent risk factor for several cardiovascular comorbidities, including resistant hypertension, atrial fibrillation, and heart failure [
3]. Recent evidence from large-scale meta-analyses underscores that untreated OSA significantly elevates the risk of all-cause mortality and is specifically linked to an increased risk of sudden cardiac death, with a pooled odds ratio as high as 3.87 in untreated patients [
4]. In addition, the condition frequently induces chronic fatigue, excessive daytime sleepiness, and cognitive impairment, which markedly degrade patients’ quality of life [
5].
Currently, overnight in-laboratory multichannel polysomnography (PSG) is recognized as the established “gold standard” for the definitive diagnosis and severity grading of obstructive sleep apnea (OSA) [
5]. However, the use of multiple tubes and a chest/abdomen strap in PSG places a significant physiological burden on subjects and is costly, making widespread PSG screening difficult. These limitations contribute to a high global rate of underdiagnosis, with estimates of undetected cases exceeding 80% in certain populations, rendering PSG impractical for large-scale population screening or longitudinal home monitoring [
3,
6]. To address the above issues, various portable monitoring solutions have been developed. For instance, dedicated wearable devices have been designed to capture respiratory pressure signals using a pressure sensor, which can calculate essential indicators such as respiratory rate, inspiratory and expiratory time, and apnea–hypopnea index (AHI) in real time. However, they still require physical contact with the subjects, which may cause discomfort and introduce signal artifacts related to sleeping posture or mattress compression [
7]. Furthermore, while commercially available wearable digital devices are increasingly utilized in population-based research due to their low cost and ubiquity, issues concerning data privacy, limited data access, and low user adherence continue to hinder their effectiveness for reliable, long-term data acquisition [
8]. Therefore, a non-invasive, comfortable, and efficient alternative to PSG is needed in both medical and home settings to improve cardiovascular risk stratification and increase early diagnosis rates [
3,
6].
In recent years, millimeter-wave (mmWave) radar technology has emerged as a transformative solution for non-invasive physiological monitoring. Unlike optical sensors that raise significant privacy concerns [
9] or contact sensors requiring physical attachment, mmWave radar is capable of detecting sub-millimeter chest displacements caused by cardiorespiratory activity through clothing and bedding without disrupting the user’s natural sleep environment [
10,
11].
Building on the above limitations, researchers have actively explored mmWave radar for detecting sleep-related breathing disorders (SRBDs). For instance, Jung et al. [
12] proposed a method to extract respiratory parameters from FMCW radar signals, achieving high efficacy in monitoring sleep apnea events. However, their work primarily focused on event detection and did not quantitatively assess the temporal accuracy of specific event occurrences. Conversely, data-driven approaches, such as the hybrid CNN–Transformer architecture introduced by Choi et al. [
13], have achieved significant success in diagnosing OSA. Nevertheless, while these deep learning models offer powerful feature representation capabilities, they typically require large-scale training datasets and substantial computational resources. From a methodological perspective, the proposed Sticky-HMM offers an unsupervised and lightweight alternative; however, its relative empirical performance compared to state-of-the-art deep learning architectures remains to be quantified in future head-to-head studies.
To achieve high-precision apnea detection while maintaining a lightweight computational framework, this article proposes an Adaptive Sticky Hidden Markov Model (HMM) framework. This approach integrates multi-dimensional characteristics of distinct signal states and introduces a dynamic mechanism to adaptively calibrate the HMM transition probabilities. By fusing these physiological features with adaptive probabilistic modeling, the proposed method significantly enhances the robustness and accuracy of respiratory state inference. Our main contributions are summarized as follows:
We propose a complete apnea detection framework based on the Adaptive Sticky Hidden Markov Model (HMM) which encompasses the entire signal chain, from raw data acquisition and signal processing to multi-dimensional feature extraction and final event inference.
We construct a high-fidelity mmWave radar dataset involving 15 subjects, strictly synchronized with a respiratory belt as the ground truth (GT). The dataset covers diverse experimental scenarios, including three standoff distances (1 m, 1.5 m, 2 m) and distinct apnea durations (10 s and 15 s), providing a valuable resource for future research in non-contact physiological monitoring.
Experimental results in our dataset demonstrate that our method achieves better performance than other baseline methods and exhibits high consistency with the ground truth.
2. Methods
2.1. Signal Preprocessing
As illustrated in Stage 1 of
Figure 1, the received raw radar ADC data is reorganized into a fast-time × slow-time matrix. We perform a Range-FFT on each chirp to generate the Range Profile matrix
, where
k is the range bin index and
m is the slow-time frame index.
To robustly lock onto the chest wall and mitigate body shifts, we employ a smoothing tracker. The target range bin
is updated dynamically:
where
is the smoothing factor and
denotes the magnitude at range index
k of the
m-th frame after Range-FFT processing.
After identifying the target range bin, the phase information is extracted from the complex signal using the arctangent function [
14]. However, the raw phase is restricted to the principal range of
, leading to artificial sawtooth-like discontinuities when the chest wall displacement exceeds the radar wavelength. To reconstruct the actual continuous movement of the chest, a phase unwrapping algorithm is applied. This process detects phase jumps between consecutive samples and compensates for them by adding or subtracting integer multiples of
, thereby restoring a smooth and continuous respiratory trajectory.
Finally, a 4th-order Butterworth bandpass filter with cutoff frequencies of is applied to isolate the respiratory component, effectively suppressing cardiac signals (>0.8 Hz) and low-frequency drift (<0.05 Hz).
2.2. Multi-Dimensional Feature Fusion and Normalization
To capture the intricate dynamics of respiratory cessation, a three-dimensional feature vector
is constructed for each slow-time sample
m, corresponding to the feature extraction process illustrated in Stage 2 of
Figure 1:
Phase Envelope (
): The Hilbert transform is utilized to extract the analytical signal of the phase, and its magnitude is smoothed to reflect the overall respiratory energy:
Velocity (
): The first derivative of the phase represents the rate of chestwall movement:
Curvature (
): The second derivative is highly sensitive to the sudden onset and offset of apnea events:
where
denotes a moving average filter with window size
w.
To ensure distance-invariant performance, features are normalized using the Median Absolute Deviation (MAD) instead of standard Z-score normalization:
This robust scaling suppresses the influence of outliers and maintains a stable feature distribution across varying Signal-to-Noise Ratio (SNR) environments.
2.3. Adaptive Sticky-HMM-Based State Decoding
We model the respiratory process as a two-state Hidden Markov Model (HMM) with state space , representing apnea and normal breathing, respectively.
To eliminate the fragmentation effect caused by transient noise, we incorporate a “Sticky” parameter [
15] into the transition matrix
:
The self-transition probabilities
are derived from the expected physiological duration
of each state:
This formulation imposes a temporal penalty on state switching, effectively forcing the model to maintain state continuity unless significant feature evidence suggests otherwise. The observation features are assumed to follow a diagonal Gaussian distribution for each state:
where the parameters
are statistically estimated from initial state masks.
To handle subject-specific variability and different apnea durations, an Adaptive Prior Strategy is implemented.
The respiratory prior is dynamically updated by identifying the peak frequency (
) of the power spectral density (PSD) calculated via Welch’s method:
The proposed decoding process employs a two-pass refinement strategy to optimize state estimation. In the initial pass, the sequence is decoded using a preliminary apnea prior, . To enhance the precision of the temporal boundaries, a refinement stage is introduced wherein the median duration of the segments identified in the first pass is utilized to adaptively update . Subsequently, a second pass is executed; here, the transition matrix is reconstructed based on the refined prior, facilitating a final high-precision decoding of the respiratory states.
The optimal state sequence, denoted as
, is inferred via the Viterbi algorithm implemented in the logarithmic domain to ensure numerical stability and mitigate potential underflow issues. The recursive update is defined as
where
represents the highest log-probability of a state sequence ending at state
at time
t;
denotes the transition probability from state
i to state
j; and
is the emission probability of the feature vector
given state
.
To ensure the decoded sequences align with standardized clinical criteria, a postprocessing refinement stage is subsequently applied. Specifically, a bridging heuristic is employed to merge transient gaps shorter than 2 s, followed by a pruning operation to eliminate spurious segments that fail to satisfy the minimum duration constraint, . This dual-filtering approach guarantees that the final output strictly adheres to the temporal definitions of clinical apnea events.
4. Results
4.1. Apnea Event Detection Performance
To comprehensively evaluate the performance of our proposed method, we reproduced three baseline methods for comparison, namely the threshold-based method, the peak-based method, and the Hidden Markov Model (HMM) method [
19,
20,
21,
22].
Figure 3 illustrates the performance of various detection methods compared against the ground truth (GT) obtained from the respiratory belt. Among all evaluated approaches, our proposed method demonstrates the highest precision in event boundary estimation. Specifically, while the GT recorded two distinct apnea events during 21.25–32.20 s and 60.70–76.10 s, our proposed Adaptive Sticky-HMM successfully identified these occurrences at 21.20–31.50 s and 61.35–76.10 s, respectively. The results exhibit a precise degree of consistency with the GT, with temporal offsets remaining well within a minimal margin, thereby validating the precision of our approach.
In contrast, both the peak-based and traditional HMM-based methods suffer from significant inaccuracies and temporal fragmentation. For the first apnea event, these two methods exhibited imprecise boundaries of 19.75–26.95 s and 21.15–29.30 s, respectively, both failing to fully capture the actual duration. This instability became more pronounced during the second event, where both approaches erroneously partitioned a single apnea event into fragmented segments. Specifically, the peak-based method divided the event into 60.40–66.85 s and 70.90–80.05 s, while the traditional HMM exhibited similar instability, with boundaries at 61.35–65.80 s and 70.05–76.55 s. Although the threshold-based method correctly identified the occurrence of both events at 22.40–32.05 s and 60.50–77.80 s, it exhibited non-negligible errors in estimating exact durations. For instance, the delayed onset and offset discrepancies in the first event led to a less robust performance compared to the proposed Adaptive Sticky-HMM, which maintained superior temporal alignment across all cases.
Table 2 summarizes the Mean Absolute Error (MAE) of the four evaluated methods across sensing distances ranging from 1 m to 2 m. Our proposed Adaptive Sticky-HMM method demonstrates superior precision and robustness across all test intervals. At the initial distance of 1 m, the Adaptive Sticky-HMM method achieves a minimal MAE of 0.77 s, which is significantly lower than the traditional HMM-based (1.29 s), threshold-based (1.53 s), and peak-based (1.80 s) methods. As the range extends to the intermediate distance of 1.5 m, the Adaptive Sticky-HMM method maintains high stability with an MAE of 1.03 s, whereas the errors for the baseline methods increase notably to 1.62 s, 2.24 s, and 2.53 s, respectively. Even at the maximum test distance of 2 m, the Adaptive Sticky-HMM preserves an MAE of 1.18 s, which remains lower than the error of the traditional HMM-based method at a much closer 1 m range (1.29 s).
The performance disparity between the Adaptive Sticky-HMM method and the three baseline methods becomes more pronounced at greater distances. The peak-based method exhibits the highest sensitivity to distance, with its MAE escalating to 3.38 s at 2 m—a nearly twofold increase compared to its 1 m performance. Similarly, the threshold-based method shows substantial degradation, reaching an MAE of 2.65 s at the 2 m mark. Overall, the Adaptive Sticky-HMM method achieves an average MAE of 0.99 s, providing an error reduction of 40.7% compared to the traditional HMM-based method (1.67 s) and outperforming the threshold-based (2.14 s) and peak-based (2.57 s) approaches by a substantial margin. These results quantitatively confirm the effectiveness of the Adaptive Sticky-HMM method in mitigating environmental noise and state-switching instabilities in long-range sensing scenarios.
4.2. Statistical Analysis of Duration Error
To validate the assumption of approximate event-level independence, the Intraclass Correlation Coefficient (ICC2) was computed for each method–distance combination. ICC2 is defined under the two-way random-effects model as
where
,
, and
denote the between-subject, between-rater, and error mean squares, respectively;
is the number of subjects; and
is the number of repeated measurements per subject (corresponding to the 10 s and 15 s apnea trials). As shown in
Table 3, all ICC values fall below 0.5, indicating that between-subject variance is negligible relative to event-level variability. Consequently, individual apnea events are treated as the unit of statistical inference, consistent with the event-based nature of clinical apnea evaluation.
To formally assess whether the proposed Adaptive Sticky-HMM achieves statistically superior performance over the baseline methods, two levels of hypothesis testing were conducted on the event-level absolute duration errors.
First, a Friedman test was applied at each sensing distance to evaluate the overall difference among the four methods. As shown in
Table 4, the differences are highly significant at all three distances (all
), confirming that the observed performance hierarchy is not attributable to chance.
Post-hoc Wilcoxon signed-rank tests further confirmed that the proposed Adaptive Sticky-HMM significantly outperformed all baseline methods across all distances. Specifically, for each pairwise comparison between the proposed method and baseline methods,
p-values were consistently below 0.001, indicating strong statistical significance (see
Table 5).
Figure 4 illustrates the comparative boxplots of absolute duration errors for the four radar-based methods against the respiratory belt ground truth (GT).
Precision and Stability at Close Range (1 m): At a distance of 1 m, the Adaptive Sticky-HMM achieves a median error of 0.80 s with a compact IQR of 0.40 s. In comparison, the traditional HMM-based method yields a median error of 1.30 s, while the threshold-based and peak-based methods exhibit significantly higher median errors of 1.58 s and 1.83 s, respectively. Notably, the Adaptive Sticky-HMM provides a 38.5% reduction in median error relative to the traditional HMM-based approach.
Resilience to Intermediate and Long Ranges (1.5 m and 2 m): As the sensing range extends, the Adaptive Sticky-HMM demonstrates remarkable resilience to the degradation of the Signal-to-Noise Ratio (SNR). At 1.5 m, it maintains a median error of 1.05 s and a stable IQR of 0.39 s, whereas the peak-based method’s median error escalates to 2.55 s. At the maximum test distance of 2 m, the Adaptive Sticky-HMM preserves a median error of 1.20 s. In contrast, all three baseline methods exhibit median errors exceeding 2.10 s, with the peak-based method reaching 3.33 s.
Consistency Analysis: The Adaptive Sticky-HMM is the only method that maintains a median error below 1.50 s across all tested scenarios. Furthermore, the 75th percentile of the Adaptive Sticky-HMM error at 2 m remains lower than the 25th percentiles of both the threshold-based and peak-based methods at the same distance.
To evaluate systematic bias and the limits of agreement (LoAs), a Bland–Altman analysis was performed for the Adaptive Sticky-HMM across all experimental scenarios, as shown in
Figure 5.
Unbiased Estimation: The system demonstrates negligible systematic bias regardless of the sensing distance. The mean bias remains near zero across all ranges: −0.07 s at 1 m, 0.09 s at 1.5 m, and −0.04 s at 2 m.
Agreement Limits and Reliability: At 1 m, the 95% LoAs are tightly constrained within [−1.22, 1.09] s, with a span of 2.31 s. Although the LoAs expand as the distance increases—reaching [−1.89, 1.81] s at 2 m with a span of 3.70 s—the vast majority of detection samples remain within these confidence boundaries.
Subgroup Analysis (10 s vs. 15 s Events): The system exhibits consistent performance across different apnea durations. At 2 m, the biases for 10 s events (−0.11 s) and 15 s events (0.03 s) remain minimal, with nearly identical standard deviations (SD ≈ 0.95 s for both groups).
5. Discussion
5.1. Impact of Sensing Distance
A primary challenge in millimeter-wave radar monitoring is distance. As evidenced in
Table 2 and
Figure 4, the Mean Absolute Error (MAE) and median error for all methods exhibit positive correlations with sensing distance. This trend is primarily due to the decrease in the Signal-to-Noise Ratio (SNR) at longer ranges, where environmental noise and subtle body movements interfere with the extracted respiratory phase. At the 2 m mark, traditional approaches like the peak-based method showed the highest sensitivity, with the median error escalating to 3.33 s and the IQR expanding to 1.10 s. This indicates that without a robust state-transition mechanism, noise-induced fluctuations are frequently misinterpreted as breathing activity, leading to the “fragmentation” of detected events.
5.2. Effectiveness of the Sticky Mechanism in State Estimation
The superiority of the Adaptive Sticky-HMM over the traditional HMM-based method is rooted in its ability to suppress spurious state transitions. Traditional Hidden Markov Models often suffer from “state-flickering” in low-SNR environments because they lack a prior bias toward state persistence. In our results, while the traditional HMM-based method outperformed the other two baselines, its median error still increased from 1.30 s at 1 m to 2.15 s at 2 m.
In contrast, the Adaptive Sticky-HMM effectively mitigates this by favoring the current state unless significant evidence of a transition is present. This is reflected in the stable Interquartile Range (IQR), which only grew from 0.40 s at 1 m to 0.60 s at 2 m. The compact IQR confirms that the “Sticky” parameter provides a crucial stabilizing effect, ensuring that the detected apnea boundaries remain continuous and precise despite signal fluctuations.
5.3. Statistical Reliability and Clinical Versatility
The Bland–Altman analysis in
Figure 5 further confirms the clinical potential of the Adaptive Sticky-HMM. The mean bias remained negligible across all distances: −0.07 s at 1 m, 0.09 s at 1.5 m, and −0.04 s at 2 m. This near-zero bias indicates that the system does not systematically over- or under-estimate the duration of apnea events.
Furthermore, the subgroup analysis (10 s vs. 15 s events) reveals that the system’s performance is independent of the apnea duration. At 2 m, the biases for 10 s events (−0.11 s) and 15 s events (0.03 s) were remarkably similar, with nearly identical standard deviations of approximately 0.95 s. This stability across different event lengths suggests that the Adaptive Sticky-HMM holds promise for varying clinical scenarios, pending further validation with diverse patient populations and clinical OSA data. By maintaining sub-second average precision (0.99 s overall average), the system represents a significant step toward bridging the gap between contact-based sensors and non-contact radar monitoring in controlled settings.
6. Conclusions
In this study, we presented a robust, non-contact framework for sleep apnea detection utilizing mmWave FMCW radar and a novel Adaptive Sticky-HMM algorithm. The proposed approach integrates multi-dimensional signal features with a dynamic sticky transition mechanism, effectively overcoming the inherent sensitivity of traditional heuristic and standard probabilistic models to environmental noise. The framework was validated through extensive experiments involving 15 subjects across three sensing distances (1 m, 1.5 m, and 2 m), achieving an overall Mean Absolute Error (MAE) of 0.99 s and significantly outperforming baseline threshold-based, peak-based, and standard HMM approaches. Formal statistical evaluations—including Wilcoxon signed-rank tests with Bonferroni correction, boxplot analysis, and Bland–Altman agreement analysis—demonstrated high concordance with ground-truth measurements and distance-invariant robustness, with minimal systematic bias even under reduced-SNR conditions at a 2 m sensing range.
Although the proposed method is benchmarked against classical signal processing baselines in this work, recent studies have explored machine learning and deep learning approaches for radar-based respiratory monitoring. For instance, Choi et al. [
13] proposed a hybrid CNN–Transformer architecture that achieved strong OSA diagnostic performance from radar signals, and Jung et al. [
12] demonstrated effective apnea event detection using learned features from FMCW radar data. While such data-driven models benefit from their capacity to capture complex nonlinear patterns, they typically require large-scale labeled training datasets and substantial computational resources, which limits their applicability in resource-constrained or data-scarce deployment scenarios. The proposed Adaptive Sticky-HMM occupies a complementary niche: it requires no training data, relies exclusively on physiologically motivated priors with adaptive parameter estimation, and maintains a lightweight computational footprint well-suited for edge deployment in long-term home monitoring. A direct quantitative comparison with deep learning approaches remains an important direction for future investigation, pending the availability of large-scale, publicly accessible radar-based apnea datasets.
Several limitations of the current study warrant acknowledgement. First, the evaluation is based on 15 healthy volunteers performing voluntary breath-holding in a controlled seated posture, rather than clinical OSA patients monitored during natural sleep. Although voluntary breath-holding produces the same radar-observable biomechanical signature as obstructive apnea—namely, the cessation of rhythmic thoracic displacement—real-world OSA involves additional physiological complexity, including hypopnea (partial airway obstruction with attenuated rather than absent chest motion), involuntary respiratory effort against a closed airway, and variable sleeping postures that alter the radar-to-chest geometry and signal characteristics. The controlled experimental design was deliberately chosen to establish a high-SNR validation baseline for rigorously assessing the algorithm’s temporal boundary precision, consistent with established practice in radar-based physiological monitoring research. Second, the study cohort comprises subjects within a relatively narrow age range (20–30 years), which may not fully represent the demographic and physiological diversity of clinical OSA populations. Future work will prioritize validation of the proposed framework using PSG-synchronized clinical recordings across diverse patient populations, with extensions to hypopnea detection and apnea–hypopnea index (AHI) estimation, thereby supporting scalable, non-invasive respiratory health management in real-world settings.