An IoT-Enabled Smart Pillow with Multi-Spectrum Deep Learning Model for Real-Time Snoring Detection and Intervention

Liu, Zhuofu; Perin, Kotchoni K. O.; Li, Gaohan; Wang, Jian; He, Tian; Xu, Yuewen; McCarthy, Peter W.

doi:10.3390/app152412891

Open AccessArticle

An IoT-Enabled Smart Pillow with Multi-Spectrum Deep Learning Model for Real-Time Snoring Detection and Intervention

by

Zhuofu Liu

^1,*,

Kotchoni K. O. Perin

¹,

Gaohan Li

¹

,

Jian Wang

¹,

Tian He

¹,

Yuewen Xu

¹ and

Peter W. McCarthy

^2,3,*

¹

The Higher Educational Key Laboratory for Measuring & Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin 150080, China

²

Faculty of Life Science and Education, University of South Wales, Treforest, Pontypridd CF37 1DL, UK

³

Faculty of Health Sciences, Durban University of Technology, Durban 1334, South Africa

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(24), 12891; https://doi.org/10.3390/app152412891

Submission received: 14 November 2025 / Revised: 30 November 2025 / Accepted: 2 December 2025 / Published: 6 December 2025

(This article belongs to the Special Issue Human Activity Recognition (HAR) in Healthcare, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Snoring, a common sleep-disordered breathing phenomenon, impairs sleep quality for both the sufferer and any bed partner. While mild snoring primarily disrupts sleep continuity, severe cases often indicate obstructive sleep apnea (OSA), a disorder affecting 9–17% of the global population, linked to significant comorbidities and socioeconomic burden (see Introduction for supporting data). Here, we propose a low-cost, real-time snoring detection and intervention system that integrates a multiple-spectrum deep learning framework with an Internet of Things (IoT)-enabled smart pillow. The modified Parallel Convolutional Spatiotemporal Network (PCSN) combines three parallel convolutional neural network (CNN) branches processing Constant-Q Transform (CQT), Synchrosqueezing Wavelet Transform (SWT), and Hilbert–Huang Transform (HHT) features with a Long Short-Term Memory (LSTM) network to capture spatial and temporal characteristics of sounds associated with snoring. The smart pillow prototype incorporates two Micro-Electro-Mechanical System (MEMS) microphones, an ESP8266 off-shelf board, a speaker, and two vibration motors for real-time audio acquisition, cloud-based processing via Arduino cloud, and closed-loop haptic/audio feedback that encourages positional changes without fully awakening the snorers. Experiments demonstrated that the modified PCSN model achieves 98.33% accuracy, 99.29% sensitivity, 98.34% specificity, 98.3% recall, and 98.32% F1-score, outperforming existing systems. Hardware costs are under USD 8 and a smartphone app provides authorized users with real-time visualization and secure data access. This solution offers a cost-effective and accurate approach for home-based OSA screening and intervention.

Keywords:

snoring detection and intervention; deep learning algorithm; cloud server; internet of things; constant-Q transform; synchrosqueezing wavelet transform; Hilbert-Huang transform

1. Introduction

Snoring is a common sleep-related breathing phenomenon. Snoring starts when the muscles surrounding the throat, including the tongue, relax during sleep. This allows soft tissues such as the tongue to push backwards when the head is in a supine posture, narrowing the upper airway and triggering vibrations that cause snoring. Mild snoring (stertor) relates to the noise itself and may primarily affect the sleep quality of others; however, severe snoring is loud and disruptive, involving repeated pauses in breathing, and is referred to as obstructive sleep apnea (OSA) [1]. The pathological feature of OSA is the recurrent collapse of the upper airway during sleep, resulting in partial or complete interruption of airflow [1]. The global prevalence of OSA is reportedly between 9% and 17% of the population [2], with over 176 million people in China—the highest number in the world—estimated as being affected [3,4]. Despite such a high prevalence, approximately 85% of patients remain undiagnosed [3]. These factors underscore the importance of developing a low-cost, non-invasive, yet reliable method for long-term monitoring.

OSA is not only strongly associated with increased mortality risk but also linked to multiple comorbidities, including cardiovascular disease, diabetes, obesity, sexual dysfunction, arrhythmias, headaches, and stroke [5]; additionally, it is worth noting that this condition can lead to relationship problems as well as health complications [6]. Clinically, polysomnography (PSG) remains the gold standard for OSA diagnosis. However, its testing process is cumbersome, costly, and heavily reliant on specialized resources, making it clearly unable to cover the vast potential patient population [7]. In terms of OSA identification, different physiologic characteristics have been utilized, which include respiratory [8,9], SpO₂ [10,11,12], and ECG [13,14,15] signals. These methods generally require feature extraction and feature selection, which require a great deal of expertise and are thus subject to personnel experience. Such limitations can directly hinder the widespread adoption and implementation of such methods in clinics and home settings, causing patients to miss opportunities for early diagnosis and intervention, risking further exacerbation of the disease burden. Annual per-patient expenditures are projected to reach USD 28,000 in the U.S. and range from 1700 to 5000 EURO in Europe [16,17].

In recent years, deep learning has emerged as an attractive alternative for substituting for experienced clinicians in tasks such as recognizing characteristics of disease, due primarily to its ability to automatically extract features and perform classification tasks end-to-end. Due to their exceptional performance in a number of domains, such as text recognition [18], speech recognition [19], and visual imagery [20], researchers have previously attempted to use deep learning algorithms to automatically detect snoring (Table 1).

However, existing models demonstrate significant limitations in robustness under complex and noisy environments. Research remains largely confined to offline analysis without real-time capabilities, and most models inadequately integrate with closed-loop intervention strategies, limiting their adoption in devices targeting clinical and home settings.

Given the aforementioned challenges, this paper proposes an innovative multi-spectrum deep learning framework by constructing a hybrid deep learning network that integrates parallel CNN models with LSTM for robust snoring detection. This model integrates multiple acoustic representations, including raw waveforms, Constant-Q Transform (CQT), Synchrosqueezing Wavelet Transform (SWT), and Hilbert–Huang Transform (HHT), enabling it to capture local spatial features while effectively modeling time-dependent relationships [24,25,26]. Building upon this algorithm, we have further developed a low-cost smart pillow prototype system. It incorporates MEMS microphones, haptic feedback via vibration motors, and miniature speakers to deliver real-time, personalized, closed-loop interventions through the embedded system and a cloud server connected via Wi-Fi. The main contributions of this work include the following: (1) constructing a multiple-spectrum dataset; (2) systematically training and testing the mixed deep learning model; (3) evaluating detection performance under different feature representations; and (4) testing the practicality of the prototype system in real-world scenarios. The results demonstrate that the proposed solution exhibits high accuracy and strong robustness in complex environments, while the prototype system validates its real-time performance and clinical feasibility.

In the following text, Section 2 summarizes the research underpinning snoring detection; Section 3 introduces the research methodology and system design, including the deep learning framework and hardware implementation; Section 4 presents the results and a performance evaluation; Section 5 compares the state-of-the-art algorithms with the proposed method and discusses different spectral feature combinations and limitations; Section 6 presents a summary.

2. Related Work

Traditional snoring detection methods include time-domain, frequency-domain, time–frequency, and chaotic signal analyses [27,28,29]. Time-domain analysis methods examine signal variations directly along the time axis and detect snoring through features such as short-time energy, zero-crossing rate, and amplitude [27]. Frequency-domain analysis transforms signals from the time domain into the frequency domain and examines their spectral components using techniques such as spectral analysis, power spectral density analysis, and Mel-frequency cepstral coefficients (MFCCs) [28]. Time–frequency analysis combines the advantages of both time- and frequency-domain characteristics and enables simultaneous investigation of the temporal and spectral variations, with commonly employed methods including short-time Fourier transform (STFT) and wavelet transform [25,26]. Chaotic signal analysis identifies snoring by analyzing chaotic characteristics within snoring signals, such as calculating Lyapunov exponents and fractal dimensions [29].

These methods cannot reliably recognize snoring individually due to key limitations, such as the following: vulnerability of time-domain techniques to ambient noise [30]; requirement for stationary signals in frequency-domain methods [31]; and inability of time–frequency approaches with fixed window lengths to simultaneously capture fast transients and low-frequency structures [32]. Although the wavelet transform provides adjustable resolution that highlights transient events, it is not shift-invariant: a small temporal shift in the snore burst can completely change the coefficient pattern, which can produce unstable features and reduce reproducibility when the sleeper changes posture. Chaotic signal analysis, although effective at extracting nonlinear characteristics from snoring signals, requires high-quality input signals [33]. In practical scenarios, snoring signals are often contaminated by various forms of noise and interference (e.g., cough, speech, and bed squeaks), which can compromise the accurate extraction of chaotic features.

In recent years, multiple feature extraction techniques have driven significant advances in reliable snoring detection [34,35]. For instance, Qian et al. [23] recorded snore sounds from 40 male patients and extracted nine complementary acoustic feature sets (crest-factor, F0, formants, spectral statistics, power ratio, sub-band energy ratio, MFCCs, EMD-based, and wavelet-energy features). After ReliefF-based feature selection, a multi-feature fusion fed to a random forest classifier achieved 78% unweighted average recall (UAR) in subject-independent tests. In another study, Mawla et al. [36] proposed a snoring-signal enhancement method that uses multivariate variational mode decomposition (MVMD) and non-negative matrix factorization (NMF) to isolate snoring within mixed interference; however, this approach is computationally intensive and requires manual tuning of the mode count and NMF rank: sub-optimal values immediately lower separation accuracy.

Despite the aforementioned advances in signal processing and algorithmic architectures, challenges remain in achieving cost-effectiveness, non-invasiveness, and practical deployment. In contrast to prior work, the proposed approach extracts multi-scale features that integrate CNN-based multi-spectrum spatial patterns within an LSTM temporal context, enabling high-sensitivity and low-false-alarm snore detection. The system supports real-time extraction of multiple features and model inference, while also facilitating closed-loop intervention. Additionally, IoT cloud-enabled remote monitoring balances accuracy, real-time performance, and power efficiency, rendering the proposed solution well-suited for both domestic and clinical applications.

3. Materials and Methods

The snoring detection and intervention system integrates a smart pillow, an IoT-enabled platform (Arduino cloud), and a cloud server (Figure 1). The smart pillow is equipped with an audio acquisition module, a microcontroller with Wi-Fi communication function, and haptic/sound feedback mechanisms. It continuously captures ambient sound and transmits the data to the cloud server via the IoT network. In the cloud, a deep learning model processes the incoming audio and classifies it as either snoring or non-snoring. Both the raw audio data and the classification results are stored in the cloud, enabling later review and analysis by authorized clinicians or users. When snoring is detected, the server sends a control command back to the microcontroller, activating the speaker and vibration motors to provide real-time intervention.

3.1. Smart Pillow

The smart pillow includes three functional layers: the sound-sense layer, the signal processing layer, and the alert-action layer. These three layers are powered by a 5 V, 2000 mAh Li-Po rechargeable battery (NX10C, Nuoxiang Digit Co. Shenzhen, China). All hardware components were purchased from Taobao.com for less than USD 8.

The sound-sense layer consists of two high-sensitivity MEMS microphone modules (INMP441, Zejie Electronic Co., Zhejiang, China), which capture respiratory sounds and forward the resulting analog signals to the signal processing layer. An ESP8266 (Tensilica Xtensa LX106, Shenzhen Guiyuanjing Technology Co., Ltd., Shenzhen, China) off-the-shelf board in the signal processing layer digitizes the raw acoustic signals and transfers them to the cloud server for computation. When snoring is detected, the microcontroller instantly activates vibration motors (9000RPM, DZQJ Industrial Co., Guandong, China) and a miniature speaker module (XFS-5152, Dengshitang Tech. Co., Shenzhen, China) in the alert-action layer to deliver vibration and gentle audio cues that encourage positional change without fully waking the user. Audio content is stored on and controlled via an SD card module (HC 4GB, SeeHope Digital Co., Fujian, China).

The ESP8266 board, the SD card module, and the battery were embedded in the center of the foam pillow, with the speaker module positioned in a top-surface slot for audio alerts (Figure 2). Two vibration motors and two microphones are evenly spaced on the left and right sides of the speaker. The left–right symmetric placement of sensors is intentionally designed to enhance snoring capture as it ensures consistent audio pickup regardless of the sleeper’s head orientation (e.g., side-lying or supine). This symmetry minimizes signal loss from snoring sounds and optimizes haptic feedback delivery, further improving detection reliability and intervention effectiveness. Through the integrated Wi-Fi module, the ESP8266 board can be seamlessly connected to the Arduino cloud.

3.2. Arduino Cloud

The Arduino cloud serves as a secure data warehouse that timestamps and stores every recorded event forwarded by the microcontroller inside the smart pillow. It also provides role-based access control where authorized clinicians and individuals would be able to retrieve both raw overnight traces and long-term history. All data is stored in the cloud, updated automatically, and can be exported to a trusted electronic device (e.g., computer, laptop, or smartphone) at any time.

We also developed a custom smartphone application (SnoreTrack) that synchronizes wirelessly via the Arduino cloud. Compatible with both Android V10 and iOS V4.1.1, SnoreTrack IoT Cloud Remote V1.1.3(83) displays a real-time snoring intensity (0–100 scale) on a gauge chart while logging timestamp curves for analysis. It also visualizes real-time amplitude and historical trends and can be used to locate the person snoring in emergencies. Privacy is protected through user authentication, which requires valid credentials.

3.3. Cloud Server

Use of a cloud server is convenient as it provides a space to implement data processing and the deep learning algorithm, as well as to transmit the analyzed results back to the ESP8266 embedded in the smart pillow, thereby enabling closed-loop control implementation.

3.3.1. Data Processing

Snoring sounds are inherently non-stationary, so Fourier transform (FT)-based methods cannot capture their time-varying details [25,34,35,36]. To reveal time–frequency characteristics of snoring sounds, we selected Constant-Q Transform (CQT), Synchrosqueezing Wavelet Transform (SWT), and Hilbert–Huang Transform (HHT) to represent the characteristics of the raw data.

CQT was originally developed for music analysis, as the human auditory system perceives pitch on a logarithmic rather than linear scale. By maintaining a constant quality factor across all spectral bins, the CQT allocates frequency channels that increase in absolute Hertz width while preserving a fixed relative bandwidth. The quality factor (Q) is defined [37]:

Q = \frac{f_{k}}{∆ f_{k}}

(1)

where

f_{k}

is called the center frequency, k is the frequency index, and

∆ f_{k}

represents the bandwidth at frequency

f_{k}

. Unlike the FT’s uniformly spaced bins, CQT employs a logarithmic frequency scale where each bin’s bandwidth is proportional to its center frequency. This Constant-Q property yields higher frequency resolution at low frequencies and superior time resolution at high frequencies, matching both human perception of musical intervals and the localization demands of transient events.

Mathematically, the CQT coefficient at time index n is obtained by projecting the signal s(n) onto a complex exponential function [37]:

S_{c q t} (k) = \frac{1}{N_{k}} \sum_{n = 0}^{N_{k} - 1} s (n) ω_{N_{k}} (n) \exp (- j \frac{2 π n Q}{N_{k}})

(2)

where

N_{k}

is the length of the window function, and

ω_{N_{k}} (n)

is the window function.

Since the window length Nₖ scales inversely with fₖ, CQT’s time–frequency tiling is non-uniform: low-frequency kernels use long windows (high frequency resolution and low time resolution), while high-frequency kernels use short windows (low frequency resolution and high time resolution). This enables CQT to precisely capture signals with frequency-dependent characteristics.

2.: SWT [38] is a re-assignment-based technique that sharpens the classical Continuous Wavelet Transform (CWT). It achieves this by collapsing energy exclusively along the frequency dimension, while leaving the time axis unaltered. The objective is to transform the blurred CWT scalogram into a sparse, invertible time–frequency representation. This representation can accurately trace the instantaneous frequency (IF) of each oscillatory component.

Let the mother wavelet

ψ

\in L^{2} (R)

satisfy the usual admissibility condition [38]; for a signal

f \in L^{2} (R)

, the CWT with respect to the wavelet

ψ

is defined as

W_{f} (a, b) ≜ \frac{1}{\sqrt{a}} \int_{- \infty}^{\infty} f (t) ψ^{*} (\frac{t - b}{a}) d t, a \in R^{+}, b \in R

(3)

where

a

is the scale parameter,

b

is the translation parameter,

ψ^{*}

denotes the complex conjugate of the wavelet function

ψ

, and

\frac{1}{\sqrt{a}}

is a normalization factor to ensure energy preservation across scales.

Let

ε > 0

, for every (

a, b

) such that ∣

W_{f} (a, b)

∣ >

ε

. The local instantaneous frequency is estimated by

ω_{f} (a, b) ≜ \frac{1}{2 π} I (\frac{\partial_{b} W_{f} (a, b)}{W_{f} (a, b)})

(4)

where the symbol ℑ is the imaginary part operator: ℑ[z] = Im(z).

After obtaining the instantaneous angular frequency, the CWT coefficients originally distributed in the time-scale domain can be transformed into SWT coefficients

T_{f} (w_{e}, b) ≜ \sum_{a : | w_{f} (a, b) - w_{e} | \leq Δ ω / 2} W_{f} (a, b) a^{- 3 / 2} Δ a

(5)

where

{\{ω_{e}\}}_{e \in z}

is a uniform frequency lattice with spacing Δω.

Δ a

is the step size of the logarithmic scale grid used in practice.

3.: The HHT is a data-driven, adaptive time–frequency analysis method introduced by Huang et al. [39] specifically designed for analyzing nonlinear and non-stationary signals. Unlike traditional transforms that rely on predefined basis functions (e.g., FT or CWT), the HHT decomposes a signal into a finite set of Intrinsic Mode Functions (IMFs) using a process called Empirical Mode Decomposition (EMD). Each IMF represents a simple oscillatory mode embedded in the data, satisfying two conditions:

The number of extrema and zero-crossings must either be equal or differ by at most one.
The mean value of the upper and lower envelopes defined by local maxima and minima is zero at any point.

Once the IMFs are extracted, the Hilbert transform is applied to each IMF to obtain the corresponding analytic signal, from which instantaneous amplitude and instantaneous frequency can be derived as functions of time. The final result is a high-resolution Hilbert spectrum, which provides a time–frequency–energy representation of the original signal.

For a real-valued signal x(t), the EMD yields

x (t) = \sum_{i = 1}^{n} c_{i} (t) + r_{n} (t)

(6)

where

c_{i} (t)

are the IMFs and

r_{n} (t)

is the residual.

Applying the Hilbert transform to each IMF

c_{i} (t)

,

H [c_{i} (t)] = \frac{1}{π} P . V . \int_{- \infty}^{\infty} \frac{c_{i} (τ)}{t - τ} d τ

(7)

where P.V. denotes the Cauchy Principal Value to handle the singularity at

τ

= t and the analytic signal

z_{i} (t)

is constructed as

z_{i} (t) = c_{i} (t) + j H [c_{i} (t)] = a_{i} (t) e^{j θ_{i} (t)}

(8)

where

a_{i} (t)

is the instantaneous amplitude and

θ_{i} (t)

is the instantaneous phase. The instantaneous frequency is then

ω_{i} (t) = \frac{d θ_{i} (t)}{d t}

(9)

The Hilbert–Huang spectrum is defined as

H (ω, t) = \sum_{i = 1}^{n} a_{i} (t) \cdot δ (ω - ω_{i} (t))

(10)

which represents the time–frequency distribution of signal energy.

3.3.2. Deep Learning Framework

The Parallel Convolutional Spatiotemporal Network (PCSN) [40] simultaneously captures spatial and temporal features; therefore, we adopted its architecture for the snoring detection task (Figure 3). The three parallel CNN branches allow any branch to be removed without disrupting the remaining components. Owing to this advantage, the algorithm code is easy to maintain and tailor to practical applications.

The three CNN blocks receive input from different time–frequency transforms: CQT, SWT, and HHT, respectively. Within every branch, convolution, ReLU, max-pooling, and average-pooling layers learn the local spatial structure while preserving the distinctive time–frequency signatures inherited from the input transform. The three complementary representations are fused into a single, highly discriminative feature space that accentuates both spatial correlations and transient spectral patterns. In the lower branch, we retain the LSTM network to scan the sequence in chronological order, enriching the representation with full contextual awareness across the entire time span.

The feature maps generated by the three parallel CNNs are concatenated with the LSTM outcomes along the feature axis. This fusion operation combines the spectral structure with time sequences, producing a unified embedding that captures both spatial awareness and temporal sensitivity. The fused feature vector is fed into a fully connected layer that compresses the high-dimensional representation into the label space; a subsequent Softmax yields the final class probabilities.

3.4. Evaluation Metrics

In this study, a total of five performance indices were used to evaluate the model performance: accuracy, sensitivity, specificity, precision, and F1-score (

F_{1, score})

, which are defined as follows [40]:

A ccuracy = \frac{T P + T N}{T P + F N + F P + T N},

(11)

Sensitivity = \frac{T P}{T P + F N},

(12)

Specificity = \frac{T N}{T N + F P}

(13)

Precision = \frac{T P}{T P + F P}

(14)

F_{1, score} = \frac{2 \cdot T P}{2 \cdot T P + F N + F P}

(15)

where TP, TN, FP, and FN denote the true positive, true negative, false positive, and false negative, respectively.

3.5. Dataset

In this study, we compiled a diverse dataset combining online sources and participant recordings. First, 2 h of snoring audio were downloaded from publicly available sources (e.g., www.bilibili.com). To enhance diversity, we recruited 14 healthy international university students (Table 2) from the Republic Democratic of Congo, Ivory Coast, Nigeria, Benin, Bangladesh, Madagascar, and China. Participants used their smartphones to record snoring during sleep, yielding a total of 1.5 h of snoring audio. The study was approved by the Faculty Research Committee (Approval No. 20250310) and conducted in accordance with the Declaration of Helsinki. All participants provided informed consent after receiving detailed verbal and written explanations of the procedures. All data was fully anonymized. Individuals reporting excessive daytime drowsiness, restless sleep, or morning headaches were excluded.

Additionally, we downloaded 3.5 h of non-snoring audio representing common night-time acoustic events, including baby crying, clock ticking, door slamming, toilet flushing, emergency sirens, rain, thunder, streetcar rumbling, conversational speech, and television news [24]. Audacity (v3.6.4), a free, open-source digital audio editor, was used to trim leading and trailing silences and segment audio files into one-second clips. This resulted in 9378 snoring and 6808 non-snoring samples. Both datasets were randomly split into training and testing sets at a 7:3 ratio.

4. Results

4.1. Prototype System

An ammeter was connected in series with the system to assess power consumption. In idle mode, the system drew 66.8 mA; during active operation (i.e., snoring detected, both vibration motors and the speaker work), this increased to 191.1 mA. The prototype is powered by a 2000 mAh Li-Po rechargeable battery. Assuming eight hours of nightly use (four hours in active mode and four in idle), total energy consumption would be approximately 4 × 66.8 mA + 4 × 191.1 mA = 1031.6 mAh. Thus, a fully charged battery can sustain the smart pillow for an 8 h sleep cycle.

The custom-made smartphone app, SnoreTrack, reports the real-time snoring amplitude using the snore score and snore percentage during sleep time (as shown in Figure 4). In addition, it fuses GPS tagging based on Google maps and time-series analytics including the live curve and the history graph (e.g., 1 h, 1 day, and 7 days). In some situations, for instance, if the smart pillow cannot prevent the sleeper from snoring, the LED alarm will blink. This can be used to alert the relatives or caregivers who use the app so they can take further action, if necessary, to prevent the risk to the snorer (e.g., sleep apnea). All the data are stored on the Arduino cloud and can be downloaded for analysis in the file format of .CSV.

4.2. Experimental Environment

Training deep learning models on large datasets is computationally intensive. We therefore accelerated all experiments on an NVIDIA GeForce RTX 5090 GPU with CUDA and cuDNN and implemented them in PyTorch version 2.7.0 for its optimized primitives. We trained for 100 epochs with the Adamw optimizer along with a learning rate = 1 × 10⁻⁵ and batch size = 32.

4.3. Results of CQT, SWT, and HHT

To illustrate the difference between snoring and non-snoring sound in the time–frequency analysis, the results of CQT, SWT, and HHT for a randomly selected signal can be seen in Figure 5. The amplitude was normalized to optimally compare the difference. All the time–frequency outcomes are stored as images and used as inputs for the modified PCSN model.

4.4. Deep Learning Model Performance Evaluation

In order to build an optimal snoring detection architecture, the original acoustic dataset was first converted to CQT, SWT, and HHT images. Then the images were randomly shuffled and split into two groups along with the corresponding time-domain signals: 70% for training and 30% for testing. The model’s performance was evaluated using accuracy, sensitivity, specificity, recall, F1-score, and the loss function.

In Figure 6a, training accuracy began around 73% and climbed almost linearly to nearly 97% within the first 25 epochs, after which it plateaued. Testing accuracy followed a similar trajectory starting about 82% and reaching up to 98% at epoch 25. Both training and testing accuracy curves fluctuated minimally after 40 epochs. In Figure 6b, training loss can be seen to drop steeply from 0.6 to 0.07 within the first 20 epochs and continued to decrease asymptotically toward zero, reaching around 0.06 by epoch 100. Testing loss followed the training loss pattern starting from 0.5 and stabilizing around 0.06 after 20 epochs.

The confusion matrix (Figure 7) of the model revealed a remarkable ability to correctly detect snoring, achieving 98.32% precision and 98.30% recall on the training set, with virtually identical performance on the test set (98.34% precision and 98.33% recall).

5. Discussion

In this article, we proposed a novel approach for real-time snoring detection utilizing the modified PCSN framework to acquire temporal and spatial information from input sequences and transmit the results to the low-cost in-field system embedded in a pillow. All results are stored on the cloud, which can be retrieved by authorized users (e.g., clinicians).

5.1. Feature Performance Evaluation

To quantify the discriminative capability of individual time–frequency representations, each feature map (CQT, SWT, HHT) and its fusion are successively fed into the modified PCSN while the unrelated components are held constant. The three parallel CNN branches can be independently enabled or disabled at runtime without perturbing the remaining architecture. Throughout these experiments, the LSTM pathway, responsible for modeling temporal dynamics, was left untouched to ensure a fair comparison. The resulting metrics are summarized in Table 3.

The comparative analysis showed that the CQT branch alone attains 98.15% accuracy, 98.04% precision, 98.43% recall, and 98.23% F1-score in only 0.12 s. Augmenting the network with the SWT or HHT branch yields negligible further improvement yet increases inference time more than three-fold if all three features are utilized (0.43 s). Consequently, the CQT-CNN + LSTM model was retained for real-time snoring detection. The superior performance of the CQT representation is likely attributed to its high frequency resolution at lower frequencies, which effectively captures the fundamental harmonic structures characteristic of snoring sounds, while its progressive time resolution at higher frequencies adequately represents the transient onset of each snore.

Although the additional features did not enhance snoring detection, a preliminary experiment indicated that the three-branch (CQT + SWT + HHT) ensemble obviously improves discrimination of different non-snoring events (e.g., infant crying, door closing, television news), demonstrating the effectiveness of the modified PCSN architecture (Figure 8).

5.2. Comparison with Other Studies

Numerous studies have introduced experimental approaches to distinguish snoring from non-snoring sounds. Table 4 summarizes recent work and compares it with the proposed method, underscoring how different hardware and algorithm combinations affect classification accuracy. Several studies [39,40,41,42,43] employed time- and frequency-domain features paired with traditional machine learning algorithms; however, please note that these methods demanded large feature sets and required PCA for dimensionality reduction.

On the embedded-system aspect, Penagos et al. [44] employed an ESP32 microcontroller together with a MEMS microphone to extract snoring-related parameters (e.g., intensity, frequency, and duration) and transmitted the data via Wi-Fi to a remote server for processing; nevertheless, no performance metrics were reported. Khan [24] developed a CNN-based anti-snoring system, using a Raspberry Pi as the central processing unit and achieving an accuracy of 96% on a dataset of 1000 samples. In contrast, the proposed method delivers higher classification performance (98.33% accuracy) by exploiting non-stationary signal characteristics. The real-time system is designed to subliminally cause the sleeping snorer to immediately modify their sleeping position by use of sound and haptics. Additionally, the sleep noise data is recorded and uploaded to the cloud for later offline analysis or review by suitably qualified staff.

When comparing performance metrics, it is noteworthy that studies like Azarbarzin et al. [43] and Dafna et al. [45] also report high accuracy (98.6% and 98.4%, respectively). However, these studies relied on extensive manual feature selection (e.g., 127 features in [45]). Additionally, prior high-accuracy methods required either tracheal microphones attached directly to the participant [43] or placement that may not be suitable for bedroom settings (1 m above the sleeper [45]); notably, the Rode microphone used in [45] costs over 20 USD per unit. In contrast, our method leverages an end-to-end deep learning framework that automatically learns optimal features from multi-spectrum time–frequency representations, reducing the need for expert-driven feature design. Furthermore, a key contribution of our work is the integration of this high-performance algorithm into a low-cost and real-time intervention system with closed-loop feedback, which was not the focus of the aforementioned studies.

5.3. Limitations

Whilst this technology demonstrates potential for the creation of home-based sleep health management for those individuals with sleep-disordered breathing, several study limitations must be acknowledged.

First, the dataset was limited in both size and clinical scope, lacking live data from real-world patients, using only pre-recorded data. We plan to collect data across multiple medical centers to evaluate the effectiveness of the system on patients diagnosed with OSA.

Second, the testing duration was deemed insufficient. Although several hours of data were collected, this did not encompass a full overnight sleep period. Extended trials are necessary to evaluate performance throughout complete sleep cycles.

Third, while the device intervenes upon detection of ≥4 snore events exceeding 50 dB within a rolling 60 s epoch, this literature-derived threshold [43,51] was chosen to balance intervention responsiveness with minimizing false alarms from transient noises. However, this threshold has not been clinically validated for our specific device, and its sensitivity to variation remains a topic for future user-specific calibration. Without video monitoring, we cannot confirm whether interventions induce side-sleeping, fail to arouse the sleeper, or cause unintended awakenings and device removal. Subjective feedback from participants is also required to evaluate how system noticeability affected perceived sleep quality.

Fourth, software integration can be improved. The current system uses Arduino cloud for data storage and visualization, while a separate cloud server runs the algorithm. These components should be consolidated into a unified architecture. The Arduino cloud was selected for its seamless IoT solutions and development-friendly widgets for control and visualization.

Fifth, pillow deformation during sleep may attenuate snoring sounds and thereby impact microphone sensitivity. In future trials, we plan to mitigate this by using medium-thickness foam to reinforce the sensor-embedded slots, which will enhance structural stability and minimize user discomfort. Furthermore, while our haptic feedback is designed to avoid disrupting sleep, its potential impact on light sleepers—including the risk of unintended awakenings—remains unevaluated in the current study. This gap requires further investigation through subjective user questionnaires and extended trials to assess sleep quality outcomes. The authors are aware that alternative mechanisms exist for modifying sleeping positions to facilitate snoring detection (e.g., pillow inflation). However, such technologies have not yet been compared to the approach trialed herein to determine their relative efficacy.

Sixth, it is important to clarify the clinical scope of the current system. While snoring is the target event, its primary significance lies in its association with OSA. This work focuses on robust snoring detection as a critical first step towards home-based OSA screening. A high frequency and intensity of detected snoring events can serve as a strong indicator for recommending professional PSG evaluation. Although OSA is a major sleep-related clinical concern affecting 9–17% of the population [1,2], snoring itself disrupts sleep for approximately 60% of men and 40% of women (either the snorer or their sleeping partner) [43]. Beyond clinical diagnoses, this high prevalence of sleep disruption underscores the importance of addressing snoring detection accuracy. Thus, accurate snoring detection is positioned as the primary challenge to resolve.

Despite these limitations, our proposed system has achieved significant progress toward real-time snoring detection in terms of

Cost-effectiveness: Hardware costs under USD 8 including two microphones, two vibration motors, a speaker, an SD card module, an ESP8266 off-shelf board, and a 5000 mAh battery.
High accuracy: Integration of temporal–spatial features with our modified PSCN model yields classification accuracy exceeding 98%.
Secure cloud storage: Historical data stored on Arduino cloud is accessible by authorized personnels for post hoc clinical diagnosis and treatment.

6. Conclusions

This paper introduces a low-cost, IoT-enabled snoring detection and intervention system that combines a multi-spectral deep learning algorithm with real-time closed-loop feedback. The modified PCSN fuses CQT, SWT, and HHT features with an LSTM, capturing the spatial–temporal dynamics of non-stationary snoring sounds. Experimental results demonstrate authenticated cloud storage, high-accuracy snoring detection, and haptic/acoustic intervention capabilities.

Author Contributions

Conceptualization, Z.L. and P.W.M.; data collection and analysis, K.K.O.P., G.L. and J.W.; methodology, Z.L. and P.W.M.; original draft preparation, K.K.O.P., G.L., J.W., T.H. and Y.X.; review and editing, Z.L. and P.W.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Faculty Research Committee of Measuring and Control Technology and Instrumentation (No. 20250310).

Informed Consent Statement

Written informed consent was obtained from the subject involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Acknowledgments

Authors would like to express their gratitude to the participation of volunteers. AI was merely used as a translation aid.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Molina, G.G.; Chellamuthu, V.; Gorski, B.; Siyahjani, F.; Babaeizadeh, S.; Mushtaq, F.; Mills, R.; McGhee, L.; DeFranco, S.; Aloia, M. 0325 Snoring and Obstructive Sleep Apnea Associations Through the Lens of a Smart Bed Platform. Sleep 2024, 47, A139–A140. [Google Scholar] [CrossRef]
Senaratna, C.V.; Perret, J.L.; Lodge, C.J.; Lowe, A.J.; Campbell, B.E.; Matheson, M.C.; Hamilton, G.S.; Dharmage, S.C. Prevalence of obstructive sleep apnea in the general population: A systematic review. Sleep Med. Rev. 2017, 34, 70–81. [Google Scholar] [CrossRef] [PubMed]
Wei, Y.; Liu, Y.; Ayas, N.; Laher, I. A narrative review on obstructive sleep apnea in China: A sleeping giant in disease pathology. Heart Mind 2022, 6, 232–241. [Google Scholar] [CrossRef]
Peppard, P.E.; Young, T.; Barnet, J.H.; Palta, M.; Hagen, E.W.; Hla, K.M. Increased prevalence of sleep-disordered breathing in adults. Am. J. Epidemiol. 2013, 177, 1006–1014. [Google Scholar] [CrossRef]
Jordan, A.S.; McSharry, D.G.; Malhotra, A. Adult obstructive sleep apnoea. Lancet 2014, 383, 736–747. [Google Scholar] [CrossRef]
Costa, C.C.; Afreixo, V.; Cravo, J. Impact of Obstructive Sleep Apnea Treatment on Marital Relationships: Sleeping Together Again? Cureus 2023, 15, e46513. [Google Scholar] [CrossRef]
Berry, R.B.; Brooks, R.; Gamaldo, C.E.; Harding, S.M.; Marcus, C.; Vaughn, B.V. The AASM Manual for the Scoring of Sleep and Associated Events. Rules, Terminology and Technical Specifications; American Academy of Sleep Medicine: Darien, IL, USA, 2012; Volume 176, p. 7. [Google Scholar]
Lakhan, P.; Ditthapron, A.; Banluesombatkul, N.; Wilaiprasitporn, T. Deep Neural Networks with Weighted Averaged Overnight Airflow Features for Sleep Apnea–Hypopnea Severity Classification. In Proceedings of the TENCON 2018—IEEE Region 10 Conference, Jeju, Republic of Korea, 28–31 October 2018; pp. 441–445. [Google Scholar]
Zhang, J.; Tang, Z.; Gao, J.; Lin, L.; Liu, Z.; Wu, H.; Liu, F.; Yao, R. Automatic detection of obstructive sleep apnea events using a deep CNN-LSTM model. Comput. Intell. Neurosci. 2021, 2021, 5594733. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Wong, K.K.; Rowsell, L.; Don, G.W.; Yee, B.J.; Grunstein, R.R. Predicting response to oxygen therapy in obstructive sleep apnoea patients using a 10-minute daytime test. Eur. Respir. J. 2018, 51, 1701587. [Google Scholar] [CrossRef]
Lin, Y.-Y.; Wu, H.-T.; Hsu, C.-A.; Huang, P.-C.; Huang, Y.-H.; Lo, Y.-L. Sleep Apnea Detection Based on Thoracic and Abdominal Movement Signals of Wearable Piezoelectric Bands. IEEE J. Biomed. Health Inform. 2016, 21, 1533–1545. [Google Scholar] [CrossRef]
Gutta, S.; Cheng, Q. Modeling of oxygen saturation and respiration for sleep apnea detection. In Proceedings of the 2016 50th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 6–9 November 2016; pp. 1636–1640. [Google Scholar]
Zarei, A.; Asl, B.M. Automatic detection of obstructive sleep apnea using wavelet transform and entropy-based features from single-lead ECG signal. IEEE J. Biomed. Health Inform. 2018, 23, 1011–1021. [Google Scholar] [CrossRef]
Cheng, C.; Kan, C.; Yang, H. Heterogeneous recurrence analysis of heartbeat dynamics for the identification of sleep apnea events. Comput. Biol. Med. 2016, 75, 10–18. [Google Scholar] [CrossRef]
Urtnasan, E.; Park, J.-U.; Joo, E.-Y.; Lee, K.-J. Automated detection of obstructive sleep apnea events from a single-lead electrocardiogram using a convolutional neural network. J. Med. Syst. 2018, 42, 104. [Google Scholar] [CrossRef]
Zappalà, P.; Lentini, M.; Ronsivalle, S.; Lavalle, S.; La Via, L.; Maniaci, A. The Global Socioeconomic Burden of Obstructive Sleep Apnea: A Comprehensive Review. Healthcare 2025, 13, 2115. [Google Scholar] [CrossRef]
Yang, M.S.; Abdallah, M.B.; Bashir, Z.; Khalife, W. Heart Failure Beyond the Diagnosis: A Narrative Review of Patients’ Perspectives on Daily Life and Challenges. J. Clin. Med. 2024, 13, 7278. [Google Scholar] [CrossRef]
Wang, T.; Wu, D.J.; Coates, A.; Ng, A.Y. End-to-End Text Recognition with Convolutional Neural Networks. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 11–15 November 2012; pp. 3304–3308. [Google Scholar]
Abdel-Hamid, O.; Mohamed, A.-r.; Jiang, H.; Penn, G. Applying Convolutional Neural Networks Concepts to Hybrid NN-HMM Model for Speech Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), Kyoto, Japan, 25–30 March 2012; pp. 4277–4280. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Banluesombatkul, N.; Ouppaphan, P.; Leelaarporn, P.; Lakhan, P.; Chaitusaney, B.; Jaimchariyatam, N.; Chuangsuwanich, E.; Chen, W.; Phan, H.; Dilokthanakul, N. MetaSleepLearner: A Pilot Study on Fast Adaptation of Bio-Signals-Based Sleep Stage Classifier to New Individual Subjects Using Meta-Learning. IEEE J. Biomed. Health Inform. 2020, 25, 1949–1963. [Google Scholar] [CrossRef] [PubMed]
Guilleminault, C.; Winkle, R.; Connolly, S.; Melvin, K.; Tilkian, A. Cyclical variation of the heart rate in sleep apnoea syndrome: Mechanisms, and usefulness of 24 h electrocardiography as a screening technique. Lancet 1984, 323, 126–131. [Google Scholar] [CrossRef]
Jiang, Y.; Peng, J.; Zhang, X. Automatic snoring sounds detection from sleep sounds based on deep learning. Phys. Eng. Sci. Med. 2020, 43, 679–689. [Google Scholar] [CrossRef]
Khan, T. A deep learning model for snoring detection and vibration notification using a smart wearable gadget. Electronics 2019, 8, 987. [Google Scholar] [CrossRef]
Qian, K.; Janott, C.; Pandit, V.; Zhang, Z.; Heiser, C.; Hohenhorst, W.; Herzog, M.; Hemmert, W.; Schuller, B. Classification of the excitation location of snore sounds in the upper airway by acoustic multifeature analysis. IEEE Trans. Biomed. Eng. 2016, 64, 1731–1741. [Google Scholar] [CrossRef] [PubMed]
Fang, Y.; Liu, D.; Zhao, S.; Deng, D. Improving OSAHS prevention based on multidimensional feature analysis of snoring. Electronics 2023, 12, 4148. [Google Scholar] [CrossRef]
Li, Y.; Xu, L.; Wang, P.; Ding, B.; Zhao, S.; Wang, Z. Ultra-wideband radar detection based on target response and time reversal. IEEE Sens. J. 2024, 24, 14750–14762. [Google Scholar] [CrossRef]
Wang, K.; Fu, X.; Ge, C.; Cao, C.; Zha, Z.-J. Towards generalized uav object detection: A novel perspective from frequency domain disentanglement. Int. J. Comput. Vis. 2024, 132, 5410–5438. [Google Scholar] [CrossRef]
Jin, W.; Wang, X.; Zhan, Y. Environmental sound classification algorithm based on region joint signal analysis feature and boosting ensemble learning. Electronics 2022, 11, 3743. [Google Scholar] [CrossRef]
Tadem, S.P. Traditional Methods in Edge, Corner and Boundary Detection. arXiv 2022, arXiv:2208.07714. [Google Scholar] [CrossRef]
Wibisono, J.K.; Hang, H.-M. Traditional Method Inspired Deep Neural Network for Edge Detection. In Proceedings of the IEEE International Conference on Image Processing (ICIP 2020), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 678–682. [Google Scholar]
Yang, H.; Wang, L.; Zhang, J.; Cheng, Y.; Xiang, A. Research on Edge Detection of LiDAR Images Based on Artificial Intelligence Technology. arXiv 2024, arXiv:2406.09773. [Google Scholar] [CrossRef]
Yamout, Y.; Yeasar, T.S.; Iqbal, S.; Zulkernine, M. Beyond smart homes: An in-depth analysis of smart aging care system security. ACM Comput. Surv. 2023, 56, 1–35. [Google Scholar] [CrossRef]
Mostafa, S.S.; Mendonça, F.; Ravelo-García, A.G.; Morgado-Dias, F. A systematic review of detecting sleep apnea using deep learning. Sensors 2019, 19, 4934. [Google Scholar] [CrossRef]
Prabhakar, S.K.; Rajaguru, H.; Won, D.-O. Coherent Feature Extraction with Swarm Intelligence Based Hybrid Adaboost Weighted ELM Classification for Snoring Sound Classification. Diagnostics 2024, 14, 1857. [Google Scholar] [CrossRef] [PubMed]
Al Mawla, M.; Chaccour, K.; Fares, H. A novel enhancement approach following MVMD and NMF separation of complex snoring signals. IEEE Trans. Biomed. Eng. 2023, 71, 494–503. [Google Scholar] [CrossRef]
Yan, J.; Liao, J.; Zhang, W.; Dai, J.; Huang, C.; Li, H.; Yu, H. Graph Convolutional Network Based on CQT Spectrogram for Bearing Fault Diagnosis. Machines 2024, 12, 179. [Google Scholar] [CrossRef]
Jiang, T.; Liu, B.; Liu, G.; Wang, B.; Li, X.; Zhang, J. Forced oscillation source location of bulk power systems using synchrosqueezing wavelet transform. IEEE Trans. Power Syst. 2024, 39, 6689–6701. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. London. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Liu, Z.; Li, G.; Wang, C.; Cascioli, V.; McCarthy, P.W. Unobtrusive Sleep Posture Detection Using a Smart Bed Mattress with Optimally Distributed Triaxial Accelerometer Array and Parallel Convolutional Spatiotemporal Network. Sensors 2025, 25, 3609. [Google Scholar] [CrossRef]
Duckitt, W.; Tuomi, S.; Niesler, T. Automatic detection, segmentation and assessment of snoring from ambient acoustic data. Physiol. Meas. 2006, 27, 1047. [Google Scholar] [CrossRef]
Cavusoglu, M.; Kamasak, M.; Erogul, O.; Ciloglu, T.; Serinagaoglu, Y.; Akcam, T. An efficient method for snore/nonsnore classification of sleep sounds. Physiol. Meas. 2007, 28, 841. [Google Scholar] [CrossRef] [PubMed]
Azarbarzin, A.; Moussavi, Z.M. Automatic and unsupervised snore sound extraction from respiratory sound signals. IEEE Trans. Biomed. Eng. 2010, 58, 1156–1162. [Google Scholar] [CrossRef]
Penagos, H.P.; Mahecha, E.M.; Camargo, A.M.; Jimenez, E.S.; Sarmiento, D.A.C.; Salazar, S.V.H. Detection, recognition and transmission of snoring signals by ESP32. Meas. Sens. 2024, 36, 101397. [Google Scholar] [CrossRef]
Dafna, E.; Tarasiuk, A.; Zigel, Y. Automatic detection of whole night snoring events using non-contact microphone. PLoS ONE 2013, 8, e84139. [Google Scholar] [CrossRef]
Swarnkar, V.R.; Abeyratne, U.R.; Sharan, R.V. Automatic picking of snore events from overnight breath sound recordings. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju Island, Repulic of Korea, 11–15 July 2017; pp. 2822–2825. [Google Scholar]
Arsenali, B.; van Dijk, J.; Ouweltjes, O.; den Brinker, B.; Pevernagie, D.; Krijn, R.; van Gilst, M.; Overeem, S. Recurrent neural network for classification of snoring and non-snoring sound events. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 328–331. [Google Scholar]
Shin, H.; Cho, J. Unconstrained snoring detection using a smartphone during ordinary sleep. Biomed. Eng. Online 2014, 13, 116. [Google Scholar] [CrossRef]
Xie, J.; Aubert, X.; Long, X.; van Dijk, J.; Arsenali, B.; Fonseca, P.; Overeem, S. Audio-based snore detection using deep neural networks. Comput. Methods Programs Biomed. 2021, 200, 105917. [Google Scholar] [CrossRef] [PubMed]
Chao, Y.-P.; Chuang, H.-H.; Lo, Y.-L.; Huang, S.-Y.; Zhan, W.-T.; Lee, G.-S.; Li, H.-Y.; Shyu, L.-Y.; Lee, L.-A. Automated sleep apnea detection from snoring and carotid pulse signals using an innovative neck wearable piezoelectric sensor. Measurement 2025, 242, 116102. [Google Scholar] [CrossRef]
Chung, T.T.; Lee, M.T.; Ku, M.C.; Yang, K.C.; Wei, C.Y. Efficacy of a smart antisnore pillow in patients with obstructive sleep apnea syndrome. Behav. Neurol. 2021, 2021, 8824011. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Architecture of the snoring detection and intervention system integrating the smart pillow, the IoT platform (Arduino cloud), and the cloud server.

Figure 2. Configuration of the smart pillow. (a) Pillow with the cover, (b) sensors embedded in the slots of the foam pillow with internal wiring, and (c) ESP8266 board (Tensilica Xtensa LX106, Shenzhen Guiyuanjing Technology Co., Ltd.,Shenzhen, China), SD card module, and the battery.

Figure 3. Structure of the modified Parallel Convolutional Spatiotemporal Network (PCSN) for snoring detection, including the three parallel CNN blocks and the lower single LSTM module.

Figure 4. Screenshots of the smartphone app using recorded snoring sound data to test the function. (a) Real-time snoring detection. (b) History curves for the 1 h snoring detection. AVG represents average amplitude with the unit of dB.

Figure 5. Comparison between non-snoring and snoring sound in the time and time–frequency domain: (a) normalized non-snoring sound (baby crying) in the time domain, (b) normalized snoring sound in the time domain, (c) CQT of non-snoring sound, (d) CQT of snoring sound, (e) SWT of non-snoring sound, (f) SWT of snoring sound, (g) HHT of non-snoring sound, and (h) HHT of snoring sound.

Figure 6. Averaged accuracy and loss curves of model training and testing sets based on 5 independent trials. (a) Accuracy curves with mean ± SD and (b) loss curves with mean ± SD.

Figure 7. Confusion matrix of training and testing dataset: (a) Results of the training dataset. (b) Results of the testing dataset.

Figure 8. Classification of fifteen test samples (color-coded): 5 snoring (green), 4 baby crying (yellow), 3 TV news (purple), and 3 door closing (red) sounds. Gray denotes background environmental noise.

Table 1. Snoring detection using different deep learning algorithms.

References	Strengths	Key Metrics	Weaknesses
Banluesombatkul et al. [21]	A novel MAML*-based meta sleep learner; Adapt sleep-stage classification to new individuals with minimal labeled data; Reduce clinician workload; Support human–machine collaboration; Provide interpretability through layer-wise relevance propagation.	A statistically significant 5.4–17.7% performance improvement over traditional deep learning-based methods (e.g., CNN* and RNN*)	Require substantial computational resources and lengthy training; Exhibit reduced accuracy for REM* stage, lack validation on real-world clinical datasets; Employ a simplified CNN architecture; Demonstrate limited generalization across diverse populations.
Zhang et al. [9]	Integrates CNN and LSTM* to automatically detect OSA* events from single-lead ECG*; Eliminate handcrafted features; Capture spatial and temporal ECG patterns effectively; Enable reliable real-time apnea detection for portable monitoring applications.	Accuracy: 96.1% Sensitivity: 96.1% Specificity: 96.2%	Restricted to OSA and normal event detection (excluding hypopnea); Exhibit reduced performance on noisy and transition epochs; Remain unvalidated across diverse clinical datasets or real-world environments.
Guilleminault et al. [22]	CVHR* provides a robust, physiologically grounded biomarker for non-invasive ECG/Holter-based screening of moderate-to-severe sleep-disordered breathing; Detect clinically significant events without full PSG*.	Not applicable	Night-to-night variability; Limited sensitivity for mild apnea/hypopnea; Reduced accuracy with noisy/ectopic ecgs or comorbid cardiac conditions.
Jiang et al. [23]	Employ CNN-based deep learning for automated snoring detection from sleep audio; Achieve high accuracy; noise robustness, and minimal manual feature extraction; Demonstrate strong potential for non-invasive sleep monitoring.	Accuracy: 95.07% Sensitivity: 95.42% Specificity: 95.82%	Limited dataset diversity and size; Overfitting to controlled laboratory conditions; Degraded performance in real-world noisy environments or with mixed sound sources; Insufficient validation across diverse populations and device types.
Khan [24]	Present a complete, low-cost CNN-based snoring detection and prevention system; Integrate a wearable vibration actuator, acoustic sensor module, and smartphone app; Low-power, real-time operation.	Accuracy: 96%	Small and non-clinical dataset; No long-term validation; Limited generalizability.

* Model Agnostic Meta-Learning (MAML), Convolution Neural Network (CNN), Long Short-Term Memory (LSTM), obstructive sleep apnea (OSA), Recurrent Neural Network (RNN), polysomnography (PSG), Cyclical Variation in Heart Rate (CVHR), NI, Rapid Eye Movement (REM), and electrocardiogram (ECG).

Table 2. Demographic information of all participants was reported as mean ± 1 standard deviation.

Participants	Age (Years)	Height (cm)	Body Mass (kg)
Male (n = 11)	26.0 ± 5.0	176.6 ± 3.1	84.8 ± 12.9
Female (n = 3)	36.0 ± 18.7	161.7 ± 9.0	72.3 ± 19.7
Overall (n = 14)	28.1 ± 9.6	173.4 ± 7.8	82.1 ± 14.7

Table 3. Classification results using different features.

Feature Type	Accuracy [%]	Sensitivity [%]	Precision [%]	Recall [%]	F1-Score [%]	Running Time (s)
CQT	98.15	99.06	98.04	98.43	98.23	0.12
HHT	96.80	97.37	96.87	97.21	97.04	0.20
SWT	98.14	98.73	98.24	98.21	98.22	0.24
CQT + HHT	98.35	99.02	98.40	98.47	98.43	0.25
CQT + SWT	98.22	99.42	98.21	98.36	98.28	0.30
HHT + SWT	98.12	99.31	98.07	98.38	98.22	0.37
CQT + HHT + SWT	98.33	99.29	98.34	98.30	98.32	0.43

Table 4. Comparison of different snoring detection and intervention methods.

References	Classifiers	Feature	Hardware	Experiment	Accuracy	Sensitivity	Specificity
Jiang et al. [23]	CNN–LSTM–DNN	Spectrum, Spectrogram, Mel-spectrogram, and CQT	A microphone (RODE, NTG-3, Sydney, Australia) and a digital audio recorder (Rowland R-44, Roland Corporation, Hamamatsu, Japan)	15 participants (11 patients diagnosed with sleep apnea hypopnea syndrome and 4 simple snorers)	95.07%	95.42%	95.82%
Khan [24]	CNN	MFCC	nRF52832 Feather board (Adafruit Industries LLC, New York City, NY, USA), Raspberry Pi (Sony UK Technology Centre, Pencoed, Bridgend, UK).	1000 samples; vibration on the arm to prevent snoring	96%	Not Applicable	Not Applicable
Duckitt et al. [41]	Hidden Markov model	Spectral features	Carol Sigma Plus 5 condenser microphone (Taiwan Carol Electronics Co., Ltd, Taichung, Taiwan)	6 subjects, 1.5 h from each subject, 1 h for training, 0.5 h for testing	82–89%	Not Applicable	Not Applicable
Cavusoglu et al. [42]	Robust logistic regression	MFCC	Sennheiser condenser microphone (Sennheiser electronic GmbH & Co. KG, Wedemark, Germany)	Full-night recordings from 18 simple snorers and 12 OSA patients	Simple snorer: 97.3% OSA patients: 90.2% Mixed (simple snorer + OSA patients): 86.8%	Not Applicable	Not Applicable
Azarbarzin et al. [43]	Unsupervised fuzzy C-means clustering	Principal component analysis	Tracheal microphone, ambient microphone	A short period of the entire night recording of 30 participants	Tracheal microphone: 98.6% Ambient microphone: 93.1%	Not Applicable	Not Applicable
Penagos et al. [44]	YAMMET	Matlab AudioTool Box (Wiener and parametric EQ filters to remove noise) spectrograms and periodograms for graph display statistical values (maxima, minima, average and standard deviation), powers and entropies as features	INMP441 MEMS microphone (InvenSense, San Jose, CA, USA), ESP32 Board (Espressif Systems, Shenzhen, China)	23 potential snoring sounds	Not Applicable	Not Applicable	Not Applicable
Dafna et al. [45]	AdaBoost classifier	Time-related features and spectral-related features (127 features)	Directional condenser microphone	Full night recordings from 67 subjects (42 for validation)	98.4%;	98.1%	98.2%
Swarnkar et al. [46]	Artificial Neural Network	Repetitive packets of energy	Microphone and computerized data-acquisition system	Full night recordings from 34 subjects, 21 subjects for training, 13 subjects for testing	86–89%	82–87%	87–89%
Arsenali et al. [47]	Recurrent neural network	MFCC	A field recorder (Zoom Corpora-tion, Tokyo, Japan) and a non-contact microphone (Studiocare Profes-sional Audio Ltd. Liverpool, UK)	Part of full night recordings from 20 subjects (11 for training, 3 for validation, and 6 for testing)	95%	92%	98%
Shin et al. [48]	Quadratic classifier	Autoregressive model and the local maximum of the spectral density	GT-I9300 (Galaxy S3™) microphone (Samsung Electronics, Suwon, Republic of Korea)	44 snoring datasets and 75 noise datasets	95.07%	98.58%	94.62%
Xie et al. [49]	CNN + RNN	CQT and spectrogram	Two types of microphones: Earthworks M23 (Earthworks Inc. Milford, NH, USA) and Behringer ECM8000 (Behringer, Zhoushan, China); placement of five microphones: Two microphones above a subject’s head, another two on the left/right side of the bed, and the fifth placed on the bedside table	Full night recording from 38 subjects	95.3 ± 0.5%	92.2 ± 0.9%	97.7 ± 0.4%
Chao et al. [50]	Traditional Linear Regression (TLR) Automatic Linear Regression (ALR) Categorical Regression (CR) with LASSO	From snoring vibration signal: Snoring index; Snore duration and interval; Duration and interval variance; Snoring vibration energy; From carotid pulse signal: Pulse rate; Standard deviation.	Advanced piezoelectric sensor (NPS, Eleceram Technology Co., Ltd., Taoyuan, Taiwan), PSG Alice system (Philips Respironics, MA, USA), Portable digital sound recorder (Sony PCM-D50, PCM-D50, Sony Electronics Inc., Tokyo, Japan), Data acquisition card (USB-6008, National Instruments Corporation, Austin, TX, USA)	Simultaneous overnight recording using NPS, in-lab PSG, and snoring sound analysis in a controlled sleep laboratory from 29 patients with Sleep Apnea Syndrome (SAS)	85–90%	Not Applicable	Not Applicable
Our work	Modified PCSN	CQT, HHT, SWT	sound Sensors, ESP8266 off-shelf board (Tensilica Xtensa LX106, Shenzhen Guiyuanjing Technology Co., Ltd., Shenzhen, China), vibration motors, a speaker	14 participants and downloaded snoring/non-snoring sound; Real-time detection and gentle haptic/sound feedback	98.33%	99.29%	98.34%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Perin, K.K.O.; Li, G.; Wang, J.; He, T.; Xu, Y.; McCarthy, P.W. An IoT-Enabled Smart Pillow with Multi-Spectrum Deep Learning Model for Real-Time Snoring Detection and Intervention. Appl. Sci. 2025, 15, 12891. https://doi.org/10.3390/app152412891

AMA Style

Liu Z, Perin KKO, Li G, Wang J, He T, Xu Y, McCarthy PW. An IoT-Enabled Smart Pillow with Multi-Spectrum Deep Learning Model for Real-Time Snoring Detection and Intervention. Applied Sciences. 2025; 15(24):12891. https://doi.org/10.3390/app152412891

Chicago/Turabian Style

Liu, Zhuofu, Kotchoni K. O. Perin, Gaohan Li, Jian Wang, Tian He, Yuewen Xu, and Peter W. McCarthy. 2025. "An IoT-Enabled Smart Pillow with Multi-Spectrum Deep Learning Model for Real-Time Snoring Detection and Intervention" Applied Sciences 15, no. 24: 12891. https://doi.org/10.3390/app152412891

APA Style

Liu, Z., Perin, K. K. O., Li, G., Wang, J., He, T., Xu, Y., & McCarthy, P. W. (2025). An IoT-Enabled Smart Pillow with Multi-Spectrum Deep Learning Model for Real-Time Snoring Detection and Intervention. Applied Sciences, 15(24), 12891. https://doi.org/10.3390/app152412891

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An IoT-Enabled Smart Pillow with Multi-Spectrum Deep Learning Model for Real-Time Snoring Detection and Intervention

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Smart Pillow

3.2. Arduino Cloud

3.3. Cloud Server

3.3.1. Data Processing

3.3.2. Deep Learning Framework

3.4. Evaluation Metrics

3.5. Dataset

4. Results

4.1. Prototype System

4.2. Experimental Environment

4.3. Results of CQT, SWT, and HHT

4.4. Deep Learning Model Performance Evaluation

5. Discussion

5.1. Feature Performance Evaluation

5.2. Comparison with Other Studies

5.3. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI