Previous Article in Journal
Effects of Acute Altitude, Speed and Surface on Biomechanical Loading in Distance Running
Previous Article in Special Issue
CONECT: Novel Weighted Networks Framework Leveraging Angle-Relation Connection (ARC) and Metaheuristic Algorithms for EEG-Based Dementia Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Thermal Imaging Signal Analysis for Real-Time Non-Invasive Respiratory Rate Monitoring

1
School of Electronic and Electrical Engineering, University of Leeds, Leeds LS2 9JT, UK
2
Department of Electrical Engineering, Politeknik Negeri Batam, Batam 29461, Indonesia
3
Academic Unit for Ageing and Stroke Research, Leeds Institute of Health Sciences, University of Leeds, Leeds LS2 9JT, UK
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(1), 278; https://doi.org/10.3390/s26010278 (registering DOI)
Submission received: 20 November 2025 / Revised: 12 December 2025 / Accepted: 25 December 2025 / Published: 1 January 2026

Abstract

(1) Background: This study presents an adaptive, contactless, and privacy-preserving respiratory-rate monitoring system based on thermal imaging, designed for real-time operation on embedded edge hardware. The system continuously processes temperature data from a compact thermal camera without external computation, enabling practical deployment for home or clinical vital-sign monitoring. (2) Methods: Thermal frames are captured using a 256 × 192 TOPDON TC001 camera and processed entirely on an NVIDIA Jetson Orin Nano. A YOLO-based detector localizes the nostril region in every even frame (stride = 2) to reduce the computation load, while a Kalman filter predicts the ROI position on skipped frames to maintain spatial continuity and suppress motion jitter. From the stabilized ROI, a temperature-based breathing signal is extracted and analyzed through an adaptive median–MAD hysteresis algorithm that dynamically adjusts to signal amplitude and noise variations for breathing phase detection. Respiratory rate ( R R ) is computed from inter-breath intervals (IBI) validated within physiological constraints. (3) Results: Ten healthy subjects participated in six experimental conditions including resting, paced breathing, speech, off-axis yaw, posture (supine), and distance variations up to 2.0 m. Across these conditions, the system attained a MAE of 0.57 ± 0.36 BPM and an RMSE of 0.64 ± 0.42 BPM, demonstrating stable accuracy under motion and thermal drift. Compared with peak-based and FFT spectral baselines, the proposed method reduced errors by a large margin across all conditions. (4) Conclusions: The findings confirm that accurate and robust respiratory-rate estimation can be achieved using a low-resolution thermal sensor running entirely on an embedded edge device. The combination of YOLO-based nostril detector, Kalman ROI prediction, and adaptive MAD–hysteresis phase that self-adjusts to signal variability provides a compact, efficient, and privacy-preserving solution for non-invasive vital-sign monitoring in real-world environments.

Graphical Abstract

1. Introduction

Continuous monitoring of respiratory rate ( R R ) has become one of the most important vital signs in both clinical and home care settings. R R plays a crucial role in assessing a physiological condition of the patient, especially during clinical deterioration [1,2,3]. Early detection through continuous R R tracking enables timely intervention, particularly for high-risk populations [4,5,6,7]. In long-lie conditions following a fall, continuous respiratory monitoring offers valuable physiological information for early detection and timely assistance [8,9,10].
Several conventional approaches have been employed in respiratory monitoring studies. These typically involve physical contact with the patient, including chest bands, nasal cannulas, or spirometry devices. While these tools are clinically validated, they often cause discomfort and may interfere with natural breathing behavior, especially during sleep or prolonged observation periods [11,12,13,14]. A non-contact alternative method, on the other hand, has shown promising results in estimating R R , which could be one of the solutions to overcome these challenges. A recent study developed a non-contact system using radar-based techniques, acoustic sensors, and camera-based methods such as RGB or thermal imaging [15,16,17,18].
Among these non-contact modalities, thermal imaging presents distinct advantages for R R estimation. It enables the capture of temperature variations generated by inhaled and exhaled air without physical contact. These thermal fluctuations, observed around the nostril or mouth region, offer a natural and unobtrusive signal source for respiratory analysis [19,20]. This makes thermal-based systems particularly well-suited for continuous monitoring in privacy-sensitive environments such as bedrooms or elder care facilities.
Despite its promise, thermal-based respiratory monitoring still faces several technical challenges. Accurate detection and tracking of the nostril region is often hindered by the low spatial resolution of thermal cameras, which complicates region of interest (ROI) localization [21,22]. Furthermore, thermal signals are susceptible to noise introduced by subject movement, head rotations, and ambient temperature changes, all of which can degrade signal quality and affect R R estimation accuracy [22,23]. Moreover, many existing implementations rely on frequency-domain analysis or computationally expensive deep learning models, which limits their real-time feasibility on embedded platforms [19,24,25,26,27].
Recent biomedical monitoring systems have increasingly shifted toward embedded edge computing due to its advantages in latency, privacy, and deployment feasibility. Unlike cloud-based processing, which introduces transmission delays and raises concerns over sensitive health data exposure, edge computation allows all inference to occur locally on the device. This enables real-time responsiveness and preserves subject privacy, which are essential for continuous respiratory monitoring [28,29,30,31,32]. However, most existing thermal and camera-based respiratory monitoring studies still rely on offline processing pipelines due to their high computational requirements, limiting their applicability for real-time embedded deployment.
Therefore, current thermal-based R R approaches still leave three critical gaps unaddressed: (i) the lack of a robust nostril-specific localization strategy for low-resolution thermal imagery, resulting in unstable ROI tracking; (ii) the absence of computational optimization needed for real-time deployment on embedded edge devices; and (iii) limited robustness of time-domain phase detection, which remains sensitive to motion-induced disturbances, amplitude variability, and thermal drift. These unmet needs motivate the development of a thermal-specific, computation-efficient, and motion-resilient respiratory monitoring framework suitable for continuous operation in real-world environments.
To overcome these limitations, this study introduces a fully automated, privacy-preserving thermal-imaging system for real-time respiratory-rate monitoring on embedded edge hardware. The system begins with a thermal YOLO-based model to locate the nostril region as a small-object bounding box; this box defines the region of interest (ROI) from which an airflow-related temperature signal is extracted using the coldest pixel within the ROI, reflecting inhalation–exhalation temperature modulation. To reduce computational cost at the detector stage, an adaptive frame skipping (stride = 2) with Kalman prediction is applied so that the YOLO detector runs at half the nominal frequency while a Kalman tracker tracks the nostril bounding-box (bbox) between detections, preserving continuity and suppressing motion artifacts. The YOLO-based nostril detector was implemented using the YOLOv8n model, which naturally supports small-object detection and operates efficiently on low-resolution thermal imagery.
Respiratory-rate estimation begins with breathing-phase detection on the stabilized ROI signal using an adaptive hysteresis state machine driven by velocity-based thresholds. These thresholds are derived from the median absolute deviation (MAD) and integrated with a flicker-suppression mechanism to maintain signal stability during head movement and other motion disturbances. The resulting stable breathing-phase sequence is then used to determine inter-breath intervals (IBI), from which the respiratory rate ( R R ) is calculated.
The main contributions of this work are: (i) a thermal-specific YOLO-based nostril detector designed for small-object detection in low-resolution 256 × 192 thermal imagery, overcoming ROI instability common in prior thermal R R studies, (ii) a detector-centric frame-skipping mechanism (stride = 2) integrated with Kalman ROI prediction, reducing detection computation by 50% while maintaining spatial continuity and enabling real-time embedded operation, (iii) an adaptive time-domain respiratory-phase detection approach that combines median–MAD thresholds, hysteresis, and flicker suppression to achieve robust segmentation under motion, drift, and amplitude variability, without relying on frequency-domain analysis, (iv) a fully on-device respiratory-rate monitoring pipeline, running entirely on an NVIDIA Jetson Orin Nano without cloud processing, ensuring privacy preservation and practical deployment feasibility for long-term ambient monitoring, and (v) a comprehensive evaluation across six real-world conditions (resting, paced breathing, soft speech, off-axis yaw, distance variation up to 2.0 m, and supine posture), demonstrating clinically acceptable accuracy (overall MAE 0.57 ± 0.36 BPM), outperforming previously reported thermal-based contactless R R systems. To guide the development of this work, the following research questions are formulated:
RQ1: Can a low-resolution thermal camera, combined with automated nostril tracking, provide accurate and reliable respiratory-rate estimation across diverse real-world conditions?
RQ2: How can adaptive signal-processing strategies, such as MAD-based breathing-phase detection and IBI validation, improve robustness against facial movement, off-axis orientation, and varying thermal contrast?
RQ3: Is the proposed approach computationally lightweight enough to operate in real time on an embedded edge device without compromising accuracy?
These questions motivate the system design and experimental evaluation presented in the remainder of this paper.
The remainder of this paper is structured as follows: related work is reviewed in Section 2; Section 3 describes the proposed methodology, including data acquisition and model architecture; Section 4 presents experimental results; Section 5 discusses system performance and deployment feasibility; and Section 6 concludes the paper with a summary and directions for future work.

2. Related Work

Among the human vital signs, respiratory rate ( R R ) is widely recognized as a critical indicator of physiological stability. In estimating the R R , thermal imaging has emerged as a promising non-contact and privacy-preserving modality. Unlike contact-based methods that require direct attachment to the body, thermal cameras detect temperature variations caused by airflow during inhalation and exhalation, typically around the nostrils or mouth. These thermal fluctuations form a natural and unobtrusive signal source for respiratory analysis, particularly suitable for continuous monitoring in both clinical and home care settings [33,34,35].
A variety of methods have been proposed to extract respiratory signals from thermal video data. Earlier approaches relied on manual region-of-interest (ROI) selection and simple pixel averaging, which were limited robustness under motion or occlusion. More recent developments introduced computer vision and deep learning techniques for automated ROI localization, including three-dimensional convolutional neural networks, detection transformers, and single-shot detectors such as YOLO and SSD [26,27,36]. Facial landmark–based approaches have also been used to improve stability under head motion or partial occlusion [37,38]. Other studies align RGB landmarks with thermal images [22] or extract thermal–motion data to detect breathing regions even when facial features are obscured by masks or bedding [23,39,40]. However, small-object detection of the nostril region in low-resolution thermal frames remains challenging.
Once the ROI is detected respiratory signals are typically obtained by tracking temperature variations over time. To enhance signal quality, various filtering methods such as Butterworth, Hampel, and Savitzky–Golay have been employed, along with adaptive decomposition techniques like the Hilbert–Huang Transform [36,41,42]. Other preprocessing strategies, including histogram equalization, optimal quantitation, and super-resolution, have been applied to compensate for the low resolution of compact thermal cameras [43]. Moreover, some works directly apply deep models to thermal sequences, learning temporal breathing patterns without explicit ROI tracking [19,26]. Many studies estimate RR using dominant spectral components via Fourier, synchro-squeezed, or autocorrelation analysis [27,34,42], but frequency-domain approaches can be sensitive to noise, motion artifacts, and ambient temperature drift. Few works incorporate explicit flicker-suppression or median-absolute-deviation-based adaptive thresholds in the breathing phase logic to stabilize the signal during head movement.
Beyond algorithmic advances, several recent works emphasize that embedded edge computing has become a key requirement for biomedical sensing systems [28,29]. On-device processing reduces communication overhead, enhances data security, and enables deployment in resource-constrained settings where continuous cloud connectivity is impractical [30,31,32]. Despite the increasing adoption of edge-based architectures, existing thermal RR methods seldom address the computational constraints of embedded hardware, with many relying on high-resolution cameras or offline deep models. This gap further motivates the development of a lightweight and fully embedded respiratory-rate monitoring pipeline.
Based on the gaps outlined above, this paper proposes an adaptive, real-time thermal respiratory monitoring system for embedded deployment. The contributions comprise a thermal-specific YOLO-based detector for nostril localization, a detector-stage frame-skipping scheme with Kalman prediction to halve detection frequency while preserving ROI continuity, and an adaptive MAD–hysteresis phase detection framework with flicker suppression for motion-robust, physiologically consistent respiratory-rate estimation, all validated on-device under privacy-preserving constraints.

3. Materials and Methods

3.1. System Overview

The proposed system converts raw thermal video frames into R R estimation through a sequence of tightly coupled modules, as illustrated in Figure 1. Thermal frames are first captured by a low-resolution thermal camera that provides two synchronous outputs: an thermal imagery frame and a thermal data. A YOLO-based detector localizes the nostril on the thermal imagery every second frame, and the resulting ROI is projected to the thermal mapping image; between detections, a Kalman prediction updates the ROI directly in thermal coordinates. The thermal data are decoded into per-pixel temperatures to form a calibrated temperature map; the tracked bounding box crops this map, and the coldest pixel temperature per frame serves as a one-dimensional airflow-related signal. This signal is band-pass filtered in the 0.08 to 0.7 Hz range with a fourth-order zero-phase Butterworth filter and analyzed in the time domain using velocity estimates with median-absolute-deviation (MAD)–derived thresholds. An adaptive hysteresis state machine with a minimum dwell of 0.15 s produces inhale, exhale, and hold phases. Phase transitions yield inter-breath intervals (IBI) that are validated within a physiologic range and converted to breaths per minute (BPM), then stabilized by short weighted averaging and exponential moving averaging.

3.2. Thermal Camera Acquisition

The initial captured frame from a thermal camera consists of both image and thermal information; therefore, splitting image and thermal information is necessary as the first step in processing thermal camera acquisition data. The process of thermal acquisition is illustrated in Figure 2. The initial capture frame can be expressed as
I raw Z 256 H × 2 W ,
where H and W represent the height and width of a single modality, and
Z 256 = { 0 , 1 , , 255 }
denotes the set of 8-bit integer values corresponding to raw pixel intensities. The raw frame is divided into two separate streams:
I raw split I img Z 256 H × W , D th Z 256 H × W × 2 ,
with I img denoting the YUV image data and D th denoting the paired bytes of thermal information. Each stream is subsequently processed within the same acquisition cycle but along independent pathways as presented in Figure 2: the I img branch is converted into a color-mapped thermal image for visualization, whereas the D th branch is decoded into pixel-wise temperature values, creating the quantitative temperature map corresponding to each thermal frame.

3.2.1. Image Data Processing

The image stream I img undergoes a sequence of preprocessing operations to produce a heatmap suitable for visualization. The raw frames, initially captured in YUV format, are first converted into an RGB representation,
I RGB = YUV 2 RGB ( I img ) ,
the I RGB denotes the color image obtained from I img . To enhance visual clarity, a linear contrast adjustment is then applied, expressed as
I RGB = α × I RGB , α = 1.0 ,
the α denote as the contrast scaling factor. In this work, α is fixed to unity, implying no additional scaling beyond the raw dynamic range.
Subsequently, the frame is spatially upscaled by bicubic interpolation:
I RGB = Resize ( I RGB , W , H ) , W = 3 W , H = 3 H ,
where Resize ( · ) denotes the interpolation operator, and ( W , H ) are the target spatial dimensions set to three times the original resolution ( W , H ) . Finally, the enhanced frame is mapped into a false-color domain for visualization through
I heatmap = ColorMap ( I RGB ) ,
the ColorMap ( · ) maps the intensity distribution of I RGB into a perceptually enhanced heatmap representation for display visualization.

3.2.2. Thermal Data Processing

In the thermal data processing stage, pixel-wise temperatures are decoded from the paired bytes. Each thermal frame stores temperature values in two consecutive 8-bit values corresponding to the most significant byte (MSB) and least significant byte (LSB), which are combined and linearly calibrated to form the quantitative temperature map shown in Figure 3. For each pixel coordinate ( x , y ) with
x { 0 , , W 1 } , y { 0 , , H 1 } ,
let H ( x , y ) and L ( x , y ) be the MSB and LSB, respectively, i.e., H , L Z 256 . The raw temperature proxy is reconstructed as
R ( x , y ) = 256 H ( x , y ) + L ( x , y ) 64 273.15 .
The combined value ( 256 H ( x , y ) + L ( x , y ) ) represents raw temperature data encoded in Kelvin × 64 according to the manufacturer’s format. Dividing by 64 converts this to Kelvin, and subtracting 273.15 yields the temperature in degrees Celsius, resulting in the calibrated temperature map R ( x , y ) . A linear calibration model is subsequently applied to compensate for sensor bias:
T ( x , y ) = α R ( x , y ) + β ,
where α is the calibration gain (scaling factor) and β is the calibration offset (bias in °C). The calibrated value T ( x , y ) corresponds to the corrected temperature at pixel ( x , y ) . And the maximum temperature within a frame is then localized as
( x * , y * ) = arg max ( x , y ) T ( x , y ) , T max = T ( x * , y * ) ,
where ( x * , y * ) indicates the pixel coordinates of the hottest point in the frame and T max is its corresponding temperature. For visualization, the calibrated temperature field is interpolated to generate a thermal map:
T map = Resize T ( x , y ) , W , H ,
where Resize ( · , W , H ) denotes a spatial interpolation operator mapping the original temperature matrix of size H × W to a new resolution H × W for display purposes. The resulting T map provides the visualization branch of the processing pipeline.

3.3. ROI Localization and Temperature Feature Extraction

3.3.1. YOLO-Based Nostril Detection

Respiratory monitoring based on wearable sensors is often limited by discomfort, movement artifacts, and the need for frequent recalibration [44,45]. RGB video methods offer a non-contact alternative but remain highly sensitive to illumination changes, motion artifacts, and computational overhead [46,47,48]. Thermal imaging provides a more suitable modality because it is independent of lighting conditions and relatively robust to minor head movements, making it advantageous for continuous monitoring. Yet the low resolution and limited texture of thermal frames make nostril localization challenging. To overcome this difficulty, a YOLO-based detection model was adopted, as illustrated in Figure 4. YOLO was chosen for its balance between detection accuracy and computational efficiency, which makes it suitable for real-time deployment on embedded edge hardware. Unlike RGB data, thermal frames contain only coarse temperature gradients and lack color cues, which reduces the effectiveness of standard feature extraction; accordingly, architectural and training adaptations were required.
In this work, YOLOv8n was selected as the object detection framework for its computational efficiency and suitability to real-time thermal imaging. The lightweight model architecture enables effective feature extraction while maintaining low computational cost, which is appropriate for the relatively simple thermal domain where nostril regions are primarily defined by local hot-cold gradients rather than complex textures. The detection head maintains high-resolution feature maps, improving the sensitivity of the model to small ROI that occupy less than five percent of the thermal frame. In parallel, the training procedure was tailored to the thermal modality. A dataset of 7958 annotated thermal images (7113 training, 563 validation, 282 testing) was assembled, incorporating variations in head orientation, distance, and partial occlusion. Data augmentation strategies avoided color-based transformations and instead emphasized brightness and contrast adjustments, Gaussian noise injection, and mild geometric perturbations to reproduce sensor variability and natural subject motion.
For deployment, the trained detector is executed using the Ultralytics YOLO runtime on the embedded device, without reliance on external deep-learning frameworks or GPU acceleration. To reduce computational load, detection is performed on every even frame, and a Kalman filter predicts the ROI on intermediate frames. With frame index k Z 0 and camera rate f cam (Hz), the detection schedule is
S det = { k k mod 2 = 0 } ( stride s = 2 ) ,
which yields the effective detection rate
f det = f cam s = f cam 2 .
The ROI used at frame k is obtained from the detector when k S det (and a detection exists) and from the Kalman prediction otherwise:
ROI k = ROI k det , k S det and a detection exists , ROI k trk , otherwise .

3.3.2. Kalman Filter Tracking

Since YOLO-based generates nostril detections only on even frames, the Kalman filter predicts the ROI on odd frames and whenever an even–frame detection is unavailable. To provide stable localization of the nostril region, an eight–dimensional Kalman filter jointly estimates the bounding–box position and its temporal dynamics. The state vector is defined as
x k = c x , c y , w , h , c ˙ x , c ˙ y , w ˙ , h ˙ ,
where ( c x , c y ) are the bounding–box center coordinates (pixels), ( w , h ) its width and height (pixels), and the dotted variables the corresponding temporal velocities.
The temporal evolution of the state follows a discrete constant–velocity model:
x k = F x k 1 + w k 1 , w k 1 N ( 0 , Q ) ,
where F is the 8 × 8 transition matrix, w k 1 the process noise, and Q its covariance. Using the frame interval Δ t = 1 / f cam , the transition matrix is
F = I 4 Δ t I 4 0 4 I 4 ,
with I 4 the 4 × 4 identity matrix and 0 4 the 4 × 4 zero matrix. The choice of the constant-velocity state–space model in Equations (15) and (16) follows standard formulations widely used in visual object tracking, as it provides a minimal yet sufficiently expressive representation of bounding-box motion [49,50]. The transition matrix F is block-structured with identity and Δ t I 4 submatrices, which yields eigenvalues equal to 1. Consequently, the discrete-time system is marginally stable, as expected for constant-velocity motion models; the continuous-time Hurwitz condition does not directly apply in this setting. Since the Kalman filter is employed solely for state estimation rather than control, controllability is not required. The pair ( F , H ) is observable, as the associated observability matrix has full rank for any Δ t > 0 , ensuring that all components of the state vector, including position, size, and their velocities are inferable from the detector measurements.
At each frame, the YOLO–based detector produces a bounding box with corners ( x 1 , y 1 ) (top–left) and ( x 2 , y 2 ) (bottom–right). This is converted into the measurement vector
z k = x 1 + x 2 2 , y 1 + y 2 2 , x 2 x 1 , y 2 y 1 ,
which contains the observed center position and box dimensions. The measurement model is
z k = H x k + v k , v k N ( 0 , R k ) , H = I 4 0 4 × 4 .
Let d k { 0 , 1 } indicate whether a detector output exists at frame k (on scheduled even frames). The measurement–use indicator is
m k = 1 , if   k   mod   2 = 0   and   d k = 1 , 0 , otherwise ,
and a fixed measurement covariance is used
R k = R .
The recursion runs on every frame. Prediction:
x ^ k | k 1 = F x ^ k 1 , P k | k 1 = F P k 1 F + Q .
Update (only if m k = 1 ):
y k = z k H x ^ k | k 1 , S k = H P k | k 1 H + R k ,
K k = P k | k 1 H S k 1 , x ^ k = x ^ k | k 1 + K k y k ,
P k = I K k H P k | k 1 .
If m k = 0 , the prediction becomes the current estimate. The output box is reconstructed as
x 1 = c x w 2 , y 1 = c y h 2 , x 2 = c x + w 2 , y 2 = c y + h 2 .
This even–odd schedule halves detector invocations, provides ROI estimates for skipped frames via Kalman prediction, and preserves temporal continuity under short dropouts, head motion, or partial occlusion.

3.3.3. Temperature Extraction

Once the nostril ROI is localized by the detection–tracking pipeline, the calibrated thermal image at frame k is treated as a discrete grid T k [ x , y ] (temperature in °C at pixel ( x , y ) ). The ROI is the integer-indexed set
ROI k = ( x , y ) Z 2 | x 1 ( k ) x x 2 ( k ) ,   y 1 ( k ) y y 2 ( k ) ,
where ( x 1 ( k ) , y 1 ( k ) ) and ( x 2 ( k ) , y 2 ( k ) ) are the top-left and bottom-right corners of the bounding box at frame k. The representative nostril temperature for frame k is the minimum within the ROI,
T ^ k = min ( x , y ) ROI k T k [ x , y ] ,
and the pixel attaining this minimum is recorded as
p k * = arg min ( x , y ) ROI k T k [ x , y ] .
Selecting the coldest pixel yields a physiologically consistent proxy of airflow, as inhalation introduces cooler ambient air whereas exhalation releases warmer expired air. Across frames, the extracted temperatures form the scalar sequence
S = { T ^ k } k = 0 N 1 ,
where N is the number of samples retained in the observation buffer. A sliding buffer of approximately 20 s (i.e., N 20 f cam ) preserves multiple respiratory cycles while maintaining responsiveness for real-time monitoring. This raw nostril-temperature signal is then used for band-pass filtering, phase detection, and respiratory-rate estimation. Figure 5 illustrates the extraction result: panel (a) shows the cropped thermal ROI with the coldest-temperature pixel at each frame, and panel (b) shows the resulting raw sequence S , whose cyclic oscillations align with inhalation (cooling) and exhalation (warming) events.

3.4. Adaptive Breathing Phase Detection and Respiratory Rate Calculation

3.4.1. Adaptive Breathing Phase Detection

The breathing phase pipeline starts from the illustration in Figure 6: inhalation cools the nostril surface, whereas exhalation warms it. After ROI extraction, each frame k provides a scalar sample T ^ [ k ] (in °C). With camera frame rate f cam (Hz), samples occur at t k = k / f cam ; equivalently,
T ^ [ k ] = T ^ ( t k ) + ε k ,
where ε k denotes measurement noise. The processing that follows operates on the discrete sequence { T ^ [ k ] } . The raw sequence shows alternating cooling and warming, but is also affected by drift and noise. To describe its expected structure and guide filter design, it is convenient to write the quasi-periodic model
T ^ ( t ) = T 0 + A sin 2 π f RR t + ϕ + d low ( t ) + n high ( t ) ,
where T 0 is the baseline temperature, A the oscillation amplitude, f RR the respiratory frequency (Hz), ϕ the phase, d low ( t ) a slow drift term, and n high ( t ) high-frequency noise. This model is illustrative; the digital operations below use the sampled signal T ^ [ k ] .
Since respiration lies in the 0.08–0.7 Hz band, a 4th-order Butterworth band-pass filter with cutoffs at 0.08 and 0.7 Hz is applied. The filter is implemented with forward–backward recursion to ensure a zero-phase response, thereby preserving the temporal integrity of breathing cycles. The band-pass output satisfies the standard IIR difference equation:
T ^ bp [ n ] = m = 0 M b m T ^ [ n m ] j = 1 N a j T ^ bp [ n j ] ,
where b m and a j are the Butterworth coefficients. This passband covers 5–42 BPM and suppresses baseline drift and high-frequency noise, preserving breath timing. Building on the zero-phase band-passed output T ^ bp [ n ] , the local heating/cooling trend is quantified using a rectangular moving-average window of length W. The discrete velocity surrogate is defined as the difference between two adjacent moving averages:
v [ n ] = r W T ^ bp [ n ] r W T ^ bp [ n W ] ,
where “∗” denotes discrete convolution and r W [ k ] is the rectangular (uniform) kernel
r W [ k ] = 1 W , 0 k W 1 , 0 , otherwise .
An equivalent summation form is
v [ n ] = 1 W k = 0 W 1 T ^ bp [ n k ] 1 W k = W 2 W 1 T ^ bp [ n k ] ,
under the zero-phase design, the sign of v [ n ] aligns with physiology:
v [ n ] > 0 exhalation ( warming ) , v [ n ] < 0 inhalation ( cooling ) ,
in implementation, a small window (e.g., W = 3 samples) attenuates frame-to-frame jitter while preserving phase timing within the 5–42 BPM operating range.
To normalize across subjects and amplitudes, an adaptive, data-dependent threshold is derived from the most recent L velocity samples. Let the length-L window be
V n { v [ n i ] i = 0 , 1 , , L 1 } .
The location statistic and dispersion are defined by the sample median and the median absolute deviation (MAD):
m v [ n ] = median V n ,
MAD v [ n ] = median i { 0 , , L 1 } v [ n i ] m v [ n ] .
A symmetric, scale-invariant threshold is then
θ [ n ] = α MAD v [ n ] + ε ,
with sensitivity coefficient α > 0 (e.g., α = 0.6 ), a short history length L (e.g., 15–25 samples), and a small ε > 0 to avoid degeneracy when variability is minimal. The threshold is applied symmetrically around zero to map velocity to the physiological phase:
v [ n ] θ [ n ] exhalation ( warming ) , v [ n ] θ [ n ] inhalation ( cooling ) .
Optionally, a minimum dwell converts threshold crossings into stable segments; with sampling frequency f cam , the dwell length in samples is
N min = t min f cam ,
ensuring physiologically plausible durations within the 5–42 BPM operating range.
To convert the thresholded velocity into stable phase labels, a hysteretic state machine with a minimum dwell is employed. Let the instantaneous phase label be
s [ n ] { 1 , 0 , + 1 } ,
representing Inhalation ( 1 ), Neutral/Hold (0), and Exhalation ( + 1 ). The dwell requirement expressed in samples Equation (43), ensuring physiologically plausible segment durations within the 5–42 BPM operating range. Transitions are gated by the elapsed persistence of the current state. Let τ [ n 1 ] denote the number of consecutive samples that state s [ n 1 ] has persisted up to time n 1 . The phase update is
s [ n ] = + 1 , if v [ n ] θ [ n ] τ [ n 1 ] N min , 1 , if v [ n ] θ [ n ] τ [ n 1 ] N min , s [ n 1 ] , otherwise .
The persistence counter is updated recursively as
τ [ n ] = 0 , if s [ n ] s [ n 1 ] , τ [ n 1 ] + 1 , if s [ n ] = s [ n 1 ] .
Optionally, brief near-threshold fluctuations may be represented as a neutral state to emphasize ambiguity around zero velocity:
( optional ) s [ n ] 0 if   | v [ n ] | < θ [ n ] .
This hysteretic formulation suppresses chatter from transient perturbations while preserving accurate timing of inhalation and exhalation transitions.
Following the hysteretic phase labeling, brief flicker patterns of the form A B A are suppressed by merging the short middle segment into its flanking phase. Here A , B { 1 , 0 , + 1 } denote the phase labels in Equation (44). Let { n i } i 0 denote the ordered change-points of s [ n ] ,
n 0 = 0 , n i + 1 = min { n > n i | s [ n ] s [ n 1 ] } ,
and define the i-th segment state and duration by
q i s [ n ]   for   all   n [ n i , n i + 1 1 ] , L i n i + 1 n i .
With sampling frequency f cam , the sample-based consolidation threshold is
N c = τ c f cam ,
where τ c is a short duration (e.g., τ c = 0.3 s ) chosen to reject physiologically implausible micro-segments within 5–42 BPM.
The consolidation rule replaces the short intermediate phase B (i.e., q i ) by its flanking phase A (i.e., q i 1 = q i + 1 ) whenever an A B A pattern occurs and the intermediate duration is below N c :
if   q i 1 = q i + 1 q i q i 1 L i < N c ,   then   s [ n ] q i 1 n [ n i , n i + 1 1 ] .
This procedure can be applied iteratively over { ( q i 1 , q i , q i + 1 ) } until no violations remain, yielding a piecewise-constant phase trace without short-lived toggles and enabling stable inter-breath-interval and respiratory-rate estimation.

3.4.2. Respiratory Rate Calculation

Respiratory rate is derived from the consolidated phase labels s [ n ] described previously. Figure 7 illustrates the outputs used here: the raw nostril-temperature sequence (green), its smoothed version (red), and the phase trace (orange) that marks inhalation, neutral/hold, and exhalation segments. Inter-breath intervals (IBI) are computed from consecutive phase transitions of a chosen event type (exhalation onsets in this implementation).
Let f cam be the camera frame rate and define the exhalation-onset event set
E k | s [ k 1 ] + 1 s [ k ] = + 1 .
For consecutive events k i 1 , k i E with k i > k i 1 , the IBI (seconds) is
Δ t i = k i k i 1 f cam .
Physiological plausibility is enforced consistently with the 5–42 BPM operating band by accepting only
Δ t min Δ t i Δ t max , Δ t min = 60 42 s , Δ t max = 60 5 s .
Each validated interval yields an instantaneous respiratory rate
R R i = 60 Δ t i ( BPM ) .
To obtain a stable yet responsive trace as in Figure 7, two lightweight smoothers are applied sequentially: a causal weighted update
R R ˜ i = 0.6 R R i + 0.4 R R ˜ i 1 , R R ˜ 0 = R R 0 ,
followed by an exponential moving average (EMA) with coefficient α = 0.7 ,
R R final , i = α R R ˜ i + ( 1 α ) R R final , i 1 , R R final , 0 = R R ˜ 0 .
This IBI → weighted-average → EMA pipeline yields a robust BPM estimate under motion, thermal drift, and noise while remaining suitable for real-time embedded execution. For real-time visualization, a PyQt5 window renders the thermal video feed with ROI detection result overlays alongside the breathing waveform panel as well as the current breathing phase, respiratory rate in BPM, nostril temperature, ROI pixel area and the FPS of system as illustrates in Figure 8.

4. Experimental Results

4.1. Hardware and Software Configuration

The respiratory rate monitoring system was implemented on a Jetson Orin Nano Developer Kit (NVIDIA Corporation, Santa Clara, CA, USA; 6-core ARM Cortex-A78AE CPU, 8 GB LPDDR5 RAM) running Ubuntu 20.04 LTS. The YOLO-based nostril detection and respiratory-signal processing algorithms were developed in Python 3.10 with OpenCV 4.x and executed directly on the embedded device. Thermal video streams were processed in real-time, with inference and signal analysis performed entirely on the edge device, without requiring cloud-based computation. The raw temperature data were continuously collected from the TOPDON TC001 thermal camera (Topdon Technology Co., Ltd., Shenzhen, China) with a resolution of 256 × 192 pixels and a lightweight design (30 g). The camera was connected directly to the Jetson Orin Nano for on-device processing, ensuring minimal latency and maintaining a non-intrusive measurement environment.
To quantify the computational complexity of the proposed system on embedded hardware, the end-to-end execution was profiled over a 60 s continuous run on the Jetson Orin Nano. The average per-frame processing time was 65.2 ms, with a 95th-percentile latency of 85.3 ms, indicating that 95% of frames were processed within this bound. This corresponds to a real-time throughput of 22.5 FPS, with a standard deviation of 1.8 FPS, reflecting stable runtime characteristics throughout the measurement interval. As shown in Table 1, YOLO-based nostril detection constituted the primary computational load (35.5 ms, 54.5%), followed by thermal capture latency (15.0 ms, 23.3%) and graph/GUI updates (10.8 ms, 16.6%). All remaining modules, including temperature extraction (2.8 ms), adaptive phase detection and signal processing (1.5 ms), Kalman tracking (1.2 ms), and IBI calculation (0.5 ms), each contributed less than 5% of the total per-frame cost. System resource monitoring further showed moderate utilization, with mean CPU usage of 42.5%, GPU usage of 68.2%, and memory consumption of 850 MB. These results confirm that the computational burden is lightweight and well within the real-time operating envelope of low-power embedded edge devices.

4.2. Respiratory Rate Experimental Procedures

A pilot study was conducted with ten healthy adults (N = 10, aged 33.3 ± 4.38) under institutional ethics approval and written informed consent. Ground-truth R R was obtained by dual-rater manual counts from the recordings; for metronome-paced blocks, the target rate was logged as an auxiliary reference. To establish ground-truth R R , manual tally counting [51] was performed on the experimental video recordings. Each breathing cycle was visually identified by observing airflow from the nostril, and the total number of cycles within a predefined time window was recorded. The respiratory rate ( R R ) was then calculated as:
R R = N breaths T × 60   [ bpm ] ,
where N breaths denotes the number of observed breathing cycles and T represents the duration of the observation in seconds. For example, if 22 breaths were observed during a 60-s video, the reference R R was 22 bpm.
The experimental protocol, summarized in Table 2 and the subject faced the thermal camera as illustrated in Figure 9, was designed not only to validate respiratory rate ( R R ) estimation under controlled conditions but also to emulate scenarios relevant to long-lie incidents. In a long-lie situation, an individual may remain immobile for an extended duration in various postures or under partial occlusions, where reliable respiration monitoring becomes a key indicator of consciousness and vitality. The resting and paced breathing sessions establish baseline accuracy across normal and rhythmic respiration patterns, forming the reference for physiological consistency. The robustness (soft speech) condition introduces mild facial motion to evaluate tolerance to articulation, representative of irregular speech or groaning that may occur before or after a fall. The distance and off-axis yaw conditions simulate variations in camera placement and subject orientation that would naturally arise when the person is lying at different angles or when the thermal sensor is mounted in a fixed overhead position. Finally, the posture (supine) recordings directly mimic a post-fall scenario, where the subject lies facing upward with minimal motion. Collectively, these conditions ensure that the proposed system is trained and validated under realistic variability, facilitating robust respiratory monitoring during long-lie detection.

4.3. Nostril Detection Performance

The nostril detection model was trained using a YOLO-based architecture, showing rapid and stable convergence, as evidenced by the steady decrease in box, classification, and distribution focal losses across both the training and validation sets, as illustrated in Figure 10. The evaluation metrics, including precision, recall, mAP@0.5, and mAP@0.5–0.95, exhibit consistent improvement and stabilization across epochs, confirming the robustness and generalization capability of the trained model for reliable nostril localization in thermal imagery.
The training curves demonstrate a steady increase in precision and recall, reaching over 99% within the first few epochs. Quantitative evaluation metrics in Figure 11 further confirm the model’s robustness, with the Precision–Recall (PR) curve showing an area under the curve (AUC) of 0.992, and the F1–Confidence curve peaking at 0.99. Both Precision–Confidence and Recall–Confidence curves indicate stable predictions across a wide confidence range, with optimal performance observed at a confidence threshold of approximately 0.86.
To complement the quantitative analysis, Figure 12 provides an enlarged view of the nostril detection output, making the detection label and bounding box clearly visible. Meanwhile, Figure 13 presents qualitative examples of the YOLO-based nostril detector applied to diverse thermal video frames. These examples demonstrate consistent and reliable nostril localization under various conditions, including different head poses, lighting variations, and partial occlusions. The trained model accurately identifies the nostril region across diverse thermal video frames, demonstrating robustness and stability for downstream respiratory rate estimation tasks.

4.4. Respiratory Rate Estimation Accuracy

Before estimating the respiratory rate following the procedures in Section 4.2, the nostril detector was first validated to ensure reliable localization across different subject poses, as illustrated in Figure 14, showing the thermal frame was collected and aligned with the experimental setup in Table 2. The automatically detected nostril regions (green bounding boxes) are shown across a wide range of conditions, including resting, metronome-paced breathing, soft-speech influence, distance variation, off-axis head orientation, and posture changes. This variability ensures that the evaluation reflects realistic operating scenarios with differences in viewpoint, articulation, and body orientation.
Once the nostril ROI is successfully detected, the system extracts the corresponding temperature signal to determine the breathing phases. Figure 15 presents representative nostril-temperature waveforms under four typical experimental conditions: resting, paced breathing (24 BPM), soft speech, and off-axis yaw. The green line indicates the raw temperature sequence, the red line depicts the smoothed and band-pass-filtered signal, and the orange-shaded regions denote exhalation phases identified by the adaptive MAD–hysteresis algorithm. During resting, as shown in Figure 15a, the thermal oscillations are smooth and periodic, reflecting stable nasal airflow. Under paced breathing in Figure 15b, the oscillation frequency increases in line with the metronome rhythm, confirming temporal consistency with the ground-truth reference. In soft-speech (Figure 15c) and off-axis (Figure 15d) scenarios, irregularities appear due to motion and partial ROI displacement; however, the system still successfully tracks the phase transitions, demonstrating robustness to moderate motion and physiological variability.
The respiratory-rate estimation experiment was conducted with ten healthy participants (age: 33.3 ± 4.38 years). Each subject completed six breathing conditions described in Table 2, performed in real-time under different room layouts and lighting environments.
A summary of the results is presented in Table 3, which provides a per-subject breakdown of MAE, RMSE, and ROI pixel area across the six conditions, highlighting both inter-subject and condition-specific variability. Most participants exhibit stable performance during resting and paced breathing, whereas higher errors emerge during soft-speech and distance-related conditions due to motion artifacts and reduced signal-to-noise ratio (SNR). Notably, although the ROI area during soft speech remains larger than that in the distance condition, the estimation error is still considerably higher, suggesting that dynamic facial motion, rather than ROI size, is the primary factor contributing to performance degradation.
The correlation between the estimated and reference respiratory rates is shown in Figure 16a, demonstrating a strong linear relationship with R 2 = 0.973 . These results indicate that the system can reliably estimate respiratory rate with minimal error across different conditions and sessions. To further examine the condition-wise performance, Figure 16b compares the distribution of estimated and reference respiratory rates using box plots, revealing close alignment across most conditions, with only slight deviations observed during soft-speech and off-axis orientations.
Moreover, Figure 16c illustrates the per-subject MAE distribution across the six conditions, highlighting both individual variability and condition-dependent performance. Estimation remains stable during resting and paced breathing, whereas larger errors appear during soft speech, off-axis yaw, and increased camera distance due to motion-induced disturbances and weakened thermal signal fidelity. Figure 16d summarizes the average MAE, RMSE, and ROI size per condition, showing that accuracy declines beyond 1.5 m as the nostril region becomes smaller and the thermal contrast diminishes. Interestingly, despite having a larger ROI area than the distance condition, the soft-speech condition exhibits higher estimation errors, reinforcing that facial dynamics, rather than ROI scale, are the dominant factor affecting accuracy.
Table 4 summarizes the average respiratory-rate estimation performance across all tested conditions, expressed as mean ± SD for both MAE and RMSE. The results demonstrate that the system achieves consistently low errors across all scenarios, with the lowest MAE and RMSE observed during resting and paced breathing. Under conditions involving speech or posture change, the estimation error slightly increases, reflecting temperature fluctuations and ROI variation. For comparison, the included peak-based and FFT-based baseline methods show substantially higher errors across all conditions, confirming the advantage of the proposed adaptive approach. The overall error remains below 1 BPM on average, confirming clinically acceptable [52] performance for a lightweight thermal-based system operating on an embedded device.
To further investigate the influence of distance on detection scale and estimation accuracy, Table 5 reports the mean ± SD of MAE, RMSE, and ROI size across the three measurement distances. The results show a substantial reduction in detected ROI area as the camera moves farther away—from 597 px2 at 1.0 m to just 165 px2 at 2.0 m—representing a 72% decrease in spatial sampling. This loss of pixel coverage directly diminishes thermal contrast and reduces signal amplitude, leading to higher estimation errors at extended distances. The increasing error trend therefore, aligns with the shrinking ROI size, confirming that reduced spatial resolution limits the system’s ability to capture subtle nostril temperature variations. Nevertheless, within the practical monitoring range of 1.0–1.5 m, estimation performance remains stable, with MAE values below 0.7 BPM.
A more detailed visualization of the distance–accuracy relationship is provided in Figure 17. As shown in Figure 17a, the detection error increases markedly, from 0.27 BPM MAE and 0.31 BPM RMSE at 1 m to 1.38 BPM MAE and 1.52 BPM RMSE at 2 m, while the corresponding ROI area decreases from approximately 597 px2 to 165 px2 (Figure 17b). This consistent trend reinforces that distance-induced loss of spatial detail is the primary factor driving performance degradation.
To contextualize these device-level gains, the proposed system is compared with recent contactless respiratory-rate estimation studies. Table 6 summarizes relevant methods using thermal imaging, highlighting differences in hardware, ROI selection, tracking strategy, estimation algorithm, accuracy, runtime feasibility, and overall contribution. The developed system achieves markedly lower estimation errors than most recent thermal-based approaches, attaining a mean absolute error of 0.57 ± 0.36 BPM and an RMSE of 0.64 ± 0.42 BPM across diverse conditions, including speech, head rotation, and distances up to 2.0 m. These results fall well within the commonly accepted clinical tolerance for respiratory-rate monitoring (error < 2 BPM) [52], corresponding to an average deviation below 1 BPM. The system thus delivers clinically relevant accuracy using a lightweight 256 × 192 thermal camera.
Compared to deep learning–based approaches, which achieve good accuracy but require high-resolution cameras, complex models, and non-real-time post-processing [19,26], the proposed system emphasizes lightweight edge deployment with minimal computational cost. Cross-modality solutions that fuse RGB and thermal data have demonstrated clinical viability but introduce additional sensor complexity and are not optimized for embedded platforms [22]. Meanwhile, traditional spectral methods achieve competitive accuracy but typically lack automated ROI localization and real-time performance [52,53,54]. In contrast, the proposed system is the first to integrate nostril-specific ROI tracking, adaptive MAD–hysteresis phase detection, and IBI validation within a real-time, edge-deployable framework. This combination enables robustness to motion and viewpoint variation while achieving state-of-the-art accuracy and real-time operation suitable for continuous home monitoring.

5. Discussion

The experimental results demonstrate that the proposed thermal-based system, which integrates a YOLO-based nostril detector executed on every second frame with Kalman prediction and an adaptive breathing-phase and IBI validation module, enables accurate and robust respiratory-rate estimation entirely on an embedded edge device. The frame-skipping strategy effectively reduces computational demand without compromising ROI continuity, while the time-domain phase logic, based on median and MAD thresholds with hysteresis and short-segment consolidation, maintains stable breathing-phase labeling under thermal drift and motion-induced noise. To strengthen the evaluation, two baseline methods, such as peak detection and FFT-based spectral analysis, were implemented and tested on the same real-time data collected in the study. As shown in Table 4, these baselines exhibit substantially higher MAE and RMSE across all conditions, confirming the advantage of the proposed method. Although several recently published thermal-based respiratory-rate estimation methods exist, direct comparison was not feasible. Most rely on substantially higher-resolution thermal sensors, computationally intensive 3D-CNN or transformer architectures, or RGB–thermal fusion pipelines that cannot be reproduced on the low-resolution dataset used in this study or deployed on embedded hardware.
In terms of privacy, the thermal modality used here does not capture facial texture, identity cues, or personally identifiable imagery. The 256 × 192 thermal frames contain only coarse temperature gradients, and the respiratory pipeline operates exclusively on a small nostril-level ROI, further reducing the possibility of re-identification. Although a formal privacy-impact assessment was not mandated for this pilot study, the sensing modality is inherently privacy-preserving compared with RGB-based approaches. When contrasted with genuinely anonymous alternatives such as radar or acoustic sensing, thermal imaging provides a favorable balance between privacy and spatial specificity: radar and acoustic systems offer strong anonymity but often exhibit reduced spatial precision, susceptibility to multipath or ambient noise, and difficulty maintaining stable anatomical anchoring for breath extraction [55,56]. By leveraging non-textured thermal data while retaining reliable nostril localization, the proposed approach achieves privacy-aware respiratory monitoring without compromising estimation accuracy.
Across all evaluated conditions, respiratory-rate estimation remained consistently accurate, with MAE values ranging from 0.34 to 0.98 BPM and RMSE values between 0.36 to 1.07 BPM across ten participants with an overall average of 0.57 ± 0.36 BPM with RMSE 0.64 ± 0.42 BPM. Errors were lowest during resting and paced-breathing trials, and increased modestly under soft-speech and posture variations due to mouth motion and partial ROI displacement. Distance and off-axis tests showed moderate increases in error, indicating that the system maintains reliable estimation up to approximately 1.5 m, even with a low-resolution thermal camera sensor. These results confirm that accurate, real-time respiratory-rate monitoring can be achieved using a adaptive, privacy-preserving thermal camera operating entirely on an edge platform. The experimental findings also demonstrate the adaptive nature of the proposed system across diverse conditions. Rather than relying on fixed thresholds or static parameters, the MAD-based phase logic continuously adjusts to variations in signal amplitude and noise, while the hysteresis and consolidation mechanisms ensure stable breathing-phase transitions. These adaptive behaviors collectively enable consistent performance under different breathing patterns and motion scenarios without manual recalibration.
A closer examination of failure modes provides further insight into the system’s behavior under challenging scenarios. During soft-speech condition, the primary source of degradation arose not only general facial motion but also from rapid upper-lip deformation and transient nostril occlusion caused by articulation. These movements introduce abrupt thermal discontinuities within the ROI, reducing local temporal coherence and lowering the effective signal-to-noise ratio (SNR) of the extracted temperature waveform. A similar failure trend is observed with increasing camera distance, where the nostril region shrinks from 597 px2 at 1.0 m to 165 px2 at 2.0 m. This reduction leads to diminished thermal gradient resolution, smaller oscillation amplitudes, and greater sensitivity to pixel-level quantization noise. Together, these analyses clarify the mechanisms underlying soft-speech- and distance-related performance degradation, complementing the quantitative results in Table 4.
Compared with existing methods summarized in Table 6, the primary advantage of the proposed system lies in its fully automated, real-time processing at minimal computational cost. Solely deep learning-based approaches can achieve strong performance but typically depend on high-resolution cameras and computationally intensive models, often requiring offline post-processing [19,26]. In contrast, conventional spectral methods can run on simpler hardware yet usually lack automated ROI localization and are sensitive to motion and baseline temperature drift [52,53]. The present design achieves both efficiency and stability by performing YOLO detections on thermal imagery, transferring the ROI via calibrated alignment, applying Kalman prediction on skipped frames, and estimating the breathing pattern directly in the time domain. This architecture maintains robustness to motion and noise while remaining fully compatible with real-time embedded execution.
A direct comparison with recent studies further contextualizes system performance. Mozafari et al. [26] reported an MAE of approximately 1.6 BPM using a 640 × 480 thermal camera with a 3D-CNN + BiLSTM model, while Nakai et al. [54] reported MAE values around 2.4 BPM using dual thermal ROIs. Gioia et al. [19] achieved an R 2 of roughly 0.10 with high-resolution imagery and offline 3D-CNN regression. Classical FFT/CZT-based approaches typically achieve MAE between 0.66 and 1.8 BPM under controlled conditions but depend on manual or semi-automated ROI selection [22]. In contrast, the proposed system delivers competitive accuracy using a lower-resolution 256 × 192 sensor while maintaining fully automated ROI tracking and real-time embedded execution. Furthermore, most prior thermal-based studies evaluated only one or two controlled breathing conditions, whereas the proposed system was validated across six diverse scenarios, including speech, off-axis rotation, posture variation, and distances up to 2.0 m, demonstrating robustness under a wider range of real-world variations.
Regarding model validity, the respiratory-rate estimation module does not involve any data-driven training and therefore cannot overfit to the subjects. All processing parameters, including band-pass filter settings, MAD-based thresholds, and hysteresis rules, were fixed in advance and applied identically to all participants. The YOLO nostril detector, the only trained component in the pipeline, was trained on an independent thermal dataset (7958 annotated frames) that did not include any of the subjects used in the R R evaluation. Ground-truth R R was obtained through dual-rater manual counting, and system accuracy was assessed using MAE and RMSE across all six experimental conditions.
Despite the advantages of the proposed method, several limitations were identified. The system requires a clearly visible nostril region to estimate the respiratory rate accurately. When the nostrils are covered, exhibit low thermal contrast, or move outside the camera’s field of view, such as when subjects wear masks or turn their heads excessively, the system may fail to produce valid respiratory rate estimates because no stable thermal signal can be extracted. In addition, accuracy decreases as the camera-to-subject distance increases. Beyond approximately 1.5–2.0 m, the nostril region becomes very small in the thermal frame, the average ROI area decreases from about 597 px2 to 165 px2, reducing temperature contrast and making the signal more susceptible to noise and minor tracking errors. Moreover, the proposed method is effective only when breathing occurs predominantly through the nasal pathway. During diaphragmatic or abdominal breathing, where nasal airflow is minimal, the thermal contrast around the nostrils becomes negligible, leading to weak or undetectable respiratory oscillations. Furthermore, when the ambient temperature is high, the subject’s facial temperature increases, diminishing the thermal contrast and making face or nostril detection unreliable.
As this work represents a feasibility study to demonstrate whether a low-resolution thermal camera can reliably provide non-invasive respiratory-rate monitoring in real time, the evaluation was intentionally limited to ten healthy adults within a narrow age range (mean age 33.3 ± 4.38 years). Consequently, elderly adults or individuals with respiratory conditions such as COPD, asthma, or sleep apnoea were not included, resulting in limited clinical validation and reduced population diversity. Future work will conduct broader clinical validation involving these patient groups, as well as individuals at risk of long-lie incidents, to ensure that the system performs reliably across diverse real-world populations.
To support these clinical applications, the system also requires further technical enhancements to ensure reliable respiration monitoring under more challenging real-world conditions. Future improvements may include integrating a low-power radar module to complement the thermal sensing, enabling respiration-phase estimation even when nasal airflow is weak or partially occluded. Beyond nasal-based measurements, future work will also explore extracting respiratory information from micro-motions of the shoulder or abdomen and identifying mouth-breathing episodes. Furthermore, fusing the proposed respiratory-rate estimator with the previously developed long-lie detection system [57] may improve reliability and reduce false alarms across varying distances, postures, and environmental conditions. This multi-modal integration would move the system toward more robust and clinically relevant continuous home monitoring.

6. Conclusions

This paper introduces an adaptive, fully automated, and privacy-preserving respiratory-rate monitoring system based on thermal imaging, designed for real-time execution on embedded edge hardware. The framework integrates a lightweight thermal-specific YOLO-based nostril detector, a detector-centric frame-skipping strategy with Kalman prediction for stable ROI continuity, and an adaptive median–MAD hysteresis algorithm with consolidation and IBI validation for robust time-domain respiration analysis.
Across six experimental conditions, including speech, off-axis rotation, posture variation, and distances up to 2.0 m, the system achieved an average MAE of 0.57 ± 0.36 BPM and RMSE of 0.64 ± 0.42 BPM, demonstrating that accurate and reliable respiratory-rate estimation is achievable using a compact 256 × 192 thermal sensor operating fully on a low-power embedded platform. The adaptive signal-processing pipeline consistently adjusted to variations in breathing amplitude, rhythm, and motion-induced disturbances without requiring manual recalibration. Notably, the achieved accuracy falls well within clinically acceptable tolerances for respiratory-rate monitoring, reinforcing its suitability for practical deployment in home or long-term monitoring environments.
Future work will include expanding the participant population and range of respiratory scenarios, improving resilience against occlusion and abdomen- or mouth-dominant breathing through the integration of a low-power radar module, and embedding the proposed respiratory module into an existing long-lie detection system. This multi-modal fusion of thermal and physiological information aims to enhance robustness, reduce false alarms, and support continuous, privacy-preserving home monitoring.

Author Contributions

R.A.: Conceptualization, Data Curation, Formal Analysis, Methodology, Software, Validation, Visualization, Writing—original draft. A.F.: Methodology, Supervision, Validation, Writing—review and editing. S.-Q.X.: Supervision, Methodology, Writing—review and editing. Z.Z.: Conceptualization, Supervision, Validation, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Indonesia Endowment Fund for Education (LPDP), Ministry of Finance of the Republic of Indonesia [Grant No. LOG-13094/LPDP.3/2024].

Institutional Review Board Statement

All procedures involving human participants were approved by the Faculty Research Ethics Committee (FREC) for Engineering and Physical Sciences (EPS), University of Leeds (Application No. 1499). Informed consent was obtained from all participants.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

The authors would like to thank all participants who were recruited for all the experiments of respiratory rate estimation.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
YOLOYou Only Look Once
ROIRegion of Interest
IIRInfinite Impulse Response
IBIInter-Breath Interval
R R Respiratory Rate
3D-CNNsThree-Dimensional Convolutional Neural Networks
SSDSingle Shot MultiBox Detector
YUVLuminance (Y) and chrominance components (U: blue-difference, V: red-difference)
C2fCross Stage Partial with 2 convolutions and fusion
BPMBreaths Per Minute
GUIGraphical User Interface
GPUGraphics Processing Unit
CPUCentral Processing Unit
DFLDistribution Focal Loss
PRPrecision–Recall
BiLSTMBidirectional Long Short-Term Memory
CZTChirp Z-Transform
RQIRegion Quality Index
FFTFast Fourier Transform
AMDFAverage Magnitude Difference Function
NICUNeonatal Intensive Care Unit
FPSFrames Per Second
RGBRed, Green, Blue color space
MADMedian Absolute Deviation
EMAExponential Moving Average
MAEMean Absolute Error
RMSERoot Mean Square Error
SNRsignal-to-noise ratio
p055th percentile
p9595th percentile
MSBmost significant byte
LSBleast significant byte
Std DevStandard deviation
COPDChronic Obstructive Pulmonary Disease

References

  1. Drummond, G.; Fischer, D.; Arvind, D. Current clinical methods of measurement of respiratory rate give imprecise values. ERJ Open Res. 2020, 6, 00023–2020. [Google Scholar] [CrossRef]
  2. Tobin, M.J. Breathing pattern analysis. Intensive Care Med. 2005, 18, 193–201. [Google Scholar] [CrossRef] [PubMed]
  3. Ashe, W.B.; McNamara, B.D.; Patel, S.M.; Shanno, J.N.; Innis, S.E.; Hochheimer, C.J.; Barros, A.J.; Williams, R.D.; Ratcliffe, S.J.; Moorman, J.; et al. Kinematic signature of high risk labored breathing revealed by novel signal analysis. Sci. Rep. 2024, 14, 27794. [Google Scholar] [CrossRef]
  4. Rivas, E.; López-Baamonde, M.; Sanahuja, J.; del Rio, E.; Ramis, T.; Recasens, A.; López, A.; Arias, M.; Kampakis, S.; Lauteslager, T.; et al. Early detection of deterioration in COVID-19 patients by continuous ward respiratory rate monitoring: A pilot prospective cohort study. Front. Med. 2023, 10, 1243050. [Google Scholar] [CrossRef]
  5. Peters, G.; Peelen, R.; Gilissen, V.; Koning, M.; Harten, W.; Doggen, C. Detecting patient deterioration early using continuous heart rate and respiratory rate measurements in hospitalized COVID-19 patients. J. Med. Syst. 2023, 47, 12. [Google Scholar] [CrossRef]
  6. Yadav, A.; Dandu, H.; Parchani, G.; Chokalingam, K.; Kadambi, P.; Mishra, R.; Jahan, A.; Teboul, J.; Latour, J. Early detection of deteriorating patients in general wards through continuous contactless vital signs monitoring. Front. Med. Technol. 2024, 6, 1436034. [Google Scholar] [CrossRef]
  7. McCartan, T.; Worrall, A.; Conluain, R.; Alaya, F.; Mulvey, C.; MacHale, E.; Brennan, V.; Lombard, L.; Walsh, J.; Murray, M.; et al. The effectiveness of continuous respiratory rate monitoring in predicting hypoxic and pyrexic events: A retrospective cohort study. Physiol. Meas. 2021, 42, 065005. [Google Scholar] [CrossRef] [PubMed]
  8. Ryynänen, O.P.; Kivelä, S.; Honkanen, R.; Laippala, P. Falls and lying helpless in the elderly. Z. Gerontol. 1992, 25, 278–282. [Google Scholar] [PubMed]
  9. Fleming, J.; Brayne, C. Inability to get up after falling, subsequent time on floor, and summoning help: Prospective cohort study in people over 90. BMJ 2008, 337, a2227. [Google Scholar] [CrossRef]
  10. Kubitza, J.; Schneider, I.T.; Reuschenbach, B. Concept of the term long lie: A scoping review. Eur. Rev. Aging Phys. Act. 2023, 20, 16. [Google Scholar] [CrossRef]
  11. Massaroni, C.; Nicolò, A.; Presti, D.; Sacchetti, M.; Silvestri, S.; Schena, E. Contact-Based Methods for Measuring Respiratory Rate. Sensors 2019, 19, 908. [Google Scholar] [CrossRef] [PubMed]
  12. Costanzo, I.; Sen, D.; Rhein, L.; Guler, U. Respiratory Monitoring: Current State of the Art and Future Roads. IEEE Rev. Biomed. Eng. 2020, 15, 103–121. [Google Scholar] [CrossRef]
  13. Liu, H.; Allen, J.; Zheng, D.; Chen, F. Recent development of respiratory rate measurement technologies. Physiol. Meas. 2019, 40, 07TR01. [Google Scholar] [CrossRef] [PubMed]
  14. Vitazkova, D.; Foltan, E.; Kosnacova, H.; Micjan, M.; Donoval, M.; Kuzma, A.; Kopani, M.; Vavrinsky, E. Advances in Respiratory Monitoring: A Comprehensive Review of Wearable and Remote Technologies. Biosensors 2024, 14, 90. [Google Scholar] [CrossRef]
  15. Xia, Z.; Sh; hi, M.M.H.; Li, Y.; Inan, O.T.; Zhang, Y. The Delineation of Fiducial Points for Non-Contact Radar Seismocardiogram Signals Without Concurrent ECG. IEEE J. Biomed. Health Inform. 2021, 25, 1031–1040. [Google Scholar] [CrossRef] [PubMed]
  16. Jalaja, A.A.; Kavitha, M. Contactless face video based vital signs detection framework for continuous health monitoring using feature optimization and hybrid neural network. Biotechnol. Bioeng. 2024, 121, 1190–1214. [Google Scholar] [CrossRef]
  17. Chian, D.-M.; Wen, C.-K.; Wang, C.-J.; Hsu, M.-H.; Wang, F.-K. Vital Signs Identification System With Doppler Radars and Thermal Camera. IEEE Trans. Biomed. Circuits Syst. 2022, 16, 153–167. [Google Scholar] [CrossRef]
  18. Negishi, T.; Abe, S.; Matsui, T.; Liu, H.; Kurosawa, M.; Kirimoto, T.; Sun, G. Contactless Vital Signs Measurement System Using RGB-Thermal Image Sensors and Its Clinical Screening Test on Patients with Seasonal Influenza. Sensors 2020, 20, 2171. [Google Scholar] [CrossRef]
  19. Gioia, F.; Pura, F.; Greco, A.; Piga, D.; Merla, A.; Forgione, M. Contactless Estimation of Respiratory Frequency Using 3D-CNN on Thermal Images. IEEE J. Biomed. Health Inform. 2025, 29, 7387–7395. [Google Scholar] [CrossRef]
  20. Yin, C.; Wang, G.; Xie, Y.; Tu, J.; Sun, W.; Kong, X.; Guo, X.; Zhang, D. Separated Respiratory Phases for In Vivo Ultrasonic Thermal Strain Imaging. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2022, 69, 1219–1229. [Google Scholar] [CrossRef]
  21. Alves, R.; van Meulen, F.; Overeem, S.; Zinger, S.; Stuijk, S. Thermal Cameras for Continuous and Contactless Respiration Monitoring. Sensors 2024, 24, 8118. [Google Scholar] [CrossRef] [PubMed]
  22. Maurya, L.; Zwiggelaar, R.; Chawla, D.; Mahapatra, P. Non-contact respiratory rate monitoring using thermal and visible imaging: A pilot study on neonates. J. Clin. Monit. Comput. 2022, 37, 815–828. [Google Scholar] [CrossRef]
  23. Jakkaew, P.; Onoye, T. Non-Contact Respiration Monitoring and Body Movements Detection for Sleep Using Thermal Imaging. Sensors 2020, 20, 6307. [Google Scholar] [CrossRef]
  24. Hurtado, D.; Abusleme, A.; Chávez, J.A.P. Non-invasive continuous respiratory monitoring using temperature-based sensors. J. Clin. Monit. Comput. 2019, 34, 223–231. [Google Scholar] [CrossRef]
  25. Jisha, S.; Jayanthi, T. Non-Contact methods for assessment of respiratory parameters. In Proceedings of the 2023 International Conference on Innovations in Engineering and Technology (ICIET), Muvattupuzha, India, 13–14 July 2023; pp. 1–6. [Google Scholar] [CrossRef]
  26. Mozafari, M.; Law, A.J.; Goubran, R.A.; Green, J.R. Respiratory Rate Estimation from Thermal Video Data Using Spatio-Temporal Deep Learning. Sensors 2024, 24, 6386. [Google Scholar] [CrossRef]
  27. Choi, J.; Oh, K.; Kwon, O.; Kwon, J.; Kim, J.; Yoo, S. Non-Contact Respiration Rate Measurement From Thermal Images Using Multi-Resolution Window and Phase-Sensitive Processing. IEEE Access 2023, 11, 112706–112718. [Google Scholar] [CrossRef]
  28. Xu, Y.; Khan, T.M.; Song, Y.; Meijering, E. Edge deep learning in computer vision and medical diagnostics: A comprehensive survey. Artif. Intell. Rev. 2025, 58, 93. [Google Scholar] [CrossRef]
  29. Rancea, A.; Anghel, I.; Cioara, T. Edge computing in healthcare: Innovations, opportunities, and challenges. Future Internet 2024, 16, 329. [Google Scholar] [CrossRef]
  30. Pankaj, A.; Kumar, A.; Kumar, M.; Komaragiri, R. Edge-Based Computation of Super-Resolution Superlet Spectrograms for Real-Time Estimation of Heart Rate Using an IoMT-Based Reference-Signal-Less PPG Sensor. IEEE Internet Things J. 2024, 11, 8647–8657. [Google Scholar] [CrossRef]
  31. Kumar, R.H.; Rajaram, B. Design and Simulation of an Edge Compute Architecture for IoT-Based Clinical Decision Support System. IEEE Access 2024, 12, 45456–45474. [Google Scholar] [CrossRef]
  32. Kanungo, P. Edge computing in healthcare: Real-time patient monitoring systems. World J. Adv. Eng. Technol. Sci. 2025, 15, 1–9. [Google Scholar] [CrossRef]
  33. Catalina, L.; Doru, A.; Călin, C. The use of thermographic techniques and analysis of thermal images to monitor the respiratory rate of premature new-borns. Case Stud. Therm. Eng. 2021, 25, 100926. [Google Scholar] [CrossRef]
  34. Alva, R.; Talasila, V.; Tv, S.; Umar, S.A.R. Estimation of Respiratory Rate Using Thermography. In Proceedings of the 2024 5th International Conference on Circuits, Control, Communication and Computing (I4C), Bangalore, India, 4–5 October 2024; pp. 154–159. [Google Scholar] [CrossRef]
  35. Luca, C.; Andritoi, D.; Corciova, C. Monitoring the Respiratory Rate in the Premature Newborn by Analyzing Thermal Images. Nov. Perspect. Eng. Res. 2022, 5, 129–143. [Google Scholar] [CrossRef]
  36. Shu, S.; Liang, H.-R.; Zhang, Y.; Zhang, Y.; Yang, Z. Non-contact measurement of human respiration using an infrared thermal camera and the deep learning method. Meas. Sci. Technol. 2022, 33, 075202. [Google Scholar] [CrossRef]
  37. Lazri, Z.; Zhu, Q.; Chen, M.; Wu, M.; Wang, Q. Detecting Essential Landmarks Directly in Thermal Images for Remote Body Temperature and Respiratory Rate Measurement With a Two-Phase System. IEEE Access 2022, 10, 39080–39094. [Google Scholar] [CrossRef]
  38. Hu, M.; Zhai, G.; Li, D.; Fan, Y.; Duan, H.; Zhu, W.; Yang, X. Combination of near-infrared and thermal imaging techniques for the remote and simultaneous measurements of breathing and heart rates under sleep situation. PLoS ONE 2018, 13, e0190466. [Google Scholar] [CrossRef]
  39. Kowalczyk, N.; Rumiński, J. Respiratory Rate Estimation Based on Detected Mask Area in Thermal Images. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–24 June 2023; pp. 6042–6051. [Google Scholar] [CrossRef]
  40. Kwon, J.; Kwon, O.; Oh, K.; Kim, J.; Yoo, S. Breathing-Associated Facial Region Segmentation for Thermal Camera-Based Indirect Breathing Monitoring. IEEE J. Transl. Eng. Health Med. 2023, 11, 505–514. [Google Scholar] [CrossRef]
  41. Cho, Y.; Julier, S.; Marquardt, N.; Bianchi-Berthouze, N. Robust tracking of respiratory rate in high-dynamic range scenes using mobile thermal imaging. Biomed. Opt. Express 2017, 8, 4480–4503. [Google Scholar] [CrossRef] [PubMed]
  42. Chen, D.-Y.; Lai, J. HHT-based remote respiratory rate estimation in thermal images. In Proceedings of the 2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Kanazawa, Japan, 26–28 June 2017; pp. 263–268. [Google Scholar] [CrossRef]
  43. Kwaśniewska, A.; Szankin, M.; Rumiński, J.; Kaczmarek, M. Evaluating Accuracy of Respiratory Rate Estimation from Super Resolved Thermal Imagery. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 2744–2747. [Google Scholar] [CrossRef]
  44. Romano, C.; Innocenti, L.; Schena, E.; Sacchetti, M.; Nicolò, A.; Massaroni, C. A Signal Quality Index for Improving the Estimation of Breath-by-Breath Respiratory Rate During Sport and Exercise. IEEE Sens. J. 2023, 23, 31250–31258. [Google Scholar] [CrossRef]
  45. Sabz, M.; MacLean, J.; Martin, A.R.; Rouhani, H. Characterization of Wearable Respiratory Sensors for Breathing Parameter Measurements. IEEE Sens. J. 2024, 24, 32283–32290. [Google Scholar] [CrossRef]
  46. Cheng, J.; Liu, R.; Li, J.; Song, R.; Liu, Y.; Chen, X. Motion-Robust Respiratory Rate Estimation From Camera Videos via Fusing Pixel Movement and Pixel Intensity Information. IEEE Trans. Instrum. Meas. 2023, 72, 4008611. [Google Scholar] [CrossRef]
  47. Wang, W.; den Brinker, A.C. Algorithmic insights of camera-based respiratory motion extraction. Physiol. Meas. 2022, 43, 075004. [Google Scholar] [CrossRef] [PubMed]
  48. Romano, C.; Schena, E.; Silvestri, S.; Massaroni, C. Non-Contact Respiratory Monitoring Using an RGB Camera for Real-World Applications. Sensors 2021, 21, 5126. [Google Scholar] [CrossRef]
  49. Yin, J.; Wu, F.; Su, H.; Huang, P.; Qixuan, Y. Improvement of SAM2 algorithm based on Kalman filtering for long-term video object segmentation. Sensors 2025, 25, 4199. [Google Scholar] [CrossRef]
  50. Fernandez, I.C.; Magpantay, P.; Rosales, M.D.; Hizon, J.R.E. Review of Kalman filters in multiple object tracking algorithms. In Proceedings of the 2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS), London, UK, 29–31 July 2024; pp. 1–4. [Google Scholar] [CrossRef]
  51. Kallioinen, N.; Hill, A.; Christofidis, M.; Horswill, M.; Watson, M. Quantitative systematic review: Sources of inaccuracy in manually measured adult respiratory rate data. J. Adv. Nurs. 2020, 77, 98–124. [Google Scholar] [CrossRef]
  52. Takahashi, Y.; Gu, Y.; Nakada, T.; Abe, R.; Nakaguchi, T. Estimation of Respiratory Rate from Thermography Using Respiratory Likelihood Index. Sensors 2021, 21, 4406. [Google Scholar] [CrossRef]
  53. Pereira, C.B.; Yu, X.; Goos, T.; Reiss, I.; Orlikowsky, T.; Heimann, K.; Venema, B.; Blazek, V.; Leonhardt, S.; Teichmann, D. Noncontact Monitoring of Respiratory Rate in Newborn Infants Using Thermal Imaging. IEEE Trans. Biomed. Eng. 2019, 66, 1105–1114. [Google Scholar] [CrossRef]
  54. Nakai, K.; Kurosawa, M.; Kirimoto, T.; Matsui, T.; Abe, S.; Suzuki, S.; Sun, G. Performance enhancement of thermal image analysis for noncontact cardiopulmonary signal extraction. Infrared Phys. Technol. 2024, 138, 105244. [Google Scholar] [CrossRef]
  55. Yang, Z.; Liu, Y.; Yang, H.; Shi, J.; Hu, A.; Xu, J.; Zhuge, X.; Miao, J. Noncontact Breathing Pattern Monitoring Using a 120 GHz Dual Radar System with Motion Interference Suppression. Biosensors 2025, 15, 486. [Google Scholar] [CrossRef] [PubMed]
  56. Husaini, M.; Kamarudin, L.; Zakaria, A.; Kamarudin, I.K.; Ibrahim, M.; Nishizaki, H.; Toyoura, M.; Mao, X. Non-Contact Breathing Monitoring Using Sleep Breathing Detection Algorithm (SBDA) Based on UWB Radar Sensors. Sensors 2022, 22, 5249. [Google Scholar] [CrossRef]
  57. Analia, R.; Forster, A.; Xie, S.-Q.; Zhang, Z. Privacy-Preserving Approach for Early Detection of Long-Lie Incidents: A Pilot Study with Healthy Subjects. Sensors 2025, 25, 3836. [Google Scholar] [CrossRef] [PubMed]
Figure 1. System overview of the proposed real-time thermal-based respiratory rate estimation. YOLO-based detector with Kalman tracking stabilizes the nostril region of interest (ROI), from which the coldest pixel temperature is extracted as the airflow-related signal. ROI min-temperature is band-pass filtered and analysed with MAD-based hysteresis and IBI validation to produce respiratory rate.
Figure 1. System overview of the proposed real-time thermal-based respiratory rate estimation. YOLO-based detector with Kalman tracking stabilizes the nostril region of interest (ROI), from which the coldest pixel temperature is extracted as the airflow-related signal. ROI min-temperature is band-pass filtered and analysed with MAD-based hysteresis and IBI validation to produce respiratory rate.
Sensors 26 00278 g001
Figure 2. Processing pipeline of thermal camera acquisition. The raw frame is split into two streams: image data (top) for visualization and heatmap generation, and thermal data (bottom) for temperature calculation, calibration, and ROI detection. This split is specific to the TOPDON TC001 dual-stream frame format. The asterisk (*) indicates that each block is further explained in the subsequent text to emphasize its specific role within the pipeline.
Figure 2. Processing pipeline of thermal camera acquisition. The raw frame is split into two streams: image data (top) for visualization and heatmap generation, and thermal data (bottom) for temperature calculation, calibration, and ROI detection. This split is specific to the TOPDON TC001 dual-stream frame format. The asterisk (*) indicates that each block is further explained in the subsequent text to emphasize its specific role within the pipeline.
Sensors 26 00278 g002
Figure 3. Thermal data decoding and calibration process. Each pixel is formed from two 8-bit bytes (MSB + LSB) combined to reconstruct R ( x , y ) , followed by linear calibration to produce the quantitative temperature map (°C).
Figure 3. Thermal data decoding and calibration process. Each pixel is formed from two 8-bit bytes (MSB + LSB) combined to reconstruct R ( x , y ) , followed by linear calibration to produce the quantitative temperature map (°C).
Sensors 26 00278 g003
Figure 4. Architecture of the YOLO-based nostril detection model for thermal imagery. The backbone extracts multi-scale thermal features, while the head performs feature fusion and detection across P3–P5 layers to localize the nostril region in real time. The blue dashed outline indicates the boundary of the detection head, and overlapping elements are used for compact visualization without affecting architectural interpretation.
Figure 4. Architecture of the YOLO-based nostril detection model for thermal imagery. The backbone extracts multi-scale thermal features, while the head performs feature fusion and detection across P3–P5 layers to localize the nostril region in real time. The blue dashed outline indicates the boundary of the detection head, and overlapping elements are used for compact visualization without affecting architectural interpretation.
Sensors 26 00278 g004
Figure 5. Illustration of respiratory signal extraction. (a) Cropped nostril ROI represented in the thermal domain; (b) corresponding raw nostril temperature signal S , where the oscillations of T ^ k denote inhalation (cooling) and exhalation (warming), indicated by the blue and red lines, respectively, providing the input for subsequent signal processing and respiratory rate estimation.
Figure 5. Illustration of respiratory signal extraction. (a) Cropped nostril ROI represented in the thermal domain; (b) corresponding raw nostril temperature signal S , where the oscillations of T ^ k denote inhalation (cooling) and exhalation (warming), indicated by the blue and red lines, respectively, providing the input for subsequent signal processing and respiratory rate estimation.
Sensors 26 00278 g005
Figure 6. Schematic illustration of the proposed respiration monitoring framework. Inhalation of cooler air lowers nostril temperature, while exhalation of warmer air raises it (left), illustrated by blue and red arrows, respectively. The airflow-induced modulation is captured from the thermal ROI and converted into a raw nostril temperature signal (middle). The extracted signal (green) exhibits quasi-periodic fluctuations that can be approximated by a sinusoidal model (red), reflecting the underlying breathing cycles (right).
Figure 6. Schematic illustration of the proposed respiration monitoring framework. Inhalation of cooler air lowers nostril temperature, while exhalation of warmer air raises it (left), illustrated by blue and red arrows, respectively. The airflow-induced modulation is captured from the thermal ROI and converted into a raw nostril temperature signal (middle). The extracted signal (green) exhibits quasi-periodic fluctuations that can be approximated by a sinusoidal model (red), reflecting the underlying breathing cycles (right).
Sensors 26 00278 g006
Figure 7. Breathing phase detection from nostril temperature signals. Raw (green) and smoothed (red) sequences are shown, with detected breathing phases indicated by orange dashed lines corresponding to exhalation phases; inhalation and hold are implicitly identified by phase transitions.
Figure 7. Breathing phase detection from nostril temperature signals. Raw (green) and smoothed (red) sequences are shown, with detected breathing phases indicated by orange dashed lines corresponding to exhalation phases; inhalation and hold are implicitly identified by phase transitions.
Sensors 26 00278 g007
Figure 8. Graphical user interface (GUI) of the proposed respiratory rate monitoring system. The left panel shows the thermal camera feed with automated nostril detection. The right panel illustrates the extracted nostril temperature signal, the green, red, and orange lines denote the raw signal, smoothed signal, and detected breathing phase, respectively. The bottom panel provides real-time vital signs, including current breathing phase, respiratory rate in BPM, nostril temperature, and processing frame rate (FPS).
Figure 8. Graphical user interface (GUI) of the proposed respiratory rate monitoring system. The left panel shows the thermal camera feed with automated nostril detection. The right panel illustrates the extracted nostril temperature signal, the green, red, and orange lines denote the raw signal, smoothed signal, and detected breathing phase, respectively. The bottom panel provides real-time vital signs, including current breathing phase, respiratory rate in BPM, nostril temperature, and processing frame rate (FPS).
Sensors 26 00278 g008
Figure 9. Illustration of experimental setup. (a) Seated configuration: thermal camera aligned at nose height at 1–2 m to capture nostril temperature; (b) Supine configuration: camera pitched downward to preserve nasal visibility.
Figure 9. Illustration of experimental setup. (a) Seated configuration: thermal camera aligned at nose height at 1–2 m to capture nostril temperature; (b) Supine configuration: camera pitched downward to preserve nasal visibility.
Sensors 26 00278 g009
Figure 10. Training performance curves of the YOLO-based nostril detection model. The plots show the evolution of box loss, classification loss, distribution focal loss (DFL), and evaluation metrics (precision, recall, mAP@0.5, and mAP@0.5–0.95) for both training and validation datasets over the training epochs.
Figure 10. Training performance curves of the YOLO-based nostril detection model. The plots show the evolution of box loss, classification loss, distribution focal loss (DFL), and evaluation metrics (precision, recall, mAP@0.5, and mAP@0.5–0.95) for both training and validation datasets over the training epochs.
Sensors 26 00278 g010
Figure 11. Performance curves of the YOLO-based nostril detection model on the validation dataset including F1–Confidence curve, Precision–Confidence curve, Precision–Recall (PR) curve, and Recall–Confidence curve. The results indicate consistently high precision, recall, and F1-scores across a wide confidence range, with optimal detection performance achieved at confidence thresholds between 0.66 and 0.86.
Figure 11. Performance curves of the YOLO-based nostril detection model on the validation dataset including F1–Confidence curve, Precision–Confidence curve, Precision–Recall (PR) curve, and Recall–Confidence curve. The results indicate consistently high precision, recall, and F1-scores across a wide confidence range, with optimal detection performance achieved at confidence thresholds between 0.66 and 0.86.
Sensors 26 00278 g011
Figure 12. Close-up example of the detected nostril region. The enlarged crop highlights the detection label and bounding box that are less visible in the full-resolution montage presented in Figure 13.
Figure 12. Close-up example of the detected nostril region. The enlarged crop highlights the detection label and bounding box that are less visible in the full-resolution montage presented in Figure 13.
Sensors 26 00278 g012
Figure 13. Qualitative results of the YOLO-based nostril detection model on thermal video frames. The model consistently localizes the nostril region with high confidence across varying head poses, facial orientations, and partial occlusions.
Figure 13. Qualitative results of the YOLO-based nostril detection model on thermal video frames. The model consistently localizes the nostril region with high confidence across varying head poses, facial orientations, and partial occlusions.
Sensors 26 00278 g013
Figure 14. Representative thermal frames showing automatically detected nostril regions (green bounding boxes) across different participants and experimental conditions, including relaxed, off-axis, distance (2 m) and supine.
Figure 14. Representative thermal frames showing automatically detected nostril regions (green bounding boxes) across different participants and experimental conditions, including relaxed, off-axis, distance (2 m) and supine.
Sensors 26 00278 g014
Figure 15. Representative breathing pattern signals extracted from the nostril ROI across four experimental conditions: (a) resting, (b) paced breathing, (c) soft speech, (d) off-axis yaw.
Figure 15. Representative breathing pattern signals extracted from the nostril ROI across four experimental conditions: (a) resting, (b) paced breathing, (c) soft speech, (d) off-axis yaw.
Sensors 26 00278 g015
Figure 16. Summary of respiratory rate estimation results across all experimental conditions. (a) Correlation between estimated and reference respiratory rates, showing strong agreement across all breathing conditions. (b) Box-plot comparison between reference and estimated respiratory rates under different conditions, illustrating close alignment and low dispersion. (c) Per-subject MAE distribution across six experimental conditions, highlighting individual and condition-specific variability. (d) Overall MAE, RMSE, and ROI-size analysis per condition, showing accuracy degradation as camera distance increases due to reduced nostril-region pixels and weakened thermal contrast; notably, during soft speech, increased facial motion leads to higher estimation error despite a larger ROI area compared with the distance condition.
Figure 16. Summary of respiratory rate estimation results across all experimental conditions. (a) Correlation between estimated and reference respiratory rates, showing strong agreement across all breathing conditions. (b) Box-plot comparison between reference and estimated respiratory rates under different conditions, illustrating close alignment and low dispersion. (c) Per-subject MAE distribution across six experimental conditions, highlighting individual and condition-specific variability. (d) Overall MAE, RMSE, and ROI-size analysis per condition, showing accuracy degradation as camera distance increases due to reduced nostril-region pixels and weakened thermal contrast; notably, during soft speech, increased facial motion leads to higher estimation error despite a larger ROI area compared with the distance condition.
Sensors 26 00278 g016
Figure 17. The effect of measurement distance on respiratory rate estimation performance. (a) shows the variation of error metrics (MAE and RMSE) with increasing distance, where solid bars denote MAE and dashed lines denote RMSE; (b) presents the corresponding change in ROI size across different measurement distances, where the dashed line highlights the overall trend.
Figure 17. The effect of measurement distance on respiratory rate estimation performance. (a) shows the variation of error metrics (MAE and RMSE) with increasing distance, where solid bars denote MAE and dashed lines denote RMSE; (b) presents the corresponding change in ROI size across different measurement distances, where the dashed line highlights the overall trend.
Sensors 26 00278 g017
Table 1. Profiling summary of computational performance on the Jetson Orin Nano (60 s run).
Table 1. Profiling summary of computational performance on the Jetson Orin Nano (60 s run).
(A) Runtime Breakdown per Component
ComponentMean (ms)p95 (ms)Percentage
Total Frame Time65.2085.30100%
YOLO Detection35.5042.1054.5%
Thermal Capture15.0030.0023.3%
Graph/GUI Update10.8014.2016.6%
Signal Processing3.503.805.4%
Temperature Extraction2.803.504.3%
Kalman Tracking1.200.801.8%
IBI Calculation0.500.800.8%
(B) System Resource Utilization(C) End-to-End Throughput Statistics
MetricMeanp05p95MetricValue (FPS)
CPU Usage42.5%35.2%58.3%Mean FPS22.5
GPU Usage68.2%67.5%80.3%Min FPS18.2
Memory Usage850.3 MB820.5 MB890.2 MBMax FPS28.8
p5024.5
p9524.5
Std Dev1.8
Table 2. Experimental design overview for respiratory rate data collection. Each block is a 60 s recording.
Table 2. Experimental design overview for respiratory rate data collection. Each block is a 60 s recording.
Condition SetBlocks per SubjectDurationDescription
Resting (spontaneous) 2 × 60 sSeated, natural nasal breathing, mouth closed
Paced breathing (metronome) 3 × 60 sSeated; guided at 12, 18, 24 BPM (randomized order); metronome target logged as auxiliary reference
Robustness (soft speech) 2 × 60 sSeated; counting aloud to emulate mild articulatory motion
Distance (stood) 3 × 60 sSpontaneous breathing at 1.0, 1.5, and 2.0 m; camera height/pitch held constant
Off-axis yaw (seated) 2 × 60 sSpontaneous breathing at ± 30 yaw; neutral pitch/roll instructed
Posture (supine) 2 × 60 sSpontaneous breathing in supine facing camera; camera pitched downward
Table 3. MAE, RMSE, and ROI pixel area (px2) of respiratory rate estimation per subject under six experimental conditions (pilot evaluation, N = 10).
Table 3. MAE, RMSE, and ROI pixel area (px2) of respiratory rate estimation per subject under six experimental conditions (pilot evaluation, N = 10).
Subj.RestingPacedSoft SpeechDistanceOff-AxisPosture
MAE RMSE px2 MAE RMSE px2 MAE RMSE px2 MAE RMSE px2 MAE RMSE px2 MAE RMSE px2
S10.10.122540.370.418470.750.9917790.570.743640.250.3512160.851.33808
S20.150.2117420.370.4620450.90.9818881.171.814930.250.2914201.051.18769
S30.40.418090.270.327851.251.577090.871.142860.40.5831.50.850.91544
S40.250.2513900.230.3510651.21.315280.931.023670.40.58310.350.38500
S50.10.114010.330.4213591.151.1619840.570.702730.40.4111640.30.36341
S60.60.623590.20.2217380.30.3617050.900.953290.350.4918970.30.31597
S70.20.2811020.130.148460.650.658610.700.792610.30.427770.81.06301
S80.50.597130.30.346240.90.957310.570.683600.40.415420.50.54427
S90.550.5710810.20.299750.750.7910460.430.553170.450.519350.050.07496
S100.550.5710550.160.1715591.91.9416960.700.723521.81.826440.750.87446
Table 4. Respiratory rate estimation errors across all experimental conditions (mean ± SD, N = 10 ). Baseline peak/FFT methods use one representative trial per subject due to their high sensitivity to artifacts, while the proposed method uses all recordings.
Table 4. Respiratory rate estimation errors across all experimental conditions (mean ± SD, N = 10 ). Baseline peak/FFT methods use one representative trial per subject due to their high sensitivity to artifacts, while the proposed method uses all recordings.
ConditionMAE (BPM)RMSE (BPM)ROI (px2)
Proposed Peak FFT Proposed Peak FFT
Resting (Spontaneous)0.34 ± 0.2010.07 ± 5.481.51 ± 1.300.36 ± 0.2011.24 ± 5.482.10 ± 1.301390 ± 614
Paced Breathing (Metronome)0.26 ± 0.080.93 ± 0.655.11 ± 5.900.31 ± 0.111.12 ± 0.657.58 ± 5.901284 ± 491
Robustness (Soft Speech)0.98 ± 0.434.39 ± 2.993.70 ± 3.921.07 ± 0.455.22 ± 2.995.24 ± 3.921393 ± 522
Distance (1.0–2.0 m)0.74 ± 0.278.33 ± 5.195.76 ± 5.770.91 ± 0.379.68 ± 5.197.94 ± 5.78340 ± 206
Off-axis Yaw ( ± 30 )0.50 ± 0.4610.72 ± 3.802.31 ± 3.790.57 ± 0.4511.30 ± 3.803.78 ± 3.791026 ± 428
Posture (Supine)0.58 ± 0.326.31 ± 4.043.47 ± 3.940.68 ± 0.407.38 ± 4.045.10 ± 3.94523 ± 192
Overall0.57 ± 0.366.79 ± 3.863.64 ± 3.880.64 ± 0.427.58 ± 3.865.48 ± 3.88
Table 5. Effect of distance on respiratory rate estimation accuracy and ROI size (mean ± SD, N = 10).
Table 5. Effect of distance on respiratory rate estimation accuracy and ROI size (mean ± SD, N = 10).
Distance (m)MAE (BPM)RMSE (BPM)ROI Size (px2)Observation *
1.00.27 ± 0.160.31 ± 0.16597 ± 138Clear nostril region, distinct thermal contrast
1.50.63 ± 0.310.69 ± 0.31260 ± 46Reduced contrast, smaller ROI
2.01.38 ± 0.671.52 ± 0.67165 ± 33Weak contrast, partial pixel loss
* Observations are based on visual inspection of thermal contrast and ROI clarity during recordings.
Table 6. Comparative summary of recent contactless respiratory rate ( R R ) estimation methods using thermal imaging. Each method is evaluated based on subjects/conditions, camera specifications, setup (including ROI and method), accuracy, runtime feasibility, and novelty. The proposed system emphasizes nostril-focused ROI, IBI validation, and real-time edge deployment.
Table 6. Comparative summary of recent contactless respiratory rate ( R R ) estimation methods using thermal imaging. Each method is evaluated based on subjects/conditions, camera specifications, setup (including ROI and method), accuracy, runtime feasibility, and novelty. The proposed system emphasizes nostril-focused ROI, IBI validation, and real-time edge deployment.
StudySubjects & ConditionsCamera Spec & Setup
(ROI + Method)
Accuracy (BPM)Real-TimeContribution
Ours10 adults; six 60-s sets (resting, paced 12/18/24 BPM, soft speech, distance 1–2 m, yaw ± 30°, posture supine)TOPDON TC001 (256 × 192); RO: Ithermal YOLOv8n (even frames, s = 10)+ Kalman tracking on skipped frames; band-pass 0.08–0.7 Hz; adaptive MAD–hysteresis phase detection + IBI validationMAE (mean 0.57 ± 0.36 ); RMSE (mean 0.64 ± 0.42 )YesThermal-based YOLO detector with Kalman tracking; adaptive MAD–hysteresis phase and IBI validation
Gioia et al. [19]30 adults; 5-min tasks (rest, Stroop, emotion); R R 9–30FLIR T640 (640 × 480); ROI: upper lip/nose (manual); 3D-CNN end-to-end regression R 2 0.61 (no MAE/RMSE)NoFeasibility of end-to-end deep learning directly from thermal video
Mozafari et al. [26]22 adults; sitting/standing × mask/no-mask, 90 sFLIR T650sc (640×480); ROI: full face (DeTr); 3D-CNN + BiLSTM with correlation lossMAE 1.6 ± 0.4 YesDeep learning robust to mask & posture; real-time feasibility focus
Maurya et al. [22]14 adults (rest, talking, variable); 8 neonates (NICU)FLIR-E60 (320 × 240) + Logitech C922 RGB (960 × 720); ROI: nose–mouth (from RGB mapped to thermal); Hampel+MA+BP filtering; CZT spectral analysisAdults: MAE 0.10–1.8; Neonates: MAE ≈ 1.5NoCross-modality ROI mapping; validated adults & neonates
Takahashi et al. [52]7 adults; paced 15–30 BPMFLIR Boson 320 (320 × 256); ROI: face subregions (scored by RQI); YOLOv3 + RQI; FFT on best regionMAE 0.66; LoA ± 2 BPMNoROI quality index (RQI) for automated subregion selection
Pereira et al. [53]12 adults (rest, pathological); 8 neonates (NICU)InfraTec VarioCAM HD (1024 × 768); ROI: full face (multi-grid, black-box); adaptive spectral analysis (autocorr, AMDF, peak detection)RMSE 0.31 (rest), 3.27 (varied), 4.15 (neonates)NoFirst NICU validation; black-box ROI without anatomical landmark
Nakai et al. [54]11 healthy adults; seated at 1 m distance (lab environment)FLIR A315 (320 × 240, 60 fps); manually defined nose & shoulder ROIs; dual-signal extraction (thermal variation + shoulder motion); band-pass filtering + autocorrelation/FFT for R R estimation r = 0.83 vs. belt; MAE 2.4 ; RMSE 2.4 NoDual-ROI thermal approach combining nasal temperature and shoulder motion
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Analia, R.; Forster, A.; Xie, S.-Q.; Zhang, Z. Adaptive Thermal Imaging Signal Analysis for Real-Time Non-Invasive Respiratory Rate Monitoring. Sensors 2026, 26, 278. https://doi.org/10.3390/s26010278

AMA Style

Analia R, Forster A, Xie S-Q, Zhang Z. Adaptive Thermal Imaging Signal Analysis for Real-Time Non-Invasive Respiratory Rate Monitoring. Sensors. 2026; 26(1):278. https://doi.org/10.3390/s26010278

Chicago/Turabian Style

Analia, Riska, Anne Forster, Sheng-Quan Xie, and Zhiqiang Zhang. 2026. "Adaptive Thermal Imaging Signal Analysis for Real-Time Non-Invasive Respiratory Rate Monitoring" Sensors 26, no. 1: 278. https://doi.org/10.3390/s26010278

APA Style

Analia, R., Forster, A., Xie, S.-Q., & Zhang, Z. (2026). Adaptive Thermal Imaging Signal Analysis for Real-Time Non-Invasive Respiratory Rate Monitoring. Sensors, 26(1), 278. https://doi.org/10.3390/s26010278

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop