Respiration monitoring for premature neonates in NICU

In this paper, we investigate an automated pipeline to estimate respiration signals from videos for premature infants in neonatal intensive care units (NICUs). Two flow estimation methods, namely the conventional optical flowand deep learning-based flow estimation methods, were employed and compared to estimate pixel motion vectors between adjacent video frames. The respiratory signal is further extracted via motion factorization. The proposed methods were evaluated by comparing our automated extracted respiration signals to that extracted from chest impedance on videos of five premature infants. The overall average cross-correlation coefficients are 0.70 for the optical flow-based method and 0.74 for the deep flow-based method. The average root mean-squared errors are 6.10 and 4.55 for the optical flowand the deep flow-based methods, respectively. The experimental results are promising for further investigation and clinical application of the video-based respiration monitoring method for infants in NICUs.


Introduction
Vital signs, such as heart rate, blood pressure, respiratory rate, and body temperature, are physical parameters that can be measured and used to assess physiological state and functioning. Monitoring of vital parameters is a crucial topic in neonatal daily care. Premature infants have an immature respiratory control that predisposes them to apnea/periodic breathing, haemoglobin oxygen desaturation, and bradycardia [1,2]. Apnea is defined as the status of cessation of respiratory airflow, whereas periodic breathing is characterized by groups of respiratory movements interrupted by small intervals of apnea [3]. Continuous monitoring of respiration for premature infants is critical to detect abnormalities in breathing and help develop early treatments to prevent significant hypoxia and central depression from apnea. A long-term continuous monitoring approach of respiration may also be used to assess sleep stage, which varies with different clinical conditions [4,5].
The existing methods for monitoring respiration include nasal thermocouples, spirometers, transthoracic inductance, respiratory effort belt transducer, piezoelectric transducer, optical sensor (pulse oximetry), strain gauge, impedance plethysmography, and electrocardiogram (ECG). Currently, the respiration of premature infants in a neonatal intensive care unit (NICU) is monitored by bedside monitors. ECG is considered as the standard reference measurement for respiration, since ECG can provide stable and robust monitoring for a NICU. However, the pressure of the contact sensor may also change the local skin perfusion, which yields not a true measurement as compared to the non-contact sensor. The electrodes may exert pressure to the skin, yielding to tissue compression and vascular insufficiency. As a consequence, applying the ECG electrodes on infant skin for a long time increases the risk of trauma and infections. Removing the adhesives from the immature skin, as part of regular care, can damage the immature skin of preterm infants, as well as cause stress and pain [6,7]. This technique is impractical for home care, since the sensors have to be placed by skilled caregivers, and wearing the sensors also causes inconvenience for everyday life [8,9].
A non-contact monitoring system is a good alternative to improve infant comfort and safety and it also has the potential for monitoring at home. Few works to date have investigated video-based contactless methods for monitoring respiration. In this study, we developed and evaluated a contactless respiration measurement method based on video monitoring. The contribution of our method is: (1) we propose an approach for non-contact respiration monitoring; and (2) we validated the method on clinical data acquired in NICU.

Related Work
In the applications of adults, Deng et al. [10] presented a design and implementation of a novel sleep monitoring system that simultaneously analyzed respiration, head posture, and body posture for adults. For respiration, the region of breathing movement was automatically determined and the intensity estimated, yielding a waveform indicating respiratory rhythms with an accuracy of 96% in recognizing abnormal breathing. However, the accuracy of the respiration rate was not provided. Chazal et al. [11] applied a contactless biomotion sensor of Doppler radar for respiration monitoring. Respiratory frequency and magnitude were used for the classification of determining sleep/wake states, which achieved the accuracy of 69% for the wake and 88% for sleep state. Gupta et al. [12] estimated heart rate (HR) accurately, using face videos acquired from a low-cost camera for adults. The face video consisting of frontal, profile, or multiple faces, was divided into multiple overlapping fragments to determine HR estimates. The HR estimates were fused using quality-based fusion, which aimed to minimize illumination and face deformations. Prathosh et al. [13] proposed a general framework for estimating a periodic signal, which was applied to derive a computationally inexpensive method for estimating respiratory pattern using two-dimensional cameras that did not critically depend on the region of interest. Specifically, the patterns were estimated by imaging changes in the reflected light caused by respiration-induced motion. Estimation of the pattern was cast as a blind deconvolution problem and was solved through a method comprising subspace projection and statistical aggregation.
In terms of the applications for infants, Werth et al. reviewed the unobtrusive measurements for indicating sleep state in preterm infants [14]. Abbas et al. [15] attempted to detect respiration rate of neonates on a real-time basis using infrared thermography. They analyzed the anterior naris (nostrils) temperature profile associated with the inspiration and expiration phases. However, the region of interest (ROI) was assumed to be fixed after initialization. Moreover, the method is not practical for NICU infants, since the faces of premature infants in a NICU are often occluded by feeding tubes and/or breathing masks. In practice, the temperature inside an incubator is continuously monitored and adjusted by caregivers in order to help infants maintain their body temperature in a normal range. However, the controlled environmental temperature might affect the accuracy of the thermography-based respiration monitoring method. Koolen et al. [16] extracted the respiration rate from video data included in polysomnography. They used Eulerian video magnification (EVM) to amplify the respiration movements, which was followed by optical flow to estimate the respiration motion and therefore, obtained a respiration signal. Independent component analysis and principal component analysis were applied to improve signal quality. Finally, the results showed a detection accuracy of 94.12% for sleeping-stage patients. Antognoli et al. [17] applied a digital webcam (WeC) and an EVM algorithm to measure HR and respiration rate (RR). The accumulated RGB values of a manually selected ROI were calculated as a single signal, from which the power spectral density was further estimated and used for peak extraction. The evaluation based on data of seven patients yielded a root mean-squared error (RMSE) of 12.2 for the HR and 7.6 for the RR. However, the limitation of a pulse-based respiration extraction is that it extracts the respiratory modulation in blood volume changes, which is both subject dependent (different respiratory efforts) and measurement-location dependent. The modulation effect varies in different body parts. Moreover, the peak selection from spectrograms was based on common knowledge of normal HR and RR frequency ranges, which is not suitable for infants under clinical conditions of different diseases.
Methods on remote sensing of respiration or other contactless physiological measures have been developed for adults and infants while our applications focus on premature infants in NICU.

Material
Our study was conducted with videos recorded at the Máxima Medical Center in Veldhoven, The Netherlands, by a fixed-position high-definition camera (Camera model IDS uEye monochrome) filming the infant's entire body in the direction of the foot to the head. Figure 1 shows an example of a captured video frame. We decided on the recording position of the camera by considering: (1) little or no interruption to daily routine care; and (2) a good viewpoint for observing vertical movement of the infant chest, where the movement is assumed to be with maximum respiratory motion energy in the vertical direction. For all infant recordings, written consent was obtained from the parents. The resolution of each video frame is 736 × 480 pixels, while the frame rate is 8 fps. The videos were recorded under uncontrolled, regular hospital lighting conditions. Five infants with an average gestational age of 29.6 ± 2.8 weeks (range 27 +0 -33 +6 weeks), an average postnatal age of 1.2 ± 0.6 weeks (range 0 +4 -2 +1 weeks), and an average weight of 1555 ± 682.4 g (range 755-2410 g) were filmed. Parallel with the video capturing, standard chest impedance (CI) signals were recorded simultaneously as reference standards.

Motion Matrix Calculation
The flowchart of the proposed system is shown in Figure 2, where motion matrix estimation is an essential step in the pipeline. We estimate dthe pattern of apparent motion including respiration of infants in videos. Motion matrix estimation can be defined as the distribution of apparent velocities of movement in successive images. Two flow estimation methods were employed to estimate pixel motion vectors between adjacent video frames. First, the conventional optical flow method [18] was utilized and evaluated. However, the optimal flow is more sensitive to high gradient texture, whereas in our case the infant chest is either bare or covered by a blanket. Thus, in both cases, the chest area, which mostly indicates the respiratory motion, lacks gradient texture. Deep flow [19] is sensitive to the entire part of moving objects with less regard to texture information. Therefore, better performance is expected by using a flow estimation method based on deep learning.
For both methods, the derived motion vectors contain respiration information, induced by the motion of the abdominal wall and chest wall. We only considered the vertical motion vector, since the respiration related motion is mainly in the vertical direction in the captured videos. For infants under one year of age, it is also recommended by the American Academy of Pediatrics (AAP) that they should be placed to sleep on their backs every time being laid down to sleep. This lowers the risk of sudden infant death syndrome (SIDS).

Conventional Optical Flow
The optical flow should satisfy Equation (1): where I is the intensity matrix of a frame in a video and V x are V y are the actual optical flow parameters. We deployed the classic dense optical flow algorithm proposed by Barron et al. [18], where second-order differential equations based on the Hessian matrix are used to constrain two-dimensional (2D) velocity. The Barron method creates flow fields with 100% density. However, the conventional optical flow method is limited at estimating motion in poorly textured areas, which lack gradient variation.

Deep Flow
The above conventional optical flow approach only computes image matching at a single scale. However, for complicated situations with complex motions, the traditional approach may not be able to effectively capture the interaction or dependency relationships. Brox and Malik [20] proposed to add the addition of a descriptor matching term in the variational approach, which allows better handling large displacements. The matching provides a guide using correspondences from sparse descriptor matching.
In our study, we applied the method from Weinzaepfel et al. [19]. A descriptor matching algorithm was incorporated, which is based on deep convolutions with six layers, interleaving convolutions and max-pooling. In the proposed framework, dense sampling is applied to efficiently retrieve quasi-dense correspondences, while incorporating a smoothing effect on the descriptors matches. Figure 3 shows the framework for deep flow estimation.

Respiratory Description
The results of flow calculation were captured in the rows of the derived motion matrix M (of size N × W, where N denotes the number of pixels in a video frame and W is the total number of frames in a video), which contain the motion derivatives that represent the velocity magnitudes of the pixel trajectories in the vertical direction.
The spatial statistics of the flow matrix was further analyzed by applying robust principal component analysis (PCA) [21]. PCA includes the eigendecomposition of a data covariance matrix or singular value decomposition of a data matrix. The decomposition task projects the original data onto an orthogonal subspace, where each direction is mutually de-correlated and the most informative data information becomes available in the first several principal components. In our study, instead of considering the originally obtained matrix, the first eigenvalue of the covariance matrix of flows was analyzed, since the first eigenvalue represents the major motion component. Besides, it is beneficial to reduce residual motion noise that has lower energy than the respiratory motion in each video. The motion matrix M is masked by sub-spatiotemporal regions, m i , where each m i is a local mask that stores W consecutive squared blocks. For each m i , we can generate the eigenvectors that satisfy the following general condition, specified by where Det(·) denotes the matrix determinant, I represents the identity matrix, and D i and λ i are eigenvectors and eigenvalues, respectively.

Evaluation
In this study, we employed a sliding window of 120 s with a step size of 1 s to estimate the respiration rate. The respiration rate was calculated by first averaging the time intervals between breathing peaks, followed by converting those numbers to a frequency value, expressed in the unit of breaths per minute (bpm).
It is well known that the respiration signal can be estimated from the CI signals. To evaluate the performance of respiration estimation using our video-based algorithms, we computed the cross-correlation coefficients and the RMSE as measures to compare the respiration rates of the reference breathing signal from the CI and our extracted respiration signals. The cross-correlation coefficient calculation is initially based on a zero-mean process. Thus, this calculation is only for comparing the respiration rate variance of our estimation and the reference standard. In addition, correlation plots and Bland-Altman plots [22,23] were also created.    Table 1 summarizes the estimation of respiration rate using two different optical flow methods. Table 2 shows the root mean-squared errors (RMSE) and cross-correlation (CC) coefficients of the reference breathing signal from the CI, compared to our optical flow-and deep flow-based results. The results obtained from deep flow are more accurate than those of the conventional optical flow approach. The average cross-correlation coefficients of all videos is 0.74 and the average RMSE is 4.55, using the proposed deep learning-based method. Despite the limited number of measurements, these preliminary results provide a promising result for further investigation of the neonatal video-based respiration monitoring method. The overall RMSE for our deep learning-based method (4.55) shows the feasibility to apply our automated respiration for clinical use. From both the correlation and Bland-Altman analyses in Figure 6a,b, the deep flow based-method produces a lower error than the optical flow-based method, especially when breathing rates are less than 50 bpm. The absolute values for the overall mean errors reduce from 4.8 to 2.7 bpm in the optical flow and deep flow cases, respectively.

Experimental Results and Discussion
There is a residual error made by our automated processing pipeline, where the automated method underestimates the respiration rate. This occurs because our band filter fails to remove motion caused by interruptions in the video, for example movement resulting from nurse care-handling. In the future, we will investigate a supervised approach to replace unsupervised filters to extract the noise from respiration signals.
We compared the effect of two different flow estimation methods on the final respiration calculation. The results show that the deep flow approach is both sensitive to homogeneous regions and the boundary area, whereas the conventional approach is more sensitive to the boundary area. This advantage of deep flow-based approach improves the whole processing pipeline by increasing both the accuracy and robustness. We consider that the accuracy of our algorithm for extracting respiration signal may be affected by the captured image resolution and image sensor noise. If the resolution is rather low, the movement information from different regions can be blended within one pixel, which is not ideal for accurate motion extraction. If the image sensor noise is too high such that the sensor noise components in pixel values dominate the pixel changes induced by respiratory motion, the measurement will be polluted.
The quantitative analysis regarding this perspective is a complicated matter and will be performed as future work.
Currently, the work is carried out as a feasibility study. The focus of this study is the installation, adaptation, and validation of camera-based monitoring technology in the NICU setting. We have not investigated the performance on preterm infants related to specific diseases. In the future, we will further validate our algorithm on infants having different health situations.
Our recordings were taken from real clinical practice without interfering with the clinical workflow. Our algorithm works when the respiratory motion can be observed by the camera, even as subtle movement, i.e., infants can be either naked or covered by a blanket (as long as the infant body has contact with the blanket such that the movement information from the thorax-abdomen can still be derived).
Our algorithm relies on the intensity of video frames for motion extraction (i.e., no color or chromaticity information is used). Therefore, for the nighttime condition, it is possible to measure the respiration signal by using the same software algorithms that we have proposed and just altering to an infrared camera with an infrared lighting source. During low-light conditions, the performance of our system may be affected by the noise induced by the camera sensor.
The highlight of using a video-based method to monitor respiration is its contact-free operation. Both CI and polysomnography need electrodes attached to the patient's skin, which increase the risk of skin irritation. Therefore, our method can improve the comfort level and convenience. A further benefit of using a camera is that it enables more measurements than contact-based bio-sensors, including physiological signals (e.g., breathing rate, heart rate, and blood oxygen saturation) [24,25] and contexture signals (e.g., body motion, activities, and facial expressions) [26][27][28][29]. This will enrich the functionality of a health monitoring system. Our system can be constructed with a generic webcam and an embedded computing platform, which forms a cost-effective solution. In principle, one camera can monitor multiple subjects/infants simultaneously, as long as they are captured by the camera view, while each contact-based bio-sensor can only monitor one single subject/infant.

Conclusions
In this study, we applied an automated pipeline to estimate respiration signals from videos for premature infants in NICUs. We compared our automated extracted respiration signals to that extracted from the CI. The preliminary results are promising for further investigation of the video-based respiration monitoring method and for applying our automated respiration extraction for infants in NICUs. Experiments showed that the deep learning-based method outperforms the optical flow-based method in accuracy (low RMSE) and robustness. In the future, we will investigate the possibility of directly applying a deep learning framework to estimate respiration rate. For example, an LSTM-based system can effectively incorporate temporal information for a regression task and is expected to further enhance the obtained results.