Machine Learning-Based Automatic Classification of Video Recorded Neonatal Manipulations and Associated Physiological Parameters: A Feasibility Study

Our objective in this study was to determine if machine learning (ML) can automatically recognize neonatal manipulations, along with associated changes in physiological parameters. A retrospective observational study was carried out in two Neonatal Intensive Care Units (NICUs) between December 2019 to April 2020. Both the video and physiological data (heart rate (HR) and oxygen saturation (SpO2)) were captured during NICU hospitalization. The proposed classification of neonatal manipulations was achieved by a deep learning system consisting of an Inception-v3 convolutional neural network (CNN), followed by transfer learning layers of Long Short-Term Memory (LSTM). Physiological signals prior to manipulations (baseline) were compared to during and after manipulations. The validation of the system was done using the leave-one-out strategy with input of 8 s of video exhibiting manipulation activity. Ten neonates were video recorded during an average length of stay of 24.5 days. Each neonate had an average of 528 manipulations during their NICU hospitalization, with the average duration of performing these manipulations varying from 28.9 s for patting, 45.5 s for a diaper change, and 108.9 s for tube feeding. The accuracy of the system was 95% for training and 85% for the validation dataset. In neonates <32 weeks’ gestation, diaper changes were associated with significant changes in HR and SpO2, and, for neonates ≥32 weeks’ gestation, patting and tube feeding were associated with significant changes in HR. The presented system can classify and document the manipulations with high accuracy. Moreover, the study suggests that manipulations impact physiological parameters.

and tube feeding. Lastly, we demonstrate the value of synchronized video and physiological data by describing variations in physiological parameters associated with the identified manipulations.

Materials and Methods
This section describes the methodology of acquiring, synchronizing, and analyzing neonatal NICU data captured with respect to manipulations.

Setting and Study Sample
Digital data were collected from a sample of neonates admitted to two NICUs over a three-month (April 2020-June 2020) duration. The study sites included 22 urban beds urban and 17 rural beds; both were level III NICUs in India. The urban NICU is staffed by three neonatologists with a doctorate in neonatal sciences, three residents, and 20 nurses. The rural NICU is staffed by three neonatologists with a doctorate in neonatal sciences, four residents, and 18 nurses. The Institutional Review Board of both NICUs approved the study with a waiver of informed consent. All electronic health records were de-identified (in accordance with Health Insurance Portability and Accountability Act (HIPAA)), and all the research was performed according to relevant guidelines. Prior to the study, written consent to the video monitoring and physiological data acquisition were obtained from the parents of eligible neonates at both study sites. All the data were stored in the de-identified form in the protected health information environment as per the HIPAA compliance. Hemodynamically stable neonates who stayed in the NICU for more than 24 h and did not have assisted ventilation were eligible. Neonates with congenital anomalies or on palliative care were excluded.

Data Collection
A sample of 10 neonates was recruited for this study. De-identified individual patient admission-to-discharge data were electronically recorded using the iNICU platform [25]. This study was purely observational, and at no point in time were clinical decisions or interventions affected by study data results. The data were entered on bedside tablets through an iPad Pro (12.9 inches, IInd generation) using a Chrome browser, and data were stored in the Postgres SQL database. The clinical diagnoses of each neonate were determined by consulting neonatologists using the International Classification Diseases (ICD) ninth revision during daily rounds (morning, afternoon, and evening) performed at the patient bedside.

Video Acquisition of Manipulation
During the study, the physiological data of neonates were collected using the NEO device [26]. The NEO system was improved with an additional camera module, and the size was further reduced (Appendix B, section B: NEO TINY system). Figure 1 shows the setup in a typical NICU setting. The wall mount was installed at the same height as the baby warmer's top to minimize interference in the routine NICU workflow (Appendix B, Figure A1). The installed wall mount could be adjusted as per the discretion of onsite clinicians. The 'Logitech C920 Universal Serial Bus (USB) camera was installed facing the neonate. All the units' beds were handled in the same way, and all the beds were equipped with cameras. The camera videos had a resolution of 1280 × 720 pixels and were recorded at 30 frames per second.
Videos recording was continuous for most neonate's NICU stay, but the parents or clinical staff could switch off the recording while the neonate was removed from the bed, such as during weight measurement and kangaroo care or breastfeeding, for privacy reasons. Thus, intermittent video data segments of each neonate were available for further analysis. Videos recording was continuous for most neonate's NICU stay, but the parents or clinical staff could switch off the recording while the neonate was removed from the bed, such as during weight measurement and kangaroo care or breastfeeding, for privacy reasons. Thus, intermittent video data segments of each neonate were available for further analysis.

Physiological Parameters of Manipulation
Along with live video recording, real-time physiological data were simultaneously captured from the patient monitors (Appendix B: section D). All the monitors did not have the ability to record respiratory rate (RR); hence, this parameter was not used in the analysis. Heart rate (HR) and oxygen saturation (SpO2) were continuously recorded before, during, and after the manipulations.

Selection of Manipulations to be Studied
Video data were annotated manually with clinicians' help, and a spreadsheet was maintained for ground truth labels of the manipulations. The overall system architecture is presented in the flow diagram shown in Figure 2. Appendix B describes the (A) hardware, data acquisition, and synchronization of video and physiological data and (B) software specifications. Appendix C describes the clinical staff interface to show an annotated video frame with physiological signals, missing data in the NICU environment, and data security. For the current feasibility study, we chose commonly used non-invasive manipulations (i) patting, (ii) diaper change, and (iii) tube feeding (definitions Table 1). The interventions were selected post hoc.

Physiological Parameters of Manipulation
Along with live video recording, real-time physiological data were simultaneously captured from the patient monitors (Appendix B: section D). All the monitors did not have the ability to record respiratory rate (RR); hence, this parameter was not used in the analysis. Heart rate (HR) and oxygen saturation (SpO 2 ) were continuously recorded before, during, and after the manipulations.

Selection of Manipulations to Be Studied
Video data were annotated manually with clinicians' help, and a spreadsheet was maintained for ground truth labels of the manipulations. The overall system architecture is presented in the flow diagram shown in Figure 2. Appendix B describes the (A) hardware, data acquisition, and synchronization of video and physiological data and (B) software specifications. Appendix C describes the clinical staff interface to show an annotated video frame with physiological signals, missing data in the NICU environment, and data security. For the current feasibility study, we chose commonly used non-invasive manipulations (i) patting, (ii) diaper change, and (iii) tube feeding (definitions Table 1). The interventions were selected post hoc. Tube Feeding: (orogastric) placed into the stomach. The feeding is provided through a tube into the stomach until the baby can take food by mouth.

Input Data, Training, and Validation Data Set
Examples of video captured patting, diaper change and tube feeding manipulations are shown in Figure 3. Acquired video sequences were down-sampled at 15 frames per second (fps) to reduce redundant computations, and images were resized from the original 1280 × 720 pixels to a color image of 720 × 480 pixels. Manipulations were initially divided based on category and neonatal identifier. Based on the discussions with the clinical team, it was hypothesized that 8 s of video data for any neonatal manipulation were sufficient to distinguish between the different types. Therefore, for each manipulation, data were processed at 8-s intervals amounting to 120 frames total. After that, the video clip corresponding to manipulation was extracted manually and then considered a training sequence. Following this, the next video sequence was extracted by sliding the cursor programmatically by 1 s to build the next 8 s subset. Although only the first 8 s were used for classifying the type of manipulation, all the frames were used for activity recognition. This process was repeated for the entire duration of the captured video of each manipulation. Appendix C explains how the clinical team visualized the video and physiological data.  Definition: This manipulation utilizes a soft tube placed through the nose (nasogastric) or mouth (orogastric) placed into the stomach. The feeding is provided through a tube into the stomach until the baby can take food by mouth. [29] Spatial features: Nurse's hand, milk, syringe attached to the feeding tube (with or without plunger) Temporal features: Frequency: 2 h Duration: 10-30 min

Input Data, Training, and Validation Data Set
Examples of video captured patting, diaper change and tube feeding manipulations are shown in Figure 3. Acquired video sequences were down-sampled at 15 frames per second (fps) to reduce redundant computations, and images were resized from the original 1280 × 720 pixels to a color image of 720 × 480 pixels. Manipulations were initially divided based on category and neonatal identifier. Based on the discussions with the clinical team, it was hypothesized that 8 s of video data for any neonatal manipulation were sufficient to distinguish between the different types. Therefore, for each manipulation, data were processed at 8-s intervals amounting to 120 frames total. After that, the video clip corresponding to manipulation was extracted manually and then considered a training sequence. Following this, the next video sequence was extracted by sliding the cursor programmatically by 1 s to build the next 8 s subset. Although only the first 8 s were used for classifying the type of manipulation, all the frames were used for activity recognition. This process was repeated for the entire duration of the captured video of each manipulation. Appendix C explains how the clinical team visualized the video and physiological data.

Classification of Manipulation Using Convolutional Neural Network (CNN)
The image classification technique has matured to a stage where facial recogni has become part of all consumer phones. An industrial set of algorithms trained on large existing dataset is now available, which can be used to detect different images as specific business domain requirements. In the current study (Figure 4), an existing trained Inception-v3 CNN model [30] was used with prior ImageNet weights for colo Red Green Blue (RGB) images. The CNN-based models were then further improved w the concept of transfer learning [31], wherein the output of pre-trained models (suc InceptionV3) is trained for a specific task at hand. In our study, the task was to recogn the neonatal manipulations, and, currently, there are no established neonatal datab

Classification of Manipulation Using Convolutional Neural Network (CNN)
The image classification technique has matured to a stage where facial recognition has become part of all consumer phones. An industrial set of algorithms trained on the large existing dataset is now available, which can be used to detect different images as per specific business domain requirements. In the current study (Figure 4), an existing pretrained Inception-v3 CNN model [30] was used with prior ImageNet weights for colored Red Green Blue (RGB) images. The CNN-based models were then further improved with the concept of transfer learning [31], wherein the output of pre-trained models (such as InceptionV3) is trained for a specific task at hand. In our study, the task was to recognize the neonatal manipulations, and, currently, there are no established neonatal databases for neonatal procedures. We conducted the transfer learning process by providing training on our annotated images marked as (i) patting, (ii) diaper change, and (iii) tube feeding. This step improves the accuracy of the manipulation-tagging model.
We extracted the weights of the CNN (InceptionV3) model to extract features of the images and combine them with LSTM layers to perform activity recognition. The sequential 2048 feature vector, an output of the InceptionV3 model representing activity in a manipulation, was input to the LSTM model. The LSTM layers were followed by additional dense layers and followed by a three-layer softmax layer. An early stopping criterion with the patience of 8 was employed. This monitors the validation loss and stops the training when the loss deteriorates for eight successive epochs. The model was implemented in Keras [34] and TensorFlow [35] and used the 'categorical cross-entropy' loss function and 'adam' optimizer. The EarlyStopping callback was used to stop training on the epoch when the accuracy metric has stopped improving [36].

Variation in Physiological Signals Associated with Manipulation
The variations in physiological parameters during manipulations were compared with those of baseline (defined as 5 min before the manipulation) and post-manipulation (defined as 5 min after the manipulation). The performance of the InceptionV3 CNN model with the transfer learning layer was also visualized by the t-Distributed Stochastic Neighbor Embedding (t-SNE) plot [31,32], which take perplexity as a user-specified input parameter. Perplexity corresponds to the effective number of neighbors considered for obtaining the embeddings and was shown to be robust over the range of 5-50 [33]. We picked the perplexity value of 35 to visualize the best segregation of neonatal manipulation. The individual image frames of videos were resized to 226 × 226 pixels as per Inception-v3 specifications.

Activity Recognition Combining CNN Output with LSTM
From a computer vision perspective, a neonatal manipulation, such as diaper change, is a collection of image frames collected over time encapsulating the activity (manipulations). Therefore, we further wrap up the pre-trained CNN model into a time series layer to bring the concept of manipulation (sequence of images). The output of the Time-distributed CNN model generates an output of the 2048-dimensional feature vector. This vector conveys information about constituent objects, such as the neonate, the clinical staff, diapers, syringe, and plunger, and their spatial attributes and how they correlate during the manipulations. It is not feasible to visualize these vectors in a human-readable format in the current deep learning landscape.
The CNN models are very accurate in classifying images, but other branches of machine learning, such as deep learning (e.g., Long Short Term Memory; LSTM), have also progressed to identify the activities. After training of the combined CNN and LSTM, the system can automatically classify the neonatal manipulations.
We extracted the weights of the CNN (InceptionV3) model to extract features of the images and combine them with LSTM layers to perform activity recognition. The sequential 2048 feature vector, an output of the InceptionV3 model representing activity in a manipulation, was input to the LSTM model. The LSTM layers were followed by additional dense layers and followed by a three-layer softmax layer. An early stopping criterion with the patience of 8 was employed. This monitors the validation loss and stops the training when the loss deteriorates for eight successive epochs. The model was implemented in Keras [34] and TensorFlow [35] and used the 'categorical cross-entropy' loss function and 'adam' optimizer. The EarlyStopping callback was used to stop training on the epoch when the accuracy metric has stopped improving [36].

Variation in Physiological Signals Associated with Manipulation
The variations in physiological parameters during manipulations were compared with those of baseline (defined as 5 min before the manipulation) and post-manipulation (defined as 5 min after the manipulation).

Performance Metrics
We measured the performance of the CNN/LSTM model in the classification of neonatal manipulations using Positive Predictive Value (PPV) (Equation (1)), Sensitivity (Equation (2)), and F-measure (Equation (3)), which are defined as: where TP, FP, and FN are true positive (TP: manipulation patting, diaper change, and tube feeding detected correctly), false positive (FP: when the system detects a manipulation when there is none), and false negative (FN: when there is manipulation that the system does not detect). For data with normal distribution, a two-sided paired t-test with a significance level <0.05 was used to compare physiological parameters during and after manipulations. This was based on our assumption that the physiological values may increase or decrease during and after manipulations in comparison to the baseline data.

Overall Activity Detection Model Evaluation
The model evaluation was done using leave-one-out cross-validation (LOOCV) utilizing PPV and sensitivity metrics. In the NICUs involved in the current study, nurses did not document routine care activities, such as diaper change and patting, in the EMR system. The comparison of tube feeding records between documented nursing records and automated tube feeding notes highlights the additional temporal data captured by machine learning-based automated classification system. The tube feeding duration and time duration from the last tube feeding were not captured in current EMR records.
Based on the visual investigation of data with the clinical team, spatial and temporal features in manipulations were documented (Table 1) to understand the classification task.

Results
The results of the feasibility study conducted to verify the designs of automated tagging of manipulation are below.

Baseline Data
Ten neonates admitted to NICU were enrolled from December 2019 to April 2020. The baseline characteristics of study subjects are displayed in Table 2. The mean gestational age was 34.7 weeks (range, 26 weeks to 40 weeks), and the mean birth weight of study subjects was 1893.8 g (range, 800 g to 3231 g).  Table 3 shows the average duration of a patting, diaper change, and tube feeding. A total of 64 diaper changes (average duration, 45.5 s), 108 tube feedings (average duration, 108.9 s), and 167 patting's (average duration, 28.9 s) were recorded and utilized for analysis.

CNN Based Classification of Manipulations
The 2048 features generated from manipulation images were plotted using t-SNE visualization (a) without transfer learning, which means without the knowledge of the current domain, and (b) with transfer learning. Without the transfer learning (Figure 5a), the ImageNet based Inception-V3 pre-trained model cannot classify the neonatal manipulations. However, after the transfer learning, except for a few outliers, the transfer-learning based Inception-V3 model can visualize the images of patting, diaper change, and tube feeding successfully (Figure 5b).
RDS: Respiratory Distress Syndrome, NNH: Neonatal Hyperbilirubinemia. Table 3 shows the average duration of a patting, diaper change, and tube feeding. A total of 64 diaper changes (average duration, 45.5 s), 108 tube feedings (average duration, 108.9 s), and 167 patting's (average duration, 28.9 s) were recorded and utilized for analysis.

CNN Based Classification of Manipulations
The 2048 features generated from manipulation images were plotted using t-SNE visualization (a) without transfer learning, which means without the knowledge of the current domain, and (b) with transfer learning. Without the transfer learning (Figure 5a), the ImageNet based Inception-V3 pre-trained model cannot classify the neonatal manipulations. However, after the transfer learning, except for a few outliers, the transfer-learning based Inception-V3 model can visualize the images of patting, diaper change, and tube feeding successfully (Figure 5b).  The accuracy of CNN-based model in classifying the manipulation frame/image is displayed in Figure 6. The validation accuracy (red) was achieved after eight epochs. The accuracy of CNN-based model in classifying the manipulation frame/image displayed in Figure 6. The validation accuracy (red) was achieved after eight epochs.

LSTM Based Classification of Manipulation Videos
The validation of automatic video classification was done in clinical settings, and accuracy was 85% on the validation dataset. The comparison of NTS data with respect nurse documented procedures is shown in Table 4. The 2048 features from the Inceptio V3 model were generated for all frames present in the duration of the manipulation vid The performance of the deep learning model obtained is presented in Table 5. T model automatically annotates the manipulation of a given neonate.

LSTM Based Classification of Manipulation Videos
The validation of automatic video classification was done in clinical settings, and the accuracy was 85% on the validation dataset. The comparison of NTS data with respect to nurse documented procedures is shown in Table 4. The 2048 features from the Inception-V3 model were generated for all frames present in the duration of the manipulation video. The performance of the deep learning model obtained is presented in Table 5. The model automatically annotates the manipulation of a given neonate. Figure 7 demonstrates different manipulations that are classified by the CNN/ LSTM model during the validation phase.    Figure 8 (a to c) show variations in physiological parameters during the patting, d per change, and tube feeding manipulations, respectively. There was an associated crease in normalized heart rate between before and during the period for neonates weeks' gestation (shown blue color) for all the three manipulations.

Physiological Signal Variations during Manipulations
Figure 8a-c show variations in physiological parameters during the patting, diaper change, and tube feeding manipulations, respectively. There was an associated increase in normalized heart rate between before and during the period for neonates <32 weeks' gestation (shown blue color) for all the three manipulations.  Table 6 shows the HR and SpO 2 physiological variables for each of the three manipulations. The significant changes (p < 0.05) are: (I) For <32 weeks: (a) HR increased during diaper changes and decreased afterward, (b) SpO 2 increased during the diaper change. (II) For ≥32 weeks: (a) HR increased during patting and decreased afterward, (b) the HR decreased after tube feeding.

Discussion
The NICU environment is highly complex, with critically ill neonates who require multiple medical devices, such as patient monitors, ventilators, syringe pumps, and infusion pumps. These many devices leave minimal working space for movement around the bedside. Therefore, a pocket-sized data aggregator, NTS, has been developed to capture valuable data with a small footprint; with its pocket-sized design (5.8 cm × 4.1 cm × 7.7 cm), it is ideal for cluttered workspaces and roaming device workflows. For video monitoring, the camera was wall-mounted above the neonate's bed to avoid interfering with routine workflow in the NICU. The NTS client device synchronizes the acquired medical device and video data and sends it to the EMR platform. The platform displays the live video feed of a neonate, along with all the acquired vital parameters data for clinical interpretation.
The framework presented in this study can enable automatic identification of manipulation, generate corresponding EMR documentation of those manipulations, and measure changes in physiological parameters. The study demonstrates a machine learning model to classify three common neonatal care manipulations: (a) patting, (b) diaper change, and (c) tube feeding. It is important to highlight that the transfer learning of classifying the manipulations like tube feeding will strongly depend on local practices, such as syringe use, the position of the end for the tube, and even the use of gloves (and their colors). The authors anticipate that NICUs in a given geographical region or associated with similar neonatal research networks can develop a unique dataset of images as per their practices. This dataset can be readily used as 'training' module for the system for that group of NICU units.
In this study, the model was able to classify the manipulations with 95% accuracy in the training dataset (accompanying loss of 0.0026) and 85% in the validation dataset (with accompanying loss of 0.0409). During the manipulations, the physiological parameters were compared with those captured prior to the manipulation and after the manipulation, in neonates <32 weeks' gestation, diaper changes were associated with significant changes in HR and SpO 2 (perhaps due to crying with subsequently increased minute ventilation). In comparison, for neonates ≥32 weeks' gestation, patting and tube feeding was associated with significant changes in HR. The health impact of these vital sign changes associated with routine care practices is unclear. The ability to detect continuous changes in physiological parameters associated with machine learning-driven monitoring of common neonatal manipulations in the NICU illustrates the capability of the NTS model, which could be further used for further analysis of how neonatal manipulations and procedures impact short-and long-term outcomes.
Most NICUs have strict light and sound control protocols, both in the larger NICU environment and in the local environment of each neonate. In the current study, open incubators were used with most of the neonates. The lights in the NICU were recommended to be dim most of the time. We did not find any difference in the automatic classification of manipulations in different light conditions. However, these finding needs to be confirmed with a large sample size. In future studies, the feasibility of night vision mode in these cameras needs to be explored in poor light conditions. Moreover, the advent of 3-D cameras allows manipulation of specific data to be captured, which will also be explored in future efforts. With the emergence of artificial intelligence, it is anticipated that continuous monitoring and analysis will help avoid unnecessary manipulations that may cause a negative neuro-sensorial stimulus to premature and sick neonates. If specific neonatal manipulations and procedures are associated with worse outcomes, future research using the NTS model could assess how modifying routine care practices to target vital sign ranges could improve outcomes.

Limitations
While the presented study shows promise for future NICU neonatal monitoring applications, certain limitations need to be considered. As a pilot study to assess the feasibility of the system, only a small number of patients were recruited. Future studies will need to assess potential differences regarding gender, different gestational age groups, and other demographic parameters. A larger cohort of neonates will need to be recruited to build a physiological database that will provide more balanced data for machine learning models to simulate the NICU environment. The presented approach only utilized labeled data of three manipulations for ten neonates. The recognition capabilities of the deep learning model can be explored further by including the data of more manipulations and more neonates. (e.g., some neonatal manipulations or procedures, such as heel prick, last only a few seconds). In the current study, monitors did not capture per second data; hence the study lacks the complete resolution of physiological data required for the detailed analysis of brief manipulations or procedures. The study did not consider the medications that neonates were receiving during their stay in the NICU; since sedatives and analgesics can potentially affect the stress experienced by neonates [37] future studies should consider individual patient drug dosages and half-lives.

Conclusions and Future Directions
The present study demonstrates a framework to help clinical staff evaluate changes in physiological parameters associated with common care manipulations in the NICU. Due to the limitations of human resources, close and constant observation of neonates on a 24-h basis is a challenge. The current study model, which utilizes state-of-the-art computer vision and analyzes physiological parameter variations, may be a useful adjunct to assess neonates. Moreover, this framework will be extended to build video databases for other neonatal manipulations and procedures, which can be used for (a) skill evaluation of clinical staff and (b) improving the care documentation. Although the current results showed the feasibility of the system, its efficiency still needs to be studied in the larger NICU population across different sites. Another future direction is to include surrounding contextual data, such as lighting conditions, ambient noise in the NICU, and the number of clinical staff around neonates, to study the overall effect on the neonates while conducting manipulations.
Future studies will capture real-time physiological data from bedside monitors in millisecond resolution synchronized with the video data. The millisecond data will help study the impact of non-invasive and invasive manipulations (such as heel prick, intubation, and extubation) in a more granular manner with associated clinical events apnea, bradycardia, and desaturations. Recent advances in the computer vision and deep learning community have shown successful use of semi-supervised and unsupervised domain adaptation techniques. These methods could be leveraged to reduce the data labeling requirements further, while adapting the proposed system to new NICU units. In addition, given reported racial disparities in neonatal care in the United States [38,39], the NTS system could be used to study racial inequities in the NICU regarding average time dedicated to care manipulations of neonates from different racial backgrounds to provide quantifiable, informative data to healthcare providers.

Code Availability
The code that underpins the video analytics documentation is openly available. A Jupyter Notebook containing the code used to generate the descriptive statistics and tables included in this paper is available at: https://github.com/CHIResearch/IEEEVideo. README.md file has all the script-related and other details. Funding: This research project is funded privately by Child Health Imprints (CHIL) Pte. Ltd., Singapore.

Conflicts of Interest:
The Child Health Imprints (CHI) as an organization is focused on using technology to improve outcomes in NICU. It is disclosed that all the associated members are employees of CHI. The team has created iNICU, NEO, and analytics modules focused on the early prediction of disease and optimizing outcomes. Harpreet Singh and Ravneet Kaur are co-founders and own stock in the CHI. The informatics and clinical advisory team are responsible for providing academic inputs.

A. Wall Mount
The camera was placed on the wall mount that was installed at the same height as the baby warmer's top to minimize interference in the routine NICU workflow ( Figure A1). The three divisions in wall mount were done to provide 360-degree rotation capability, along with horizontal and vertical shift possibility, to place the camera's field of view on the neonate's body.

B. NEO TINY System: Hardware Design
The NEO TINY system is a small form factor NEO device that can easily be set up along with existing patient monitors, ventilators, and other biomedical devices connected to a neonate in the NICU setting ( Figure A2). The NTS client module captures the video and physiological data of neonates. It collects physiological data from medical devices, like bedside monitors and ventilators, along with video data from USB-based cameras. Figure A3 shows the hardware image of the NTS client with respect to its dimensions and various networking ports for integration.

C.
Hardware Specifications There is one RS232 interface that connects to the serial port of internal NanoPi NEO Core2 single-board computer (SBC) and enables them to communicate with medical devices using the RS232 connector interface ( Figure A4) mounted on the main Printed Circuit Board (PCB), and it is visible through the external top face of the NEO TINY device. Figure A2. Size comparison of a 3 × 3 Rubik's cube and NEO TINY system client.
The NTS client module captures the video and physiological data of neonates. It collects physiological data from medical devices, like bedside monitors and ventilators, along with video data from USB-based cameras. Figure A3 shows the hardware image of the NTS client with respect to its dimensions and various networking ports for integration. The NTS client module captures the video and physiological data of neonates. It collects physiological data from medical devices, like bedside monitors and ventilators, along with video data from USB-based cameras. Figure A3 shows the hardware image of the NTS client with respect to its dimensions and various networking ports for integration.

C.
Hardware Specifications There is one RS232 interface that connects to the serial port of internal NanoPi NEO Core2 single-board computer (SBC) and enables them to communicate with medical devices using the RS232 connector interface ( Figure A4) mounted on the main Printed Circuit Board (PCB), and it is visible through the external top face of the NEO TINY device.

C. Hardware Specifications
There is one RS232 interface that connects to the serial port of internal NanoPi NEO Core2 single-board computer (SBC) and enables them to communicate with medical devices using the RS232 connector interface ( Figure A4) mounted on the main Printed Circuit Board (PCB), and it is visible through the external top face of the NEO TINY device.
There is an RJ45 connector that is mounted on the main PCB and is connected to NanoPi NEO Core2 SBC's RJ45 interface. A USB hub is also provided on the main PCB to connect up to three USB port compatible devices. The front panel of the device has a Thin Film Transistor (TFT) Liquid Crystal Display (LCD) screen indicating notifications and alerts messages. NTS client can be powered using 5 V 2 A USB adaptor or battery backup and can be switched on/off using slider switch. NTS client's power supply involves step upconverters and battery charging integrated circuit (IC) in order to ensure 5 V supply throughout the device. The mainboard of the NTS client is populated with one micro USB for charging and one micro USB for programming purposes. Table A2 provides  There is an RJ45 connector that is mounted on the main PCB and is connected to NanoPi NEO Core2 SBC's RJ45 interface. A USB hub is also provided on the main PCB to connect up to three USB port compatible devices. The front panel of the device has a Thin Film Transistor (TFT) Liquid Crystal Display (LCD) screen indicating notifications and alerts messages. NTS client can be powered using 5 V 2 A USB adaptor or battery backup and can be switched on/off using slider switch. NTS client's power supply involves step upconverters and battery charging integrated circuit (IC) in order to ensure 5 V supply throughout the device. The mainboard of the NTS client is populated with one micro USB for charging and one micro USB for programming purposes. Table A2 provides the hardware specification of NEO TINY.   NTS client software layer uses a Java version 1.8 based program to capture medical device data using the Health Level Seven (HL7) protocol on a Debian operating system. The acquisition of medical device data and associated biomedical protocols have been previously described in detail [26]. The NTS client captures video stream data from a Logitech USB camera with a built-in H.264 encoder using a Video4Linux version 2 (V4L2) application programming interface (API) [45]. The on-camera H.264 encoding minimizes the compute power on the NTS client device and ensures higher compression than other encoders. The video stream is sent via Wi-Fi to the streaming engine server within the NICU premises [46]. The transmission of video data occurs in two stages: (I) First, the avconv command (a Unix command) to grab data from a USB camera and transmit the video stream over USB to the NTS client over low latency based User Datagram Protocol (UDP) [47]. (II) In the second stage, the Secure Reliable Transport (SRT) protocol is used to transmit the video stream from the NTS client to the server [48,49]. The NTS system has a latency of 1-2 s and consumes up to 5 Mbps internet speed to display live video feeds with a resolution of 1280 × 720 pixels.

E. Synchronization of Video and Physiological Data
The server layer referred to as the NTS Server receives both video streaming and medical device data of the neonates. The live video is based on the Logitech camera clock, whereas the physiological data from the cardio-respiratory monitors are based on the equipment clock ( Figure A5). The clock of the camera acquiring video data and bedside monitor capturing the physiological data were manually configured in the same time zone (UTC: Universal coordinated time) described in Table A3.
Video Data Capturing by NTS Client Two system services are running on the NTS system, which are stream publish and SRT wrapped. Explanation of system services is as follows: The Stream Publish system service code snippet is shown below: This service will run a script called streamPublish.spresent in "/usr/local/streampublish" directory. The snippet of streamPublish.sh script file. In the streamPublish.sh script, "capture" is an output build file of V4l2 API written in C, which captures H.264 encoded stream at the resolution of 1280 × 720p @30FPS from the camera. Then, the avconv command takes the captured stream from the capture file using pipe and transmits it to localhost at port 1000 (127.0.0.1:1000) using UDP protocol.
This service runs a script called srtwrapped.sh. Here, also, if the srtwrapped.sh script crashes due to some reason, the system service will try to restart the scrip automatically after 5 s. The "srtWrapped" system service snippet is shown below. The srtwrapped.sh script sends the stream from localhost to wowza Server IP at a designated port assigned to the NEO TINY using SRT protocol.
In the current study, GE B40®patient monitor (GE Healthcare, Milwaukee, WI, USA), SureSigns®VM6 patient monitor (Philips Medical Systems, Inc., Cleveland, OH, USA), and Philips Intellivue MP70 (Philips, Andover, MA, USA) were used. Both video and physiological data, collected using NTS client, are updated with NTS client clock time. The NTS client clock is synchronized with NTP (Network Time Protocol) at regular time intervals. The time difference between video and physiological signals is adjusted at regula time intervals. In the current study, regular time intervals of 10, 30, 60, and 120 min wer tried for offset calculations; 60-min video recordings were most optimum. The server rec ords the incoming video stream by splitting the recording every hour to manage the off sets [50]. The current time of the NTS client is injected as meta-data into both video and physiological data streams. Moreover, a scheduled Cron job runs every 30 min to synchro The time difference between video and physiological signals is adjusted at regular time intervals. In the current study, regular time intervals of 10, 30, 60, and 120 min were tried for offset calculations; 60-min video recordings were most optimum. The server records the incoming video stream by splitting the recording every hour to manage the offsets [50]. The current time of the NTS client is injected as meta-data into both video and physiological data streams. Moreover, a scheduled Cron job runs every 30 min to synchronize the NTS client's clock with the NTP server (configurable using XML).
As both video and physiological signals are captured, the offset (difference) between the two clocks increases based on hardware and processing capabilities. This offset in milliseconds is depicted in Table A3. After 60 min, the time offset between the clocks was around 549 milliseconds. To ensure the synchronization of video and monitor data, the video clock was reset every 60 min by the offset time. The outputs of combined data are displayed to users using HTML5 based webapplication ( Figure A6). The live video stream is displayed using Web Real-Time Communication (WebRTC) video player [51], and physiological trends are shown as highcharts (a software library for charting written in JavaScript) [52].

B. Missing Data in the NICU Environment
The NTS client-server architecture can be affected by various reasons, such as bandwidth and network issues. The network and bandwidth can cause data reception delays on the NTS server. The payload size of each client request in JSON (JavaScript Object Notation) format is one kilobyte, consisting of (a) medical device information, (b) patient data, and (c) NTS client information. The small payload size allows the NTS client to perform in low bandwidth settings with a minimum internet requirement of 5 Mbps. However, the acquired patient data are stored locally on the NTS client and is sent to a cloudbased NTS server at regular intervals. NTS client has an on the device storage capacity of 8 GB of data. The server also evaluates the transmission performance of all the NTS clients and notifies the user of any data loss during a given timeframe. Since the device data

B. Missing Data in the NICU Environment
The NTS client-server architecture can be affected by various reasons, such as bandwidth and network issues. The network and bandwidth can cause data reception delays on the NTS server. The payload size of each client request in JSON (JavaScript Object Notation) format is one kilobyte, consisting of (a) medical device information, (b) patient data, and (c) NTS client information. The small payload size allows the NTS client to perform in low bandwidth settings with a minimum internet requirement of 5 Mbps. However, the acquired patient data are stored locally on the NTS client and is sent to a cloud-based NTS server at regular intervals. NTS client has an on the device storage capacity of 8 GB of data. The server also evaluates the transmission performance of all the NTS clients and notifies the user of any data loss during a given timeframe. Since the device data capturing resolution in the present study is set to 1 min, therefore for each patient, a total of 1440 data points are received in 24 h.
During the patient's NICU stay, physiological signal acquisition is affected by data disconnection caused by sensors falling off or poor contact. The vital tracker displays the total number of data points received for a given patient ( Figure A7). In cases where the NTS server does not receive physiological data for 5 min, then the patient's placard flashes red, and audio-visual alarms are generated to notify the onsite clinical staff ( Figure A8). To consider the quality of physiological signals affected by these external factors, the extreme values, which were not associated with clinical events, were excluded from the analysis.

B.
Missing Data in the NICU Environment The NTS client-server architecture can be affected by various reasons, such as bandwidth and network issues. The network and bandwidth can cause data reception delays on the NTS server. The payload size of each client request in JSON (JavaScript Object Notation) format is one kilobyte, consisting of (a) medical device information, (b) patient data, and (c) NTS client information. The small payload size allows the NTS client to perform in low bandwidth settings with a minimum internet requirement of 5 Mbps. However, the acquired patient data are stored locally on the NTS client and is sent to a cloudbased NTS server at regular intervals. NTS client has an on the device storage capacity of 8 GB of data. The server also evaluates the transmission performance of all the NTS clients and notifies the user of any data loss during a given timeframe. Since the device data capturing resolution in the present study is set to 1 min, therefore for each patient, a total of 1440 data points are received in 24 h.
During the patient's NICU stay, physiological signal acquisition is affected by data disconnection caused by sensors falling off or poor contact. The vital tracker displays the total number of data points received for a given patient ( Figure A7). In cases where the NTS server does not receive physiological data for 5 min, then the patient's placard flashes red, and audio-visual alarms are generated to notify the onsite clinical staff ( Figure A8).
To consider the quality of physiological signals affected by these external factors, the extreme values, which were not associated with clinical events, were excluded from the analysis.

C. Data Security
Data transmission frequencies vary among medical devices. Some devices, such as cardio-respiratory monitors, continuously send data at a regular 60-s interval, whereas certain devices, such as blood gases, are used and transmit data intermittently. Depending on a patient's respiratory needs, continuous positive airway pressure devices or ventilators are utilized and provide data at defined intervals (usually multiple values per minute). NTS enables data acquisition from various devices based on specific protocols, such as HL7 (Health Level Seven) [53], ASCII (American Standard Code for Information Interchange), ASTM (American Society for Testing and Materials) [54], binary, or proprietary. Moreover, in the current study, the video camera sends the streaming feed at 30 fps. The data acquisition module transmits the acquired video and medical device data at a per- Figure A8. Baby placard notifying the disconnection of sensor capturing physiological data (the red icon is flashed continuously until the physiological data resumes).

C. Data Security
Data transmission frequencies vary among medical devices. Some devices, such as cardio-respiratory monitors, continuously send data at a regular 60-s interval, whereas certain devices, such as blood gases, are used and transmit data intermittently. Depending on a patient's respiratory needs, continuous positive airway pressure devices or ventilators are utilized and provide data at defined intervals (usually multiple values per minute). NTS en-ables data acquisition from various devices based on specific protocols, such as HL7 (Health Level Seven) [53], ASCII (American Standard Code for Information Interchange), ASTM (American Society for Testing and Materials) [54], binary, or proprietary. Moreover, in the current study, the video camera sends the streaming feed at 30 fps. The data acquisition module transmits the acquired video and medical device data at a per-minute resolution.
The medical environment is highly regulated, and the patient data needs to adhere to HIPAA (Health Insurance Portability and Accountability Act). The data transmitted by NTS clients are protected by HTTPS (Hypertext Transfer Protocol Secure) (256 bit) secure encryption. Each NTS client is configured with an IP address and a server port to transmit the data. The ports on NTS clients are enabled only based on connected medical devices. Private keys are needed, protected by PKI (public key infrastructure) to enable remote access protocols, like Secure Shell (SSH). Data stored on the different servers are protected by roles and rights assigned to the users. The servers are facilitated with disaster recovery mechanisms and are protected by firewalls. Each data node is kept on three different data centers to provide replication in case one server crashes.