A Software-Defined Sensor System Using Semantic Segmentation for Monitoring Remaining Intravenous Fluids
Abstract
:Highlights
- What are the main findings?
- A smartphone-compatible computer vision system utilizing PSPNet delivers the highly accurate segmentation of IV fluid containers and fluid levels using only image data.
- The proposed system outperforms general-purpose segmentation models, such as the SAM, mainly when dealing with transparent medical containers.
- What are the implications of the main finding?
- Enables low-cost, sensorless, real-time IV fluid monitoring in clinical and home-care settings, reducing nursing workload and improving patient safety.
- Demonstrates the feasibility of software-defined sensing in medical applications, replacing traditional hardware-based IV fluid sensors with deep learning solutions.
Abstract
1. Introduction
2. Related Work
2.1. Traditional IV Fluid Monitoring Methods
2.2. Commercial IV Monitoring Devices
2.3. Computer Vision in Medical Applications
2.4. Advances in Semantic Segmentation
2.5. Gaps and Contributions
- Traditional sensors and commercial monitors do not provide the direct volume estimation of IV fluids.
- Many vision-based methods lack granularity or are constrained by hardware requirements.
- General-purpose segmentation models struggle with the transparency and reflectivity of IV containers.
3. Materials and Methods
3.1. Overview of Semantic Segmentation
3.2. PSPNet Architecture for Fluid Segmentation
- Feature Extraction (Backbone: ResNet-50): PSPNet uses ResNet-50 as its backbone for feature extraction. This version of ResNet was selected as a practical trade-off between segmentation accuracy and computational efficiency, which is essential for real-time deployment on smartphones and embedded devices. Table 1 summarizes the performance comparison between ResNet-50 and a deeper alternative, ResNet-101.
- Pyramid Pooling Module (PPM): The ResNet-generated feature maps are passed into the PPM, which performs pooling at multiple spatial scales (e.g., 1 × 1, 2 × 2, 3 × 3, 6 × 6). These pooled features are upsampled and concatenated with the original map to capture both local details and the global context—critical for segmenting transparent fluids in complex clinical settings.
- Segmentation Prediction: The integrated multi-scale features are processed through convolutional layers and upsampled using bilinear interpolation to match the input resolution. This allows the model to output dense segmentation maps distinguishing two classes: Vessel and Liquid General.
- The need to distinguish transparent and reflective surfaces;
- Handling variations in container size and shape;
- Segmenting fluid that may have weak visual boundaries.
3.3. Model Architecture Components
- Skip Connections (ResNet-50): The model uses a residual architecture with skip connections, allowing the network to preserve both the low-level and high-level spatial features essential for the accurate boundary detection of transparent fluids.
- Dilated Convolutions: Within the ResNet backbone, dilated convolutions expand the receptive field without reducing feature resolution. This enables the network to capture a wider context, helping it to determine fluid regions that lack clear edges or contrast.
- Pyramid Pooling Module (PPM): While standard FCNs lack context aggregation, our model includes a multi-scale pooling module that enriches the feature map with local and global spatial contexts. This improves the model’s ability to handle irregular IV container shapes and occlusions, which are common in real-world clinical scenarios.
- Efficient Upsampling: Instead of the transposed convolutions often used in FCNs, we use bilinear interpolation to upsample feature maps to the original input size. This method is both computationally efficient and less prone to checkerboard artifacts, preserving smooth boundaries between the vessel and fluid regions.
- Clinical Adaptation of Canonical PSPNet: In contrast to the original PSPNet, which uses ResNet-101 and is trained on general-purpose datasets like PASCAL VOC, our model is optimized for domain-specific deployment. We use ResNet-50 to reduce inference time and memory usage for mobile compatibility (as detailed in Section 3.2). We fine-tune PSPNet exclusively on IV-specific data involving transparent fluids, irregular containers, and varied lighting, addressing segmentation challenges not considered in the canonical implementation.
- These architectural choices—dilated residual encoding, pyramid-based context fusion, efficient upsampling, and task-specific training—enable a robust segmentation performance under real-world hospital conditions. The resulting system is both technically optimized and clinically relevant for software-defined sensing.
3.4. System Design and Data Flow
- Vessel (the IV container boundary);
- Liquid general (the fluid region inside).
- Non-contact, low-cost monitoring using standard mobile hardware;
- Flexible deployment either locally at the bedside or via the centralized hospital infrastructure;
- Integration into broader healthcare automation systems such as nurse robots or patient monitoring dashboards.
3.5. Dataset Creation and Implementation Details
3.6. Inference Time Evaluation on Mobile Platforms
4. Evaluation Criteria
4.1. Metrics for Evaluation
- Mean Intersection over Union (mIoU): mIoU is the primary metric for semantic segmentation, calculated as the average IoU across all classes [29]. IoU measures the overlap between the predicted segmentation and the ground truth, as shown in Figure 3. Higher IoU values indicate better alignment. A prediction is considered correct if the IoU exceeds a threshold of 0.5, based on the standards of the PASCAL VOC object recognition challenge and COCO dataset criteria [30].
- Precision and Recall: Precision evaluates the proportion of correctly predicted positive pixels among all predicted positive pixels, while recall assesses the proportion of true positives (TPs) identified among all actual positives [31]. These metrics complement mIoU by providing insight into the balance between false positives (FPs) and false negatives (FNs).
- Validation Loss: The loss values during the validation phase, as shown in Figure 4, track the model’s convergence and overfitting tendencies over training iterations.
4.2. Dataset-Specific Comparison
4.3. Model Performance
4.4. Comparative Analysis with SAM
4.5. Insights from Comparison with SAM
4.6. Limitations and Further Validation
5. Results
5.1. Model Performance Across Datasets
5.2. Validation Loss and Training Stability
5.3. Segmentation Evaluation Metrics
- Mean Intersection over Union (mIoU): measures the overlap between the predicted and ground-truth segmentation, averaged across vessel and fluid classes.
- Dice Coefficient: this provides a region-based overlap measure, calculated as twice the area of overlap divided by the sum of pixels in the prediction and ground-truth masks.
- Pixel-wise F1-Score: this computes the harmonic mean of precision and recall for each class, offering a balanced view of segmentation accuracy.
- Hausdorff Distance (Future Work): Although not included in the current experiments, Hausdorff distance is a useful metric to assess boundary agreement between predicted and ground-truth masks. Future work will incorporate Hausdorff distance analysis to evaluate fine boundary delineation.
5.4. Comparative Analysis with Segment Anything Model (SAM)
5.5. IoU Trends and Precision–Recall Analysis
- The IV-specific model maintained a recall of >90%, effectively detecting fluid regions;
- The LabPics model had a lower precision, leading to false positives in fluid classification.
5.6. Ablation Study
5.7. Fluid Volume Estimation Accuracy and Inference Time Validation
5.8. Robustness Under Varied Lighting Conditions
- Bright daylight (near a window);
- Low ambient light (evening indoor settings);
- Fluorescent overhead lighting (clinical simulation).
6. Discussion
6.1. Interpretation of Findings in Context of Previous Studies
6.2. Practical Implications and Broader Impact
6.3. Limitations and Future Directions
- Dataset Diversity and Generalizability: Although the model was trained on an IV-specific dataset supplemented by online-sourced images, the dataset may not fully represent the diversity of real-world clinical environments. Variations in IV container types, fluid characteristics, lighting conditions, and patient settings could impact generalization performance. Expanding the dataset to include broader clinical scenarios remains a priority.
- Lighting Sensitivity: Preliminary evaluation demonstrated robust segmentation performance (mIoU > 0.90) across various lighting environments, including bright daylight, low ambient light, and clinical fluorescent lighting. Comprehensive controlled experiments across broader lighting variations will be pursued to further enhance deployment robustness.
- Mobile Inference: Although training was conducted on high-end hardware, real-time performance on smartphones or embedded systems was only preliminarily benchmarked and requires further optimization.
- Expanding the IV dataset to include a wider variety of container brands, clinical environments, and patient scenarios to improve generalization and robustness;
- Exploring lightweight models (e.g., BiSeNet V2, SegFormer) for faster inference on mobile devices;
- Evaluating hybrid architectures that combine the SAM’s generalization capability with PSPNet’s domain-specific precision;
- Extending deployment to nurse-assistive robots, bedside monitoring systems, and telehealth platforms.
6.4. Additional Limitations: Lighting Conditions and Clinical Validation
6.5. Integration with Existing Monitoring Systems
6.6. Comparative Analysis with Commercial IV Monitoring Devices
- Comparable or superior estimation performance without requiring physical attachment;
- Significantly lower cost (no additional device cost beyond existing smartphones);
- Higher scalability across diverse clinical environments due to minimal setup and maintenance needs.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cleveland Clinic. IV Fluids. 31 July 2023. Available online: https://my.clevelandclinic.org/health/treatments/21635-iv-fluids (accessed on 4 December 2024).
- Wiseman, M.L.; Poole, S.; Ahlin, A.; Dooley, M.J. Reducing intravenous infusion errors: An observational study of 16,866 patients over five years. J. Pharm. Pract. Res. 2018, 48, 49–55. [Google Scholar] [CrossRef]
- Lee, H.; Pan, Y. The Medical Supplies KIT Design based on the Intravenous Fluid Therapy for Experience of Medical Staff. J. Korea Converg. Soc. 2019, 10, 121–128. [Google Scholar] [CrossRef]
- nurse.org. 6 Nurse AI Robots That Are Changing Healthcare in 2023. 15 May 2023. Available online: https://nurse.org/articles/nurse-robots/ (accessed on 10 November 2024).
- Kim, M. An Observation System of Giving an Injection of Ringer’s Solution. KR Patent 10-2004-0107764, 9 January 2006. [Google Scholar]
- Shin, S. Indicator of Remainder and Necessary Time of Ringer’s Solution. KR Patent 10-2007-0093238, 18 September 2007. [Google Scholar]
- Noh, B.; Kang, S. A System for Alarming the Exhaustion of Ringer Solution. KR Patent 10-2007-0080029, 9 August 2007. [Google Scholar]
- Chen, H. Intravenous Empty Bottle Alarm Device; CN202323649424.4; Hainan People’s Hospital, China National Intellectual Property Administration (CNIPA): Haikou, China, 2024. [Google Scholar]
- Contec Medical Systems Co., Ltd. SP750 Infusion Pump Product Overview. Available online: https://www.contecmedsystem.com/product/contec_sp750_ce_portable_infusion_pump_new_design (accessed on 23 March 2025).
- Zunair, H.; Ben Hamza, A. Masked Supervised Learning for Semantic Segmentation. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 21–24 November 2022. [Google Scholar]
- Wang, S.; Wang, W.; Zheng, Y. Reconfigurable Multimode Microwave Sensor with Resonance and Transmission Sensing Capabilities for Noninvasive Glucose Monitoring. IEEE Trans. Microw. Theory Tech. 2024, 72, 3102–3117. [Google Scholar] [CrossRef]
- Shift Labs. DripAssist. Available online: https://www.shiftlabs.com/ (accessed on 11 March 2025).
- Monidor. Monidrop. Available online: https://monidor.com/en/products/monidor-iv-screen (accessed on 12 March 2025).
- Evelabs. Dripo. Available online: https://evelabs.co/dripo (accessed on 13 March 2025).
- Edge Impulse. IV Drip Fluid-Level Monitoring. 18 June 2022. Available online: https://docs.edgeimpulse.com/experts/prototype-and-concept-projects/iv-drip-fluid-level-monitoring/ (accessed on 6 January 2025).
- Eppel, S.; Xu, H.; Bismuth, M.; Alan Aspuru-Guzik, A. Computer Vision for Recognition of Materials and Vessels in Chemistry Lab Settings and the Vector-LabPics Data Set. ACS Cent. Sci. 2020, 6, 1743–1752. [Google Scholar] [CrossRef] [PubMed]
- Han, X.; Zhou, Y.; Chang, H.; Zhuang, Y.; Zhu, W. Divide-and-Conquer Predictor for Unbiased Scene Graph Generation. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8611–8622. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Chen, Y.; Xia, R.; Zou, K.; Yang, K. FFTI: Image inpainting algorithm via features fusion and two-steps inpainting. J. Vis. Commun. Image Represent. 2023, 91, 103776. [Google Scholar] [CrossRef]
- Chen, Y.; Xia, R.; Yang, K.; Zou, K. DARGS: Image inpainting algorithm via deep attention residuals group and semantics. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101567. [Google Scholar] [CrossRef]
- Chen, Y.; Xia, R.; Yang, K.; Zou, K. MFFN: Image super-resolution via multi-level features fusion network. Vis. Comput. 2023, 40, 489–504. [Google Scholar] [CrossRef]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS’21), Red Hook, NY, USA, 6–14 December 2021. [Google Scholar]
- Ke, L.; Ye, M.; Danellian, M.; Liu, Y.; Tai, Y.; Tang, C.; Yu, F. Segment anything in high quality. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS’23), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Wang, X.; Yang, J.; Darrell, T. Segment Anything without Supervision. arXiv 2024, arXiv:2406.20081. [Google Scholar]
- Cherian, A.; Jain, S.; Marks, T.K. Few-shot Transparent Instance Segmentation for Bin Picking. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 5009–5016. [Google Scholar] [CrossRef]
- Yu, H.; Yang, L.T.; Zhang, Q.; Armstrong, D.; Deen, M.J. Convolutional neural networks for medical image analysis: State-of-the-art, comparisons, improvement and perspectives. Neurocomputing 2021, 444, 92–110. [Google Scholar] [CrossRef]
- Kentaro, W. LabelImg. GitHub Repository. 2023. Available online: https://github.com/wkentaro/labelme (accessed on 1 July 2024).
- Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- V7. Mean Average Precision (mAP) Explained: Everything You Need to Know. 7 March 2022. Available online: https://www.v7labs.com/blog/mean-average-precision (accessed on 23 October 2024).
Backbone | Vessel mIoU (%) | Liquid mIoU (%) | Average mIoU (%) | Frame Per Second | Memory Usage (MB) | Improvement (Average mIoU) (%) |
---|---|---|---|---|---|---|
Resnet-50 | 93.24 | 91.67 | 92.46 | 12 | 1420 | - |
Resnet-101 | 94.81 | 92.31 | 93.56 | 8 | 1950 | +1.19 |
Dataset | Algorithm | Number of Data | |||
---|---|---|---|---|---|
Training (60%) | Validation (20%) | Test (20%) | Total (100%) | ||
IV | FCN | 1584 | 526 | 526 | 2636 |
LabPics | FCN | 1313 | 437 | 437 | 2187 |
Mixed | FCN | 2897 | 963 | 963 | 4823 |
Class | IV Dataset mIoU (%) | LabPics Dataset mIoU (%) | Mixed Dataset mIoU (%) | Percent Improvement (IV vs. LabPics) (%) | Percent Improvement (IV vs. Mixed) (%) |
---|---|---|---|---|---|
Vessel | 94.31 | 67.40 | 88.32 | +39.93 | +6.78 |
Liquid General | 92.15 | 65.58 | 86.11 | +40.54 | +7.01 |
Threshold | Vessel mIoU | Liquid mIoU | Average mIoU |
---|---|---|---|
0.0 | 0.3382 | 0.3382 | 0.3382 |
0.1 | 0.6773 | 0.5334 | 0.5816 |
0.2 | 0.6875 | 0.5414 | 0.5904 |
0.3 | 0.6996 | 0.5557 | 0.6069 |
0.4 | 0.7474 | 0.5924 | 0.6457 |
0.5 | 0.7857 | 0.6405 | 0.6889 |
0.6 | 0.8317 | 0.7477 | 0.7757 |
0.7 | 0.8793 | 0.8078 | 0.8256 |
0.8 | 0.9152 | 0.8932 | 0.8862 |
0.9 | 0.9481 | 0.9516 | 0.9451 |
Class | mIoU | Dice Coefficient | Pixel-Wise F1-Score |
---|---|---|---|
Vessel | 0.94 | 0.97 | 0.96 |
Liquid (Fluid) | 0.92 | 0.96 | 0.95 |
Average | 0.93 | 0.965 | 0.955 |
Lighting Condition | Mean IoU |
---|---|
Standard Indoor Lighting | 0.932 |
Bright Daylight | 0.921 |
Low Ambient Light | 0.907 |
Fluorescent Clinical Lighting | 0.929 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sunwoo, H.; Lee, S.; Paik, W. A Software-Defined Sensor System Using Semantic Segmentation for Monitoring Remaining Intravenous Fluids. Sensors 2025, 25, 3082. https://doi.org/10.3390/s25103082
Sunwoo H, Lee S, Paik W. A Software-Defined Sensor System Using Semantic Segmentation for Monitoring Remaining Intravenous Fluids. Sensors. 2025; 25(10):3082. https://doi.org/10.3390/s25103082
Chicago/Turabian StyleSunwoo, Hasik, Seungwoo Lee, and Woojin Paik. 2025. "A Software-Defined Sensor System Using Semantic Segmentation for Monitoring Remaining Intravenous Fluids" Sensors 25, no. 10: 3082. https://doi.org/10.3390/s25103082
APA StyleSunwoo, H., Lee, S., & Paik, W. (2025). A Software-Defined Sensor System Using Semantic Segmentation for Monitoring Remaining Intravenous Fluids. Sensors, 25(10), 3082. https://doi.org/10.3390/s25103082