Machine Learning for Sensor Analytics: A Comprehensive Review and Benchmark of Boosting Algorithms in Healthcare, Environmental, and Energy Applications

Yifan Xie; Sai Pranay Tummala

doi:10.3390/s25237294

and

Department of Decision Sciences and Marketing, Adelphi University, 1 South Avenue, Garden City, NY 11530, USA

^*

Author to whom correspondence should be addressed.

Sensors2025, 25(23), 7294;https://doi.org/10.3390/s25237294
(registering DOI)

This article belongs to the Special Issue Feature Review Papers in Intelligent Sensors

Version Notes

Order Reprints

Abstract

Sensor networks generate high-dimensional temporally dependent data across healthcare, environmental monitoring, and energy management, which demands robust machine learning for reliable forecasting. While gradient boosting methods have emerged as powerful tools for sensor-based regression, systematic evaluation under realistic deployment conditions remains limited. This work provides a comprehensive review and empirical benchmark of boosting algorithms spanning classical methods (AdaBoost and GBM), modern gradient boosting frameworks (XGBoost, LightGBM, and CatBoost), and adaptive extensions for streaming data and hybrid architectures. We conduct rigorous cross-domain evaluation on continuous glucose monitoring, urban air-quality forecasting, and building-energy prediction, assessing not only predictive accuracy but also robustness under sensor degradation, temporal generalization through proper time-series validation, feature-importance stability, and computational efficiency. Our analysis reveals fundamental trade-offs challenging conventional assumptions. Algorithmic sophistication yields diminishing returns when intrinsic predictability collapses due to exogenous forcing. Random cross-validation (CV) systematically overestimates performance through temporal leakage, with magnitudes varying substantially across domains. Calibration drift emerges as the dominant failure mode, causing catastrophic degradation across all the static models regardless of sophistication. Importantly, feature-importance stability does not guarantee predictive reliability. We synthesize the findings into actionable guidelines for algorithm selection, hyperparameter configuration, and deployment strategies while identifying critical open challenges, including uncertainty quantification, physics-informed architectures, and privacy-preserving distributed learning.

Keywords:

XGBoost; LightGBM; CatBoost; time-series forecasting; temporal robustness; IoT sensor analytics; healthcare monitoring

Machine Learning for Sensor Analytics: A Comprehensive Review and Benchmark of Boosting Algorithms in Healthcare, Environmental, and Energy Applications

Abstract

Article Metrics

Citations

Article Access Statistics