SensorAI: A Machine Learning Framework for Sensor Data
Abstract
1. Introduction
2. Materials and Methods
2.1. Background
2.2. Previous Efforts with Tutorials
2.3. CCPS Testbed Data Collection
2.4. ElectricDot
2.5. Design Requirements
- Must use Python.
- Must focus on time-series/sensor data.
- Minimize coding.
- Easy to use.
- Must assist in training and testing.
- Must include metrics.
- Visualization of data and results.
- Must have an accompanying tutorial.
- Must include digital signal processing, classification, clustering, regression, and anomaly detection.
- Must have a Graphical User Interface.
- Core functionality must be directly accessible.
- Must enable smart device and historical data connectivity to models.
2.6. Comparable Frameworks
2.7. Design Part 1: Core Functionality and Tutorial
2.7.1. Digital Signal Processing
2.7.2. Classification
2.7.3. Clustering
2.7.4. Detection
2.7.5. Regression
2.7.6. Utilities
2.7.7. Framework Tutorial
- DSP: wave generation, noise, filters, transforms, decomposition, power spectral density, wavelet analysis, and transforms.
- Classification: decision trees, nearest neighbors, support vector machines, bagging, boosting, others.
- Clustering: hierarchical, k-means, k-shape, density-based, spectral, others.
- Anomaly Detection: outlier vs. novelty detection, isolation forests, local outlier factor, others.
- Regression: generalized linear models, lars, lasso, ridge, elastic nets, nearest neighbors, support vector machines, others.
2.8. Design Part 2: Graphical User Interface and Testbed Integration
2.8.1. Data
2.8.2. Digital Signal Processing
2.8.3. Classification
2.8.4. Clustering
2.8.5. Detection
2.8.6. Regression
2.8.7. Device Connector
2.8.8. Historical Download
2.8.9. Tutorials
3. Results
3.1. Data
3.2. Framework Model Comparisons
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CCPS | Center for Cyber-Physical Systems |
CWT | Continuous Wavelet Transform |
DBSCAN | Density-Based Spatial Clustering of Applications with Noise |
DSP | Digital Signal Processing |
ECU | Electronic Control Unit |
eDot | ElectricDot |
Elastic-Net | Elastic-Net Regularization |
EMD | Empirical Mode Decomposition |
FDIA | False Data Injection |
FFT | Fast-Fourier Transform |
FOC | Field-Orient Control |
GAK | Global Alignment Kernel |
GPU | Graphics Processing Unit |
GUI | Graphical User Interface |
IT | Information Technology |
KNN | K Nearest Neighbor |
LARS | Least Angle Regression Shrinkage |
LASSO | Least Absolute Shrinkage and Selection Operator |
MQTT | Message Queuing Telemetry Transport |
OPTICS | Ordering Points To Identify the Clustering Structure |
OT | Operational Technology |
PCC | Point of Common Coupling |
PCT | Polynomial Chirplet Transform |
RANSAC | Random Sample Consensus |
SCG | Seismocardiography |
SSA | Singular Spectrum Analysis |
SST | SynchroSqueezing Transform |
STFT | Short Time Fourier Transform |
THD | Total Harmonic Distortion |
TS KNN | Time-Series K Nearest Neighbor |
UGA | University of Georgia |
WVD | Wigner–Ville Distribution |
References
- Weerts, H.J.P.; Pechenizkiy, M. Teaching responsible machine learning to engineers. In The Second Teaching Machine Learning and Artificial Intelligence Workshop; PMLR: New York, NY, USA, 2022; pp. 40–45. [Google Scholar]
- Lasi, H.; Fettke, P.; Kemper, H.G.; Feld, T.; Hoffmann, M. Industry 4.0. Bus. Inf. Syst. Eng. 2014, 6, 239–242. [Google Scholar] [CrossRef]
- Lu, Y. The current status and developing trends of industry 4.0: A review. Inf. Syst. Front. 2025, 27, 215–234. [Google Scholar]
- Ambadekar, P.K.; Ambadekar, S.; Choudhari, C.; Patil, S.A.; Gawande, S. Artificial intelligence and its relevance in mechanical engineering from Industry 4.0 perspective. Aust. J. Mech. Eng. 2025, 23, 110–130. [Google Scholar] [CrossRef]
- Selmy, H.A.; Mohamed, H.K.; Medhat, W. A predictive analytics framework for sensor data using time series and deep learning techniques. Neural Comput. Appl. 2024, 36, 6119–6132. [Google Scholar] [CrossRef]
- Putnik, G.D.; Ferreira, L.; Lopes, N.; Putnik, Z. What is a Cyber-Physical System: Definitions and Models Spectrum. FME Trans. 2019, 47, 663–674. [Google Scholar] [CrossRef]
- Chen, Y.; Tu, L. Density-based clustering for real-time stream data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 133–142. [Google Scholar]
- Coshatt, S.J.; Li, Q.; Yang, B.; Wu, S.; Shrivastava, D.; Ye, J.; Song, W.; Zahiri, F. Design of cyber-physical security testbed for multi-stage manufacturing system. In Proceedings of the GLOBECOM 2022-2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 1978–1983. [Google Scholar]
- Yang, H.; Yang, B.; Coshatt, S.; Li, Q.; Hu, K.; Hammond, B.C.; Ye, J.; Parasuraman, R.; Song, W. Real-world Cyber Security Demonstration for Networked Electric Drives. IEEE J. Emerg. Sel. Top. Power Electron. 2025, 13, 4659–4668. [Google Scholar] [CrossRef]
- Microsoft Corporation. Microsoft Azure Machine Learning Studio. 2024. Available online: https://azure.microsoft.com/en-us/products/machine-learning (accessed on 22 September 2025).
- Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”; Morgan Kaufmann: San Francisco, CA, USA, 2016. [Google Scholar]
- LeDell, E.; Poirier, S. H2O AutoML: Scalable Automatic Machine Learning. In Proceedings of the 7th ICML Workshop on Automated Machine Learning (AutoML), Online, 17–18 July 2020. [Google Scholar]
- Hernandez, J.G.; Saini, A.K.; Ghosh, A.; Moore, J.H. The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning. Patterns 2025, 6, 101314. [Google Scholar] [CrossRef] [PubMed]
- Erickson, N.; Mueller, J.; Smola, A.J.; Weil, S.; Chan, J.C.W.; Shmakov, A.; Shchur, O.; Shi, X.; Huang, E.W.; Lorraine, J.; et al. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
- Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.; Blum, M.; Hutter, F. Efficient and robust automated machine learning. Adv. Neural Inf. Process. Syst. 2015, 28, 2962–2970. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
- Tavenard, R.; Faouzi, J.; Vandewiele, G.; Divo, F.; Androz, G.; Holtz, C.; Payne, M.; Yurchak, R.; Rußwurm, M.; Kolar, K.; et al. Tslearn, A Machine Learning Toolkit for Time Series Data. J. Mach. Learn. Res. 2020, 21, 1–6. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Plotly Technologies Inc. Collaborative Data Science; Plotly Technologies Inc.: Montréal, QC, Canada, 2015. [Google Scholar]
- Song, W.; Coshatt, S.; Zhang, Y.; Chen, J. Sensor Data Science and AI Tutorial. 2024. Available online: https://github.com/SensorWebEdu/SensorAI.git (accessed on 27 September 2025).
- Cuturi, M. Fast global alignment kernels. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 929–936. [Google Scholar]
- Löning, M.; Bagnall, A.; Ganesh, S.; Kazakov, V.; Lines, J.; Király, F.J. sktime: A unified interface for machine learning with time series. arXiv 2019, arXiv:1909.07872. [Google Scholar] [CrossRef]
Module File Name | Module Information |
---|---|
classification.py | Classification, Supervised Anomaly Detection |
clustering.py | Clustering |
detection.py | Unsupervised Anomaly Detection |
dsp.py | Digital Signal Processing |
regression.py | Regression |
utils.py | Plotting and other Miscellaneous functions |
Module | |
---|---|
Classification | classification_slides.pdf, classification_tutorial.ipynb |
Clustering | unsupervised_slides.pdf, unsupervised_tutorial.ipynb |
DSP | dsp_slides.pdf, dsp_tutorial.ipynb |
Regression | regression_slides.pdf, regression_tutorial.ipynb |
Supervised Detection | classification_slides.pdf, classification_tutorial.ipynb |
Unsupervised Detection | unsupervised_slides.pdf, unsupervised_tutorial.ipynb |
Model Type | Average Scores from 10 Runs | |||
---|---|---|---|---|
Accuracy | Precision | Recall | F1 Score | |
Bagging | 0.9989474 | 0.9989864 | 0.9989474 | 0.9989519 |
Random Forest | 0.9968421 | 0.9970276 | 0.9968421 | 0.9968620 |
Extra Trees | 0.9957895 | 0.9959658 | 0.9957895 | 0.9958041 |
AdaBoost | 0.9936842 | 0.9937881 | 0.9936842 | 0.9936572 |
Decision Tree | 0.9926316 | 0.9927936 | 0.9926316 | 0.9926280 |
K Nearest Neighbors (KNNs) | 0.9905264 | 0.9908982 | 0.9905264 | 0.9904392 |
Time-Series KNN | 0.9905264 | 0.9908982 | 0.9905264 | 0.9904392 |
Gradient Boost | 0.9884211 | 0.9899116 | 0.9884210 | 0.9884201 |
Histogram Grad. Boost | 0.9747369 | 0.9759421 | 0.9747369 | 0.9746843 |
Non-Myopic Early | 0.9347368 | 0.9418987 | 0.9347368 | 0.9350837 |
Radius NN | 0.8663157 | 0.8777572 | 0.8663157 | 0.8675915 |
Support Vector Classifier (SVC) | 0.7389473 | 0.7742732 | 0.7389473 | 0.7320654 |
Time-Series SVC | 0.7031580 | 0.7500177 | 0.7031580 | 0.6967196 |
Multilayer Perceptron | 0.6557896 | 0.6283973 | 0.6565679 | 0.6264954 |
Linear Discriminant Analysis | 0.6147367 | 0.4935034 | 0.6147367 | 0.5459568 |
Gaussian Naive-Bayes | 0.6063158 | 0.6692445 | 0.6063158 | 0.5770321 |
Quadratic Discriminant Analysis | 0.5989473 | 0.5934619 | 0.5989473 | 0.5924451 |
Gaussian Process | 0.5852632 | 0.5035906 | 0.5852632 | 0.5129866 |
Nearest Centroid | 0.5421051 | 0.6064616 | 0.5421051 | 0.5019513 |
Passive-Aggressive | 0.5010525 | 0.3211840 | 0.5010525 | 0.3677192 |
Bernoulli Naive-Bayes | 0.4842105 | 0.2359225 | 0.4842105 | 0.3168296 |
Model Type | Standard Deviation of Scores from 10 Runs | |||
---|---|---|---|---|
Accuracy | Precision | Recall | F1 Score | |
Bagging (trees) | 0.0033286 | 0.0032053 | 0.0033286 | 0.0033144 |
Random Forest | 0.0071048 | 0.0066445 | 0.0071048 | 0.0070537 |
Extra Trees | 0.0101694 | 0.0097522 | 0.0101694 | 0.0101381 |
AdaBoost (trees) | 0.0113155 | 0.0111929 | 0.0113155 | 0.0113950 |
Decision Tree | 0.0131754 | 0.0129107 | 0.0131754 | 0.0131879 |
KNN | 0.0195045 | 0.0186991 | 0.0195045 | 0.0198083 |
Time-Series KNN | 0.0195045 | 0.0186991 | 0.0195045 | 0.0198083 |
Gradient Boost | 0.0207285 | 0.0174075 | 0.0207286 | 0.0208220 |
Histogram Grad. Boost | 0.0281578 | 0.0265375 | 0.0281578 | 0.0282982 |
Non-Myopic Early | 0.0429161 | 0.0366547 | 0.0429161 | 0.0423254 |
Radius NN | 0.0732739 | 0.0661957 | 0.0732739 | 0.0717121 |
SVC | 0.0396352 | 0.0312425 | 0.0396352 | 0.0435080 |
Time-Series SVC | 0.0640867 | 0.0721340 | 0.0640867 | 0.0626575 |
Multilayer Perceptron | 0.0543689 | 0.1234110 | 0.0538698 | 0.0796094 |
Linear Discriminant Analysis | 0.0358514 | 0.0298841 | 0.0358514 | 0.0334393 |
Gaussian Naive-Bayes | 0.0484160 | 0.0671592 | 0.0484160 | 0.0472427 |
Quadratic Discriminant Analysis | 0.0395264 | 0.0389216 | 0.0395264 | 0.0387197 |
Gaussian Process | 0.0394486 | 0.0481147 | 0.0394486 | 0.0474366 |
Nearest Centroid | 0.0228744 | 0.0277980 | 0.0228744 | 0.0323248 |
Passive-Aggressive | 0.1406311 | 0.2279550 | 0.1406311 | 0.1838261 |
Bernoulli Naive-Bayes | 0.0403126 | 0.0394275 | 0.0403126 | 0.0441936 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Coshatt, S.; Yang, H.; Wu, S.; Ye, J.; Ma, P.; Song, W. SensorAI: A Machine Learning Framework for Sensor Data. Sensors 2025, 25, 6223. https://doi.org/10.3390/s25196223
Coshatt S, Yang H, Wu S, Ye J, Ma P, Song W. SensorAI: A Machine Learning Framework for Sensor Data. Sensors. 2025; 25(19):6223. https://doi.org/10.3390/s25196223
Chicago/Turabian StyleCoshatt, Stephen, He Yang, Shushan Wu, Jin Ye, Ping Ma, and Wenzhan Song. 2025. "SensorAI: A Machine Learning Framework for Sensor Data" Sensors 25, no. 19: 6223. https://doi.org/10.3390/s25196223
APA StyleCoshatt, S., Yang, H., Wu, S., Ye, J., Ma, P., & Song, W. (2025). SensorAI: A Machine Learning Framework for Sensor Data. Sensors, 25(19), 6223. https://doi.org/10.3390/s25196223