Incremental Sparse Adaptive PCA for Streaming Industrial Sensor Data
Abstract
1. Introduction
1.1. Problem Statement
- Non-stationarity: Processes exhibit regime shifts and sensor drift; gas sensor arrays undergo aging-induced drift over months, while chemical processes experience abrupt fault transitions [7].
- Streaming constraints: Continuous data arrival necessitates online algorithms with constant memory usage; storing historical batches is infeasible for millions of samples [5].
1.2. Proposed Solution: ISAPCA Framework
- 1.
- Incremental updates via rank-one covariance updates to process new batches with constant memory.
- 2.
- Adaptive forgetting factor () to discard outdated data and track drift.
- 3.
- L1-based sparsity (regularization ) for feature selection and noise suppression.
1.3. Contributions
- 1.
- Incremental sparse adaptive PCA (ISAPCA): We propose ISAPCA, a unified streaming dimensionality reduction framework that integrates exponential forgetting, online Oja–Sanger subspace tracking, and -proximal sparsity within a single algorithm suitable for non-stationary industrial sensor streams.
- 2.
- Streaming-stable sparse subspace learning: The system implements proximal soft-thresholding with QR re-orthonormalization to achieve sparsity, which results in stable interpretable principal components during strict streaming and memory-constrained operations.
- 3.
- The system performs complete benchmarking tests by using industrial data which serves as the test input. We evaluate ISAPCA on three representative IIoT datasets (SmartBuilding, Tennessee Eastman Process, and GasSensor) and benchmark it against online (incremental PCA) and offline upper-bound methods (randomized PCA, sparse PCA, and dictionary learning).
- 4.
- Quantitative analysis of accuracy-interpretability trade-offs: The evaluation of ISAPCA uses the reconstruction error and explained variance ratio (EVR) as well as statistical significance testing and sparsity-normalized metrics to show its ability in capturing variance at levels comparable to dense streaming methods while generating more understandable output than these methods.
- 5.
- TinyML-tailored efficiency: ISAPCA is designed for microcontrollers and edge devices. It operates in time per mini-batch and uses only memory, thereby avoiding the storage required by covariance-based PCA. These resource requirements enable deployment on TinyML platforms with kilobyte-level memory budgets.
2. Literature Review
2.1. Principal Component Analysis for Industrial Monitoring
2.2. Incremental and Adaptive PCA
2.3. Sparse PCA and Structured Representations
2.4. Robust and Low-Rank Decomposition Methods
2.5. Non-Stationary Subspace and Drift-Aware Methods
2.6. TinyML and Edge Computing Constraints
2.7. Summary and Research Gap
- Strict streaming operation with constant memory and single-pass updates;
- Adaptive forgetting to track non-stationary industrial processes;
- Explicit sparsity for interpretability and noise suppression;
- Numerical stability under long-term streaming;
- TinyML compatibility for deployment on resource-constrained edge devices.
3. Methodology
3.1. Problem Formulation
- Maximizes retained variance;
- Adapts to concept drift;
- Enforces sparsity for interpretability;
- Operates with constant memory in streaming settings.
3.2. Adaptive Mean Update with Forgetting
3.3. Online Subspace Tracking
3.4. Sparse Regularization
3.5. Orthonormalization and Stability
3.6. ISAPCA Algorithm
| Algorithm 1: Incremental sparse adaptive PCA (ISAPCA) |
| Require: Components k, learning rate , sparsity , forgetting factor , batches |
| 1: Initialize |
| 2: Initialize using top-k right singular vectors of |
| 3: for to T do |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: |
| 12: end for |
| 13: return |
3.7. Computational Complexity
4. Experimental Set-Up and Evaluation
4.1. Datasets
- SmartBuilding: This is a large-scale building energy dataset containing multi-sensor operational measurements. After removing identifiers and non-numeric fields, continuous variables remain. To ensure runtime and memory safety in the streaming evaluation, the analysis was capped at the first samples while preserving chronological order.
- Tennessee Eastman Process (TEP): This is a benchmark chemical process dataset widely used for fault detection and process monitoring. The dataset contains 55 variables which show intense regime shifts and operational mode transitions and short-term system fluctuations.
- The GasSensor dataset contains high-dimensional metal-oxide gas sensor array data, which consist of 129 numeric variables. The data show both significant sensor drift and strong correlation between sensors, which creates difficulties for real-time subspace tracking operations.
4.2. Streaming Preprocessing Pipeline
- 1.
- Finite-value sanitization: Infinite values were replaced with NaN.
- 2.
- Robust imputation: Missing values were imputed using column-wise batch medians.
- 3.
- Outlier control: Winsorization was applied by clipping each feature to the batch quantiles.
- 4.
- Standardization: A two-pass incremental standardization was performed using StandardScalerwith partial fitting, ensuring zero-mean, unit-variance features without information leakage.
4.3. Baseline Methods
- Incremental PCA (IPCA): A dense online baseline that updates principal components incrementally using mini-batch SVD.
- Randomized PCA (RandPCA): A batch PCA approximation based on randomized SVD [27], evaluated on a fixed subset due to memory constraints.
- Sparse PCA (SparsePCA): A batch sparse PCA method enforcing regularization on loadings.
- Dictionary learning (DictLearn): A sparse coding approach based on online dictionary learning [28], producing sparse latent codes.
4.4. Evaluation Metrics
- The mean squared error (MSE) and mean absolute error (MAE) between the standardized inputs and reconstructions .
- The explained variance ratio (EVR), defined in a bounded reconstruction-based form:
4.5. Statistical Analysis
4.6. Hyperparameters and Implementation Details
5. Experimental Results
5.1. Streaming Convergence of EVR
5.2. Comparative EVR Across Baselines
5.3. Sparsity-Normalized Performance
5.4. Ablation and Efficiency Analysis
5.5. Quantitative Summary
6. Discussion
6.1. Streaming Accuracy and Stability
6.2. Comparison with Batch Upper Bounds
6.3. Sparsity and Interpretability Trade-Offs
6.4. Statistical Significance and Variability
6.5. Practical Implications and Limitations
6.6. Summary
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Shafique, M.; Theocharides, T.; Reddy, V.J.; Murmann, B. TinyML: Current progress, research challenges and future roadmap. In Proceedings of the 58th ACM/IEEE Design Automation Conference (DAC 2021), San Francisco, CA, USA, 6–10 December 2021; IEEE: New York, NY, USA, 2021; pp. 1303–1306. [Google Scholar] [CrossRef]
- Reimringer, W.; Bur, C. Promoting quality in low-cost gas sensor devices for real-world applications. Front. Sens. 2023, 4, 1317533. [Google Scholar] [CrossRef]
- Warden, P. TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
- Velasquez, J.D.; Cadavid, L.; Franco, C.J. Emerging trends and strategic opportunities in tiny machine learning: A comprehensive thematic analysis. Neurocomputing 2025, 648, 130746. [Google Scholar] [CrossRef]
- Yan, E.; Wang, H.; Xia, W. Temporal streaming batch principal component analysis for time series classification (student abstract). Proc. AAAI Conf. Artif. Intell. 2025, 39, 29543–29544. [Google Scholar] [CrossRef]
- Lu, C.; Zeng, J.; Dong, Y.; Xu, X. Streaming variational probabilistic principal component analysis for monitoring of nonstationary process. J. Process Control 2024, 133, 103134. [Google Scholar] [CrossRef]
- Anwer, A.H.; Saadaoui, M.; Mohamed, A.T.; Ahmad, N.; Benamor, A. State-of-the-art advances and challenges in wearable gas sensors for emerging applications: Innovations and future prospects. Chem. Eng. J. 2024, 502, 157899. [Google Scholar] [CrossRef]
- Zhang, J.; Wei, H.; Zhang, K.; Xiao, J.; Hong, X. An efficient multimodal attentional principal component analysis for continual learning-based dynamic process monitoring. Neurocomputing 2025, 611, 128642. [Google Scholar] [CrossRef]
- Zhang, J.; Zhou, D.; Chen, M. Monitoring multimode processes: A modified PCA algorithm with continual learning ability. J. Process Control 2021, 103, 76–86. [Google Scholar] [CrossRef]
- Søndergaard, H.A.N.; Shaker, H.R.; Jørgensen, B.N. Enhanced fault detection in energy systems using individual contextual forgetting factors in recursive principal component analysis. Energy Build. 2024, 324, 114851. [Google Scholar] [CrossRef]
- Migenda, N.; Möller, R.; Schenck, W. Adaptive local principal component analysis improves the clustering of high-dimensional data. Pattern Recognit. 2024, 146, 110030. [Google Scholar] [CrossRef]
- Zhou, Q.; Gao, Q.; Wang, Q.; Yang, M.; Gao, X. Sparse discriminant PCA based on contrastive learning and class-specificity distribution. Neural Netw. 2023, 167, 775–786. [Google Scholar] [CrossRef]
- Bertsimas, D.; Kitane, D.L. Sparse PCA: A geometric approach. J. Mach. Learn. Res. 2023, 24, 1–70. [Google Scholar]
- Wang, T.; Xie, Y.; Jeong, Y.-S.; Jeong, M.K. Dynamic sparse PCA: A dimensional reduction method for sensor data in virtual metrology. Expert Syst. Appl. 2024, 251, 123995. [Google Scholar] [CrossRef]
- Migenda, N. Clustering in High-Dimensional Data Streams with Adaptive Local Principal Component Analysis. 2025. Available online: https://pub.uni-bielefeld.de/download/3006960/3006961/Clustering%20in%20High-Dimensional%20Data%20Streams%20with%20Adaptive%20Local%20Principal%20Component%20Analysis.pdf (accessed on 31 January 2026).
- Kumar, S.; Sarkar, P. Oja’s algorithm for streaming sparse PCA. arXiv 2024, arXiv:2402.07240. [Google Scholar]
- Lee, J.; Cho, H.; Yun, S.-Y.; Yun, C. Fair streaming principal component analysis: Statistical and algorithmic viewpoint. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Nokhwal, S.; Kumar, N. PBES: PCA-based exemplar sampling algorithm for continual learning. arXiv 2023, arXiv:2312.09352. [Google Scholar]
- Bouwmans, T.; Zahzah, E.H. Robust PCA via principal component pursuit: A review for a comparative evaluation in video surveillance. Comput. Vis. Image Underst. 2014, 122, 22–34. [Google Scholar] [CrossRef]
- AlSalehy, A.S.; Bailey, M. Improving time series data quality: Identifying outliers and handling missing values in a multilocation gas and weather dataset. Smart Cities 2025, 8, 82. [Google Scholar] [CrossRef]
- Sundararajan, R.R.; Pipiras, V.; Pourahmadi, M. Stationary subspace analysis of nonstationary covariance processes: Eigenstructure description and testing. arXiv 2019, arXiv:1904.09420. [Google Scholar] [CrossRef]
- Wu, D.; Sheng, L.; Zhou, D.; Chen, M. Dynamic stationary subspace analysis for monitoring nonstationary dynamic processes. Ind. Eng. Chem. Res. 2020, 59, 20787–20797. [Google Scholar] [CrossRef]
- Koo, B.; Anderson, H.M.; Seo, M.H.; Yao, W. High-dimensional predictive regression in the presence of cointegration. J. Econ. 2020, 219, 456–477. [Google Scholar] [CrossRef]
- Yan, H.; Paynabar, K.; Shi, J. Real-time monitoring of high-dimensional functional data streams via spatio-temporal smooth sparse decomposition. Technometrics 2018, 60, 181–197. [Google Scholar] [CrossRef]
- Rafee, A.N.M. Edge-Optimized Machine Learning Models for Real-Time Personalized Health Monitoring on Wearables. Ph.D. Thesis, Brac University, Dhaka, Bangladesh, 2024. [Google Scholar]
- Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biol. 1982, 15, 267–273. [Google Scholar] [CrossRef]
- Halko, N.; Martinsson, P.-G.; Tropp, J.A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 2011, 53, 217–288. [Google Scholar] [CrossRef]
- Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 2010, 11, 19–60. [Google Scholar]
- Le, T.-T.; Abed-Meraim, K.; Trung, N.L.; Hafiane, A. OPIT: A simple but effective method for sparse subspace tracking. IEEE Trans. Signal Process. 2024, 72, 4350–4363. [Google Scholar] [CrossRef]





| Dataset | k | b | ||||
|---|---|---|---|---|---|---|
| SmartBuilding | 3 | 256 | 0.99 | 0.05 | 0.01 | 100,000 |
| TEP | 4 | 256 | 0.95 | 0.05 | 0.005 | 50,000 |
| GasSensor | 5 | 256 | 0.98 | 0.05 | 0.005 | 50,000 |
| Dataset | Variant | EVR | MSE |
|---|---|---|---|
| SmartBuilding | Full model | 0.4935 | 0.2975 |
| No forgetting | 0.4707 | 0.3163 | |
| No sparsity | 0.4977 | 0.3046 | |
| Tennessee Eastman | Full model | 0.3114 | 0.6501 |
| No forgetting | 0.3113 | 0.7003 | |
| No sparsity | 0.3109 | 0.6506 | |
| GasSensor | Full model | 0.8623 | 0.1004 |
| No forgetting | 0.8618 | 0.1025 | |
| No sparsity | 0.8628 | 0.1002 |
| Dataset | Method | Time [ms] | Memory [kB] |
|---|---|---|---|
| SmartBuilding | IPCA | 0.20 | 0.26 |
| ISAPCA | 0.234 | 0.10 | |
| TEP | IPCA | 0.30 | 12.10 |
| ISAPCA | 0.315 | 0.88 | |
| GasSensor | IPCA | 0.60 | 64.00 |
| ISAPCA | 0.606 | 2.60 |
| Dataset | Model | EVR | MSE |
|---|---|---|---|
| SmartBuilding | IPCA (online) | 0.5293 ± 0.0160 | — |
| ISAPCA (online) | 0.4935 ± 0.0218 | 0.2975 ± 0.2142 | |
| RandPCA (subset) | 0.6961 | 0.2660 | |
| SparsePCA (subset) | 0.6961 | 0.2660 | |
| DictLearn (subset) | 0.6216 | 0.3432 | |
| TEP | IPCA (online) | 0.3230 ± 0.0107 | — |
| ISAPCA (online) | 0.3114 ± 0.0103 | 0.6501 ± 0.0035 | |
| RandPCA (subset) | 0.3283 | 0.6595 | |
| SparsePCA (subset) | 0.3283 | 0.6595 | |
| DictLearn (subset) | 0.3283 | 0.6595 | |
| GasSensor | IPCA (online) | 0.8219 ± 0.0418 | — |
| ISAPCA (online) | 0.8623 ± 0.0407 | 0.1004 ± 0.0510 | |
| RandPCA (subset) | 0.9049 | 0.0944 | |
| SparsePCA (subset) | 0.9049 | 0.0944 | |
| DictLearn (subset) | 0.9043 | 0.0949 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Saleh, R.; Villányi, B. Incremental Sparse Adaptive PCA for Streaming Industrial Sensor Data. Telecom 2026, 7, 50. https://doi.org/10.3390/telecom7030050
Saleh R, Villányi B. Incremental Sparse Adaptive PCA for Streaming Industrial Sensor Data. Telecom. 2026; 7(3):50. https://doi.org/10.3390/telecom7030050
Chicago/Turabian StyleSaleh, Rebin, and Balázs Villányi. 2026. "Incremental Sparse Adaptive PCA for Streaming Industrial Sensor Data" Telecom 7, no. 3: 50. https://doi.org/10.3390/telecom7030050
APA StyleSaleh, R., & Villányi, B. (2026). Incremental Sparse Adaptive PCA for Streaming Industrial Sensor Data. Telecom, 7(3), 50. https://doi.org/10.3390/telecom7030050

