From Statistical Filtering to Adaptive Reinforcement Learning: A Progressive Framework for IoT Time-Series Anomaly Detection
Abstract
1. Introduction
- A systematic evaluation of classical statistical filters for IoT anomaly detection under diverse and controlled anomaly scenarios.
- The development of a reinforcement learning (RL) environment that enables adaptive filter selection using reward functions based on detection accuracy and false negative penalization.
- An extended RL framework that jointly optimizes filter selection and parameter configuration.
- The development of a multi-agent reinforcement learning (MARL) framework enabling scenario-specific policy specialization in distributed IoT environments.
- A comprehensive comparison between statistical filtering, single-agent RL, and MARL approaches.
- The design of synthetic datasets with non-stationary characteristics for multi-node evaluation, together with a MARL formulation based on implicit specialization and global reward mechanisms without direct agent-to-agent communication.
2. Background and Related Work
2.1. Background
2.2. Related Works
3. Materials and Methods
3.1. Problem Definition
- Filter selection only:
- Filter and parameter configuration:
- High overall correctness (accuracy).
- Strong penalty for missed anomalies (false negatives), which are often more costly in monitoring contexts.
- Accuracy is computed from true positives (TPs), which denote correctly detected anomalies, true negatives (TNs), which denote correctly identified normal samples, false positives (FPs), which denote normal samples incorrectly classified as anomalies, false negatives (FNs), which denote anomalies incorrectly classified as normal samples over the full scenario, and FNR, which represents the false negative rate computed over anomalous samples;
- controls the severity of false negative penalization.
3.2. Synthetic Dataset Generation
- represents the baseline environmental temperature signal;
- represents measurement noise produced by the sensor;
- represents injected anomalies simulating sensor faults or disturbances.
- is the initial temperature level;
- represents a small drift coefficient;
- is low-amplitude Gaussian noise representing natural environmental fluctuations.
3.3. Statistical Filtering Baseline
- is the window half-size;
- is the Hampel sensitivity parameter.
- is the IQR scaling factor controlling detection sensitivity.
- is the mean of the series;
- is the standard deviation.
- is the Z-score threshold.
3.4. RL for Adaptive Filter Selection
- Encourage high overall detection accuracy;
- Penalize missed anomalies (false negatives), which are typically more critical than false positives.
- is the learning rate;
- is the discount factor;
- is the next state.
3.5. RL with Parameter Adaptation
- denotes the selected filter;
- represents the parameter set associated with filter .
- is the sliding window size;
- is the MAD scaling factor.
- is the IQR multiplier.
- is the Z-score threshold.
3.6. MARL
- The same set of statistical filters.
- The same parameter configuration space.
- The same reward definition.
3.7. Evaluation Metrics
4. Results
4.1. Dataset and Experimental Setup Overview
4.2. Performance of Statistical Filtering Baseline
4.3. RL with Filter–Parameter Adaptation
- Robust median-based configurations under impulsive noise;
- More sensitive parameterizations in stable environments;
- Conservative thresholds in severely corrupted signals.
4.4. MARL Performance
- Agents operating under impulsive noise favor robust median-based configurations;
- Agents handling stable signals prefer sensitive configurations;
- Agents exposed to severe corruption adopt conservative parameter choices.
- Improved specialization: each agent optimizes for a specific anomaly profile.
- Faster convergence: reduced state-action ambiguity accelerates learning.
- Scalability: new agents can be added for additional scenarios without retraining the entire system.
4.5. Cross-Method Comparative Analysis
4.6. Stress-Test Evaluation for MARL Under Complex and Distributed Conditions
4.7. Real-World Evaluation with RL and MARL
4.7.1. RL and MARL Configuration
4.7.2. Results: Applied Real Dataset
4.8. MARL Scalability Analysis in Distributed IoT Scenarios
4.9. Computational Cost Analysis
4.10. Comparison with Unsupervised Baseline Models
4.11. Sensitivity Analysis of RL/MARL Hyperparameters
5. Discussion
5.1. Interpretation of Statistical vs. Learning-Based Methods
5.2. Global vs. Local Learning Behavior
5.3. Insights from Real-World Evaluation
5.4. Limitations
5.5. Sacalability
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DL | Deep Learning |
| IQR | Inter-Quartile Range |
| IoT | Internet of Things |
| MAD | Median Absolute Deviation |
| MARL | Multi-Agent Reinforcement Learning |
| MDP | Markov Decision Process |
| ML | Machine Learning |
| RF | Random Forrest |
| RL | Reinforcement Learning |
| SVM | Support Vector Machines |
Appendix A. Scenarios 1 to 4
| Parameter | Description | Typical Value |
|---|---|---|
| Initial baseline temperature | 20–22 °C | |
| Sensor noise standard deviation | 0.05–0.2 °C | |
| Spike magnitude | 2–5 °C | |
| Spike occurrence probability | 0.5–2% | |
| Impulsive noise duration | 2–5 samples | |
| Drift coefficient | 0.001–0.01 °C/sample | |
| Length of the flat corrupted segment | 20–50 samples |
| Scenario | Signal Characteristics | Anomaly Type | Typical Magnitude | Duration | Injection Frequency |
|---|---|---|---|---|---|
| Scenario 1: Stable Spikes | Stable baseline temperature with small natural noise | Isolated spikes | ±3–6 °C deviation | 1–2 samples | Rare (≈1–2% of samples) |
| Scenario 2: Impulsive Noise | Stable signal with frequent abrupt disturbances | Dense impulsive noise | ±2–5 °C deviation | 1 sample | Frequent (≈5–10%) |
| Scenario 3: Gradual Drift | Slowly increasing baseline | Drift anomaly | gradual ±3–8 °C shift | Long segments (50–150 samples) | Continuous |
| Scenario 4: Flat Corrupted Segments | Artificially constant temperature | Sensor freeze/flat signal | constant value | 40–120 samples | Occasional |
- 0 denotes normal behavior;
- 1 denotes anomalous behavior.
Appendix B. Scenarios 5 to 7
- 0–200 min: stable signal with sparse spike anomalies;
- 200–400 min: gradual drift;
- 400–600 min: impulsive noise;
- 600–800 min: flat-line corruption;
- 800–1000 min: mixed anomalies (drift/spikes/moderate noise).
- Drift combined with sparse spikes in early segments;
- Increased spike density and amplitude in intermediate segments;
- Flat-line corruption with superimposed impulsive noise;
- Gradual increase in noise variance and anomaly magnitude over time.
- Global events: simultaneous temperature changes affecting multiple nodes;
- Local anomalies: node-specific faults such as drift, spikes, or flat-line corruption.
- Scenario 5 highlights the impact of temporal non-stationarity;
- Scenario 6 introduces ambiguity due to overlapping anomaly patterns;
- Scenario 7 demonstrates the importance of distributed decision-making.
References
- Atzori, L.; Iera, A.; Morabito, G. The Internet of Things: A survey. Comput. Netw. 2010, 54, 2787–2805. [Google Scholar] [CrossRef]
- Zanella, A.; Bui, N.; Castellani, A.; Vangelista, L.; Zorzi, M. Internet of Things for Smart Cities. IEEE Internet Things J. 2014, 1, 22–32. [Google Scholar] [CrossRef]
- Pearson, R.K. Outliers in Process Modeling and Identification. IEEE Trans. Control Syst. Technol. 2002, 10, 55–63. [Google Scholar] [CrossRef] [PubMed]
- Hampel, F.R. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 1974, 69, 383–393. [Google Scholar] [CrossRef]
- Ahmed, M.; Mahmood, A.N.; Hu, J. A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 2016, 60, 19–31. [Google Scholar] [CrossRef]
- Aggarwal, C.C. Outlier Analysis, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
- Chalapathy, S.; Chawla, S. Deep learning for anomaly detection: A survey. arXiv 2019, arXiv:1901.03407. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Ngo, M.V.; Luo, T.; Chaouchi, H.; Quek, T.Q.S. Contextual-Bandit Anomaly Detection for IoT Data in Distributed Hierarchical Edge Computing. In Proceedings of the 40th IEEE International Conference on Distributed Computing Systems (ICDCS), Singapore, 29 November–1 December 2020; pp. 1223–1228. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection: A Survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
- Ruppert, D.; Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; Wiley: Hoboken, NJ, USA, 1986. [Google Scholar]
- Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
- Adhikari, D.; Jiang, W.; Zhan, J.; Rawat, D.B.; Bhattarai, A. Recent Advances in Anomaly Detection in Internet of Things: Status, Challenges and Perspectives. Comput. Sci. Rev. 2024, 54, 100665. [Google Scholar] [CrossRef]
- Hu, Y.; Li, X.; Zhang, L.; Wang, J. IoT-ONDDQN: A Detection Model Based on Deep Reinforcement Learning for IoT Data Security. Comput. Commun. 2025, 241, 108263. [Google Scholar] [CrossRef]
- Wali, S.; Khan, M.I.; Imran, M. Semantic-Aware Reinforcement Learning for Signal Management and Anomaly Detection in IoT Systems. Sci. Rep. 2025, 15, 26500. [Google Scholar] [CrossRef] [PubMed]
- Servin, A.; Kudenko, D. Multi-Agent Reinforcement Learning for Intrusion Detection. In Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning; Springer: Berlin/Heidelberg, Germany, 2008; pp. 211–223. [Google Scholar] [CrossRef]
- Chen, Q.; Zhang, Y.; Li, J. AI-Enabled IoT Security: A Survey on Advances, Challenges, and Future Directions. In Proceedings of the ACM Conference; Association for Computing Machinery: New York, NY, USA, 2025. [Google Scholar] [CrossRef]
- Belay, M.A.; Blakseth, S.S.; Rasheed, A.; Rossi, P.S. Unsupervised Anomaly Detection for IoT-Based Multivariate Time Series: Existing Solutions, Performance Analysis and Future Directions. Sensors 2023, 23, 2844. [Google Scholar] [CrossRef] [PubMed]
- Haque, A.; Chowdhury, N.-U.; Soliman, H.; Hossen, M.S.; Fatima, T.; Ahmed, I. Wireless Sensor Networks Anomaly Detection Using Machine Learning: A Survey. arXiv 2023, arXiv:2303.08823. [Google Scholar] [CrossRef]
- Gueriani, A.; Kheddar, H.; Mazari, A.C. Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey. In Proceedings of the 2023 2nd International Conference on Electronics, Energy and Measurement (IC2EM), Medea, Algeria, 28–30 November 2023; pp. 1–7. [Google Scholar] [CrossRef]
- Agarwal, A.B.; Rajesh, R.; Arul, N. Spatially-Resolved Hyperlocal Weather Prediction and Anomaly Detection Using IoT Sensor Networks and Machine Learning Techniques. arXiv 2023, arXiv:2310.11001. [Google Scholar]
- Garg, S.; Kaur, K.; Kumar, N.; Rodrigues, J.J.P.C. A Multi-Stage Anomaly Detection Scheme for Augmenting Security in IoT-Enabled Applications. Future Gener. Comput. Syst. 2020, 104, 328–342. [Google Scholar] [CrossRef]
- Warden, P.; Situnayake, D. TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
- Shi, W.; Dustdar, S. The Promise of Edge Computing. Computer 2016, 49, 78–81. [Google Scholar] [CrossRef]
- Hoaglin, D.C.; Iglewicz, B.; Tukey, J.W. Performance of some resistant rules for outlier labelling. J. Am. Stat. Assoc. 1986, 81, 991–999. [Google Scholar] [CrossRef]
- Hawkins, D.M. Identification of Outliers; Chapman and Hall: Oxfordshire, UK, 1980. [Google Scholar]
- Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Busoniu, L.; Babuska, R.; De Schutter, B. Multi-Agent Reinforcement Learning; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Lapan, M. Deep Reinforcement Learning Hands-On; Packt Publishing: Birmingham, UK, 2018. [Google Scholar]
- Busoniu, L.; Babuska, R.; De Schutter, B. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man. Cybern. 2008, 38, 156–172. [Google Scholar] [CrossRef]
- Hernandez-Leal, P.; Kartal, B.; Taylor, M.E. A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-Agent Syst. 2019, 33, 750–797. [Google Scholar] [CrossRef]
- Python Software Foundation. Python, version 3.11. Programming Language. Python Software Foundation: Beaverton, OR, USA, 2024. Available online: https://www.python.org (accessed on 2 January 2026).
- Pires, L.M. Adaptive IoT RL Anomaly Detection: Source Code and Experimental Framework. GitHub Repository, 2025. Available online: https://github.com/prof-luispires/iot-anomaly-detection.git (accessed on 25 May 2026).
- DataVic. Sensor Readings with Temperature, Light, Humidity Every 5 Minutes at 8 Locations (2014–2015). Available online: https://discover.data.vic.gov.au/dataset/sensor-readings-with-temperature-light-humidity-every-5-minutes-at-8-locations-trial-2014-2015 (accessed on 21 April 2026).















| Approach Type | Typical Model | Main Advantage | Main Limitation | Difference from This Work |
|---|---|---|---|---|
| Deep RL anomaly detection | DQN/neural RL | High adaptability | Higher computational cost and lower interpretability | This work uses tabular Q-learning with explicit filter–parameter actions |
| RL-based IoT anomaly detection | Single-agent RL | Adaptive policy learning | Often centralized or task-specific | This work focuses on lightweight statistical filter selection |
| MARL intrusion detection | Distributed agents | Decentralized decisions | Mainly cybersecurity-oriented | This work targets sensor time-series anomaly detection |
| Statistical anomaly detection | Hampel/IQR/Z-score | Low cost and interpretable | Fixed thresholds and limited adaptability | This work adds adaptive RL-based selection and tuning |
| Proposed framework | Statistical filters and tabular RL/MARL | Lightweight, interpretable, adaptive | Limited to discrete actions | Designed for edge-oriented IoT deployments |
| Ref. | Approach | Adaptivity | Learning Paradigm | Main Limitation |
|---|---|---|---|---|
| [13] | Survey of statistical and ML methods | Partial | Supervised and Unsupervised | Requires labeled datasets |
| [14] | Deep RL anomaly detection | Yes | Single-agent RL | Complex model architecture |
| [15] | RL-based signal anomaly detection | Yes | Single-agent RL | Focus on communication signals |
| [9] | Contextual RL model selection | Yes | Single-agent RL | Limited parameter adaptation |
| [16] | Distributed MARL intrusion detection | Yes | MARL | Focus on network security |
| [17] | AI-enabled IoT security framework | Yes | MARL | Not focused on time-series anomalies |
| [18] | Unsupervised multivariate IoT anomaly detection | Partial | Autoencoder/DL | High computational complexity |
| [19] | WSN anomaly detection survey | Partial | ML-based | Limited edge deployment discussion |
| [20] | DRL-based IoT intrusion detection survey | Yes | Deep RL | Cybersecurity-oriented focus |
| [21] | Hyperlocal IoT anomaly detection | Partial | Unsupervised ML | Environmental application specificity |
| [22] | Multi-stage IoT anomaly detection | Partial | Hybrid ML | Increased processing overhead |
| This work | Statistical filters, RL and MARL | Yes | Single-agent RL and MARL | Simulation-based validation |
| Scenario | Anomaly Type | Temporal Structure | Main Detection Challenge |
|---|---|---|---|
| 1 | Sporadic spikes | Isolated events | Avoid false positives |
| 2 | Impulsive noise | Dense, irregular | Robustness to noise |
| 3 | Drift and periodic | Progressive patterns | Contextual detection |
| 4 | Flat-line/corruption | Persistent failure | Detect low-variance faults |
| Scenario | Single-Agent Optimal Configuration | Multi-Agent Optimal Configuration |
|---|---|---|
| 0 | IQR (w = 21, k = 1.5) | IQR (w = 21, k = 1.5) |
| 1 | IQR (w = 31, k = 2.0) | IQR (w = 31, k = 2.0) |
| 2 | Z-Score (w = 21, t = 3.0) | Z-Score (w = 21, t = 3.0) |
| 3 | Z-Score (w = 31, t = 2.5) | Z-Score (w = 31, t = 2.5) |
| Scenario | Method | Accuracy | FNR | Reward |
|---|---|---|---|---|
| 0 | Static (single config): IQR (w = 31, k = 2.0) | 0.995 | 0.000 | 0.995 |
| 1 | Static (single config): IQR (w = 31, k = 2.0) | 0.993 | 0.000 | 0.993 |
| 2 | Static (single config): IQR (w = 31, k = 2.0) | 0.995 | 0.000 | 0.995 |
| 3 | Static (single config): IQR (w = 31, k = 2.0) | 0.992 | 0.000 | 0.992 |
| 0 | Static (best per scenario) | 0.995 | 0.000 | 0.995 |
| 1 | Static (best per scenario) | 0.993 | 0.000 | 0.993 |
| 2 | Static (best per scenario) | 1.000 | 0.000 | 1.000 |
| 3 | Static (best per scenario) | 0.996 | 0.000 | 0.996 |
| 0 | RL single-agent (filter and param) | 0.971 | 0.000 | 0.971 |
| 1 | RL single-agent (filter and param) | 0.993 | 0.000 | 0.993 |
| 2 | RL single-agent (filter and param) | 1.000 | 0.000 | 1.000 |
| 3 | RL single-agent (filter and param) | 0.996 | 0.000 | 0.996 |
| 0 | MARL (one agent per scenario) | 0.971 | 0.000 | 0.971 |
| 1 | MARL (one agent per scenario) | 0.993 | 0.000 | 0.993 |
| 2 | MARL (one agent per scenario) | 1.000 | 0.000 | 1.000 |
| 3 | MARL (one agent per scenario) | 0.996 | 0.000 | 0.996 |
| Avg | Static (single config): IQR (w = 31, k = 2.0) | 0.994 | 0.000 | 0.994 |
| Avg | Static (best per scenario) | 0.996 | 0.000 | 0.996 |
| Avg | RL single-agent (filter and param) | 0.990 | 0.000 | 0.990 |
| Avg | MARL (one agent per scenario) | 0.990 | 0.000 | 0.990 |
| Stress-Test Scenario | Method | Accuracy | FN Rate | Reward |
|---|---|---|---|---|
| Scenario 5–Regime switching | RL single-agent (filter param.) | 0.5222 | 0.5724 | 0.2073 |
| Scenario 5–Regime switching | MARL | 0.4831 | 0.5746 | 0.1654 |
| Scenario 6–Overlapping anomalies | RL single-agent (filter param.) | 0.2127 | 0.8340 | −0.3404 |
| Scenario 6–Overlapping anomalies | MARL | 0.1897 | 0.8326 | −0.3601 |
| Scenario 7–Multi-node correlated | RL single-agent (filter param.) | 0.8664 | 0.5178 | 0.5319 |
| Scenario 7–Multi-node correlated | MARL (improved) | 0.8580 | 0.5361 | 0.5526 |
| Average | RL single-agent (filter param.) | 0.5338 | 0.6414 | 0.1329 |
| Average | MARL/MARL (improved in Scenario 7) | 0.5103 | 0.6478 | 0.1193 |
| Anomaly Type | Description | Evaluation Purpose |
|---|---|---|
| Spike | Abrupt isolated deviation | Detect impulsive anomalies |
| Drift | Gradual deviation over time | Detect slow-changing faults |
| Flat | Constant-value segment | Simulate sensor freeze |
| Dropout | Missing/corrupted bursts | Simulate communication loss |
| Periodic | Repeated disturbance | Test robustness to structured noise |
| Component | RL | MARL |
|---|---|---|
| Learning paradigm | Single-agent Q-learning | Multi-agent Q-learning |
| Policy scope | Global | Local (per sensor) |
| State | Sensor scenario | Sensor-specific agent |
| Actions | Filter and parameters | Filter and parameters |
| Filters | Hampel, IQR, Z-score | Hampel, IQR, Z-score |
| Reward | Defined in (2) | Defined in (2) |
| Objective | Global adaptation | Local specialization |
| Sensor | RL Config | MARL Config | RL F1 | MARL F1 | RL Reward | MARL Reward |
|---|---|---|---|---|---|---|
| 501 | ZScore_w21_t3.0 | Z-Score w21_t3.0 | 0.1905 | 0.1905 | 0.2724 | 0.2724 |
| 502 | Hampel_w21_s3.0 | Hampel w21_s3.0 | 0.2919 | 0.2934 | 0.3089 | 0.3100 |
| 505 | IQR_w31_k2.0 | IQR w31_k2.0 | 0.1974 | 0.1974 | 0.2768 | 0.2768 |
| 506 | IQR_w21_k1.5 | IQR w31_k2.0 | 0.1675 | 0.1646 | 0.2617 | 0.2604 |
| 507 | Hampel_w31_s2.5 | Hampel w31_s2.5 | 0.2243 | 0.2233 | 0.2885 | 0.2880 |
| 508 | Hampel_w31_s2.5 | Hampel w31_s2.5 | 0.1454 | 0.1454 | 0.2494 | 0.2494 |
| 509 | IQR_w31_k2.0 | IQR w31_k2.0 | 0.2088 | 0.2116 | 0.2812 | 0.2832 |
| 510 | IQR_w31_k2.0 | IQR w31_k2.0 | 0.1739 | 0.1739 | 0.2691 | 0.2691 |
| 511 | IQR_w31_k2.0 | IQR w31_k2.0 | 0.1451 | 0.1367 | 0.2837 | 0.2825 |
| Method | Accuracy | Precision | Recall | F1-Score | FN Rate | Reward |
|---|---|---|---|---|---|---|
| RL | 0.8949 | 0.7218 | 0.1172 | 0.1938 | 0.8828 | 0.2769 |
| MARL | 0.8921 | 0.6391 | 0.1210 | 0.1930 | 0.8790 | 0.2769 |
| Number of Agents | Accuracy | Precision | Recall | F1-Score | FPR |
|---|---|---|---|---|---|
| 4 | 0.818 | 0.282 | 0.236 | 0.257 | 0.093 |
| 8 | 0.861 | 0.475 | 0.401 | 0.435 | 0.069 |
| 12 | 0.861 | 0.474 | 0.402 | 0.435 | 0.069 |
| Approach | Avg. Runtime (ms) | Peak Memory (MB) | Training Time (s) | Inference Cost (ms) |
|---|---|---|---|---|
| Hampel | 4.10 | ~0.30 | 0 | 4.10 |
| IQR | 5.39 | ~0.32 | 0 | 5.39 |
| Z-score | 4.40 | ~0.28 | 0 | 4.40 |
| RL (filter selection) | 1366.25 | ~8–12 | 1.37 | ~5 |
| MARL (improved) | 14,293.94 | ~25–40 | 14.29 | ~10 |
| Scenario/Dataset | Method | Accuracy | Precision | Recall | F1-Score | FPR | FNR |
|---|---|---|---|---|---|---|---|
| S5-Regime Switching | Isolation Forest | 0.625 | 0.356 | 0.626 | 0.454 | 0.317 | 0.374 |
| S5-Regime Switching | OC-SVM | 0.737 | 0.509 | 0.909 | 0.652 | 0.262 | 0.091 |
| S6-Overlap Severity | Isolation Forest | 0.810 | 0.489 | 0.536 | 0.511 | 0.147 | 0.464 |
| S6-Overlap Severity | OC-SVM | 0.789 | 0.437 | 0.531 | 0.480 | 0.175 | 0.469 |
| S7-Multi-node Correlated | Isolation Forest | 0.829 | 0.406 | 0.425 | 0.415 | 0.089 | 0.575 |
| S7-Multi-node Correlated | OC-SVM | 0.846 | 0.499 | 0.563 | 0.529 | 0.099 | 0.437 |
| Real-world sensors (average) | Isolation Forest | 0.850 | 0.331 | 0.295 | 0.312 | 0.084 | 0.675 |
| Real-world sensors (average) | OC-SVM | 0.845 | 0.319 | 0.320 | 0.319 | 0.089 | 0.624 |
| Hyperparameter Configuration | α | γ | ε | FN Penalty | Accuracy | Precision | Recall | F1-Score | FPR |
|---|---|---|---|---|---|---|---|---|---|
| Conservative learning configuration | 0.05 | 0.80 | 0.20 | 1.0 | 0.853 | 0.632 | 0.139 | 0.204 | 0.014 |
| Baseline configuration | 0.10 | 0.90 | 0.10 | 1.5 | 0.853 | 0.632 | 0.139 | 0.204 | 0.014 |
| Aggressive adaptation configuration | 0.20 | 0.95 | 0.05 | 2.0 | 0.853 | 0.632 | 0.139 | 0.204 | 0.014 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Pires, L.M.; Fialho, V. From Statistical Filtering to Adaptive Reinforcement Learning: A Progressive Framework for IoT Time-Series Anomaly Detection. Appl. Sci. 2026, 16, 5608. https://doi.org/10.3390/app16115608
Pires LM, Fialho V. From Statistical Filtering to Adaptive Reinforcement Learning: A Progressive Framework for IoT Time-Series Anomaly Detection. Applied Sciences. 2026; 16(11):5608. https://doi.org/10.3390/app16115608
Chicago/Turabian StylePires, Luis Miguel, and Vitor Fialho. 2026. "From Statistical Filtering to Adaptive Reinforcement Learning: A Progressive Framework for IoT Time-Series Anomaly Detection" Applied Sciences 16, no. 11: 5608. https://doi.org/10.3390/app16115608
APA StylePires, L. M., & Fialho, V. (2026). From Statistical Filtering to Adaptive Reinforcement Learning: A Progressive Framework for IoT Time-Series Anomaly Detection. Applied Sciences, 16(11), 5608. https://doi.org/10.3390/app16115608

