Saved Queries

The continuous functioning of any industrial manufacturing facility, especially critical infrastructures, has become crucial in the current multi risk context. Monitoring and detection of anomalies carries multiple significant practical benefits that are direct Industry 4.0 goals, and some of them improve resiliency and sustainability—implicit targets of Industry 5.0. For this reason, the current paper explores the usage of feedforward autoencoder neural networks for anomaly detection. The proposed approach is designed to capture deviations in the overall operational behavior of a plant, enabling system-wide monitoring rather than being constrained to the identification of specific, predefined fault scenarios. The obtained autoencoder was subject to further experimental testing on synthetic data, and a direct comparison with five other anomaly detection methods (Z-Score, Interquartile Range, Isolation Forest, One-Class Support Vector Machines, and Local Outlier Factor) proved superior performance from the autoencoder in terms of precision, recall, and F1 score. The foreseen case study was focused on data from a real drinking water treatment plant. Full article

►▼ Show Figures

Figure 1

27 pages, 2170 KB

Open AccessArticle

What Knowledge Transfers in Tabular Anomaly Detection? A Teacher–Student Distillation Analysis

by Tea Krčmar, Dina Šabanović, Miljenko Švarcmajer and Ivica Lukić

Mach. Learn. Knowl. Extr. 2026, 8(3), 60; https://doi.org/10.3390/make8030060 - 3 Mar 2026

Abstract

Anomaly detection on tabular data is widely used in fraud detection, predictive maintenance, and medical screening. While heterogeneous ensembles combining multiple detection paradigms achieve strong performance, their computational cost limits deployment in latency-sensitive or resource-constrained environments. We propose KD-AnomalyNet, a teacher–student framework that distills anomaly knowledge from a high-capacity ensemble into a lightweight neural model for efficient inference. Beyond performance replication, we study how anomaly representations transfer during distillation. To this end, we introduce a noise perturbation analysis that serves as a diagnostic probe for representation stability without introducing additional trainable components. Experiments on ten benchmark datasets show that the distilled model preserves up to 98.5% of the teacher’s AUC-ROC on the nine capacity-sufficient datasets (84.7% mean retention across all ten datasets) while achieving 26–181× inference speedups. Our analysis reveals which forms of anomaly knowledge transfer reliably—global outliers (78% transfer) and isolation-based detection (88% retention)—and which degrade under compression—local outliers (20% transfer) and neighborhood-based detection (76% retention)—providing practical guidance for deploying distilled anomaly detectors. Full article

►▼ Show Figures

Figure 1

22 pages, 9538 KB

Open AccessArticle

A Comprehensive Cleaning Method for Outliers in Wind Turbine Power Curves Based on the Quartile Method and Segmented Regression Detection Method

by Xiaolong Shang, Yelong Wei, Dongxing Wan, Peng Yuan, Gang An, Yulong Ma, Shoutu Li and Fuai Yang

Energies 2026, 19(5), 1161; https://doi.org/10.3390/en19051161 - 26 Feb 2026

Viewed by 103

Abstract

The actual power curve of a wind turbine is essential for performance evaluation and operational optimization. However, SCADA data frequently contain various abnormal data points that limit their direct and effective use. Existing methods often fail to provide high-quality data for accurate power-curve fitting. Therefore, this paper proposes a comprehensive outlier cleaning method (QRD). This method incorporates the operational mechanisms of wind turbines and establishes preprocessing rules to effectively remove extreme outliers and bottom horizontal accumulation exhibiting distinct numerical characteristics. By leveraging the data distribution features in pitch angle–power and wind speed–power relationships, it implements horizontal and vertical quartile methods to eliminate mid-level accumulation and discrete outliers. A segmented regression-based outlier detection method with metrics adaptive to the power-curve distribution characteristics is proposed to clean residual outliers. Comparative results demonstrate that, relative to the Bins, CPQ, CIF, and TTLOF methods, the QRD method achieves a cleaning speed of 0.152 s per 10,000 data points, improving the average dispersion difference by 32.94%, 11.74%, 13.05%, and 9.67%, respectively. In terms of power-curve fitting accuracy, the average NMAE decreases by 8.65%, 5.07%, 7.57%, and 4.06%, while the average NRMSE decreases by 10.78%, 7.99%, 7.66%, and 5.16% and R² increases by 1.74%, 1.62%, 1.57%, and 1.03%, respectively. Overall, QRD demonstrates superior efficiency and accuracy in identifying abnormal wind power values, providing reliable support for high-quality power-curve modeling. Full article

(This article belongs to the Section A3: Wind, Wave and Tidal Energy)

►▼ Show Figures

Figure 1

24 pages, 5456 KB

Open AccessArticle

A Study of Typical P-AEB Test Scenarios Based on Accident Data

by Yajun Luo, Zhenfei Zhan, Qing Mao and Zhenxing Yi

World Electr. Veh. J. 2026, 17(3), 114; https://doi.org/10.3390/wevj17030114 - 26 Feb 2026

Viewed by 166

Abstract

A large number of vulnerable road users such as pedestrians continue to be injured or killed in road accidents every year, and active safety systems such as automatic emergency braking systems are expected to improve the situation. However, automatic emergency braking systems for pedestrians have been tested in a variety of real-world scenarios. The purpose of this paper is to obtain typical P-AEB test scenarios that can reflect the real and collision scenarios through real pedestrian–vehicle crash data. By using the k-means clustering algorithm based on local outlier detection, the intersection data and the straight-road data are clustered and analyzed separately, with five types of typical P-AEB straight-road test scenarios and seven types of typical P-AEB intersection test scenarios. By comparing with the existing test protocols, the test scenarios proposed in this paper have good coverage and authenticity, and can play a guiding role in the construction of specific P-AEB system test scenarios. Full article

(This article belongs to the Section Vehicle and Transportation Systems)

►▼ Show Figures

Figure 1

16 pages, 2082 KB

Open AccessArticle

MFF-AE: Enhanced Quality Control for Proteomics Mass Spectrometry Data via Multi-Scale Feature Fusion

by Guangkui Fan, Xinyu Ji, Hunyue Liao, Bo Meng, Duotao Pan, Jinze Huang and Yang Zhao

Int. J. Mol. Sci. 2026, 27(5), 2121; https://doi.org/10.3390/ijms27052121 - 25 Feb 2026

Viewed by 143

Abstract

Mass spectrometry (MS) is a core analytical tool in proteomics, and the quality of the generated data directly determines the effectiveness of downstream analyses and the reliability of final research conclusions. While MS is also widely used in other omics applications, this study focuses on label-free quantitative proteomics, where samples are represented as protein-abundance matrices derived from MaxQuant. However, MS data are typically characterized by high dimensionality and substantial noise, posing serious challenges for quality control (QC). Existing QC methods have limited feature extraction capabilities and struggled to capture the key information embedded in the data, resulting in poor performance in identifying anomalous samples. Here, we propose the Multi-Scale Feature Fusion-based Autoencoder (MFF-AE). This deep learning-based anomaly detection model achieves precise identification of anomalous samples by integrating both global and local data features. The model consists of three modules: an autoencoder-based backbone network that efficiently embeds raw data into a low-dimensional semantic space, a local feature extraction and fusion module designed to capture and integrate multi-scale features within MS data, and a sample identification module that enhances discriminative representations to enable accurate anomaly detection. To evaluate the effectiveness of the proposed model, we conduct extensive experiments on a benchmark dataset with synthesized anomalies. Quantitative results on the benchmark dataset show that, compared with 15 baseline models from statistical learning, deep learning, and ensemble learning, our model consistently achieves the best performance across key metrics. Furthermore, through linear relationship analysis on real-world clinical datasets, the exclusion of outlier samples significantly increased the statistical significance and fold change in the identified differential proteins. Overall, the proposed model establishes a solid data foundation, paving the way for downstream mechanistic studies and target discovery. Full article

(This article belongs to the Special Issue Molecular Application of Mass Spectrometry and Chromatography in Biomedicine: 2nd Edition)

►▼ Show Figures

Figure 1

24 pages, 5977 KB

Open AccessArticle

Dam Deformation Prediction Based on MHA-BiGRU Framework Enhanced by CEEMD–iForest Outlier Detection

by Jinji Xie, Yuan Shao, Junzhuo Li, Zihao Jia, Chunjiang Fu, Bo Chen, Cong Ma and Sen Zheng

Water 2026, 18(4), 516; https://doi.org/10.3390/w18040516 - 21 Feb 2026

Viewed by 386

Abstract

Notably, one of the key points to address low accuracy and delayed responsiveness of dam deformation prediction models lies in the timely detection of the outliers caused by environmental disturbances, sensor failures, or operational anomalies of dam monitoring sequences. Therefore, our work offers an unambiguous method for overcoming this challenge. In this paper, a robust prediction framework that integrates Complete Ensemble Empirical Mode Decomposition (CEEMD) and Isolation Forest (iForest) for effective outlier detection, followed by a Multi-Head Attention Bidirectional Gated Recurrent Unit (MHA-BiGRU) model for dam deformation prediction, is presented. The original deformation time series is first decomposed using CEEMD into a set of intrinsic mode functions (IMFs). This decomposition separates the series into trend-related components and noise components. Subsequently, the iForest algorithm is applied in outlier detection for noise components. Then, the BiGRU model is enhanced with an MHA mechanism to give more weight to the features that affect the sequences of monitoring dam deformation. By enabling the proposed model to focus on the key factors affecting dam deformation, the accuracy of the prediction results has been enhanced. Finally, a case study introducing monitoring data from a practical project in China demonstrates the performance of the proposed method. The proposed MHA-BiGRU model demonstrates superior performance across all tested scenarios. Notably, the coefficient of determination is consistently maintained above 0.98, peaking at 0.9880. In terms of error control, the model exhibits a maximum mean absolute error of 0.1789, thereby substantiating its exceptional prediction accuracy and robustness. In comparison with classical time series forecasting models, including LSTM, GRU and BiGRU, the proposed approach demonstrates enhanced robustness and delivers greater prediction accuracy. The findings provide a promising reference framework for dam structural characteristics prediction in similar projects. Full article

(This article belongs to the Special Issue Hydraulic Engineering Applications of Artificial Intelligence, Deep Learning, and Digital Twin Technology)

►▼ Show Figures

Figure 1

14 pages, 2998 KB

Open AccessArticle

Clinical Validation of rPPG-Enabled Contactless Pulse Rate Monitoring Software in Cardiovascular Disease Patients

by Jing Wei Chin, Po Him David Chan, Shutao Chen, Chun Hong Cheng, Richard H. Y. So, Elaine Chow, Benny S. P. Fok and Kwan Long Wong

Bioengineering 2026, 13(2), 246; https://doi.org/10.3390/bioengineering13020246 - 20 Feb 2026

Viewed by 356

Abstract

Background: Cardiovascular disease (CVD) is the leading cause of mortality worldwide, creating demand for continuous, unobtrusive monitoring solutions. This clinical validation evaluates the accuracy of remote photoplethysmography (rPPG), a contactless method using camera video, for measuring pulse rate (PR) in patients with CVD. Methods: We enrolled 50 adults with confirmed CVD at a clinical trial center. In a 6 min rested session, synchronized facial video (under controlled lighting), electrocardiogram (ECG), and photoplethysmography (PPG) signals were recorded. PR was derived from 25 s video segments using rPPG-enabled software and compared to ECG-derived PR via regression and Bland–Altman analysis. Results: Data from 47 participants (n = 817 samples) were analyzed. rPPG-derived PR showed strong agreement with ECG, with a mean absolute error of 1.061 bpm, root-mean-squared error of 2.845 bpm, and Pearson correlation of 0.962. Mixed-effects regression analyses (after 2% outlier removal, n = 782) indicated minimal influence from demographic, environmental, or CVD factors on accuracy. PPG-ECG discrepancies reflected inherent methodological differences. Conclusion: The rPPG method provides accurate, contactless PR monitoring in CVD patients, supporting its potential for remote patient monitoring and early deterioration detection. Future work will validate rPPG for irregular rhythms, additional vital signs, and diverse cohorts to strengthen clinical robustness for cardiometabolic risk assessment. Full article

(This article belongs to the Special Issue Contactless Technologies for Patient Health Monitoring)

►▼ Show Figures

Figure 1

27 pages, 4075 KB

Open AccessArticle

Outlier Detection in Functional Data Using Adjusted Outlyingness

by Zhenghui Feng, Xiaodan Hong, Yingxing Li, Xiaofei Song and Ketao Zhang

Entropy 2026, 28(2), 233; https://doi.org/10.3390/e28020233 - 16 Feb 2026

Viewed by 243

Abstract

In signal processing and information analysis, the detection and identification of anomalies present in signals constitute a critical research focus. Accurately discerning these deviations using probabilistic, statistical, and information-theoretic methods is essential for ensuring data integrity and supporting reliable downstream analysis. Outlier detection in functional data aims to identify curves or trajectories that deviate significantly from the dominant pattern—a process vital for data cleaning and the discovery of anomalous events. This task is challenging due to the intrinsic infinite dimensionality of functional data, where outliers often appear as subtle shape deformations that are difficult to detect. Moving beyond conventional approaches that discretize curves into multivariate vectors, we introduce a novel framework that projects functional data into a low-dimensional space of meaningful features. This is achieved via a tailored weighting scheme designed to preserve essential curve variations. We then incorporate the Mahalanobis distance to detect directional outlyingness under non-Gaussian assumptions through a robustified bootstrap resampling method with data-driven threshold determination. Simulation studies validated its superior performance, demonstrating higher true positive and lower false positive rates across diverse anomaly types, including magnitude, shape-isolated, shape-persistent, and mixed outliers. The practical utility of our approach was further confirmed through applications in environmental monitoring using seawater spectral data, character trajectory analysis, and population data underscoring its cross-domain versatility. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

►▼ Show Figures

Figure 1

21 pages, 1511 KB

Open AccessArticle

SKNet-GAT: A Novel Multi-Source Data Fusion Approach for Distribution Network State Estimation

by Huijia Liu, Chengkai Yin and Sheng Ye

Energies 2026, 19(4), 1012; https://doi.org/10.3390/en19041012 - 14 Feb 2026

Viewed by 166

Abstract

This paper tackles the growing uncertainty in distribution networks caused by distributed generation, load fluctuations, and frequent topological changes. It proposes a multi-source data fusion framework using enhanced selective convolution (SKNet) and graph attention networks (GAT). First, heterogeneous measurement data, including Phasor Measurement Unit (PMU) and Supervisory Control and Data Acquisition (SCADA) data, are processed through a unified normalization and outlier elimination technique to ensure data quality. Second, SKNet is utilized to extract spatiotemporal multi-scale features, improving the detection of both rapid disturbances and long-term trends. Third, the extracted features are fed into GAT to model node electrical couplings, while power flow residual constraints are embedded in the loss function to enforce the physical validity of the estimated states. This physics-informed design overcomes a key limitation of pure data-driven models and enables an end-to-end framework that integrates data-driven learning with physical mechanism constraints. Finally, comprehensive validation is performed on the improved IEEE 33-node and IEEE 123-node test systems. The test scenarios include Gaussian measurement noise, data outliers, missing measurements, and topological changes. The results show that the proposed method outperforms baseline models such as Multi-Scale Graph Attention Network (MS-GAT), Bidirectional Long Short-Term Memory (BiLSTM), and traditional weighted least squares (WLS). It achieves Root Mean Square Error (RMSE) reductions of up to 18% and Mean Absolute Error (MAE) reductions of up to 15%. The average inference latency is only 10–18 ms. Even under unknown topological changes, the estimation error increases by only 15–25%. These results demonstrate the superior accuracy, robustness, and real-time performance of the proposed method for intelligent distribution network state estimation. Full article

(This article belongs to the Topic AI and Computational Methods for Modelling, Simulations and Optimizing of Advanced Systems: Innovations in Complexity, 2nd Edition)

►▼ Show Figures

Figure 1

28 pages, 14898 KB

Open AccessArticle

Deep Learning for Classification of Internal Defects in Fused Filament Fabrication Using Optical Coherence Tomography

by Valentin Lang, Qichen Zhu, Malgorzata Kopycinska-Müller and Steffen Ihlenfeldt

Appl. Syst. Innov. 2026, 9(2), 42; https://doi.org/10.3390/asi9020042 - 14 Feb 2026

Viewed by 365

Abstract

Additive manufacturing is increasingly adopted for the industrial production of small series of functional components, particularly in thermoplastic strand extrusion processes such as Fused Filament Fabrication. This transition relies on technological advances addressing key process limitations, including dimensional instability, weak interlayer bonding, extrusion defects, moisture sensitivity, and insufficient melting. Process monitoring therefore focuses on early defect detection to minimize failed builds and costs, while ultimately enabling process optimization and adaptive control to mitigate defects during fabrication. For this purpose, a data processing pipeline for monitoring Optical Coherence Tomography images acquired in Fused Filament Fabrication is introduced. Convolutional neural networks are used for the automatic classification of tomographic cross-sections. A dataset of tomographic images passes semi-automatic labeling, preprocessing, model training and evaluation. A sliding window detects outlier regions in the tomographic cross-sections, while masks suppress peripheral noise, enabling label generation based on outlier ratios. Data are split into training, validation, and test sets using block-based partitioning to limit leakage. The classification model employs a ResNet-V2 architecture with BottleneckV2 modules. Hyperparameters are optimized, with N = 2, K = 2, dropout 0.5, and learning rate 0.001 yielding best performance. The model achieves 0.9446 accuracy and outperforms EfficientNet-B0 and VGG16 in accuracy and efficiency. Full article

(This article belongs to the Special Issue AI-Driven Decision Support for Systemic Innovation)

►▼ Show Figures

Figure 1

28 pages, 4067 KB

Open AccessArticle

Machine Learning Forecasting of Strong Subsequent Events in New Zealand Using the NESTORE Algorithm

by Letizia Caravella and Stefania Gentili

Forecasting 2026, 8(1), 16; https://doi.org/10.3390/forecast8010016 - 12 Feb 2026

Viewed by 403

Abstract

New Zealand, located along the boundary between the Pacific and Australian plates, is among the most seismically active regions in the world. In such an area, reliable short-term forecasting of strong aftershocks is essential for seismic risk mitigation. In this study, we apply NESTORE (NExt STrOng Related Earthquake), a machine learning probabilistic forecasting algorithm, to the New Zealand earthquake catalogue to evaluate the probability that a mainshock of magnitude M_m will be followed by an event of magnitude ≥ M_m − 1 within a defined space–time window. NESTORE uses nine features describing early post-mainshock seismicity and outputs the probability that a cluster is Type A (i.e., containing a strong aftershock) or not (Type B). We assess performance using two testing strategies: chronological training–testing splits and k-fold cross-validation and refine the training set using the REPENESE outlier-detection procedure. The k-fold approach proves more robust than the chronological one, despite changes in catalogue characteristics over time. Eighteen hours after the mainshock, NESTORE correctly classified 88% of clusters (75% for Type A and 92% for Type B; Precision = 0.75). Notably, the highly destructive 2010–2011 Canterbury–Christchurch sequence was correctly identified as Type A. These findings support the applicability of NESTORE for short-term aftershock forecasting in New Zealand. Full article

(This article belongs to the Special Issue Feature Papers of Forecasting 2025)

►▼ Show Figures

Figure 1

28 pages, 15959 KB

Open AccessArticle

A Proof of Concept for an Agrifood Data Space Based on Open Data and Interoperability

by Cristina Martinez-Ruedas, Adela Pérez-Galvín and Rafael Linares-Burgos

Appl. Sci. 2026, 16(4), 1831; https://doi.org/10.3390/app16041831 - 12 Feb 2026

Viewed by 236

Abstract

The creation of unified, open, secure, reliable, and agile data spaces is essential for collecting, storing, and sharing data in a standardized and accessible manner, promoting data reuse and addressing current interoperability limitations. In this context, this research presents a proof of concept for a unified agronomic data space based on the structured integration of heterogeneous open data sources. The central hypothesis is that the automated acquisition, preprocessing, and harmonization of publicly available agronomic data can significantly improve accessibility, usability, and interoperability for agricultural decision support applications. To this end, a comprehensive analysis of relevant open data sources was conducted, followed by the design and implementation of configurable algorithms for automated data downloading, cleaning, validation, and integration. The proposed approach explicitly addresses key challenges such as heterogeneous data formats, inconsistent spatial and temporal resolutions, missing values, and outlier detection. As a result, a unified access point was developed, providing reliable agronomic information, including (i) preprocessed climatological time series, (ii) crop and phytosanitary data, (iii) high-resolution aerial orthophotography, (iv) remote-sensing imagery, (v) pest-related information, and (vi) time series of major vegetation indices. The proof of concept was implemented for olive groves in the Andalusian region of Spain; however, the methodology is fully transferable to other crops, regions, and institutional contexts where comparable open data sources are available. The results demonstrate the potential of shared agronomic data spaces to enhance data reuse, support scalable analytics, and facilitate interoperable, data-driven agricultural management beyond the specific regional case study. Full article

(This article belongs to the Special Issue Sustainable and Smart Agriculture)

►▼ Show Figures

Figure 1

29 pages, 6877 KB

Open AccessArticle

Feature-Enhanced Erroneous Outlier Detection in Hydrological Time Series Using Ensemble Methods

by Banujan Kuhaneswaran, Golam Sorwar, Ali Reza Alaei and Feifei Tong

Water 2026, 18(4), 446; https://doi.org/10.3390/w18040446 - 8 Feb 2026

Viewed by 354

Abstract

Data quality issues in hydrological time series directly affect hydrological modelling applications, including flood forecasting and water resource management. A critical challenge in hydrological monitoring is distinguishing erroneous outliers caused by sensor malfunctions or data transmission errors from natural extreme events such as floods, which exhibit similar statistical characteristics but require opposite treatments in forecasting models. Current detection practices rely on generic algorithms without systematic validation or adaptation to hydrological temporal dependencies, limiting their effectiveness in operational contexts. This study addresses these gaps through a comprehensive framework for detecting erroneous outliers in daily hydrological time series. We engineered 19 features that capture temporal dependencies and hydrological patterns, and reduced them to six key features that capture raw measurements, temporal patterns, and hydrological dynamics. We evaluated 13 detection algorithms across three categories: statistical methods (e.g., Extreme Studentised Deviate and Hampel filter), ML approaches (e.g., Isolation Forest, and Local Outlier Factor), and feature-enhanced variants. Three data-driven ensemble strategies were developed: Accurate (maximising F1-score), Diverse (balancing performance with method diversity), and Fast (prioritising computational efficiency). By injecting controlled outliers into the recorded hydrological data from five-gauge stations (in the Tweed River catchment, Australia), the outlier detection framework was validated. The outcomes showed that the ensemble methods achieved satisfactory F1 scores (0.6–0.9) in detecting the erroneous outliers. Statistical testing also identified the top-performing detection algorithms. The framework developed in this paper provides a validated tool for quality control in hydrological analysis, with potential applications in drought monitoring and flood forecasting systems. Full article

(This article belongs to the Section Hydraulics and Hydrodynamics)

►▼ Show Figures

Figure 1

19 pages, 3681 KB

Open AccessArticle

Location Adaptive Model Predictive Controller for Autonomous Vehicle Path Tracking with Location Drifting

by Jia Xu, Xiang Xu, Xiaoyan Huang, Yuanyuan Wang, Yue Yu and Nan Zhou

Symmetry 2026, 18(2), 307; https://doi.org/10.3390/sym18020307 - 7 Feb 2026

Viewed by 205

Abstract

With the rapid development of autonomous driving, path tracking has emerged as a pivotal research direction. Model predictive control (MPC) has become one of the most prevailing approaches for path tracking, owing to its superior capacity in dealing with multi-constrained control problems and compatibility with the symmetry of vehicle dynamic systems. Nevertheless, conventional MPC suffers from performance degradation in path tracking when vehicle localization drift occurs, referring to the noticeable deviation between sensor-measured position and actual physical position over time, which is mainly induced by sensor noise and outliers. To overcome these limitations and enhance the accuracy and stability of path tracking, this paper presents a location-adaptive model predictive control framework. Specifically, a supervisor is designed to detect localization drift, and a Runge–Kutta-based location estimator is activated to predict the current vehicle state once drift is identified. Furthermore, a linear time-varying MPC is utilized to compute the desired control input for real-time multi-objective optimization. A set of co-simulations based on Simulink and CarSim are conducted to validate the effectiveness of the proposed strategy. Numerical results demonstrate that the presented method outperforms traditional MPC in terms of tracking accuracy and stability under localization drift conditions. Full article

(This article belongs to the Section Computer)

►▼ Show Figures

Figure 1

30 pages, 10053 KB

Open AccessArticle

A Methodological Framework for Incremental Capacity-Based Feature Engineering and Unsupervised Learning Across First-Life and Second-Life Battery Datasets

by Matthew Beatty, Dani Strickland and Pedro Ferreira

Batteries 2026, 12(2), 55; https://doi.org/10.3390/batteries12020055 - 6 Feb 2026

Viewed by 287

Abstract

Accurately assessing battery health across mixed datasets remains a challenge due to differences in chemistry, format, and usage history. This study presents a reproducible framework for preparing battery cycling data using incremental capacity analysis (ICA), with the aim of supporting machine learning (ML) workflows across both first-life and second-life battery datasets. The methodology includes IC curve generation, feature extraction, encoding and scaling, feature reduction, and unsupervised learning exploration. A two-tiered outlier detection system was introduced during preprocessing to flag edge-case samples. Two clustering algorithms, K-means and HDBSCAN, were applied to the engineered feature space to explore patterns in the IC feature space. K-means revealed broad health-related groupings with overlapping boundaries, while HDBSCAN identified finer clusters and flagged additional ambiguous samples as noise. To support interpretation, PCA and t-SNE were used to visualise the feature space in reduced dimensions. Rather than using clustering as a classification tool, the resulting cluster and noise labels are proposed as structure-aware meta-features for supervised learning. The framework accommodates heterogeneous battery datasets and addresses the challenges of integrating data from mixed sources with varying histories and characteristics. These outputs provide a structured foundation for future supervised classification of battery state of health. Full article

(This article belongs to the Special Issue Batteries: 10th Anniversary)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 28.

Go to page 1 2 3 4 5

Search Results (1,363)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI