MDPI - Publisher of Open Access Journals

21 pages, 533 KiB

Open AccessArticle

Angle-Based Dual-Association Evolutionary Algorithm for Many-Objective Optimization

by Xinzi Wang, Huimin Wang, Zhen Tian, Wenxiao Wang and Junming Chen

Mathematics 2025, 13(11), 1757; https://doi.org/10.3390/math13111757 - 26 May 2025

Viewed by 362

As the number of objectives increases, the comprehensive processing performance of multi-objective optimization problems significantly declines. To address this challenge, this paper proposes an Angle-based dual-association Evolutionary Algorithm for Many-Objective Optimization (MOEA-AD). The algorithm enhances the exploration capability of unknown regions by associating [...] Read more.

As the number of objectives increases, the comprehensive processing performance of multi-objective optimization problems significantly declines. To address this challenge, this paper proposes an Angle-based dual-association Evolutionary Algorithm for Many-Objective Optimization (MOEA-AD). The algorithm enhances the exploration capability of unknown regions by associating empty subspaces with the solutions of the highest fitness through an angle-based bi-association strategy. Additionally, a novel quality assessment scheme is designed to evaluate the convergence and diversity of solutions, introducing dynamic penalty coefficients to balance the relationship between the two. Adaptive hierarchical sorting of solutions is performed based on the global diversity distribution to ensure the selection of optimal solutions. The performance of MOEA-AD is validated on several classic benchmark problems (with up to 20 objectives) and compared with five state-of-the-art multi-objective evolutionary algorithms. Experimental results demonstrate that the algorithm exhibits significant advantages in both convergence and diversity. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence, Machine Learning and Optimization)

► Show Figures

Figure 1

14 pages, 5623 KiB

Open AccessArticle

Investigation on Traffic Carbon Emission Factor Based on Sensitivity and Uncertainty Analysis

by Jianan Chen, Hao Yu, Haocheng Xu, Qiang Lv, Zongqiang Zhu, Hao Chen, Feiyang Zhao and Wenbin Yu

Energies 2024, 17(7), 1774; https://doi.org/10.3390/en17071774 - 8 Apr 2024

Cited by 3 | Viewed by 1308

Abstract

The premise for formulating effective emission control strategies is to accurately and reasonably evaluate the actual emission level of vehicles. Firstly, the active subspace method is applied to set up a low-dimensional model of the relationship between CO₂ emission and multivariate vehicle [...] Read more.

The premise for formulating effective emission control strategies is to accurately and reasonably evaluate the actual emission level of vehicles. Firstly, the active subspace method is applied to set up a low-dimensional model of the relationship between CO₂ emission and multivariate vehicle driving data, in which the vehicle specific power (VSP) is identified as the most significant factor on the CO₂ emission factor, followed by speed. Additionally, acceleration and exhaust temperature had the least impact. It is inferred that the changes in data sampling transform the establishment of subspace matrices, affecting the calculation of eigenvector components and the fitting of the final quadratic response surface, so that the emission sensitivity and final fitting accuracy are impressionable by the data distribution form. For the VSP, the best fitting result can be obtained when the VSP conforms to a uniform distribution. Moreover, the Bayesian linear regression method accounts for fitting parameters between the VSP and CO₂ emission factor with uncertainties derived from heteroscedastic measurement errors, and the values and distributions of the intercept and slope α and β are obtained. In general, the high-resolution inventory of the carbon emission factor of the tested vehicle is set up via systematically analyzing it, which brings a bright view of data processing in further counting the carbon footprint. Full article

(This article belongs to the Topic Zero Carbon Vehicles and Power Generation)

► Show Figures

Figure 1

20 pages, 1228 KiB

Open AccessArticle

Machine Learning Classification of Event-Related Brain Potentials during a Visual Go/NoGo Task

by Anna Bryniarska, José A. Ramos and Mercedes Fernández

Entropy 2024, 26(3), 220; https://doi.org/10.3390/e26030220 - 29 Feb 2024

Cited by 2 | Viewed by 2211

Abstract

Machine learning (ML) methods are increasingly being applied to analyze biological signals. For example, ML methods have been successfully applied to the human electroencephalogram (EEG) to classify neural signals as pathological or non-pathological and to predict working memory performance in healthy and psychiatric [...] Read more.

Machine learning (ML) methods are increasingly being applied to analyze biological signals. For example, ML methods have been successfully applied to the human electroencephalogram (EEG) to classify neural signals as pathological or non-pathological and to predict working memory performance in healthy and psychiatric patients. ML approaches can quickly process large volumes of data to reveal patterns that may be missed by humans. This study investigated the accuracy of ML methods at classifying the brain’s electrical activity to cognitive events, i.e., event-related brain potentials (ERPs). ERPs are extracted from the ongoing EEG and represent electrical potentials in response to specific events. ERPs were evoked during a visual Go/NoGo task. The Go/NoGo task requires a button press on Go trials and response withholding on NoGo trials. NoGo trials elicit neural activity associated with inhibitory control processes. We compared the accuracy of six ML algorithms at classifying the ERPs associated with each trial type. The raw electrical signals were fed to all ML algorithms to build predictive models. The same raw data were then truncated in length and fitted to multiple dynamic state space models of order

n_{x}

using a continuous-time subspace-based system identification algorithm. The

4 n_{x}

numerator and denominator parameters of the transfer function of the state space model were then used as substitutes for the data. Dimensionality reduction simplifies classification, reduces noise, and may ultimately improve the predictive power of ML models. Our findings revealed that all ML methods correctly classified the electrical signal associated with each trial type with a high degree of accuracy, and accuracy remained high after parameterization was applied. We discuss the models and the usefulness of the parameterization. Full article

(This article belongs to the Special Issue Entropy-Based Methods in Time Series Identification and Classification with Applications to Engineering and Science)

► Show Figures

Figure 1

14 pages, 5023 KiB

Open AccessTechnical Note

Stain Detection Based on Unmanned Aerial Vehicle Hyperspectral Photovoltaic Module

by Da Li, Lan Li, Mingyang Cui, Pengliang Shi, Yintong Shi, Jian Zhu, Sui Dai and Meiping Song

Remote Sens. 2024, 16(1), 153; https://doi.org/10.3390/rs16010153 - 29 Dec 2023

Cited by 3 | Viewed by 1400

Abstract

Solar power generation has great development potential as an abundant and clean energy source. However, many factors affect the efficiency of the photovoltaic (PV) module; among these factors, outdoor PV modules are inevitably affected by stains, thus reducing the power generation efficiency of [...] Read more.

Solar power generation has great development potential as an abundant and clean energy source. However, many factors affect the efficiency of the photovoltaic (PV) module; among these factors, outdoor PV modules are inevitably affected by stains, thus reducing the power generation efficiency of the PV panel. This paper proposes a framework for PV module stain detection based on UAV hyperspectral images (HSIs). The framework consists of two stain detection methods: constrained energy minimization (CEM)-based and orthogonal subspace projection (OSP)-based stain detection methods. Firstly, the contaminated PV modules are analyzed and processed to enhance the data’s analytical capability. Secondly, based on the known spectral signature of the PV module, stain detection methods are proposed, including CEM-based stain detection and OSP-based stain detection for PV modules. The experimental results on real data illustrate that, in comparison with contrasting methods, the proposed method achieves stain detection results that closely align with known stain percentages. Additionally, it exhibits a fitting curve similar to the more maturely developed electroluminescence-based methods currently in use. Full article

(This article belongs to the Special Issue New Methods and Approaches in Airborne Hyperspectral Data Processing)

► Show Figures

Figure 1

14 pages, 4358 KiB

Open AccessArticle

Meshless Search SR-STAP for Airborne Radar Based on Meta-Heuristic Algorithms

by Yunfei Hou, Yingnan Zhang, Wenzhu Gui, Di Wang and Wei Dong

Sensors 2023, 23(23), 9444; https://doi.org/10.3390/s23239444 - 27 Nov 2023

Cited by 2 | Viewed by 1234

Abstract

The sparse recovery (SR) space-time adaptive processing (STAP) method has excellent clutter suppression performance under the condition of limited observation samples. However, when the cluttering is nonlinear in a spatial-Doppler profile, it will cause an off-grid effect and reduce the sparse recovery performance. [...] Read more.

The sparse recovery (SR) space-time adaptive processing (STAP) method has excellent clutter suppression performance under the condition of limited observation samples. However, when the cluttering is nonlinear in a spatial-Doppler profile, it will cause an off-grid effect and reduce the sparse recovery performance. A meshless search using a meta-heuristic algorithm (MH) can completely eliminate the off-grid effect in theory. Therefore, genetic algorithm (GA), differential evolution (DE), particle swarm optimization (PSO), and grey wolf optimization (GWO) methods are applied to SR-STAP for selecting exact clutter atoms in this paper. The simulation results show that MH-STAP can estimate the clutter subspace more accurately than the traditional algorithm; PSO-STAP and GWO-STAP showed better clutter suppression performance in four MH-STAP methods. To search for more accurate clutter atoms, PSO and GWO are combined to improve the method’s capacity for global optimization. Meanwhile, the fitness function is improved by using prior knowledge of the clutter distribution. The simulation results show that the improved PSO-GWO-STAP algorithm provides excellent clutter suppression performance, which solves the off-grid problem better than does single MH-STAP. Full article

(This article belongs to the Special Issue Advanced Sensing and Signal Processing for Radar Imaging, Target Recognition and Object Detection)

► Show Figures

Figure 1

16 pages, 6254 KiB

Open AccessArticle

Correlation Analysis of Large-Span Cable-Stayed Bridge Structural Frequencies with Environmental Factors Based on Support Vector Regression

by Jingye Xu, Tugang Xiao, Yu Liu, Yu Hong, Qianhui Pu and Xuguang Wen

Sensors 2023, 23(23), 9442; https://doi.org/10.3390/s23239442 - 27 Nov 2023

Cited by 6 | Viewed by 1479

Abstract

The dynamic characteristics of bridge structures are influenced by various environmental factors, and exploring the impact of environmental temperature and humidity on structural modal parameters is of great significance for structural health assessment. This paper utilized the Covariance-Driven Stochastic Subspace Identification method (SSI-COV) [...] Read more.

The dynamic characteristics of bridge structures are influenced by various environmental factors, and exploring the impact of environmental temperature and humidity on structural modal parameters is of great significance for structural health assessment. This paper utilized the Covariance-Driven Stochastic Subspace Identification method (SSI-COV) and clustering algorithms to identify modal frequencies from four months of acceleration data collected from the health monitoring system of the Jintang Hantan Twin-Island Bridge. Furthermore, a correlation analysis is conducted to examine the relationship between higher-order frequency and environmental factors, including temperature and humidity. Subsequently, a Support Vector Machine Regression (SVR) model is employed to analyze the effects of environmental temperature on structural modal frequencies. This study has obtained the following conclusions: 1. Correlation analysis revealed that temperature is the primary influencing factor in frequency variations. Frequency exhibited a strong linear correlation with temperature and little correlation with humidity. 2. SVR regression analysis was performed on frequency and temperature, and an evaluation of the fitting residuals was conducted. The model effectively fit the sample data and provided reliable predictive results. 3. The original structural frequencies underwent smoothing, eliminating the influence of temperature-induced frequency data generated by the SVR model. After eliminating the temperature effects, the fluctuations in frequency within a 24 h period significantly decreased. The data presented in this paper can serve as a reference for further health assessments of similar bridge structures. Full article

(This article belongs to the Topic Intelligent Health Monitoring and Assistance Systems and Frameworks)

► Show Figures

Figure 1

15 pages, 1050 KiB

Open AccessArticle

Self-Position Determination Based on Array Signal Subspace Fitting under Multipath Environments

by Zhongkang Cao, Pan Li, Wanghao Tang, Jianfeng Li and Xiaofei Zhang

Sensors 2023, 23(23), 9356; https://doi.org/10.3390/s23239356 - 23 Nov 2023

Cited by 3 | Viewed by 1389

Abstract

A vehicle’s position can be estimated with array receiving signal data without the help of satellite navigation. However, traditional array self-position determination methods are faced with the risk of failure under multipath environments. To deal with this problem, an array signal subspace fitting [...] Read more.

A vehicle’s position can be estimated with array receiving signal data without the help of satellite navigation. However, traditional array self-position determination methods are faced with the risk of failure under multipath environments. To deal with this problem, an array signal subspace fitting method is proposed for suppressing the multipath effect. Firstly, all signal incidence angles are estimated with enhanced spatial smoothing and root multiple signal classification (Root-MUSIC). Then, non-line-of-sight (NLOS) components are distinguished from multipath signals using a K-means clustering algorithm. Finally, the signal subspace fitting (SSF) function with a

P

matrix is established to reduce the NLOS components in multipath signals. Meanwhile, based on the initial clustering estimation, the search area can be significantly reduced, which can lead to less computational complexity. Compared with the C-matrix, oblique projection, initial signal fitting (ISF), multiple signal classification (MUSIC) and signal subspace fitting (SSF), the simulated experiments indicate that the proposed method has better NLOS component suppression performance, less computational complexity and more accurate positioning precision. A numerical analysis shows that the complexity of the proposed method has been reduced by at least

7.64 dB

. A cumulative distribution function (CDF) analysis demonstrates that the estimation accuracy of the proposed method is increased by

3.10 dB

compared with the clustering algorithm and

11.77 dB

compared with MUSIC, ISF and SSF under multipath environments. Full article

(This article belongs to the Special Issue Feature Papers in Electronic Sensors)

► Show Figures

Figure 1

15 pages, 5470 KiB

Open AccessArticle

ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples

by Yukun Du, Yitao Cai, Xiao Jin, Hongxia Wang, Yao Li and Min Lu

Mathematics 2023, 11(18), 3891; https://doi.org/10.3390/math11183891 - 13 Sep 2023

Viewed by 1362

Abstract

Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a [...] Read more.

Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis. Full article

(This article belongs to the Special Issue Advances in Computational Statistics and Data Analysis)

► Show Figures

Figure 1

16 pages, 5857 KiB

Open AccessArticle

Discovering Low-Dimensional Descriptions of Multineuronal Dependencies

by Lazaros Mitskopoulos and Arno Onken

Entropy 2023, 25(7), 1026; https://doi.org/10.3390/e25071026 - 6 Jul 2023

Viewed by 2405

Abstract

Coordinated activity in neural populations is crucial for information processing. Shedding light on the multivariate dependencies that shape multineuronal responses is important to understand neural codes. However, existing approaches based on pairwise linear correlations are inadequate at capturing complicated interaction patterns and miss [...] Read more.

Coordinated activity in neural populations is crucial for information processing. Shedding light on the multivariate dependencies that shape multineuronal responses is important to understand neural codes. However, existing approaches based on pairwise linear correlations are inadequate at capturing complicated interaction patterns and miss features that shape aspects of the population function. Copula-based approaches address these shortcomings by extracting the dependence structures in the joint probability distribution of population responses. In this study, we aimed to dissect neural dependencies with a C-Vine copula approach coupled with normalizing flows for estimating copula densities. While this approach allows for more flexibility compared to fitting parametric copulas, drawing insights on the significance of these dependencies from large sets of copula densities is challenging. To alleviate this challenge, we used a weighted non-negative matrix factorization procedure to leverage shared latent features in neural population dependencies. We validated the method on simulated data and applied it on copulas we extracted from recordings of neurons in the mouse visual cortex as well as in the macaque motor cortex. Our findings reveal that neural dependencies occupy low-dimensional subspaces, but distinct modules are synergistically combined to give rise to diverse interaction patterns that may serve the population function. Full article

(This article belongs to the Special Issue Neural Dynamics and Information Processing)

► Show Figures

Figure 1

20 pages, 5345 KiB

Open AccessArticle

The Accuracy and Computational Efficiency of the Loewner Framework for the System Identification of Mechanical Systems

by Gabriele Dessena, Marco Civera, Dmitry I. Ignatyev, James F. Whidborne, Luca Zanotti Fragonara and Bernardino Chiaia

Aerospace 2023, 10(6), 571; https://doi.org/10.3390/aerospace10060571 - 20 Jun 2023

Cited by 7 | Viewed by 2123

Abstract

The Loewner framework has recently been proposed for the system identification of mechanical systems, mitigating the limitations of current frequency domain fitting processes for the extraction of modal parameters. In this work, the Loewner framework computational performance, in terms of the elapsed time [...] Read more.

The Loewner framework has recently been proposed for the system identification of mechanical systems, mitigating the limitations of current frequency domain fitting processes for the extraction of modal parameters. In this work, the Loewner framework computational performance, in terms of the elapsed time till identification, is assessed. This is investigated on a hybrid, numerical and experimental dataset against two well-established system identification methods (least-squares complex exponential, LSCE, and subspace state space system identification, N4SID). Good results are achieved, in terms of better accuracy than LSCE and better computational performance than N4SID. Full article

(This article belongs to the Special Issue Structural Dynamics and Control (2nd Edition))

► Show Figures

Figure 1

19 pages, 33540 KiB

Open AccessArticle

The Calibration Method of Multi-Channel Spatially Varying Amplitude-Phase Inconsistency Errors in Airborne Array TomoSAR

by Dawei Wang, Fubo Zhang, Longyong Chen, Zhenhua Li and Ling Yang

Remote Sens. 2023, 15(12), 3032; https://doi.org/10.3390/rs15123032 - 9 Jun 2023

Cited by 8 | Viewed by 1936

Abstract

Airborne array tomographic synthetic aperture radar (TomoSAR) can acquire three-dimensional (3D) information of the observed scene in a single pass. In the process of airborne array TomoSAR data imaging, due to the disturbance of factors such as inconsistent antenna patterns and baseline errors, [...] Read more.

Airborne array tomographic synthetic aperture radar (TomoSAR) can acquire three-dimensional (3D) information of the observed scene in a single pass. In the process of airborne array TomoSAR data imaging, due to the disturbance of factors such as inconsistent antenna patterns and baseline errors, there are spatially varying amplitude-phase inconsistency errors in the multi-channel Single-Look-Complex (SLC) images. The existence of the errors degrades the quality of the 3D imaging results, which suffer from positioning errors, stray points, and spurious targets. In this paper, a new calibration method based on multiple prominent points is proposed to calibrate the errors of amplitude-phase inconsistency. Firstly, the prominent points are selected from the multi-channel SLC data. Then, the subspace decomposition method and maximum interference spectrum method are used to extract the multi-channel amplitude-phase inconsistency information at each point. The last step is to fit the varying curve and to compensate for the errors. The performance of the method is verified using actual data. The experimental results show that compared with the traditional fixed amplitude-phase inconsistency calibration method, the proposed method can effectively calibrate spatially varying amplitude-phase inconsistency errors, thus improving on the accuracy of 3D reconstruction results for large-scale scenes. Full article

(This article belongs to the Special Issue Advances in Synthetic Aperture Radar: Calibration, Analysis, and Application)

► Show Figures

Figure 1

11 pages, 724 KiB

Open AccessArticle

Reproducing Kernel Hilbert Spaces of Smooth Fractal Interpolation Functions

by Dah-Chin Luor and Liang-Yu Hsieh

Fractal Fract. 2023, 7(5), 357; https://doi.org/10.3390/fractalfract7050357 - 27 Apr 2023

Viewed by 1277

Abstract

The theory of reproducing kernel Hilbert spaces (RKHSs) has been developed into a powerful tool in mathematics and has lots of applications in many fields, especially in kernel machine learning. Fractal theory provides new technologies for making complicated curves and fitting experimental data. [...] Read more.

The theory of reproducing kernel Hilbert spaces (RKHSs) has been developed into a powerful tool in mathematics and has lots of applications in many fields, especially in kernel machine learning. Fractal theory provides new technologies for making complicated curves and fitting experimental data. Recently, combinations of fractal interpolation functions (FIFs) and methods of curve estimations have attracted the attention of researchers. We are interested in the study of connections between FIFs and RKHSs. The aim is to develop the concept of smooth fractal-type reproducing kernels and RKHSs of smooth FIFs. In this paper, a linear space of smooth FIFs is considered. A condition for a given finite set of smooth FIFs to be linearly independent is established. For such a given set, we build a fractal-type positive semi-definite kernel and show that the span of these linearly independent smooth FIFs is the corresponding RKHS. The nth derivatives of these FIFs are investigated, and properties of related positive semi-definite kernels and the corresponding RKHS are studied. We also introduce subspaces of these RKHS which are important in curve-fitting applications. Full article

(This article belongs to the Special Issue Recent Advances in Fractal Interpolation Functions and Their Applications in AI)

20 pages, 14462 KiB

Open AccessArticle

Surface Approximation by Means of Gaussian Process Latent Variable Models and Line Element Geometry

by Ivan De Boi, Carl Henrik Ek and Rudi Penne

Mathematics 2023, 11(2), 380; https://doi.org/10.3390/math11020380 - 11 Jan 2023

Cited by 1 | Viewed by 2842

Abstract

The close relation between spatial kinematics and line geometry has been proven to be fruitful in surface detection and reconstruction. However, methods based on this approach are limited to simple geometric shapes that can be formulated as a linear subspace of line or [...] Read more.

The close relation between spatial kinematics and line geometry has been proven to be fruitful in surface detection and reconstruction. However, methods based on this approach are limited to simple geometric shapes that can be formulated as a linear subspace of line or line element space. The core of this approach is a principal component formulation to find a best-fit approximant to a possibly noisy or impartial surface given as an unordered set of points or point cloud. We expand on this by introducing the Gaussian process latent variable model, a probabilistic non-linear non-parametric dimensionality reduction approach following the Bayesian paradigm. This allows us to find structure in a lower dimensional latent space for the surfaces of interest. We show how this can be applied in surface approximation and unsupervised segmentation to the surfaces mentioned above and demonstrate its benefits on surfaces that deviate from these. Experiments are conducted on synthetic and real-world objects. Full article

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications II)

► Show Figures

Graphical abstract

21 pages, 1496 KiB

Open AccessArticle

A Dynamic Principal Component Analysis and Fréchet-Distance-Based Algorithm for Fault Detection and Isolation in Industrial Processes

by Bálint Levente Tarcsay, Ágnes Bárkányi, Tibor Chován and Sándor Németh

Processes 2022, 10(11), 2409; https://doi.org/10.3390/pr10112409 - 15 Nov 2022

Cited by 5 | Viewed by 2015

Abstract

Fault Detection and Isolation (FDI) methodology focuses on maintaining safe and reliable operating conditions within industrial practices which is of crucial importance for the profitability of technologies. In this work, the development of an FDI algorithm based on the use of dynamic principal [...] Read more.

Fault Detection and Isolation (FDI) methodology focuses on maintaining safe and reliable operating conditions within industrial practices which is of crucial importance for the profitability of technologies. In this work, the development of an FDI algorithm based on the use of dynamic principal component analysis (DPCA) and the Fréchet distance

(δ_{d F})

metric is explored. The three-tank benchmark problem is studied and utilized to demonstrate the performance of the FDI method for six fault types. A DPCA transformation for the system was established, and fault detection was conducted based on the Q statistic. Fault isolation is also of critical importance for proper intervention to mitigate fault effects. To identify the type of detected faults, the fault responses within the PC subspace were analyzed using the

δ_{d F}

metric. The use of the Fréchet distance metric for the isolation of faults combined with DPCA for feature extraction is a novel technique to the best of the authors’ knowledge that provides a robust computational tool with low computational cost for FDI purposes that fits well into the Industry 4.0 framework.The robustness and sensitivity of the method was validated for a wide variety of signal-to-noise ratio (SNR) conditions, with findings indicating a possible average false and missed alarm rate of 0.1 and a macro-averaged F-score above 0.8 in all cases. Full article

► Show Figures

Figure 1

22 pages, 4021 KiB

Open AccessArticle

An Auto-Encoder with Genetic Algorithm for High Dimensional Data: Towards Accurate and Interpretable Outlier Detection

by Jiamu Li, Ji Zhang, Mohamed Jaward Bah, Jian Wang, Youwen Zhu, Gaoming Yang, Lingling Li and Kexin Zhang

Algorithms 2022, 15(11), 429; https://doi.org/10.3390/a15110429 - 15 Nov 2022

Cited by 6 | Viewed by 4366

Abstract

When dealing with high-dimensional data, such as in biometric, e-commerce, or industrial applications, it is extremely hard to capture the abnormalities in full space due to the curse of dimensionality. Furthermore, it is becoming increasingly complicated but essential to provide interpretations for outlier [...] Read more.

When dealing with high-dimensional data, such as in biometric, e-commerce, or industrial applications, it is extremely hard to capture the abnormalities in full space due to the curse of dimensionality. Furthermore, it is becoming increasingly complicated but essential to provide interpretations for outlier detection results in high-dimensional space as a consequence of the large number of features. To alleviate these issues, we propose a new model based on a Variational AutoEncoder and Genetic Algorithm (VAEGA) for detecting outliers in subspaces of high-dimensional data. The proposed model employs a neural network to create a probabilistic dimensionality reduction variational autoencoder (VAE) that applies its low-dimensional hidden space to characterize the high-dimensional inputs. Then, the hidden vector is sampled randomly from the hidden space to reconstruct the data so that it closely matches the input data. The reconstruction error is then computed to determine an outlier score, and samples exceeding the threshold are tentatively identified as outliers. In the second step, a genetic algorithm (GA) is used as a basis for examining and analyzing the abnormal subspace of the outlier set obtained by the VAE layer. After encoding the outlier dataset’s subspaces, the degree of anomaly for the detected subspaces is calculated using the redefined fitness function. Finally, the abnormal subspace is calculated for the detected point by selecting the subspace with the highest degree of anomaly. The clustering of abnormal subspaces helps filter outliers that are mislabeled (false positives), and the VAE layer adjusts the network weights based on the false positives. When compared to other methods using five public datasets, the VAEGA outlier detection model results are highly interpretable and outperform or have competitive performance compared to current contemporary methods. Full article

(This article belongs to the Special Issue Machine Learning and Deep Learning Applications for Anomaly and Fault Detection)

► Show Figures

Figure 1

Search Results (46)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (46)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI