Mathematics

Research

24 pages, 2113 KB

Open AccessArticle

Structured Element Extraction from Official Documents Based on BERT-CRF and Knowledge Graph-Enhanced Retrieval

by Siyuan Chen, Liyuan Niu, Jinning Li, Xiaomin Zhu, Xuebin Zhuang and Yanqing Ye

Mathematics 2025, 13(17), 2779; https://doi.org/10.3390/math13172779 - 29 Aug 2025

Viewed by 539

The growth of e-government has rendered automated element extraction from official documents a critical bottleneck for administrative efficiency. The core challenge lies in unifying deep semantic understanding with the structured domain knowledge required to interpret complex formats and specialized terminology. To address the [...] Read more.

The growth of e-government has rendered automated element extraction from official documents a critical bottleneck for administrative efficiency. The core challenge lies in unifying deep semantic understanding with the structured domain knowledge required to interpret complex formats and specialized terminology. To address the limitations of existing methods, we propose a hybrid framework. Our approach leverages a BERT-CRF model for robust sequence labeling, a knowledge graph (KG)-driven retrieval system to ground the model in verifiable facts, and a large language model (LLM) as a reasoning engine to resolve ambiguities and identify complex relationships. Validated on the DovDoc-CN dataset, our framework achieves a macro-average F1 score of 0.850, outperforming the BiLSTM-CRF baseline by 2.41 percentage points, and demonstrates high consistency, with a weighted F1 score of 0.984. The low standard deviation in the validation set further indicates the model’s stable performance across different subsets. These results confirm that our integrated approach provides an efficient and reliable solution for intelligent document processing, effectively handling the format diversity and specialized knowledge characteristic of government documents. Full article

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

► Show Figures

Figure 1

31 pages, 4141 KB

Open AccessFeature PaperArticle

Automated Quality Control of Candle Jars via Anomaly Detection Using OCSVM and CNN-Based Feature Extraction

by Azeddine Mjahad and Alfredo Rosado-Muñoz

Mathematics 2025, 13(15), 2507; https://doi.org/10.3390/math13152507 - 4 Aug 2025

Viewed by 517

Abstract

Automated quality control plays a critical role in modern industries, particularly in environments that handle large volumes of packaged products requiring fast, accurate, and consistent inspections. This work presents an anomaly detection system for candle jars commonly used in industrial and commercial applications, [...] Read more.

Automated quality control plays a critical role in modern industries, particularly in environments that handle large volumes of packaged products requiring fast, accurate, and consistent inspections. This work presents an anomaly detection system for candle jars commonly used in industrial and commercial applications, where obtaining labeled defective samples is challenging. Two anomaly detection strategies are explored: (1) a baseline model using convolutional neural networks (CNNs) as an end-to-end classifier and (2) a hybrid approach where features extracted by CNNs are fed into One-Class classification (OCC) algorithms, including One-Class SVM (OCSVM), One-Class Isolation Forest (OCIF), One-Class Local Outlier Factor (OCLOF), One-Class Elliptic Envelope (OCEE), One-Class Autoencoder (OCAutoencoder), and Support Vector Data Description (SVDD). Both strategies are trained primarily on non-defective samples, with only a limited number of anomalous examples used for evaluation. Experimental results show that both the pure CNN model and the hybrid methods achieve excellent classification performance. The end-to-end CNN reached 100% accuracy, precision, recall, F1-score, and AUC. The best-performing hybrid model CNN-based feature extraction followed by OCIF also achieved 100% across all evaluation metrics, confirming the effectiveness and robustness of the proposed approach. Other OCC algorithms consistently delivered strong results, with all metrics above 95%, indicating solid generalization from predominantly normal data. This approach demonstrates strong potential for quality inspection tasks in scenarios with scarce defective data. Its ability to generalize effectively from mostly normal samples makes it a practical and valuable solution for real-world industrial inspection systems. Future work will focus on optimizing real-time inference and exploring advanced feature extraction techniques to further enhance detection performance. Full article

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

► Show Figures

Figure 1

12 pages, 3026 KB

Open AccessArticle

Statistical Analysis of COVID-19 Impact on Italian Mortality

by Girolamo Franchetti, Carmela Iorio and Massimiliano Politano

Mathematics 2025, 13(15), 2368; https://doi.org/10.3390/math13152368 - 24 Jul 2025

Viewed by 328

Abstract

This study presents a methodology for evaluating the impact of the pandemic on mortality rates in Italy. The primary objectives are to define criteria for identifying a ‘rise in mortality’, establish a robust evaluation approach, and assess pandemic repercussions using the proposed framework. [...] Read more.

This study presents a methodology for evaluating the impact of the pandemic on mortality rates in Italy. The primary objectives are to define criteria for identifying a ‘rise in mortality’, establish a robust evaluation approach, and assess pandemic repercussions using the proposed framework. To conduct a comparative analysis of mortality estimates, two classical models were employed: the Lee–Carter and the Renshaw–Haberman models. The analysis involved utilising actuarial tables and mortality models to quantify pandemic-induced excess deaths by calculating the disparity between these estimates. The proposed method aims to provide a comprehensive and clear understanding of the impact of the pandemic on mortality in Italy. Full article

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

► Show Figures

Figure 1

25 pages, 1891 KB

Open AccessFeature PaperArticle

Classification Improvement with Integration of Radial Basis Function and Multilayer Perceptron Network Architectures

by László Kovács

Mathematics 2025, 13(9), 1471; https://doi.org/10.3390/math13091471 - 30 Apr 2025

Viewed by 587

Abstract

The radial basis function architecture and the multilayer perceptron architecture are very different approaches to neural networks in theory and practice. Considering their classification efficiency, both have different strengths; thus, the integration of these tools is an interesting but understudied problem domain. This [...] Read more.

The radial basis function architecture and the multilayer perceptron architecture are very different approaches to neural networks in theory and practice. Considering their classification efficiency, both have different strengths; thus, the integration of these tools is an interesting but understudied problem domain. This paper presents a novel initialization method based on a distance-weighted homogeneity measure to construct a radial basis function network with fast convergence. The proposed radial basis function network is utilized in the development of an integrated RBF-MLP architecture. The proposed neural network model was tested in various classification tasks and the test results show superiority of the proposed architecture. The RBF-MLP model achieved nearly 40 percent better accuracy in the tests than the baseline MLP or RBF neural network architectures. Full article

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

► Show Figures

Figure 1

28 pages, 10436 KB

Open AccessArticle

ParDP: A Parallel Density Peaks-Based Clustering Algorithm

by Libero Nigro and Franco Cicirelli

Mathematics 2025, 13(8), 1285; https://doi.org/10.3390/math13081285 - 14 Apr 2025

Viewed by 426

Abstract

This paper proposes ParDP, an algorithm and concrete tool for unsupervised clustering, which belongs to the class of density peaks-based clustering methods. Such methods rely on the observation that cluster representative points (centroids) are points of higher local density surrounded by points of [...] Read more.

This paper proposes ParDP, an algorithm and concrete tool for unsupervised clustering, which belongs to the class of density peaks-based clustering methods. Such methods rely on the observation that cluster representative points (centroids) are points of higher local density surrounded by points of lesser density. Candidate centroids, though, are to be far from each other. A key factor of ParDP is adopting a k-Nearest Neighbors (kNN) technique for estimating the density of points. Complete clustering depends on densities and distances among points. ParDP uses principal component analysis to cope with high-dimensional data points. The current implementation relies on Java parallel streams and the built-in lock-free fork/join mechanism, enabling the exploitation of the computing power of commodity multi/many-core machines. This paper demonstrates ParDP’s clustering capabilities by applying it to several benchmark and real-world datasets. ParDP’s operation can either be directed to observe the number of clusters in a dataset or to finalize clustering with an assigned number of clusters. Different internal and external measures can be used to assess the accuracy of a resultant clustering solution. Full article

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

► Show Figures

Figure 1

22 pages, 7778 KB

Open AccessFeature PaperArticle

A New Approach to Estimate the Parameters of the Joint Distribution of the Wind Speed and the Wind Direction, Modelled with the Angular–Linear Model

by Samuel Martínez-Gutiérrez, Alejandro Merino, Luis A. Sarabia, Daniel Sarabia and Ruben Ruiz-Gonzalez

Mathematics 2025, 13(8), 1238; https://doi.org/10.3390/math13081238 - 9 Apr 2025

Viewed by 461

Abstract

In order to assess the potential and suitability of a location to deploy a wind farm, it is essential to have a model of the joint probability density function of the wind speed and direction, f_V_,Θ(v,θ [...] Read more.

In order to assess the potential and suitability of a location to deploy a wind farm, it is essential to have a model of the joint probability density function of the wind speed and direction, f_V_,Θ(v,θ). The angular–linear model is widely used to obtain the analytical expression of the joint density from the parametric estimation of the probability density functions of wind speed, f_V(v), and wind direction, f_Θ(θ). In previous studies, the parameters of the marginal distributions were obtained by fitting the wind measurements to the cumulative distribution function (CDF) using the least squares method and then calculating the probability density function (PDF). In this study, we propose to directly fit the probability density function and then calculate the cumulative distribution function. It is shown that it has both computational and goodness-of-fit advantages. In addition, previous studies have been expanded, analysing the effect of the number of intervals on which wind speed and direction ranges are divided. The new parameter fitting method is evaluated and compared with the original proposal in terms of goodness of fit, using the coefficient of determination R² as an estimator both in the probability density function (R²_pdf) and in the cumulative distribution function (R²_cdf). The computational times required to estimate the parameters using both methods will also be compared. The new approach is faster, and the goodness of the fitting is satisfactory for both estimators: it produces a better R²_pdf, without significantly affecting the R²_cdf, in contrast to the initial one where the R²_pdf is smaller. Full article

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

► Show Figures

Figure 1

14 pages, 1082 KB

Open AccessFeature PaperArticle

Interpreting Temporal Shifts in Global Annual Data Using Local Surrogate Models

by Shou Nakano and Yang Liu

Mathematics 2025, 13(4), 626; https://doi.org/10.3390/math13040626 - 14 Feb 2025

Cited by 2 | Viewed by 803

Abstract

This paper focuses on explaining changes over time in globally sourced annual temporal data with the specific objective of identifying features in black-box models that contribute to these temporal shifts. Leveraging local explanations, a part of explainable machine learning/XAI, can yield explanations behind [...] Read more.

This paper focuses on explaining changes over time in globally sourced annual temporal data with the specific objective of identifying features in black-box models that contribute to these temporal shifts. Leveraging local explanations, a part of explainable machine learning/XAI, can yield explanations behind a country’s growth or downfall after making economic or social decisions. We employ a Local Interpretable Model-Agnostic Explanation (LIME) to shed light on national happiness indices, economic freedom, and population metrics, spanning variable time frames. Acknowledging the presence of missing values, we employ three imputation approaches to generate robust multivariate temporal datasets apt for LIME’s input requirements. Our methodology’s efficacy is substantiated through a series of empirical evaluations involving multiple datasets. These evaluations include comparative analyses against random feature selection, correlation with real-world events as explained using LIME, and validation through Individual Conditional Expectation (ICE) plots, a state-of-the-art technique proficient in feature importance detection. Full article

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

► Show Figures

Figure 1

18 pages, 3527 KB

Open AccessArticle

Identification of Patterns in CO₂ Emissions among 208 Countries: K-Means Clustering Combined with PCA and Non-Linear t-SNE Visualization

by Ana Lorena Jiménez-Preciado, Salvador Cruz-Aké and Francisco Venegas-Martínez

Mathematics 2024, 12(16), 2591; https://doi.org/10.3390/math12162591 - 22 Aug 2024

Cited by 4 | Viewed by 2655

Abstract

This paper identifies patterns in total and per capita CO₂ emissions among 208 countries considering different emission sources, such as cement, flaring, gas, oil, and coal. This research uses linear and non-linear dimensional reduction techniques, combining K-means clustering with principal component analysis [...] Read more.

This paper identifies patterns in total and per capita CO₂ emissions among 208 countries considering different emission sources, such as cement, flaring, gas, oil, and coal. This research uses linear and non-linear dimensional reduction techniques, combining K-means clustering with principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), which allows the identification of distinct emission profiles among nations. This approach allows effective clustering of heterogeneous countries despite the highly dimensional nature of emissions data. The optimal number of clusters is determined using Calinski–Harabasz and Davies–Bouldin scores, of five and six clusters for total and per capita CO₂ emissions, respectively. The findings reveal that for total emissions, t-SNE brings together the world’s largest economies and emitters, i.e., China, USA, India, and Russia, into a single cluster, while PCA provides clusters with a single country for China, USA, and Russia. Regarding per capita emissions, PCA generates a cluster with only one country, Qatar, due to its significant flaring emissions, as byproduct of the oil industry, and its low population. This study concludes that international collaboration and coherent global policies are crucial for effectively addressing CO₂ emissions and developing targeted climate change mitigation strategies. Full article

(This article belongs to the Special Issue Exploring Statistical Learning: Inference, Optimization, and Real-World Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Exploring Statistical Learning: Inference, Optimization, and Real-World Applications

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI