Submit to Special Issue Submit Abstract to Special Issue Review for Electronics Propose a Special Issue

Journal Menu

Journal Browser

Data-Centric Artificial Intelligence: New Methods for Data Processing

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 October 2025 | Viewed by 19165

Share This Special Issue

Special Issue Editors

Dr. Dawid Ewald

E-Mail Website
Guest Editor

Department of Computer Science, Kazimierz Wielki University, 85-064 Bydgoszcz, Poland
Interests: bee algorithms; fuzzy logic; artificial neural networks and their applications; language models; generative AI

Dr. Yutaka Watanobe

E-Mail Website
Guest Editor

School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan
Interests: intelligent software; smart learning; cloud robotics; programming environment; visual languages
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data-centric artificial intelligence is developing rapidly thanks to advances in machine learning, natural language processing, and data visualization. These modern AI techniques enable us to better understand and process huge data sets, and provide companies and scientists with tools for extracting hidden patterns, discovering new knowledge, and automating complex analytical processes. In this Special Issue, we present application examples of these AI methods in solving real-world business and scientific problems.

We invite you to submit papers for this Special Issue dedicated to data-centric artificial intelligence, focusing on the following:

New methods and techniques for processing large data sets;
Topics related to machine learning, natural language processing, and data visualization;
Practical applications of these methods in various fields.

This publication will supplement the existing literature by focusing on the latest trends and solutions in this area.

Dr. Dawid Ewald
Dr. Yutaka Watanobe
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

artificial intelligence
machine learning
data processing
data visualization
natural language processing
fuzzy logic

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

21 pages, 2883 KiB

Open AccessArticle

Support Vector Machines with Hyperparameter Optimization Frameworks for Classifying Mobile Phone Prices in Multi-Class

by You-Jeng Chang, Ying-Lei Lin and Ping-Feng Pai

Electronics 2025, 14(11), 2173; https://doi.org/10.3390/electronics14112173 - 27 May 2025

Viewed by 645

Abstract

Accurately predicting mobile phone prices is essential for improving consumer decision-making, supporting business strategies, and enhancing market transparency. However, studies on improving the performance of multi-class classification models by using hyperparameter selection frameworks are limited. Thus, this study aims to develop a mobile phone price classification model by integrating support vector machines (SVM) with two advanced hyperparameter optimization (HPO) frameworks, namely Hyperopt (HYP) and Optuna (OPT), for hyperparameter determination to increase classification accuracy. A public dataset with various training and testing conditions is used by presented models, namely SVMHYP and SVMOPT models. Numerical results indicate that the developed models outperform results from the previous literature in terms of classification accuracy. Furthermore, a 5-fold cross-validation strategy is performed to examine generalizability and robustness of the presented multi-classification models. These findings highlight the effectiveness of combining SVM with HPO as a robust solution for mobile phone price prediction. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

36 pages, 3107 KiB

Open AccessArticle

Estimating Calibrated Risks Using Focal Loss and Gradient-Boosted Trees for Clinical Risk Prediction

by Henry Johnston, Nandini Nair and Dongping Du

Electronics 2025, 14(9), 1838; https://doi.org/10.3390/electronics14091838 - 30 Apr 2025

Viewed by 1275

Abstract

Probability calibration and decision threshold selection are fundamental aspects of risk prediction and classification, respectively. A strictly proper loss function is used in clinical risk prediction applications to encourage a model to predict calibrated class-posterior probabilities or risks. Recent studies have shown that training with focal loss can improve the discriminatory power of gradient-boosted decision trees (GBDT) for classification tasks with an imbalanced or skewed class distribution. However, the focal loss function is not a strictly proper loss function. Therefore, the output of GBDT trained using focal loss is not an accurate estimate of the true class-posterior probability. This study aims to address the issue of poor calibration of GBDT trained using focal loss in the context of clinical risk prediction applications. The methodology utilizes a closed-form transformation of the confidence scores of GBDT trained with focal loss to estimate calibrated risks. The closed-form transformation relates the focal loss minimizer and the true-class posterior probability. Algorithms based on Bayesian hyperparameter optimization are provided to choose the focal loss parameter that optimizes discriminatory power and calibration, as measured by the Brier score metric. We assess how the calibration of the confidence scores affects the selection of a decision threshold to optimize the balanced accuracy, defined as the arithmetic mean of sensitivity and specificity. The effectiveness of the proposed strategy was evaluated using lung transplant data extracted from the Scientific Registry of Transplant Recipients (SRTR) for predicting post-transplant cancer. The proposed strategy was also evaluated using data from the Behavioral Risk Factor Surveillance System (BRFSS) for predicting diabetes status. Probability calibration plots, calibration slope and intercept, and the Brier score show that the approach improves calibration while maintaining the same discriminatory power according to the area under the receiver operating characteristics curve (AUROC) and the H-measure. The calibrated focal-aware XGBoost achieved an AUROC, Brier score, and calibration slope of 0.700, 0.128, and 0.968 for predicting the 10-year cancer risk, respectively. The miscalibrated focal-aware XGBoost achieved equal AUROC but a worse Brier score and calibration slope (0.140 and 1.579). The proposed method compared favorably to the standard XGBoost trained using cross-entropy loss (AUROC of 0.755 versus 0.736 in predicting the 1-year risk of cancer). Comparable performance was observed with other risk prediction models in the diabetes prediction task. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

39 pages, 1360 KiB

Open AccessArticle

Real-Time Monitoring of LTL Properties in Distributed Stream Processing Applications

by Loay Aladib, Guoxin Su and Jack Yang

Electronics 2025, 14(7), 1448; https://doi.org/10.3390/electronics14071448 - 3 Apr 2025

Viewed by 683

Abstract

Stream processing frameworks have become key enablers of real-time data processing in modern distributed systems. However, robust and scalable mechanisms for verifying temporal properties are often lacking in existing systems. To address this gap, a new runtime verification framework is proposed that integrates linear temporal logic (LTL) monitoring into stream processing applications, such as Apache Spark. The approach introduces reusable LTL monitoring patterns designed for seamless integration into existing streaming workflows. Our case study, applied to real-time financial data monitoring, demonstrates that LTL-based monitoring can effectively detect violations of safety and liveness properties while maintaining stable latency. A performance evaluation reveals that although the approach introduces computational overhead, it scales effectively with increasing data volume. The proposed framework extends beyond financial data processing and is applicable to domains such as real-time equipment failure detection, financial fraud monitoring, and industrial IoT analytics. These findings demonstrate the feasibility of real-time LTL monitoring in large-scale stream processing environments while highlighting trade-offs between verification accuracy, scalability, and system overhead. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

23 pages, 1436 KiB

Open AccessArticle

Forecasting Corporate Financial Performance Using Deep Learning with Environmental, Social, and Governance Data

by Wan-Lu Hsu, Ying-Lei Lin, Jung-Pin Lai, Yu-Hui Liu and Ping-Feng Pai

Electronics 2025, 14(3), 417; https://doi.org/10.3390/electronics14030417 - 21 Jan 2025

Viewed by 3920

Abstract

In recent years, extensive research has focused on the relationship between corporate social responsibility (CSR) and financial performance. While past studies have explored this connection, they often faced challenges in quantitatively assessing the effectiveness of CSR initiatives. However, advancements in research methodologies and the development of Environmental, Social, and Governance (ESG) measurement dimensions have led to the creation of more robust evaluation criteria. These criteria use ESG scores as primary reference indicators for assessing the effectiveness of CSR activities. This study aims to utilize ESG indicators from the ESG InfoHub website of the Taiwan Stock Exchange Corporation (TSEC) as benchmarks, comprising 15 items from the environmental (E), social (S), and governance (G) dimensions to form the CSR effectiveness indicators and predict financial performance. The data cover the years 2021–2022 for listed companies, using return on assets (ROA) and return on equity (ROE) as measures of financial performance. With the rapid development of artificial intelligence in recent years, the applications of machine learning and deep learning (DL) have proliferated across many fields. However, the use of machine learning to analyze ESG data remains rare. Therefore, this study employs machine learning models to predict financial performance based on ESG performance, utilizing both classification and regression approaches. Numerical results indicate that two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), outperform other models in regression and classification tasks, respectively. Consequently, deep learning techniques prove to be feasible, effective, and efficient alternatives for predicting corporations’ financial performance based on ESG metrics. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

26 pages, 2692 KiB

Open AccessArticle

Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing

by Vishnu S. Pendyala, Karnavee Kamdar and Kapil Mulchandani

Electronics 2025, 14(2), 256; https://doi.org/10.3390/electronics14020256 - 9 Jan 2025

Viewed by 2906

Abstract

Research expands the boundaries of a subject, economy, and civilization. Peer review is at the heart of research and is understandably an expensive process. This work, with human-in-the-loop, aims to support the research community in multiple ways. It predicts quality, and acceptance, and recommends reviewers. It helps the authors and editors to evaluate research work using machine learning models developed based on a dataset comprising 18,000+ research papers, some of which are from highly acclaimed, top conferences in Artificial Intelligence such as NeurIPS and ICLR, their reviews, aspect scores, and accept/reject decisions. Using machine learning algorithms such as Support Vector Machines, Deep Learning Recurrent Neural Network architectures such as LSTM, a wide variety of pre-trained word vectors using Word2Vec, GloVe, FastText, transformer architecture-based BERT, DistilBERT, Google’s Large Language Model (LLM), PaLM 2, and TF-IDF vectorizer, a comprehensive system is built. For the system to be readily usable and to facilitate future enhancements, a frontend, a Flask server in the cloud, and a NOSQL database at the backend are implemented, making it a complete system. The work is novel in using a unique blend of tools and techniques to address most aspects of building a system to support the peer review process. The experiments result in a 86% test accuracy on acceptance prediction using DistilBERT. Results from other models are comparable, with PaLM-based LLM embeddings achieving 84% accuracy. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

26 pages, 2128 KiB

Open AccessArticle

Gross Domestic Product Forecasting: Harnessing Machine Learning for Accurate Economic Predictions in a Univariate Setting

by Bogdan Oancea and Mihaela Simionescu

Electronics 2024, 13(24), 4918; https://doi.org/10.3390/electronics13244918 - 13 Dec 2024

Viewed by 2752

Abstract

In recent years, precise economic forecasting has primarily relied on econometric models, which often assume linearity and stationarity in time series data. However, the nonlinear and dynamic nature of economic data calls for more innovative approaches. Machine learning (ML) techniques offer significant advantages over traditional methods by capturing complex, nonlinear patterns without predefined specifications. This study investigates the effectiveness of Long Short-Term Memory (LSTM) networks for forecasting Gross Domestic Product (GDP) in a univariate setting using quarterly Romanian GDP data spanning from 1995 to 2023. The dataset encompasses significant economic events, including the 2008 financial crisis and the COVID-19 pandemic, highlighting its relevance for broader economic forecasting applications. While the univariate approach simplifies model development, it also limits the incorporation of additional economic indicators, potentially affecting generalizability. Furthermore, computational challenges, such as time-intensive hyperparameter tuning, emerged during model optimization. We implemented LSTM networks with input data based on four and six lags to predict GDP and compared their performance with Seasonal Autoregressive Integrated Moving Average (SARIMA), a classical econometric method. Our results reveal that LSTM networks consistently outperformed SARIMA in predictive accuracy, demonstrating their robustness in capturing economic trends. These findings underscore the potential of ML in enhancing economic forecasting methodologies. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

16 pages, 2237 KiB

Open AccessArticle

Improving Process Control Through Decision Tree-Based Pattern Recognition

by Izabela Rojek, Agnieszka Kujawińska, Robert Burduk and Dariusz Mikołajewski

Electronics 2024, 13(23), 4823; https://doi.org/10.3390/electronics13234823 - 6 Dec 2024

Viewed by 1345

Abstract

This paper explores the integration of decision tree classifiers in the assessment of machining process stability using control charts. The inherent variability in manufacturing processes requires a robust system for the early detection and correction of disturbances, which has traditionally relied on operators’ experience. Using decision trees, this study presents an automated approach to pattern recognition on control charts that outperforms the accuracy of human operators and neural networks. Experimental research conducted on two datasets from surface finishing processes demonstrates that decision trees can achieve perfect classification under optimal parameters. The results suggest that decision trees offer a transparent and effective tool for quality control, capable of reducing human error, improving decision making, and fostering greater confidence among company employees. These results open up new possibilities for the automation and continuous improvement of machining process control. The contribution of this research to Industry 4.0 is to enable the real-time, data-driven monitoring of machining process stability through decision tree-based pattern recognition, which improves predictive maintenance and quality control. It supports the transition to intelligent manufacturing, where process anomalies are detected and resolved dynamically, reducing downtime and increasing productivity. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

19 pages, 3109 KiB

Open AccessArticle

Text Command Intelligent Understanding for Cybersecurity Testing

by Junkai Yi, Yuan Liu, Zhongbai Jiang and Zhen Liu

Electronics 2024, 13(21), 4330; https://doi.org/10.3390/electronics13214330 - 4 Nov 2024

Viewed by 1140

Abstract

Research on named entity recognition (NER) and command-line generation for network security evaluation tools is relatively scarce, and no mature models for recognition or generation have been developed thus far. Therefore, in this study, the aim is to build a specialized corpus for network security evaluation tools by combining knowledge graphs and information entropy for automatic entity annotation. Additionally, a novel NER approach based on the KG-BERT-BiLSTM-CRF model is proposed. Compared to the traditional BERT-BiLSTM model, the KG-BERT-BiLSTM-CRF model demonstrates superior performance when applied to the specialized corpus of network security evaluation tools. The graph attention network (GAT) component effectively extracts relevant sequential content from datasets in the network security evaluation domain. The fusion layer then concatenates the feature sequences from the GAT and BiLSTM layers, enhancing the training process. Upon successful NER execution, in this study, the identified entities are mapped to pre-established command-line data for network security evaluation tools, achieving automatic conversion from textual content to evaluation commands. This process not only improves the efficiency and accuracy of command generation but also provides practical value for the development and optimization of network security evaluation tools. This approach enables the more precise automatic generation of evaluation commands tailored to specific security threats, thereby enhancing the timeliness and effectiveness of cybersecurity defenses. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

13 pages, 5922 KiB

Open AccessArticle

Evaluating Multimodal Techniques for Predicting Visibility in the Atmosphere Using Satellite Images and Environmental Data

by Hui-Yu Tsai and Ming-Hseng Tseng

Electronics 2024, 13(13), 2585; https://doi.org/10.3390/electronics13132585 - 1 Jul 2024

Cited by 1 | Viewed by 1526

Abstract

Visibility is a measure of the atmospheric transparency at an observation point, expressed as the maximum horizontal distance over which a person can see and identify objects. Low atmospheric visibility often occurs in conjunction with air pollution, posing hazards to both traffic safety and human health. In this study, we combined satellite remote sensing images with environmental data to explore the classification performance of two distinct multimodal data processing techniques. The first approach involves developing four multimodal data classification models using deep learning. The second approach integrates deep learning and machine learning to create twelve multimodal data classifiers. Based on the results of a five-fold cross-validation experiment, the inclusion of various environmental data significantly enhances the classification performance of satellite imagery. Specifically, the test accuracy increased from 0.880 to 0.903 when using the deep learning multimodal fusion technique. Furthermore, when combining deep learning and machine learning for multimodal data processing, the test accuracy improved even further, reaching 0.978. Notably, weather conditions, as part of the environmental data, play a crucial role in enhancing visibility prediction performance. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Figure 1

16 pages, 858 KiB

Open AccessArticle

Periodic Transformer Encoder for Multi-Horizon Travel Time Prediction

by Hui-Ting Christine Lin and Vincent S. Tseng

Electronics 2024, 13(11), 2094; https://doi.org/10.3390/electronics13112094 - 28 May 2024

Cited by 1 | Viewed by 1604

Abstract

In the domain of Intelligent Transportation Systems (ITS), ensuring reliable travel time predictions is crucial for enhancing the efficiency of transportation management systems and supporting long-term planning. Recent advancements in deep learning have demonstrated the ability to effectively leverage large datasets for accurate travel time predictions. These innovations are particularly vital as they address both short-term and long-term travel demands, which are essential for effective traffic management and scheduled routing planning. Despite advances in deep learning applications for traffic analysis, the dynamic nature of traffic patterns frequently challenges the forecasting capabilities of existing models, especially when forecasting both immediate and future traffic conditions across various time horizons. Additionally, the area of long-term travel time forecasting still remains not fully explored in current research due to these complexities. In response to these challenges, this study introduces the Periodic Transformer Encoder (PTE). PTE is a Transformer-based model designed to enhance traffic time predictions by effectively capturing temporal dependencies across various horizons. Utilizing attention mechanisms, PTE learns from long-range periodic traffic data for handling both short-term and long-term fluctuations. Furthermore, PTE employs a streamlined encoder-only architecture that eliminates the need for a traditional decoder, thus significantly simplifying the model’s structure and reducing its computational demands. This architecture enhances both the training efficiency and the performance of direct travel time predictions. With these enhancements, PTE effectively tackles the challenges presented by dynamic traffic patterns, significantly improving prediction performance across multiple time horizons. Comprehensive evaluations on an extensive real-world traffic dataset demonstrate PTE’s superior performance in predicting travel times over multiple horizons compared to existing methods. PTE is notably effective in adapting to high-variability road segments and peak traffic hours. These results prove PTE’s effectiveness and robustness across diverse traffic environments, indicating its significant contribution to advancing traffic prediction capabilities within ITS. Full article

(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)

► Show Figures

Journal Menu

Journal Browser

Data-Centric Artificial Intelligence: New Methods for Data Processing

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI