Data-Centric Artificial Intelligence: New Methods for Data Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 October 2025 | Viewed by 12298

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Kazimierz Wielki University, 85-064 Bydgoszcz, Poland
Interests: bee algorithms; fuzzy logic; artificial neural networks and their applications; language models; generative AI

E-Mail Website
Guest Editor
School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan
Interests: intelligent software; smart learning; cloud robotics; programming environment; visual languages
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data-centric artificial intelligence is developing rapidly thanks to advances in machine learning, natural language processing, and data visualization. These modern AI techniques enable us to better understand and process huge data sets, and provide companies and scientists with tools for extracting hidden patterns, discovering new knowledge, and automating complex analytical processes. In this Special Issue, we present application examples of these AI methods in solving real-world business and scientific problems.

We invite you to submit papers for this Special Issue dedicated to data-centric artificial intelligence, focusing on the following:

  1. New methods and techniques for processing large data sets;
  2. Topics related to machine learning, natural language processing, and data visualization;
  3. Practical applications of these methods in various fields.

This publication will supplement the existing literature by focusing on the latest trends and solutions in this area.

Dr. Dawid Ewald
Dr. Yutaka Watanobe
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • machine learning
  • data processing
  • data visualization
  • natural language processing
  • fuzzy logic

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

36 pages, 3107 KiB  
Article
Estimating Calibrated Risks Using Focal Loss and Gradient-Boosted Trees for Clinical Risk Prediction
by Henry Johnston, Nandini Nair and Dongping Du
Electronics 2025, 14(9), 1838; https://doi.org/10.3390/electronics14091838 - 30 Apr 2025
Abstract
Probability calibration and decision threshold selection are fundamental aspects of risk prediction and classification, respectively. A strictly proper loss function is used in clinical risk prediction applications to encourage a model to predict calibrated class-posterior probabilities or risks. Recent studies have shown that [...] Read more.
Probability calibration and decision threshold selection are fundamental aspects of risk prediction and classification, respectively. A strictly proper loss function is used in clinical risk prediction applications to encourage a model to predict calibrated class-posterior probabilities or risks. Recent studies have shown that training with focal loss can improve the discriminatory power of gradient-boosted decision trees (GBDT) for classification tasks with an imbalanced or skewed class distribution. However, the focal loss function is not a strictly proper loss function. Therefore, the output of GBDT trained using focal loss is not an accurate estimate of the true class-posterior probability. This study aims to address the issue of poor calibration of GBDT trained using focal loss in the context of clinical risk prediction applications. The methodology utilizes a closed-form transformation of the confidence scores of GBDT trained with focal loss to estimate calibrated risks. The closed-form transformation relates the focal loss minimizer and the true-class posterior probability. Algorithms based on Bayesian hyperparameter optimization are provided to choose the focal loss parameter that optimizes discriminatory power and calibration, as measured by the Brier score metric. We assess how the calibration of the confidence scores affects the selection of a decision threshold to optimize the balanced accuracy, defined as the arithmetic mean of sensitivity and specificity. The effectiveness of the proposed strategy was evaluated using lung transplant data extracted from the Scientific Registry of Transplant Recipients (SRTR) for predicting post-transplant cancer. The proposed strategy was also evaluated using data from the Behavioral Risk Factor Surveillance System (BRFSS) for predicting diabetes status. Probability calibration plots, calibration slope and intercept, and the Brier score show that the approach improves calibration while maintaining the same discriminatory power according to the area under the receiver operating characteristics curve (AUROC) and the H-measure. The calibrated focal-aware XGBoost achieved an AUROC, Brier score, and calibration slope of 0.700, 0.128, and 0.968 for predicting the 10-year cancer risk, respectively. The miscalibrated focal-aware XGBoost achieved equal AUROC but a worse Brier score and calibration slope (0.140 and 1.579). The proposed method compared favorably to the standard XGBoost trained using cross-entropy loss (AUROC of 0.755 versus 0.736 in predicting the 1-year risk of cancer). Comparable performance was observed with other risk prediction models in the diabetes prediction task. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

39 pages, 1360 KiB  
Article
Real-Time Monitoring of LTL Properties in Distributed Stream Processing Applications
by Loay Aladib, Guoxin Su and Jack Yang
Electronics 2025, 14(7), 1448; https://doi.org/10.3390/electronics14071448 - 3 Apr 2025
Viewed by 391
Abstract
Stream processing frameworks have become key enablers of real-time data processing in modern distributed systems. However, robust and scalable mechanisms for verifying temporal properties are often lacking in existing systems. To address this gap, a new runtime verification framework is proposed that integrates [...] Read more.
Stream processing frameworks have become key enablers of real-time data processing in modern distributed systems. However, robust and scalable mechanisms for verifying temporal properties are often lacking in existing systems. To address this gap, a new runtime verification framework is proposed that integrates linear temporal logic (LTL) monitoring into stream processing applications, such as Apache Spark. The approach introduces reusable LTL monitoring patterns designed for seamless integration into existing streaming workflows. Our case study, applied to real-time financial data monitoring, demonstrates that LTL-based monitoring can effectively detect violations of safety and liveness properties while maintaining stable latency. A performance evaluation reveals that although the approach introduces computational overhead, it scales effectively with increasing data volume. The proposed framework extends beyond financial data processing and is applicable to domains such as real-time equipment failure detection, financial fraud monitoring, and industrial IoT analytics. These findings demonstrate the feasibility of real-time LTL monitoring in large-scale stream processing environments while highlighting trade-offs between verification accuracy, scalability, and system overhead. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

23 pages, 1436 KiB  
Article
Forecasting Corporate Financial Performance Using Deep Learning with Environmental, Social, and Governance Data
by Wan-Lu Hsu, Ying-Lei Lin, Jung-Pin Lai, Yu-Hui Liu and Ping-Feng Pai
Electronics 2025, 14(3), 417; https://doi.org/10.3390/electronics14030417 - 21 Jan 2025
Viewed by 2500
Abstract
In recent years, extensive research has focused on the relationship between corporate social responsibility (CSR) and financial performance. While past studies have explored this connection, they often faced challenges in quantitatively assessing the effectiveness of CSR initiatives. However, advancements in research methodologies and [...] Read more.
In recent years, extensive research has focused on the relationship between corporate social responsibility (CSR) and financial performance. While past studies have explored this connection, they often faced challenges in quantitatively assessing the effectiveness of CSR initiatives. However, advancements in research methodologies and the development of Environmental, Social, and Governance (ESG) measurement dimensions have led to the creation of more robust evaluation criteria. These criteria use ESG scores as primary reference indicators for assessing the effectiveness of CSR activities. This study aims to utilize ESG indicators from the ESG InfoHub website of the Taiwan Stock Exchange Corporation (TSEC) as benchmarks, comprising 15 items from the environmental (E), social (S), and governance (G) dimensions to form the CSR effectiveness indicators and predict financial performance. The data cover the years 2021–2022 for listed companies, using return on assets (ROA) and return on equity (ROE) as measures of financial performance. With the rapid development of artificial intelligence in recent years, the applications of machine learning and deep learning (DL) have proliferated across many fields. However, the use of machine learning to analyze ESG data remains rare. Therefore, this study employs machine learning models to predict financial performance based on ESG performance, utilizing both classification and regression approaches. Numerical results indicate that two deep learning models, Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), outperform other models in regression and classification tasks, respectively. Consequently, deep learning techniques prove to be feasible, effective, and efficient alternatives for predicting corporations’ financial performance based on ESG metrics. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

26 pages, 2692 KiB  
Article
Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing
by Vishnu S. Pendyala, Karnavee Kamdar and Kapil Mulchandani
Electronics 2025, 14(2), 256; https://doi.org/10.3390/electronics14020256 - 9 Jan 2025
Viewed by 2040
Abstract
Research expands the boundaries of a subject, economy, and civilization. Peer review is at the heart of research and is understandably an expensive process. This work, with human-in-the-loop, aims to support the research community in multiple ways. It predicts quality, and acceptance, and [...] Read more.
Research expands the boundaries of a subject, economy, and civilization. Peer review is at the heart of research and is understandably an expensive process. This work, with human-in-the-loop, aims to support the research community in multiple ways. It predicts quality, and acceptance, and recommends reviewers. It helps the authors and editors to evaluate research work using machine learning models developed based on a dataset comprising 18,000+ research papers, some of which are from highly acclaimed, top conferences in Artificial Intelligence such as NeurIPS and ICLR, their reviews, aspect scores, and accept/reject decisions. Using machine learning algorithms such as Support Vector Machines, Deep Learning Recurrent Neural Network architectures such as LSTM, a wide variety of pre-trained word vectors using Word2Vec, GloVe, FastText, transformer architecture-based BERT, DistilBERT, Google’s Large Language Model (LLM), PaLM 2, and TF-IDF vectorizer, a comprehensive system is built. For the system to be readily usable and to facilitate future enhancements, a frontend, a Flask server in the cloud, and a NOSQL database at the backend are implemented, making it a complete system. The work is novel in using a unique blend of tools and techniques to address most aspects of building a system to support the peer review process. The experiments result in a 86% test accuracy on acceptance prediction using DistilBERT. Results from other models are comparable, with PaLM-based LLM embeddings achieving 84% accuracy. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

26 pages, 2128 KiB  
Article
Gross Domestic Product Forecasting: Harnessing Machine Learning for Accurate Economic Predictions in a Univariate Setting
by Bogdan Oancea and Mihaela Simionescu
Electronics 2024, 13(24), 4918; https://doi.org/10.3390/electronics13244918 - 13 Dec 2024
Viewed by 1845
Abstract
In recent years, precise economic forecasting has primarily relied on econometric models, which often assume linearity and stationarity in time series data. However, the nonlinear and dynamic nature of economic data calls for more innovative approaches. Machine learning (ML) techniques offer significant advantages [...] Read more.
In recent years, precise economic forecasting has primarily relied on econometric models, which often assume linearity and stationarity in time series data. However, the nonlinear and dynamic nature of economic data calls for more innovative approaches. Machine learning (ML) techniques offer significant advantages over traditional methods by capturing complex, nonlinear patterns without predefined specifications. This study investigates the effectiveness of Long Short-Term Memory (LSTM) networks for forecasting Gross Domestic Product (GDP) in a univariate setting using quarterly Romanian GDP data spanning from 1995 to 2023. The dataset encompasses significant economic events, including the 2008 financial crisis and the COVID-19 pandemic, highlighting its relevance for broader economic forecasting applications. While the univariate approach simplifies model development, it also limits the incorporation of additional economic indicators, potentially affecting generalizability. Furthermore, computational challenges, such as time-intensive hyperparameter tuning, emerged during model optimization. We implemented LSTM networks with input data based on four and six lags to predict GDP and compared their performance with Seasonal Autoregressive Integrated Moving Average (SARIMA), a classical econometric method. Our results reveal that LSTM networks consistently outperformed SARIMA in predictive accuracy, demonstrating their robustness in capturing economic trends. These findings underscore the potential of ML in enhancing economic forecasting methodologies. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

16 pages, 2237 KiB  
Article
Improving Process Control Through Decision Tree-Based Pattern Recognition
by Izabela Rojek, Agnieszka Kujawińska, Robert Burduk and Dariusz Mikołajewski
Electronics 2024, 13(23), 4823; https://doi.org/10.3390/electronics13234823 - 6 Dec 2024
Viewed by 948
Abstract
This paper explores the integration of decision tree classifiers in the assessment of machining process stability using control charts. The inherent variability in manufacturing processes requires a robust system for the early detection and correction of disturbances, which has traditionally relied on operators’ [...] Read more.
This paper explores the integration of decision tree classifiers in the assessment of machining process stability using control charts. The inherent variability in manufacturing processes requires a robust system for the early detection and correction of disturbances, which has traditionally relied on operators’ experience. Using decision trees, this study presents an automated approach to pattern recognition on control charts that outperforms the accuracy of human operators and neural networks. Experimental research conducted on two datasets from surface finishing processes demonstrates that decision trees can achieve perfect classification under optimal parameters. The results suggest that decision trees offer a transparent and effective tool for quality control, capable of reducing human error, improving decision making, and fostering greater confidence among company employees. These results open up new possibilities for the automation and continuous improvement of machining process control. The contribution of this research to Industry 4.0 is to enable the real-time, data-driven monitoring of machining process stability through decision tree-based pattern recognition, which improves predictive maintenance and quality control. It supports the transition to intelligent manufacturing, where process anomalies are detected and resolved dynamically, reducing downtime and increasing productivity. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

19 pages, 3109 KiB  
Article
Text Command Intelligent Understanding for Cybersecurity Testing
by Junkai Yi, Yuan Liu, Zhongbai Jiang and Zhen Liu
Electronics 2024, 13(21), 4330; https://doi.org/10.3390/electronics13214330 - 4 Nov 2024
Viewed by 1000
Abstract
Research on named entity recognition (NER) and command-line generation for network security evaluation tools is relatively scarce, and no mature models for recognition or generation have been developed thus far. Therefore, in this study, the aim is to build a specialized corpus for [...] Read more.
Research on named entity recognition (NER) and command-line generation for network security evaluation tools is relatively scarce, and no mature models for recognition or generation have been developed thus far. Therefore, in this study, the aim is to build a specialized corpus for network security evaluation tools by combining knowledge graphs and information entropy for automatic entity annotation. Additionally, a novel NER approach based on the KG-BERT-BiLSTM-CRF model is proposed. Compared to the traditional BERT-BiLSTM model, the KG-BERT-BiLSTM-CRF model demonstrates superior performance when applied to the specialized corpus of network security evaluation tools. The graph attention network (GAT) component effectively extracts relevant sequential content from datasets in the network security evaluation domain. The fusion layer then concatenates the feature sequences from the GAT and BiLSTM layers, enhancing the training process. Upon successful NER execution, in this study, the identified entities are mapped to pre-established command-line data for network security evaluation tools, achieving automatic conversion from textual content to evaluation commands. This process not only improves the efficiency and accuracy of command generation but also provides practical value for the development and optimization of network security evaluation tools. This approach enables the more precise automatic generation of evaluation commands tailored to specific security threats, thereby enhancing the timeliness and effectiveness of cybersecurity defenses. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

13 pages, 5922 KiB  
Article
Evaluating Multimodal Techniques for Predicting Visibility in the Atmosphere Using Satellite Images and Environmental Data
by Hui-Yu Tsai and Ming-Hseng Tseng
Electronics 2024, 13(13), 2585; https://doi.org/10.3390/electronics13132585 - 1 Jul 2024
Cited by 1 | Viewed by 1317
Abstract
Visibility is a measure of the atmospheric transparency at an observation point, expressed as the maximum horizontal distance over which a person can see and identify objects. Low atmospheric visibility often occurs in conjunction with air pollution, posing hazards to both traffic safety [...] Read more.
Visibility is a measure of the atmospheric transparency at an observation point, expressed as the maximum horizontal distance over which a person can see and identify objects. Low atmospheric visibility often occurs in conjunction with air pollution, posing hazards to both traffic safety and human health. In this study, we combined satellite remote sensing images with environmental data to explore the classification performance of two distinct multimodal data processing techniques. The first approach involves developing four multimodal data classification models using deep learning. The second approach integrates deep learning and machine learning to create twelve multimodal data classifiers. Based on the results of a five-fold cross-validation experiment, the inclusion of various environmental data significantly enhances the classification performance of satellite imagery. Specifically, the test accuracy increased from 0.880 to 0.903 when using the deep learning multimodal fusion technique. Furthermore, when combining deep learning and machine learning for multimodal data processing, the test accuracy improved even further, reaching 0.978. Notably, weather conditions, as part of the environmental data, play a crucial role in enhancing visibility prediction performance. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

16 pages, 858 KiB  
Article
Periodic Transformer Encoder for Multi-Horizon Travel Time Prediction
by Hui-Ting Christine Lin and Vincent S. Tseng
Electronics 2024, 13(11), 2094; https://doi.org/10.3390/electronics13112094 - 28 May 2024
Cited by 1 | Viewed by 1292
Abstract
In the domain of Intelligent Transportation Systems (ITS), ensuring reliable travel time predictions is crucial for enhancing the efficiency of transportation management systems and supporting long-term planning. Recent advancements in deep learning have demonstrated the ability to effectively leverage large datasets for accurate [...] Read more.
In the domain of Intelligent Transportation Systems (ITS), ensuring reliable travel time predictions is crucial for enhancing the efficiency of transportation management systems and supporting long-term planning. Recent advancements in deep learning have demonstrated the ability to effectively leverage large datasets for accurate travel time predictions. These innovations are particularly vital as they address both short-term and long-term travel demands, which are essential for effective traffic management and scheduled routing planning. Despite advances in deep learning applications for traffic analysis, the dynamic nature of traffic patterns frequently challenges the forecasting capabilities of existing models, especially when forecasting both immediate and future traffic conditions across various time horizons. Additionally, the area of long-term travel time forecasting still remains not fully explored in current research due to these complexities. In response to these challenges, this study introduces the Periodic Transformer Encoder (PTE). PTE is a Transformer-based model designed to enhance traffic time predictions by effectively capturing temporal dependencies across various horizons. Utilizing attention mechanisms, PTE learns from long-range periodic traffic data for handling both short-term and long-term fluctuations. Furthermore, PTE employs a streamlined encoder-only architecture that eliminates the need for a traditional decoder, thus significantly simplifying the model’s structure and reducing its computational demands. This architecture enhances both the training efficiency and the performance of direct travel time predictions. With these enhancements, PTE effectively tackles the challenges presented by dynamic traffic patterns, significantly improving prediction performance across multiple time horizons. Comprehensive evaluations on an extensive real-world traffic dataset demonstrate PTE’s superior performance in predicting travel times over multiple horizons compared to existing methods. PTE is notably effective in adapting to high-variability road segments and peak traffic hours. These results prove PTE’s effectiveness and robustness across diverse traffic environments, indicating its significant contribution to advancing traffic prediction capabilities within ITS. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

Back to TopTop