Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (82)

Search Parameters:
Keywords = big data pipelines

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 2667 KB  
Article
Deterministic Data Governance in Hybrid Financial Architectures
by Sergiu-Alexandru Ionescu, Vlad Diaconita, Andreea-Oana Radu, Laurentiu Gabriel Dinca and Ioana Nagit
Electronics 2026, 15(8), 1716; https://doi.org/10.3390/electronics15081716 (registering DOI) - 18 Apr 2026
Abstract
Today, financial institutions’ architecture does not rely on one single technology. Instead, it uses a multi-technology approach in order to cover modern requirements and, at the same time, remain relevant. It integrates technologies such as relational databases, Big Data for analysis, and Cloud [...] Read more.
Today, financial institutions’ architecture does not rely on one single technology. Instead, it uses a multi-technology approach in order to cover modern requirements and, at the same time, remain relevant. It integrates technologies such as relational databases, Big Data for analysis, and Cloud environments for distributed capacities within a complex data architecture. At the same time, due to European data governance regulations, governance mechanisms such as encryption, pseudonymization, and incremental versioning must be applied on each architectural layer in order to comply with strict European governance rules. In this study, the impact of data governance is assessed by applying these mechanisms from the data-ingestion level, using diverse data types such as structured, semi-structured, and unstructured data, across relational databases, Big Data analysis, and Cloud distributed systems. In doing so, metrics such as execution time, CPU, and memory usage are assessed in order to properly evaluate the impact of governance mechanisms on financial systems. The results show that governance can be successfully integrated, provided these mechanisms are embedded at the architectural level, ensuring that performance, scalability, and compliance are maintained across the entire processing pipeline. Full article
26 pages, 1707 KB  
Article
Axiom Generation for Automated Ontology Construction from Texts Through Schema Mapping
by Tsitsi Zengeya, Jean Vincent Fonou-Dombeu and Mandlenkosi Gwetu
Mach. Learn. Knowl. Extr. 2026, 8(2), 29; https://doi.org/10.3390/make8020029 - 26 Jan 2026
Viewed by 1020
Abstract
Ontology learning from unstructured text has become a critical task for knowledge-driven applications in Big Data and Artificial Intelligence. While significant advances have been made in the automatic extraction of concepts and relations using neural and Transformer-based models, the generation of formal Description [...] Read more.
Ontology learning from unstructured text has become a critical task for knowledge-driven applications in Big Data and Artificial Intelligence. While significant advances have been made in the automatic extraction of concepts and relations using neural and Transformer-based models, the generation of formal Description Logic axioms required for constructing logically consistent and computationally tractable ontologies remains largely underexplored. This paper puts forward a novel pipeline for automated axiom generation through schema mapping. Our paper introduces three key innovations: a deterministic mapping framework that guarantees logical consistency (unlike stochastic Large Language Models); guaranteed formal consistency verified by OWL reasoners (unaddressed by prior statistical methods); and a transparent, scalable bridge from neural extractions to symbolic logic, eliminating manual post-processing. Technically, the pipeline builds upon the outputs of a Transformer-based fusion model for joint concept and relation extraction. We then map lexical relational phrases to formal ontological properties through a lemmatization-based schema alignment step. Entity typing and hierarchical induction are then employed to infer class structures, as well as domain and range constraints. Using RDFLib and structured data processing, we transform the extracted triples into both assertional (ABox) and terminological (TBox) axioms expressed in Description Logic. Experimental evaluation on benchmark datasets (Conll04 and NYT) demonstrates the efficacy of the approach, with expert validation showing high acceptance rates (>95%) and reasoners confirming zero inconsistencies. The pipeline thus establishes a reliable, scalable foundation for automated ontology learning, advancing the field from extraction to formally verifiable knowledge base construction. Full article
(This article belongs to the Section Data)
Show Figures

Figure 1

39 pages, 30009 KB  
Article
A Case Study on DNN-Based Surface Roughness QA Analysis of Hollow Metal AM Fabricated Parts in a DT-Enabled CW-GTAW Robotic Manufacturing Cell
by João Vítor A. Cabral, Alberto J. Alvares, Antonio Carlos da C. Facciolli and Guilherme C. de Carvalho
Sensors 2026, 26(1), 4; https://doi.org/10.3390/s26010004 - 19 Dec 2025
Viewed by 834
Abstract
In the context of Industry 4.0, new methods of manufacturing, monitoring, and data generation related to industrial processes have emerged. Over the last decade, a new method of part manufacturing that has been revolutionizing the industry is Additive Manufacturing, which comes in various [...] Read more.
In the context of Industry 4.0, new methods of manufacturing, monitoring, and data generation related to industrial processes have emerged. Over the last decade, a new method of part manufacturing that has been revolutionizing the industry is Additive Manufacturing, which comes in various forms, including the more traditional Fusion Deposition Modeling (FDM) and the more innovative ones, such as Laser Metal Deposition (LMD) and Wire Arc Additive Manufacturing (WAAM). New technologies related to monitoring these processes are also emerging, such as Cyber-Physical Systems (CPSs) or Digital Twins (DTs), which can be used to enable Artificial Intelligence (AI)-powered analysis of generated big data. However, few works have dealt with a comprehensive data analysis, based on Digital Twin systems, to study quality levels of manufactured parts using 3D models. With this background in mind, this current project uses a Digital Twin-enabled dataflow to constitute a basis for a proposed data analysis pipeline. The pipeline consists of analyzing metal AM-manufactured parts’ surface roughness quality levels by the application of a Deep Neural Network (DNN) analytical model and enabling the assessment and tuning of deposition parameters by comparing AM-built models’ 3D representation, obtained by photogrammetry scanning, with the positional data acquired during the deposition process and stored in a cloud database. Stored and analyzed data may be further used to refine the manufacturing of parts, calibration of sensors and refining of the DT model. Also, this work presents a comprehensive study on experiments carried out using the CW-GTAW (Cold Wire Gas Tungsten Arc Welding) process as the means of depositing metal, resulting in hollow parts whose geometries were evaluated by means of both 3D scanned data, obtained via photogrammetry, and positional/deposition process parameters obtained from the Digital Twin architecture pipeline. Finally, an adapted PointNet DNN model was used to evaluate surface roughness quality levels of point clouds into 3 classes (good, fair, and poor), obtaining an overall accuracy of 75.64% on the evaluation of real deposited metal parts. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

16 pages, 2354 KB  
Article
MTBseq-nf: Enabling Scalable Tuberculosis Genomics “Big Data” Analysis Through a User-Friendly Nextflow Wrapper for MTBseq Pipeline
by Abhinav Sharma, Davi Josué Marcon, Johannes Loubser, Karla Valéria Batista Lima, Gian van der Spuy and Emilyn Costa Conceição
Microorganisms 2025, 13(12), 2685; https://doi.org/10.3390/microorganisms13122685 - 25 Nov 2025
Cited by 1 | Viewed by 842
Abstract
The MTBseq pipeline, published in 2018, was designed to address bioinformatics challenges in tuberculosis (TB) research using whole-genome sequencing (WGS) data. It was the first publicly available tool on GitHub to perform full analysis of WGS data for Mycobacterium tuberculosis complex (MTBC) encompassing [...] Read more.
The MTBseq pipeline, published in 2018, was designed to address bioinformatics challenges in tuberculosis (TB) research using whole-genome sequencing (WGS) data. It was the first publicly available tool on GitHub to perform full analysis of WGS data for Mycobacterium tuberculosis complex (MTBC) encompassing quality control through mapping, variant calling for lineage classification, drug resistance prediction, and phylogenetic inference. However, the pipeline’s architecture is not optimal for analyses on high-performance computing or cloud computing environments that often involve large datasets. To overcome this limitation, we developed MTBseq-nf, a Nextflow wrapper that provides parallelization for faster execution speeds in addition to several other significant enhancements. The MTBseq-nf wrapper can run several instances of the same step in parallel, fully utilizing the available resources, unlike the linear, batched analysis of samples in the TBfull step of the MTBseq pipeline. For evaluation of scalability and reproducibility, we used 90 M. tuberculosis genomes (European Nucleotide Archive—ENA accession PRJEB7727) for the benchmarking analysis on a dedicated computational server. In our benchmarks, MTBseq-nf in its parallel mode is at least twice as fast as the standard MTBseq pipeline for cohorts exceeding 20 samples. Through integration with the best practices of nf-core, Bioconda, and Biocontainers projects MTBseq-nf ensures reproducibility and platform independence, providing a scalable and efficient solution for TB genomic surveillance. Full article
(This article belongs to the Special Issue Mycobacterial Research)
Show Figures

Figure 1

26 pages, 2602 KB  
Article
A Big Data Pipeline Approach for Predicting Real-Time Pandemic Hospitalization Risk
by Vishnu S. Pendyala, Mayank Kapadia, Basanth Periyapatnaroopakumar, Manav Anandani and Nischitha Nagendran
Algorithms 2025, 18(12), 730; https://doi.org/10.3390/a18120730 - 21 Nov 2025
Viewed by 878
Abstract
Pandemics emphasize the importance of real-time, interpretable clinical decision-support systems for identifying high-risk patients and assisting with prompt triage, particularly in data-intensive healthcare systems. This paper describes a novel dual big-data pipeline that includes (i) a streaming module for real-time epidemiological hospitalization risk [...] Read more.
Pandemics emphasize the importance of real-time, interpretable clinical decision-support systems for identifying high-risk patients and assisting with prompt triage, particularly in data-intensive healthcare systems. This paper describes a novel dual big-data pipeline that includes (i) a streaming module for real-time epidemiological hospitalization risk prediction and (ii) a supplementary imaging-based detection and reasoning module for chest X-rays, with COVID-19 as an example. The first pipeline uses state-of-the-art machine learning algorithms to estimate patient-level hospitalization risk based on data from the Centers for Disease Control and Prevention’s (CDC) COVID-19 Case Surveillance dataset. A Bloom filter accelerated triage by constant-time pre-screening of high-risk profiles. Specifically, after significant experimentation and optimization, one of the models, XGBoost, was selected because it achieved the best minority-class F1-score (0.76) and recall (0.80), outperforming baseline models. Synthetic data generation was employed to mimic streaming workloads, including a strategy that used the Conditional Tabular Generative Adversarial Network (CTGAN) to produce the best balanced and realistic distributions. The second pipeline focuses on diagnostic imaging and combines an advanced convolutional neural network, EfficientNet-B0, with Grad-CAM visual explanations, achieving 99.5% internal and 99.3% external accuracy. A lightweight Generative Pre-trained Transformer (GPT)-based reasoning layer converts model predictions into auditable triage comments (ALERT/FLAG/LOG), yielding traceable and interpretable decision logs. This scalable, explainable, and near-real-time framework provides a foundation for future multimodal and genomic advancements in public health readiness. Full article
Show Figures

Figure 1

72 pages, 1461 KB  
Systematic Review
LLMs for Cybersecurity in the Big Data Era: A Comprehensive Review of Applications, Challenges, and Future Directions
by Aristeidis Karras, Leonidas Theodorakopoulos, Christos Karras, Alexandra Theodoropoulou, Ioanna Kalliampakou and Gerasimos Kalogeratos
Information 2025, 16(11), 957; https://doi.org/10.3390/info16110957 - 4 Nov 2025
Cited by 7 | Viewed by 7191
Abstract
This paper presents a systematic review of research (2020–2025) on the role of Large Language Models (LLMs) in cybersecurity, with emphasis on their integration into Big Data infrastructures. Based on a curated corpus of 235 peer-reviewed studies, this review synthesizes evidence across multiple [...] Read more.
This paper presents a systematic review of research (2020–2025) on the role of Large Language Models (LLMs) in cybersecurity, with emphasis on their integration into Big Data infrastructures. Based on a curated corpus of 235 peer-reviewed studies, this review synthesizes evidence across multiple domains to evaluate how models such as GPT-4, BERT, and domain-specific variants support threat detection, incident response, vulnerability assessment, and cyber threat intelligence. The findings confirm that LLMs, particularly when coupled with scalable Big Data pipelines, improve detection accuracy and reduce response latency compared with traditional approaches. However, challenges persist, including adversarial susceptibility, risks of data leakage, computational overhead, and limited transparency. The contribution of this study lies in consolidating fragmented research into a unified taxonomy, identifying sector-specific gaps, and outlining future research priorities: enhancing robustness, mitigating bias, advancing explainability, developing domain-specific models, and optimizing distributed integration. In doing so, this review provides a structured foundation for both academic inquiry and practical adoption of LLM-enabled cyberdefense strategies. Last search: 30 April 2025; methods followed: PRISMA-2020; risk of bias was assessed; random-effects syntheses were conducted. Full article
(This article belongs to the Special Issue IoT, AI, and Blockchain: Applications, Security, and Perspectives)
Show Figures

Graphical abstract

30 pages, 4273 KB  
Article
Scalable Predictive Modeling for Hospitalization Prioritization: A Hybrid Batch–Streaming Approach
by Nisrine Berros, Youness Filaly, Fatna El Mendili and Younes El Bouzekri El Idrissi
Big Data Cogn. Comput. 2025, 9(11), 271; https://doi.org/10.3390/bdcc9110271 - 25 Oct 2025
Cited by 1 | Viewed by 1281
Abstract
Healthcare systems worldwide have faced unprecedented pressure during crises such as the COVID-19 pandemic, exposing limits in managing scarce hospital resources. Many predictive models remain static, unable to adapt to new variants, shifting conditions, or diverse patient populations. This work proposes a dynamic [...] Read more.
Healthcare systems worldwide have faced unprecedented pressure during crises such as the COVID-19 pandemic, exposing limits in managing scarce hospital resources. Many predictive models remain static, unable to adapt to new variants, shifting conditions, or diverse patient populations. This work proposes a dynamic prioritization framework that recalculates severity scores in batch mode when new factors appear and applies them instantly through a streaming pipeline to incoming patients. Unlike approaches focused only on fixed mortality or severity risks, our model integrates dual datasets (survivors and non-survivors) to refine feature selection and weighting, enhancing robustness. Built on a big data infrastructure (Spark/Databricks), it ensures scalability and responsiveness, even with millions of records. Experimental results confirm the effectiveness of this architecture: The artificial neural network (ANN) achieved 98.7% accuracy, with higher precision and recall than traditional models, while random forest and logistic regression also showed strong AUC values. Additional tests, including temporal validation and real-time latency simulation, demonstrated both stability over time and feasibility for deployment in near-real-world conditions. By combining adaptability, robustness, and scalability, the proposed framework offers a methodological contribution to healthcare analytics, supporting fair and effective hospitalization prioritization during pandemics and other public health emergencies. Full article
Show Figures

Figure 1

30 pages, 2440 KB  
Article
Adaptive Segmentation and Statistical Analysis for Multivariate Big Data Forecasting
by Desmond Fomo and Aki-Hiro Sato
Big Data Cogn. Comput. 2025, 9(11), 268; https://doi.org/10.3390/bdcc9110268 - 24 Oct 2025
Cited by 1 | Viewed by 1377
Abstract
Forecasting high-volume, univariate, and multivariate longitudinal data streams is a critical challenge in Big Data systems, especially with constrained computational resources and pronounced data variability. However, existing approaches often neglect multivariate statistical complexity (e.g., covariance, skewness, kurtosis) of multivariate time series or rely [...] Read more.
Forecasting high-volume, univariate, and multivariate longitudinal data streams is a critical challenge in Big Data systems, especially with constrained computational resources and pronounced data variability. However, existing approaches often neglect multivariate statistical complexity (e.g., covariance, skewness, kurtosis) of multivariate time series or rely on recency-only windowing that discards informative historical fluctuation patterns, limiting robustness under strict resource budgets. This work makes two core contributions to big data forecasting. First, we establish a formal, multi-dimensional framework for quantifying “data bigness” across statistical, computational, and algorithmic complexities, providing a rigorous foundation for analyzing resource-constrained problems. Second, guided by this framework, we extend and validate the Adaptive High-Fluctuation Recursive Segmentation (AHFRS) algorithm for multivariate time series. By incorporating higher-order statistics such as covariance, skewness, and kurtosis, AHFRS improves predictive accuracy under strict computational budgets. We validate the approach in two stages. First, a real-world case study on a univariate Bitcoin time series provides a practical stress test using a Long Short-Term Memory (LSTM) network as a robust baseline. This validation reveals a significant increase in forecasting robustness, with our method reducing the Root Mean Squared Error (RMSE) by more than 76% in a challenging scenario. Second, its generalizability is established on synthetic multivariate data sets in Finance, Retail, and Healthcare using standard statistical models. Across domains, AHFRS consistently outperforms baselines; in our multivariate Finance simulation, RMSE decreases by up to 62.5% in Finance and Mean Absolute Percentage Error (MAPE) drops by more than 10 percentage points in Healthcare. These results demonstrate that the proposed framework and AHFRS advances the theoretical modeling of data complexity and the design of adaptive, resource-efficient forecasting pipelines for real-world, high-volume data ecosystems. Full article
Show Figures

Figure 1

17 pages, 414 KB  
Article
DQMAF—Data Quality Modeling and Assessment Framework
by Razan Al-Toq and Abdulaziz Almaslukh
Information 2025, 16(10), 911; https://doi.org/10.3390/info16100911 - 17 Oct 2025
Viewed by 1897
Abstract
In today’s digital ecosystem, where millions of users interact with diverse online services and generate vast amounts of textual, transactional, and behavioral data, ensuring the trustworthiness of this information has become a critical challenge. Low-quality data—manifesting as incompleteness, inconsistency, duplication, or noise—not only [...] Read more.
In today’s digital ecosystem, where millions of users interact with diverse online services and generate vast amounts of textual, transactional, and behavioral data, ensuring the trustworthiness of this information has become a critical challenge. Low-quality data—manifesting as incompleteness, inconsistency, duplication, or noise—not only undermines analytics and machine learning models but also exposes unsuspecting users to unreliable services, compromised authentication mechanisms, and biased decision-making processes. Traditional data quality assessment methods, largely based on manual inspection or rigid rule-based validation, cannot cope with the scale, heterogeneity, and velocity of modern data streams. To address this gap, we propose DQMAF (Data Quality Modeling and Assessment Framework), a generalized machine learning–driven approach that systematically profiles, evaluates, and classifies data quality to protect end-users and enhance the reliability of Internet services. DQMAF introduces an automated profiling mechanism that measures multiple dimensions of data quality—completeness, consistency, accuracy, and structural conformity—and aggregates them into interpretable quality scores. Records are then categorized into high, medium, and low quality, enabling downstream systems to filter or adapt their behavior accordingly. A distinctive strength of DQMAF lies in integrating profiling with supervised machine learning models, producing scalable and reusable quality assessments applicable across domains such as social media, healthcare, IoT, and e-commerce. The framework incorporates modular preprocessing, feature engineering, and classification components using Decision Trees, Random Forest, XGBoost, AdaBoost, and CatBoost to balance performance and interpretability. We validate DQMAF on a publicly available Airbnb dataset, showing its effectiveness in detecting and classifying data issues with high accuracy. The results highlight its scalability and adaptability for real-world big data pipelines, supporting user protection, document and text-based classification, and proactive data governance while improving trust in analytics and AI-driven applications. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

38 pages, 913 KB  
Article
Towards the Adoption of Recommender Systems in Online Education: A Framework and Implementation
by Alex Martínez-Martínez, Águeda Gómez-Cambronero, Raul Montoliu and Inmaculada Remolar
Big Data Cogn. Comput. 2025, 9(10), 259; https://doi.org/10.3390/bdcc9100259 - 14 Oct 2025
Cited by 3 | Viewed by 3733
Abstract
The rapid expansion of online education has generated large volumes of learner interaction data, highlighting the need for intelligent systems capable of transforming this information into personalized guidance. Educational Recommender Systems (ERS) represent a key application of big data analytics and machine learning, [...] Read more.
The rapid expansion of online education has generated large volumes of learner interaction data, highlighting the need for intelligent systems capable of transforming this information into personalized guidance. Educational Recommender Systems (ERS) represent a key application of big data analytics and machine learning, offering adaptive learning pathways that respond to diverse student needs. For widespread adoption, these systems must align with pedagogical principles while ensuring transparency, interpretability, and seamless integration into Learning Management Systems (LMS). This paper introduces a comprehensive framework and implementation of an ERS designed for platforms such as Moodle. The system integrates big data processing pipelines to support scalability, real-time interaction, and multi-layered personalization, including data collection, preprocessing, recommendation generation, and retrieval. A detailed use case demonstrates its deployment in a real educational environment, underlining both technical feasibility and pedagogical value. Finally, the paper discusses challenges such as data sparsity, learner model complexity, and evaluation of effectiveness, offering directions for future research at the intersection of big data technologies and digital education. By bridging theoretical models with operational platforms, this work contributes to sustainable and data-driven personalization in online learning ecosystems. Full article
Show Figures

Figure 1

23 pages, 1714 KB  
Article
Harnessing Digital Marketing Analytics for Knowledge-Driven Digital Transformation in the Hospitality Industry
by Dimitrios P. Reklitis, Marina C. Terzi, Damianos P. Sakas and Panagiotis Reklitis
Information 2025, 16(10), 868; https://doi.org/10.3390/info16100868 - 7 Oct 2025
Cited by 3 | Viewed by 3167
Abstract
In the digitally saturated hospitality environment, research on digital transformation remains dominated by macro-level adoption trends and user-generated content, while the potential of micro-level web-behavioural data remains largely untapped. Recent systematic reviews highlight a fragmented body of literature and note that hospitality studies [...] Read more.
In the digitally saturated hospitality environment, research on digital transformation remains dominated by macro-level adoption trends and user-generated content, while the potential of micro-level web-behavioural data remains largely untapped. Recent systematic reviews highlight a fragmented body of literature and note that hospitality studies seldom address first-party behavioural data or big-data analytics capabilities. To address this gap, we collected clickstream, navigation and booking-funnel data from five luxury hotels in the Mediterranean and employed big-data analytics integrated with simulation modelling—specifically fuzzy cognitive mapping (FCM)—to model causal relationships among digital touchpoints, managerial actions and customer outcomes. FCM is a robust simulation tool that captures stakeholder knowledge and causal influences across complex systems. Using a case-study methodology, we show that first-party behavioural data enable real-time insights, support knowledge-based decision-making and drive digital service innovation. Across a 12-month panel, visitor volume was strongly associated with search traffic and social traffic, with the total-visitors model explaining 99.8% of variance. Our findings extend digital-transformation models by embedding micro-level behavioural data flows and simulation modelling. Practically, this study offers a replicable framework that helps managers integrate web-analytics into decision-making and customer-centric innovation. Overall, embedding micro-level web-behavioural analytics within an FCM framework yields a decision-ready, replicable pipeline that translates behavioural evidence into high-leverage managerial interventions. Full article
(This article belongs to the Special Issue Emerging Research in Knowledge Management and Innovation)
Show Figures

Figure 1

72 pages, 22031 KB  
Article
AI-Enabled Sustainable Manufacturing: Intelligent Package Integrity Monitoring for Waste Reduction in Supply Chains
by Mohammad Shahin, Ali Hosseinzadeh and F. Frank Chen
Electronics 2025, 14(14), 2824; https://doi.org/10.3390/electronics14142824 - 14 Jul 2025
Cited by 7 | Viewed by 2976
Abstract
Despite advances in automation, the global manufacturing sector continues to rely heavily on manual package inspection, creating bottlenecks in production and increasing labor demands. Although disruptive technologies such as big data analytics, smart sensors, and machine learning have revolutionized industrial connectivity and strategic [...] Read more.
Despite advances in automation, the global manufacturing sector continues to rely heavily on manual package inspection, creating bottlenecks in production and increasing labor demands. Although disruptive technologies such as big data analytics, smart sensors, and machine learning have revolutionized industrial connectivity and strategic decision-making, real-time quality control (QC) on conveyor lines remains predominantly analog. This study proposes an intelligent package integrity monitoring system that integrates waste reduction strategies with both narrow and Generative AI approaches. Narrow AI models were deployed to detect package damage at full line speed, aiming to minimize manual intervention and reduce waste. Using a synthetically generated dataset of 200 paired top-and-side package images, we developed and evaluated 10 distinct detection pipelines combining various algorithms, image enhancements, model architectures, and data processing strategies. Several pipeline variants demonstrated high accuracy, precision, and recall, particularly those utilizing a YOLO v8 segmentation model. Notably, targeted preprocessing increased top-view MobileNetV2 accuracy from chance to 67.5%, advanced feature extractors with full enhancements achieved 77.5%, and a segmentation-based ensemble with feature extraction and binary classification reached 92.5% accuracy. These results underscore the feasibility of deploying AI-driven, real-time QC systems for sustainable and efficient manufacturing operations. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Intelligent Manufacturing)
Show Figures

Figure 1

25 pages, 1292 KB  
Article
Screening Decommissioned Oil and Gas Pipeline Cleaners Using Big Data Analytics Methods
by Rongguang Li, Junqi Zhao, Ling Sun, Long Jin, Sixun Chen and Lihui Zheng
Energies 2025, 18(13), 3496; https://doi.org/10.3390/en18133496 - 2 Jul 2025
Viewed by 769
Abstract
Traditional methods, such as full-factorial, orthogonal, and empirical experiments, show limited accuracy and efficiency in selecting cleaning agents for decommissioned oil and gas pipelines. They also lack the ability to quantitatively analyze the impact of multiple variables. This study proposes a data-driven optimization [...] Read more.
Traditional methods, such as full-factorial, orthogonal, and empirical experiments, show limited accuracy and efficiency in selecting cleaning agents for decommissioned oil and gas pipelines. They also lack the ability to quantitatively analyze the impact of multiple variables. This study proposes a data-driven optimization approach to address these limitations. Residue samples from six regions, including Dalian and Shenyang, were analyzed for inorganic components using XRD and for organic components using GC. Citric acid was used as a model cleaning agent, and cleaning efficiency was tested under varying temperature, agitation, and contact time. Key variables showed significant correlations with cleaning performance. To further quantify the combined effects of multiple factors, multivariate regression methods such as multiple linear regression and ridge regression were employed to establish predictive models. A weighted evaluation approach was used to identify the optimal model, and a method for inverse prediction was proposed. This study shows that, compared with traditional methods, the data-driven approach improves accuracy by 3.67% and efficiency by 82.5%. By efficiently integrating and analyzing multidimensional data, this method not only enables rapid identification of optimal formulations but also uncovers the underlying relationships and combined effects among variables. It offers a novel strategy for the efficient selection and optimization of cleaning agents for decommissioned oil and gas pipelines, as well as broader chemical systems. Full article
(This article belongs to the Special Issue Enhanced Oil Recovery: Numerical Simulation and Deep Machine Learning)
Show Figures

Figure 1

31 pages, 2298 KB  
Review
Optical Fiber-Based Structural Health Monitoring: Advancements, Applications, and Integration with Artificial Intelligence for Civil and Urban Infrastructure
by Nikita V. Golovastikov, Nikolay L. Kazanskiy and Svetlana N. Khonina
Photonics 2025, 12(6), 615; https://doi.org/10.3390/photonics12060615 - 16 Jun 2025
Cited by 14 | Viewed by 8869
Abstract
Structural health monitoring (SHM) plays a vital role in ensuring the safety, durability, and performance of civil infrastructure. This review delves into the significant advancements in optical fiber sensor (OFS) technologies such as Fiber Bragg Gratings, Distributed Temperature Sensing, and Brillouin-based systems, which [...] Read more.
Structural health monitoring (SHM) plays a vital role in ensuring the safety, durability, and performance of civil infrastructure. This review delves into the significant advancements in optical fiber sensor (OFS) technologies such as Fiber Bragg Gratings, Distributed Temperature Sensing, and Brillouin-based systems, which have emerged as powerful tools for enhancing SHM capabilities. Offering high sensitivity, resistance to electromagnetic interference, and real-time distributed monitoring, these sensors present a superior alternative to conventional methods. This paper also explores the integration of OFSs with Artificial Intelligence (AI), which enables automated damage detection, intelligent data analysis, and predictive maintenance. Through case studies across key infrastructure domains, including bridges, tunnels, high-rise buildings, pipelines, and offshore structures, the review demonstrates the adaptability and scalability of these sensor systems. Moreover, the role of SHM is examined within the broader context of civil and urban infrastructure, where IoT connectivity, AI-driven analytics, and big data platforms converge to create intelligent and responsive infrastructure. While challenges remain, such as installation complexity, calibration issues, and cost, ongoing innovation in hybrid sensor networks, low-power systems, and edge computing points to a promising future. This paper offers a comprehensive amalgamation of current progress and future directions, outlining a strategic path for next-generation SHM in resilient urban environments. Full article
Show Figures

Figure 1

28 pages, 2486 KB  
Article
A Framework for Rapidly Prototyping Data Mining Pipelines
by Flavio Corradini, Luca Mozzoni, Marco Piangerelli, Barbara Re and Lorenzo Rossi
Big Data Cogn. Comput. 2025, 9(6), 150; https://doi.org/10.3390/bdcc9060150 - 5 Jun 2025
Viewed by 2860
Abstract
With the advent of Big Data, data mining techniques have become crucial for improving decision-making across diverse sectors, yet their employment demands significant resources and time. Time is critical in industrial contexts, as delays can lead to increased costs, missed opportunities, and reduced [...] Read more.
With the advent of Big Data, data mining techniques have become crucial for improving decision-making across diverse sectors, yet their employment demands significant resources and time. Time is critical in industrial contexts, as delays can lead to increased costs, missed opportunities, and reduced competitive advantage. To address this, systems for analyzing data can help prototype data mining pipelines, mitigating the risks of failure and resource wastage, especially when experimenting with novel techniques. Moreover, business experts often lack deep technical expertise and need robust support to validate their pipeline designs quickly. This paper presents Rainfall, a novel framework for rapidly prototyping data mining pipelines, developed through collaborative projects with industry. The framework’s requirements stem from a combination of literature review findings, iterative industry engagement, and analysis of existing tools. Rainfall enables the visual programming, execution, monitoring, and management of data mining pipelines, lowering the barrier for non-technical users. Pipelines are composed of configurable nodes that encapsulate functionalities from popular libraries or custom user-defined code, fostering experimentation. The framework is evaluated through a case study and SWOT analysis with INGKA, a large-scale industry partner, alongside usability testing with real users and validation against scenarios from the literature. The paper then underscores the value of industry–academia collaboration in bridging theoretical innovation with practical application. Full article
Show Figures

Graphical abstract

Back to TopTop