Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (583)

Search Parameters:
Keywords = scarce dataset

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 2560 KiB  
Article
Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions
by Aysha Alshibli and Qurban Memon
Automation 2025, 6(3), 35; https://doi.org/10.3390/automation6030035 - 2 Aug 2025
Viewed by 94
Abstract
Deep learning with unmanned aerial vehicles (UAVs) is transforming maritime search and rescue (SAR) by enabling rapid object identification in challenging marine environments. This study benchmarks the performance of YOLO models for maritime SAR under diverse weather conditions using the SeaDronesSee and AFO [...] Read more.
Deep learning with unmanned aerial vehicles (UAVs) is transforming maritime search and rescue (SAR) by enabling rapid object identification in challenging marine environments. This study benchmarks the performance of YOLO models for maritime SAR under diverse weather conditions using the SeaDronesSee and AFO datasets. The results show that while YOLOv7 achieved the highest mAP@50, it struggled with detecting small objects. In contrast, YOLOv10 and YOLOv11 deliver faster inference speeds but compromise slightly on precision. The key challenges discussed include environmental variability, sensor limitations, and scarce annotated data, which can be addressed by such techniques as attention modules and multimodal data fusion. Overall, the research results provide practical guidance for deploying efficient deep learning models in SAR, emphasizing specialized datasets and lightweight architectures for edge devices. Full article
(This article belongs to the Section Intelligent Control and Machine Learning)
Show Figures

Figure 1

18 pages, 9470 KiB  
Article
DCS-ST for Classification of Breast Cancer Histopathology Images with Limited Annotations
by Suxing Liu and Byungwon Min
Appl. Sci. 2025, 15(15), 8457; https://doi.org/10.3390/app15158457 - 30 Jul 2025
Viewed by 238
Abstract
Accurate classification of breast cancer histopathology images is critical for early diagnosis and treatment planning. Yet, conventional deep learning models face significant challenges under limited annotation scenarios due to their reliance on large-scale labeled datasets. To address this, we propose Dynamic Cross-Scale Swin [...] Read more.
Accurate classification of breast cancer histopathology images is critical for early diagnosis and treatment planning. Yet, conventional deep learning models face significant challenges under limited annotation scenarios due to their reliance on large-scale labeled datasets. To address this, we propose Dynamic Cross-Scale Swin Transformer (DCS-ST), a robust and efficient framework tailored for histopathology image classification with scarce annotations. Specifically, DCS-ST integrates a dynamic window predictor and a cross-scale attention module to enhance multi-scale feature representation and interaction while employing a semi-supervised learning strategy based on pseudo-labeling and denoising to exploit unlabeled data effectively. This design enables the model to adaptively attend to diverse tissue structures and pathological patterns while maintaining classification stability. Extensive experiments on three public datasets—BreakHis, Mini-DDSM, and ICIAR2018—demonstrate that DCS-ST consistently outperforms existing state-of-the-art methods across various magnifications and classification tasks, achieving superior quantitative results and reliable visual classification. Furthermore, empirical evaluations validate its strong generalization capability and practical potential for real-world weakly-supervised medical image analysis. Full article
Show Figures

Figure 1

42 pages, 1300 KiB  
Article
A Hybrid Human-AI Model for Enhanced Automated Vulnerability Scoring in Modern Vehicle Sensor Systems
by Mohamed Sayed Farghaly, Heba Kamal Aslan and Islam Tharwat Abdel Halim
Future Internet 2025, 17(8), 339; https://doi.org/10.3390/fi17080339 - 28 Jul 2025
Viewed by 260
Abstract
Modern vehicles are rapidly transforming into interconnected cyber–physical systems that rely on advanced sensor technologies and pervasive connectivity to support autonomous functionality. Yet, despite this evolution, standardized methods for quantifying cybersecurity vulnerabilities across critical automotive components remain scarce. This paper introduces a novel [...] Read more.
Modern vehicles are rapidly transforming into interconnected cyber–physical systems that rely on advanced sensor technologies and pervasive connectivity to support autonomous functionality. Yet, despite this evolution, standardized methods for quantifying cybersecurity vulnerabilities across critical automotive components remain scarce. This paper introduces a novel hybrid model that integrates expert-driven insights with generative AI tools to adapt and extend the Common Vulnerability Scoring System (CVSS) specifically for autonomous vehicle sensor systems. Following a three-phase methodology, the study conducted a systematic review of 16 peer-reviewed sources (2018–2024), applied CVSS version 4.0 scoring to 15 representative attack types, and evaluated four free source generative AI models—ChatGPT, DeepSeek, Gemini, and Copilot—on a dataset of 117 annotated automotive-related vulnerabilities. Expert validation from 10 domain professionals reveals that Light Detection and Ranging (LiDAR) sensors are the most vulnerable (9 distinct attack types), followed by Radio Detection And Ranging (radar) (8) and ultrasonic (6). Network-based attacks dominate (104 of 117 cases), with 92.3% of the dataset exhibiting low attack complexity and 82.9% requiring no user interaction. The most severe attack vectors, as scored by experts using CVSS, include eavesdropping (7.19), Sybil attacks (6.76), and replay attacks (6.35). Evaluation of large language models (LLMs) showed that DeepSeek achieved an F1 score of 99.07% on network-based attacks, while all models struggled with minority classes such as high complexity (e.g., ChatGPT F1 = 0%, Gemini F1 = 15.38%). The findings highlight the potential of integrating expert insight with AI efficiency to deliver more scalable and accurate vulnerability assessments for modern vehicular systems.This study offers actionable insights for vehicle manufacturers and cybersecurity practitioners, aiming to inform strategic efforts to fortify sensor integrity, optimize network resilience, and ultimately enhance the cybersecurity posture of next-generation autonomous vehicles. Full article
Show Figures

Figure 1

27 pages, 4973 KiB  
Article
LSTM-Based River Discharge Forecasting Using Spatially Gridded Input Data
by Kamilla Rakhymbek, Balgaisha Mukanova, Andrey Bondarovich, Dmitry Chernykh, Almas Alzhanov, Dauren Nurekenov, Anatoliy Pavlenko and Aliya Nugumanova
Data 2025, 10(8), 122; https://doi.org/10.3390/data10080122 - 27 Jul 2025
Viewed by 512
Abstract
Accurate river discharge forecasting remains a critical challenge in hydrology, particularly in data-scarce mountainous regions where in situ observations are limited. This study investigated the potential of long short-term memory (LSTM) networks to improve discharge prediction by leveraging spatially distributed reanalysis data. Using [...] Read more.
Accurate river discharge forecasting remains a critical challenge in hydrology, particularly in data-scarce mountainous regions where in situ observations are limited. This study investigated the potential of long short-term memory (LSTM) networks to improve discharge prediction by leveraging spatially distributed reanalysis data. Using the ERA5-Land dataset, we developed an LSTM model that integrates grid-based meteorological inputs and assesses their relative importance. We conducted experiments on two snow-dominated basins with contrasting physiographic characteristics, the Uba River basin in Kazakhstan and the Flathead River basin in the USA, to answer three research questions: (1) whether full-grid input outperforms reduced configurations and models trained on Caravan, (2) the impact of spatial resolution on accuracy and efficiency, and (3) the effect of partial spatial coverage on prediction reliability. Specifically, we compared the full-grid LSTM with a single-cell LSTM, a basin-average LSTM, a Caravan-trained LSTM, and coarser cell aggregations. The results demonstrate that the full-grid LSTM consistently yields the highest forecasting performance, achieving a median Nash–Sutcliffe efficiency of 0.905 for Uba and 0.93 for Middle Fork Flathead, while using coarser grids and random subsets reduces performance. Our findings highlight the critical importance of spatial input richness and provide a reproducible framework for grid selection in flood-prone basins lacking dense observation networks. Full article
(This article belongs to the Special Issue New Progress in Big Earth Data)
Show Figures

Figure 1

10 pages, 2331 KiB  
Article
Early-Stage Melanoma Benchmark Dataset
by Aleksandra Dzieniszewska, Piotr Garbat, Paweł Pietkiewicz and Ryszard Piramidowicz
Cancers 2025, 17(15), 2476; https://doi.org/10.3390/cancers17152476 - 26 Jul 2025
Viewed by 284
Abstract
Background: The early detection of melanoma is crucial for improving patient outcomes, as survival rates decline dramatically with disease progression. Despite significant achievements in deep learning methods for skin lesion analysis, several challenges limit their effectiveness in clinical practice. One of the key [...] Read more.
Background: The early detection of melanoma is crucial for improving patient outcomes, as survival rates decline dramatically with disease progression. Despite significant achievements in deep learning methods for skin lesion analysis, several challenges limit their effectiveness in clinical practice. One of the key issues is the lack of knowledge about the melanoma stage distribution in the training data, raising concerns about the ability of these models to detect early-stage melanoma accurately. Additionally, publicly available datasets that include detailed information on melanoma stage and tumor thickness remain scarce, restricting researchers from developing and benchmarking methods specifically tailored for early diagnosis. Another major limitation is the lack of cross-dataset evaluations. Most deep learning models are tested on the same dataset they were trained on, so they fail to assess their generalization ability when applied to unseen data. This reduces their reliability in real-world clinical settings. Methods: We introduce an early-stage melanoma benchmark dataset to address these issues, featuring images labeled according to T-category based on Breslow thickness. Results: We evaluated several state-of-the-art deep learning models on this dataset and observed a significant drop in performance compared to their results on the ISIC Challenge datasets. Conclusions: This finding highlights the models’ limited capability in detecting early-stage melanoma. This work seeks to advance the development and clinical applicability of automated melanoma diagnostic systems by providing a resource for T-category-specific analysis and supporting cross-dataset evaluation. Full article
(This article belongs to the Special Issue Image Analysis and Machine Learning in Cancers: 2nd Edition)
Show Figures

Figure 1

19 pages, 28897 KiB  
Article
MetaRes-DMT-AS: A Meta-Learning Approach for Few-Shot Fault Diagnosis in Elevator Systems
by Hongming Hu, Shengying Yang, Yulai Zhang, Jianfeng Wu, Liang He and Jingsheng Lei
Sensors 2025, 25(15), 4611; https://doi.org/10.3390/s25154611 - 25 Jul 2025
Viewed by 256
Abstract
Recent advancements in deep learning have spurred significant research interest in fault diagnosis for elevator systems. However, conventional approaches typically require substantial labeled datasets that are often impractical to obtain in real-world industrial environments. This limitation poses a fundamental challenge for developing robust [...] Read more.
Recent advancements in deep learning have spurred significant research interest in fault diagnosis for elevator systems. However, conventional approaches typically require substantial labeled datasets that are often impractical to obtain in real-world industrial environments. This limitation poses a fundamental challenge for developing robust diagnostic models capable of performing reliably under data-scarce conditions. To address this critical gap, we propose MetaRes-DMT-AS (Meta-ResNet with Dynamic Meta-Training and Adaptive Scheduling), a novel meta-learning framework for few-shot fault diagnosis. Our methodology employs Gramian Angular Fields to transform 1D raw sensor data into 2D image representations, followed by episodic task construction through stochastic sampling. During meta-training, the system acquires transferable prior knowledge through optimized parameter initialization, while an adaptive scheduling module dynamically configures support/query sets. Subsequent regularization via prototype networks ensures stable feature extraction. Comprehensive validation using the Case Western Reserve University bearing dataset and proprietary elevator acceleration data demonstrates the framework’s superiority: MetaRes-DMT-AS achieves state-of-the-art few-shot classification performance, surpassing benchmark models by 0.94–1.78% in overall accuracy. For critical few-shot fault categories—particularly emergency stops and severe vibrations—the method delivers significant accuracy improvements of 3–16% and 17–29%, respectively. Full article
(This article belongs to the Special Issue Signal Processing and Sensing Technologies for Fault Diagnosis)
Show Figures

Figure 1

26 pages, 453 KiB  
Article
Trend-Enabled Recommender System with Diversity Enhancer for Crop Recommendation
by Iulia Baraian, Rudolf Erdei, Rares Tamaian, Daniela Delinschi, Emil Marian Pasca and Oliviu Matei
Agriculture 2025, 15(15), 1614; https://doi.org/10.3390/agriculture15151614 - 25 Jul 2025
Viewed by 190
Abstract
Achieving optimal agricultural yields and promoting sustainable farming relies on accurate crop recommendations. However, the applicability of many current systems is limited by their considerable computational requirements and dependence on comprehensive datasets, especially in resource-limited contexts. This paper presents HOLISTIQ RS, a novel [...] Read more.
Achieving optimal agricultural yields and promoting sustainable farming relies on accurate crop recommendations. However, the applicability of many current systems is limited by their considerable computational requirements and dependence on comprehensive datasets, especially in resource-limited contexts. This paper presents HOLISTIQ RS, a novel crop recommendation system explicitly designed for operation on low-specification hardware and in data-scarce regions. HOLISTIQ RS combines collaborative filtering with a Markov model to predict appropriate crop choices, drawing upon user profiles, regional agricultural data, and past crop performance. Results indicate that HOLISTIQ RS provides a significant increase in recommendation accuracy, achieving a MAP@5 of 0.31 and nDCG@5 of 0.41, outperforming standard collaborative filtering methods (the KNN achieved MAP@5 of 0.28 and nDCG@5 of 0.38, and the ANN achieved MAP@5 of 0.25 and nDCG@5 of 0.35). Significantly, the system also demonstrates enhanced recommendation diversity, achieving an Item Variety (IV@5) of 23%, which is absent in deterministic baselines. Significantly, the system is engineered for reduced energy consumption and can be deployed on low-cost hardware. This provides a feasible and adaptable method for encouraging informed decision-making and promoting sustainable agricultural practices in areas where resources are constrained, with an emphasis on lower energy usage. Full article
(This article belongs to the Section Agricultural Systems and Management)
Show Figures

Figure 1

19 pages, 551 KiB  
Article
Open Energy Data in Spain and Its Contribution to Sustainability: Content and Reuse Potential
by Ricardo Curto-Rodríguez, Rafael Marcos-Sánchez, Alicia Zaragoza-Benzal and Daniel Ferrández
Sustainability 2025, 17(15), 6731; https://doi.org/10.3390/su17156731 - 24 Jul 2025
Viewed by 364
Abstract
This paper presents a study on open energy data in Spain and its contribution to sustainability, analyzing its content and its reuse potential. Since energy plays an important role in the sustainability and economic development of a country or region, energy strategies must [...] Read more.
This paper presents a study on open energy data in Spain and its contribution to sustainability, analyzing its content and its reuse potential. Since energy plays an important role in the sustainability and economic development of a country or region, energy strategies must be managed through public policies that promote the development of this sector. In this sense, open data is relevant for decision-making in the energy sector, especially in areas such as energy consumption and renewable energy policies. Our research aims to analyze the work of Spain’s autonomous communities in the field of energy information by conducting a population analysis of all datasets tagged in the energy category. After compiling the information and eliminating irrelevant datasets (those that are mislabeled, obsolete, or have a scope less than the level of the autonomous community), it can be seen that the supply is very scarce and that this category is one of the least populated among all existing categories. The typological analysis indicates that information on consumption is the one offering the most datasets, followed, at a short distance, by heterogeneous and difficult-to-classify information and by the set related to energy certificates or audits (the most recurrent, as it is offered only once by the autonomous communities). One of the main findings of the research is the heterogeneity of the initiatives and the significant differences in scores on an indicator created for this purpose. The ranking has taken into account both the existence of information and the quality of reuse, with Catalonia, the Basque Country, and Cantabria being the leaders (with Castilla y León, the performance reaches 60%, so the three remaining communities do not reach 40%). The research concludes with recommendations based on the gaps detected: more data should be published that can drive economic development and environmental sustainability, reduce heterogeneity, and facilitate the use of these data for greater applicability, which will increase the chances that open energy data can contribute more to sustainability. Full article
(This article belongs to the Special Issue Energy Storage, Conversion and Sustainable Management)
Show Figures

Figure 1

18 pages, 3760 KiB  
Article
Transcriptomic Meta-Analysis Unveils Shared Neurodevelopmental Toxicity Pathways and Sex-Specific Transcriptional Signatures of Established Neurotoxicants and Polystyrene Nanoplastics as an Emerging Contaminant
by Wenhao Wang, Yutong Liu, Nanxin Ma, Rui Wang, Lifan Fan, Chen Chen, Qiqi Yan, Zhihua Ren, Xia Ning, Shuting Wei and Tingting Ku
Toxics 2025, 13(8), 613; https://doi.org/10.3390/toxics13080613 - 22 Jul 2025
Viewed by 277
Abstract
Environmental contaminants exhibit heterogeneous neurotoxicity profiles, yet systematic comparisons between legacy neurotoxicants and emerging pollutants remain scarce. To address this gap, we implemented an integrative transcriptome meta-analysis framework that harmonized eight transcriptomic datasets spanning in vivo and in vitro neural models exposed to [...] Read more.
Environmental contaminants exhibit heterogeneous neurotoxicity profiles, yet systematic comparisons between legacy neurotoxicants and emerging pollutants remain scarce. To address this gap, we implemented an integrative transcriptome meta-analysis framework that harmonized eight transcriptomic datasets spanning in vivo and in vitro neural models exposed to two legacy neurotoxicants (bisphenol A [BPA], 2, 2′, 4, 4′-tetrabromodiphenyl ether [BDE-47]) and polystyrene nanoplastics (PSNPs) as an emerging contaminant. Our analysis revealed a substantial overlap (68% consistency) in differentially expressed genes (DEGs) between BPA and PSNPs, with shared enrichment in extracellular matrix disruption pathways (e.g., “fibronectin binding” and “collagen binding”, p < 0.05). Network-based toxicogenomic mapping linked all three contaminants to six neurological disorders, with BPA showing the strongest associations with Hepatolenticular Degeneration. Crucially, a sex-stratified analysis uncovered male-specific transcriptional responses to BPA (e.g., lipid metabolism and immune response dysregulation), whereas female models showed no equivalent enrichment. This highlights the sex-specific transcriptional characteristics of BPA exposure. This study establishes a novel computational toxicology workflow that bridges legacy and emerging contaminant research, providing mechanistic insights for chemical prioritization and gender-specific risk assessment. Full article
Show Figures

Figure 1

24 pages, 3409 KiB  
Article
DepressionMIGNN: A Multiple-Instance Learning-Based Depression Detection Model with Graph Neural Networks
by Shiwen Zhao, Yunze Zhang, Yikai Su, Kaifeng Su, Jiemin Liu, Tao Wang and Shiqi Yu
Sensors 2025, 25(14), 4520; https://doi.org/10.3390/s25144520 - 21 Jul 2025
Viewed by 382
Abstract
The global prevalence of depression necessitates the application of technological solutions, particularly sensor-based systems, to augment scarce resources for early diagnostic purposes. In this study, we use benchmark datasets that contain multimodal data including video, audio, and transcribed text. To address depression detection [...] Read more.
The global prevalence of depression necessitates the application of technological solutions, particularly sensor-based systems, to augment scarce resources for early diagnostic purposes. In this study, we use benchmark datasets that contain multimodal data including video, audio, and transcribed text. To address depression detection as a chronic long-term disorder reflected by temporal behavioral patterns, we propose a novel framework that segments videos into utterance-level instances using GRU for contextual representation, and then constructs graphs where utterance embeddings serve as nodes connected through dual relationships capturing both chronological development and intermittent relevant information. Graph neural networks are employed to learn multi-dimensional edge relationships and align multimodal representations across different temporal dependencies. Our approach achieves superior performance with an MAE of 5.25 and RMSE of 6.75 on AVEC2014, and CCC of 0.554 and RMSE of 4.61 on AVEC2019, demonstrating significant improvements over existing methods that focus primarily on momentary expressions. Full article
Show Figures

Figure 1

29 pages, 5825 KiB  
Article
BBSNet: An Intelligent Grading Method for Pork Freshness Based on Few-Shot Learning
by Chao Liu, Jiayu Zhang, Kunjie Chen and Jichao Huang
Foods 2025, 14(14), 2480; https://doi.org/10.3390/foods14142480 - 15 Jul 2025
Viewed by 325
Abstract
Deep learning approaches for pork freshness grading typically require large datasets, which limits their practical application due to the high costs associated with data collection. To address this challenge, we propose BBSNet, a lightweight few-shot learning model designed for accurate freshness classification with [...] Read more.
Deep learning approaches for pork freshness grading typically require large datasets, which limits their practical application due to the high costs associated with data collection. To address this challenge, we propose BBSNet, a lightweight few-shot learning model designed for accurate freshness classification with a limited number of images. BBSNet incorporates a batch channel normalization (BCN) layer to enhance feature distinguishability and employs BiFormer for optimized fine-grained feature extraction. Trained on a dataset of 600 pork images graded by microbial cell concentration, BBSNet achieved an average accuracy of 96.36% in a challenging 5-way 80-shot task. This approach significantly reduces data dependency while maintaining high accuracy, presenting a viable solution for cost-effective real-time pork quality monitoring. This work introduces a novel framework that connects laboratory freshness indicators to industrial applications in data-scarce conditions. Future research will investigate its extension to various food types and optimization for deployment on portable devices. Full article
Show Figures

Figure 1

24 pages, 7521 KiB  
Article
Developing a Remote Sensing-Based Approach for Agriculture Water Accounting in the Amman–Zarqa Basin
by Raya A. Al-Omoush, Jawad T. Al-Bakri, Qasem Abdelal, Muhammad Rasool Al-Kilani, Ibraheem Hamdan and Alia Aljarrah
Water 2025, 17(14), 2106; https://doi.org/10.3390/w17142106 - 15 Jul 2025
Viewed by 460
Abstract
In water-scarce regions such as Jordan, accurate tracking of water flows is critical for informed water management. This study applied the Water Accounting Plus (WA+) framework using open-source remote sensing data from the FAO WaPOR portal to develop agricultural water accounting (AWA) for [...] Read more.
In water-scarce regions such as Jordan, accurate tracking of water flows is critical for informed water management. This study applied the Water Accounting Plus (WA+) framework using open-source remote sensing data from the FAO WaPOR portal to develop agricultural water accounting (AWA) for the Amman–Zarqa Basin (AZB) during 2014–2022. Inflows, outflows, and water consumption were quantified using WaPOR and other open datasets. The results showed a strong correlation between WaPOR precipitation (P) and rainfall station data, while comparisons with other remote sensing sources were weaker. WaPOR evapotranspiration (ET) values were generally lower than those from alternative datasets. To improve classification accuracy, a correction of the WaPOR-derived land cover map was performed. The revised map achieved a producer’s accuracy of 15.9% and a user’s accuracy of 86.6% for irrigated areas. Additionally, ET values over irrigated zones were adjusted, resulting in a fivefold improvement in estimates. These corrections significantly enhanced the reliability of key AWA indicators such as basin closure, ET fraction, and managed fraction. The findings demonstrate that the accuracy of P and ET data strongly affects AWA outputs, particularly the estimation of percolation and beneficial water use. Therefore, calibrating remote sensing data is essential to ensure reliable water accounting, especially in agricultural settings where data uncertainty can lead to misleading conclusions. This study recommends the use of open-source datasets such as WaPOR—combined with field validation and calibration—to improve agricultural water resource assessments and support decision making at basin and national levels. Full article
Show Figures

Figure 1

30 pages, 2389 KiB  
Communication
Beyond Expectations: Anomalies in Financial Statements and Their Application in Modelling
by Roman Blazek and Lucia Duricova
Stats 2025, 8(3), 63; https://doi.org/10.3390/stats8030063 - 15 Jul 2025
Cited by 1 | Viewed by 340
Abstract
The increasing complexity of financial reporting has enabled the implementation of innovative accounting practices that often obscure a company’s actual performance. This project seeks to uncover manipulative behaviours by constructing an anomaly detection model that utilises unsupervised machine learning techniques. We examined a [...] Read more.
The increasing complexity of financial reporting has enabled the implementation of innovative accounting practices that often obscure a company’s actual performance. This project seeks to uncover manipulative behaviours by constructing an anomaly detection model that utilises unsupervised machine learning techniques. We examined a dataset of 149,566 Slovak firms from 2016 to 2023, which included 12 financial parameters. Utilising TwoSteps and K-means clustering in IBM SPSS, we discerned patterns of normative financial activity and computed an abnormality index for each firm. Entities with the most significant deviation from cluster centroids were identified as suspicious. The model attained a silhouette score of 1.0, signifying outstanding clustering quality. We discovered a total of 231 anomalous firms, predominantly concentrated in sectors C (32.47%), G (13.42%), and L (7.36%). Our research indicates that anomaly-based models can markedly enhance the precision of fraud detection, especially in scenarios with scarce labelled data. The model integrates intricate data processing and delivers an exhaustive study of the regional and sectoral distribution of anomalies, thereby increasing its relevance in practical applications. Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
Show Figures

Figure 1

27 pages, 648 KiB  
Article
An Algorithm for Mining Frequent Approximate Subgraphs with Structural and Label Variations in Graph Collections
by Daybelis Jaramillo-Olivares, Jesús Ariel Carrasco-Ochoa and José Francisco Martínez-Trinidad
Appl. Sci. 2025, 15(14), 7880; https://doi.org/10.3390/app15147880 - 15 Jul 2025
Viewed by 235
Abstract
Using graphs as a data structure is a simple way to represent relationships between objects. Consequently, it has raised the need for algorithms to process, analyze, and extract meaningful information from graphs. Therefore, frequent subgraph mining (FSM) algorithms have been reported in the [...] Read more.
Using graphs as a data structure is a simple way to represent relationships between objects. Consequently, it has raised the need for algorithms to process, analyze, and extract meaningful information from graphs. Therefore, frequent subgraph mining (FSM) algorithms have been reported in the literature to discover interesting, unexpected, and useful patterns in graph databases. Frequent subgraph mining involves discovering subgraphs that appear no less than a user-specified threshold; this can be performed exactly or approximately. Although several algorithms for mining frequent approximate subgraphs exist, mining this type of subgraph in graph collections has scarcely been addressed. Thus, we propose AGCM-SLV, an algorithm for mining frequent approximate subgraphs within a graph collection that allows structural and label variations. Unlike other FSM approaches, our proposed algorithm tracks subgraph occurrences and their structural dissimilarities, allowing user-defined partial similarities between node and edge labels, and captures frequent approximate subgraphs (patterns) that would otherwise be overlooked. Experiments on real-world datasets demonstrate that our algorithm identifies more patterns than the most similar state-of-the-art algorithm with a shorter runtime. We also present experiments in which we add white noise to the graph collection at different levels, revealing that over 99% of the patterns extracted without noise are preserved under noisy conditions, making the proposed algorithm noise-tolerant. Full article
Show Figures

Figure 1

28 pages, 5774 KiB  
Article
Data-Driven Prediction of Polymer Nanocomposite Tensile Strength Through Gaussian Process Regression and Monte Carlo Simulation with Enhanced Model Reliability
by Pavan Hiremath, Subraya Krishna Bhat, Jayashree P. K., P. Krishnananda Rao, Krishnamurthy D. Ambiger, Murthy B. R. N., S. V. Udaya Kumar Shetty and Nithesh Naik
J. Compos. Sci. 2025, 9(7), 364; https://doi.org/10.3390/jcs9070364 - 14 Jul 2025
Viewed by 425
Abstract
This study presents a robust machine learning framework based on Gaussian process regression (GPR) to predict the tensile strength of polymer nanocomposites reinforced with various nanofillers and processed under diverse techniques. A comprehensive dataset comprising 25 polymer matrices, 22 surface functionalization methods, and [...] Read more.
This study presents a robust machine learning framework based on Gaussian process regression (GPR) to predict the tensile strength of polymer nanocomposites reinforced with various nanofillers and processed under diverse techniques. A comprehensive dataset comprising 25 polymer matrices, 22 surface functionalization methods, and 24 processing routes was constructed from the literature. GPR, coupled with Monte Carlo sampling across 2000 randomized iterations, was employed to capture nonlinear dependencies and uncertainty propagation within the dataset. The model achieved a mean coefficient of determination (R2) of 0.96, RMSE of 12.14 MPa, MAE of 7.56 MPa, and MAPE of 31.73% over 2000 Monte Carlo iterations, outperforming conventional models such as support vector machine (SVM), regression tree (RT), and artificial neural network (ANN). Sensitivity analysis revealed the dominant influence of Carbon Nanotubes (CNT) weight fraction, matrix tensile strength, and surface modification methods on predictive accuracy. The findings demonstrate the efficacy of the proposed GPR framework for accurate, reliable prediction of composite mechanical properties under data-scarce conditions, supporting informed material design and optimization. Full article
(This article belongs to the Special Issue Characterization and Modelling of Composites, Volume III)
Show Figures

Figure 1

Back to TopTop