Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (6,046)

Search Parameters:
Keywords = automatic evaluation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 687 KiB  
Article
Turkish Chest X-Ray Report Generation Model Using the Swin Enhanced Yield Transformer (Model-SEY) Framework
by Murat Ucan, Buket Kaya and Mehmet Kaya
Diagnostics 2025, 15(14), 1805; https://doi.org/10.3390/diagnostics15141805 - 17 Jul 2025
Abstract
Background/Objectives: Extracting meaningful medical information from chest X-ray images and transcribing it into text is a complex task that requires a high level of expertise and directly affects clinical decision-making processes. Automatic reporting systems for this field in Turkish represent an important [...] Read more.
Background/Objectives: Extracting meaningful medical information from chest X-ray images and transcribing it into text is a complex task that requires a high level of expertise and directly affects clinical decision-making processes. Automatic reporting systems for this field in Turkish represent an important gap in scientific research, as they have not been sufficiently addressed in the existing literature. Methods: A deep learning-based approach called Model-SEY was developed with the aim of automatically generating Turkish medical reports from chest X-ray images. The Swin Transformer structure was used in the encoder part of the model to extract image features, while the text generation process was carried out using the cosmosGPT architecture, which was adapted specifically for the Turkish language. Results: With the permission of the ethics committee, a new dataset was created using image–report pairs obtained from Elazıg Fethi Sekin City Hospital and Indiana University Chest X-Ray dataset and experiments were conducted on this new dataset. In the tests conducted within the scope of the study, scores of 0.6412, 0.5335, 0.4395, 0.4395, 0.3716, and 0.2240 were obtained in BLEU-1, BLEU-2, BLEU-3, BLEU-4, and ROUGE word overlap evaluation metrics, respectively. Conclusions: Quantitative and qualitative analyses of medical reports autonomously generated by the proposed model have shown that they are meaningful and consistent. The proposed model is one of the first studies in the field of autonomous reporting using deep learning architectures specific to the Turkish language, representing an important step forward in this field. It will also reduce potential human errors during diagnosis by supporting doctors in their decision-making. Full article
(This article belongs to the Special Issue Artificial Intelligence for Health and Medicine)
36 pages, 4468 KiB  
Article
Apis mellifera Bee Verification with IoT and Graph Neural Network
by Apolinar Velarde Martínez, Gilberto González Rodríguez and Juan Carlos Estrada Cabral
Appl. Sci. 2025, 15(14), 7969; https://doi.org/10.3390/app15147969 - 17 Jul 2025
Abstract
Automatic recognition systems (ARS) have been proposed in scientific and technological research for the care and preservation of endangered species; these systems, consisting of Internet of Things (IoT) devices and object-recognition techniques with artificial intelligence (AI), have emerged as proposed solutions to detect [...] Read more.
Automatic recognition systems (ARS) have been proposed in scientific and technological research for the care and preservation of endangered species; these systems, consisting of Internet of Things (IoT) devices and object-recognition techniques with artificial intelligence (AI), have emerged as proposed solutions to detect and prevent parasite attacks on Apis mellifera bees. This article presents a pilot ARS for the recognition and analysis of honeybees at the hive entrance using IoT devices and automatic object-recognition techniques, for the early detection of the Varroa mite in test apiaries. Two object-recognition techniques, namely the k-Nearest Neighbor Algorithm (kNN) and Graph Neural Network (GNN), were evaluated with an image dataset of 600 images from a single beehive. The results of the experiments show the viability of using GNN in real environments. GNN has greater accuracy in bee recognition, but with greater processing time, while the kNN classifier requires fewer processing resources but has lower recognition accuracy. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in the IoT)
Show Figures

Figure 1

27 pages, 3817 KiB  
Article
A Deep Learning-Based Diagnostic Framework for Shaft Earthing Brush Faults in Large Turbine Generators
by Katudi Oupa Mailula and Akshay Kumar Saha
Energies 2025, 18(14), 3793; https://doi.org/10.3390/en18143793 - 17 Jul 2025
Abstract
Large turbine generators rely on shaft earthing brushes to safely divert harmful shaft currents to ground, protecting bearings from electrical damage. This paper presents a novel deep learning-based diagnostic framework to detect and classify faults in shaft earthing brushes of large turbine generators. [...] Read more.
Large turbine generators rely on shaft earthing brushes to safely divert harmful shaft currents to ground, protecting bearings from electrical damage. This paper presents a novel deep learning-based diagnostic framework to detect and classify faults in shaft earthing brushes of large turbine generators. A key innovation lies in the use of FFT-derived spectrograms from both voltage and current waveforms as dual-channel inputs to the CNN, enabling automatic feature extraction of time–frequency patterns associated with different SEB fault types. The proposed framework combines advanced signal processing and convolutional neural networks (CNNs) to automatically recognize fault-related patterns in shaft grounding current and voltage signals. In the approach, raw time-domain signals are converted into informative time–frequency representations, which serve as input to a CNN model trained to distinguish normal and faulty conditions. The framework was evaluated using data from a fleet of large-scale generators under various brush fault scenarios (e.g., increased brush contact resistance, loss of brush contact, worn out brushes, and brush contamination). Experimental results demonstrate high fault detection accuracy (exceeding 98%) and the reliable identification of different fault types, outperforming conventional threshold-based monitoring techniques. The proposed deep learning framework offers a novel intelligent monitoring solution for predictive maintenance of turbine generators. The contributions include the following: (1) the development of a specialized deep learning model for shaft earthing brush fault diagnosis, (2) a systematic methodology for feature extraction from shaft current signals, and (3) the validation of the framework on real-world fault data. This work enables the early detection of brush degradation, thereby reducing unplanned downtime and maintenance costs in power generation facilities. Full article
(This article belongs to the Section F: Electrical Engineering)
Show Figures

Figure 1

35 pages, 2924 KiB  
Article
A Monitoring System for Measuring the Cognitive Cycle via a Continuous Reaction Time Task
by Teodor Ukov, Georgi Tsochev and Radoslav Yoshinov
Systems 2025, 13(7), 597; https://doi.org/10.3390/systems13070597 - 17 Jul 2025
Abstract
The cognitive cycle has been studied via cognitive architectures and by analyzing cognitive experiments. An emerging theoretical approach suggests that several automatic cognitive processes retrieve information, making it available to an internal agent, which in turn decides which information to access. Derived from [...] Read more.
The cognitive cycle has been studied via cognitive architectures and by analyzing cognitive experiments. An emerging theoretical approach suggests that several automatic cognitive processes retrieve information, making it available to an internal agent, which in turn decides which information to access. Derived from this view, four phases of the cognitive cycle can be formulated and reproduced within a cognitive monitoring system. This exploratory work presents a new theory, Attention as Internal Action, and proposes a hypothesis about the relationship between an iteration of the cognitive cycle and a conscious motor action. The design of a continuous reaction time task is presented as a tool for quick cognitive evaluation. Via continuously provided user responses, the computational system behind the task adapts triggering stimuli based on the suggested hypothesis. Its software implementation was employed to assess whether a previously conducted simulation of the cognitive cycle’s time range aligned with empirical data. A control group was assigned to perform a separate simple reaction time task in a sequence of five days. The analysis showed that the experimental cognitive monitoring system produced results more closely aligned with the established understanding of the timing of the cognitive cycle than the control task did. Full article
Show Figures

Figure 1

21 pages, 1246 KiB  
Article
Does Control-Related Information Attenuate Biased Self-Control and Moral Perceptions Based on Weight?
by Casey L. Timbs and Heather M. Maranges
Behav. Sci. 2025, 15(7), 970; https://doi.org/10.3390/bs15070970 - 17 Jul 2025
Abstract
Negative weight-based attitudes are pervasive and difficult to change. One reason may be the moralization of weight: if people use higher weight as a cue for lower self-control, they may infer lower moral character, given the strong link between self-control and morality. Moralized [...] Read more.
Negative weight-based attitudes are pervasive and difficult to change. One reason may be the moralization of weight: if people use higher weight as a cue for lower self-control, they may infer lower moral character, given the strong link between self-control and morality. Moralized attitudes tend to be resistant to change. Accordingly, we tested whether (1) people perceived others with higher (vs. lower) weight as having lower self-control and, in turn, morality and (2) whether targeting control-related perceptions attenuated the weight → self-control → morality links. To that end, in two preregistered experiments (see OSF), we employed intervention strategies targeting control-related perceptions to increase moral evaluations of higher-weight individuals. Specifically, we provided evidence of a higher-weight person’s (a) weight uncontrollability (Study 1) and (b) high self-control (Study 2). People perceived higher-weight targets as having lower self-control, and this predicted perceptions of lower moral character. However, as with extant weight-based attitude interventions, neither experimental intervention strategy attenuated less positive (i.e., made more positive) moral character perceptions. These findings suggest that it is not enough to intervene on control-related beliefs to reduce the moralization of weight. We suggest intervening on moral perceptions directly and the possibility that moralization of weight may be automatic, requiring interventions targeting automatic attitudes. Full article
Show Figures

Figure 1

21 pages, 447 KiB  
Article
Aerodynamic Design of Wind Turbine Blades Using Multi-Fidelity Analysis and Surrogate Models
by Rosalba Cardamone, Riccardo Broglia, Francesco Papi, Franco Rispoli, Alessandro Corsini, Alessandro Bianchini and Alessio Castorrini
Int. J. Turbomach. Propuls. Power 2025, 10(3), 16; https://doi.org/10.3390/ijtpp10030016 - 16 Jul 2025
Abstract
A standard approach to design begins with scaling up state-of-the-art machines to new target dimensions, moving towards larger rotors with lower specific energy to maximize revenue and enable power production in lower wind speed areas. This trend is particularly crucial in floating offshore [...] Read more.
A standard approach to design begins with scaling up state-of-the-art machines to new target dimensions, moving towards larger rotors with lower specific energy to maximize revenue and enable power production in lower wind speed areas. This trend is particularly crucial in floating offshore wind in the Mediterranean Sea, where the high levelized cost of energy poses significant risks to the sustainability of investments in new projects. In this context, the conventional approach of scaling up machines designed for fixed foundations and strong offshore winds may not be optimal. Additionally, modern large-scale wind turbines for offshore applications face challenges in achieving high aerodynamic performance in thick root regions. This study proposes a holistic optimization framework that combines multi-fidelity analyses and tools to address the new challenges in wind turbine rotor design, accounting for the novel demands of this application. The method is based on a modular optimization framework for the aerodynamic design of a new wind turbine rotor, where the cost function block is defined with the aid of a model reduction strategy. The link between the full-order model required to evaluate the target rotor’s performance, the physical aspects of blade aerodynamics, and the optimization algorithm that needs several evaluations of the cost function is provided by the definition of a surrogate model (SM). An intelligent SM definition strategy is adopted to minimize the computational effort required to build a reliable model of the cost function. The strategy is based on the construction of a self-adaptive, automatic refinement of the training space, while the particular SM is defined by the use of stochastic radial basis functions. The goal of this paper is to describe the new aerodynamic design strategy, its performance, and results, presenting a case study of a 15 MW wind turbine blades optimized for specific deepwater sites in the Mediterranean Sea. Full article
Show Figures

Figure 1

59 pages, 11250 KiB  
Article
Automated Analysis of Vertebral Body Surface Roughness for Adult Age Estimation: Ellipse Fitting and Machine-Learning Approach
by Erhan Kartal and Yasin Etli
Diagnostics 2025, 15(14), 1794; https://doi.org/10.3390/diagnostics15141794 - 16 Jul 2025
Abstract
Background/Objectives: Vertebral degenerative features are promising but often subjectively scored indicators for adult age estimation. We evaluated an objective surface roughness metric, the “average distance to the fitted ellipse” score (DS), calculated automatically for every vertebra from C7 to S1 on routine CT [...] Read more.
Background/Objectives: Vertebral degenerative features are promising but often subjectively scored indicators for adult age estimation. We evaluated an objective surface roughness metric, the “average distance to the fitted ellipse” score (DS), calculated automatically for every vertebra from C7 to S1 on routine CT images. Methods: CT scans of 176 adults (94 males, 82 females; 21–94 years) were retrospectively analyzed. For each vertebra, the mean orthogonal deviation of the anterior superior endplate from an ideal ellipse was extracted. Sex-specific multiple linear regression served as a baseline; support vector regression (SVR), random forest (RF), k-nearest neighbors (k-NN), and Gaussian naïve-Bayes pseudo-regressor (GNB-R) were tuned with 10-fold cross-validation and evaluated on a 20% hold-out set. Performance was quantified with the standard error of the estimate (SEE). Results: DS values correlated moderately to strongly with age (peak r = 0.60 at L3–L5). Linear regression explained 40% (males) and 47% (females) of age variance (SEE ≈ 11–12 years). Non-parametric learners improved precision: RF achieved an SEE of 8.49 years in males (R2 = 0.47), whereas k-NN attained 10.8 years (R2 = 0.45) in women. Conclusions: Automated analysis of vertebral cortical roughness provides a transparent, observer-independent means of estimating adult age with accuracy approaching that of more complex deep learning pipelines. Streamlining image preparation and validating the approach across diverse populations are the next steps toward forensic adoption. Full article
(This article belongs to the Special Issue New Advances in Forensic Radiology and Imaging)
Show Figures

Figure 1

36 pages, 8048 KiB  
Article
Characterization and Automated Classification of Underwater Acoustic Environments in the Western Black Sea Using Machine Learning Techniques
by Maria Emanuela Mihailov
J. Mar. Sci. Eng. 2025, 13(7), 1352; https://doi.org/10.3390/jmse13071352 - 16 Jul 2025
Abstract
Growing concern over anthropogenic underwater noise, highlighted by initiatives like the Marine Strategy Framework Directive (MSFD) and its Technical Group on Underwater Noise (TG Noise), emphasizes regions like the Western Black Sea, where increasing activities threaten marine habitats. This region is experiencing rapid [...] Read more.
Growing concern over anthropogenic underwater noise, highlighted by initiatives like the Marine Strategy Framework Directive (MSFD) and its Technical Group on Underwater Noise (TG Noise), emphasizes regions like the Western Black Sea, where increasing activities threaten marine habitats. This region is experiencing rapid growth in maritime traffic and resource exploitation, which is intensifying concerns over the noise impacts on its unique marine habitats. While machine learning offers promising solutions, a research gap persists in comprehensively evaluating diverse ML models within an integrated framework for complex underwater acoustic data, particularly concerning real-world data limitations like class imbalance. This paper addresses this by presenting a multi-faceted framework using passive acoustic monitoring (PAM) data from fixed locations (50–100 m depth). Acoustic data are processed using advanced signal processing (broadband Sound Pressure Level (SPL), Power Spectral Density (PSD)) for feature extraction (Mel-spectrograms for deep learning; PSD statistical moments for classical/unsupervised ML). The framework evaluates Convolutional Neural Networks (CNNs), Random Forest, and Support Vector Machines (SVMs) for noise event classification, alongside Gaussian Mixture Models (GMMs) for anomaly detection. Our results demonstrate that the CNN achieved the highest classification accuracy of 0.9359, significantly outperforming Random Forest (0.8494) and SVM (0.8397) on the test dataset. These findings emphasize the capability of deep learning in automatically extracting discriminative features, highlighting its potential for enhanced automated underwater acoustic monitoring. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

23 pages, 1759 KiB  
Article
Discriminating Children with Speech Sound Disorders from Children with Typically Developing Speech Using the Motor Speech Hierarchy Probe Words: A Preliminary Analysis of Mandibular Control
by Linda Orton, Richard Palmer, Roslyn Ward, Petra Helmholz, Geoffrey R. Strauss, Paul Davey and Neville W. Hennessey
Diagnostics 2025, 15(14), 1793; https://doi.org/10.3390/diagnostics15141793 - 16 Jul 2025
Abstract
Background/Objectives: The Motor Speech Hierarchy (MSH) Probe Words (PWs) have yet to be validated as effective in discriminating between children with impaired and children with typically developing speech motor control. This preliminary study first examined the effectiveness of the mandibular control subtest [...] Read more.
Background/Objectives: The Motor Speech Hierarchy (MSH) Probe Words (PWs) have yet to be validated as effective in discriminating between children with impaired and children with typically developing speech motor control. This preliminary study first examined the effectiveness of the mandibular control subtest of the MSH-PWs in distinguishing between typically developing (TD) and speech sound-disordered (SSD) children aged between 3 years 0 months and 3 years 6 months. Secondly, we compared automatically derived kinematic measures of jaw range and control with MSH-PW consensus scoring to assist in identifying deficits in mandibular control. Methods: Forty-one children with TD speech and 13 with SSD produced the 10 words of the mandibular stage of the MSH-PWs. A consensus team of speech pathologists observed video recordings of the words to score motor speech control and phonetic accuracy, as detailed in the MSH-PW scoring criteria. Specific measures of jaw and lip movements during speech were also extracted to derive the objective measurements, with agreement between the perceptual and objective measures of jaw range and jaw control evaluated. Results: A significant difference between TD and SSD groups was found for jaw range (p = 0.006), voicing transitions (p = 0.004) and total mandibular scores (p = 0.015). SSD and TD group discrimination was significant (at alpha = 0.01) with a balanced classification accuracy of 0.79. Initial analysis indicates objective kinematic measures using facial tracking show good agreement with perceptual judgements of jaw range and jaw control. Conclusions: The preliminary data indicate the MSH-PWs can discriminate TD speech from SSD at the level of mandibular control and can be used by clinicians to assess motor speech control. Further investigation of objective measures to support perceptual scoring is indicated. Full article
Show Figures

Figure 1

25 pages, 6057 KiB  
Article
Physical Implementation and Experimental Validation of the Compensation Mechanism for a Ramp-Based AUV Recovery System
by Zhaoji Qi, Lingshuai Meng, Haitao Gu, Ziyang Guo, Jinyan Wu and Chenghui Li
J. Mar. Sci. Eng. 2025, 13(7), 1349; https://doi.org/10.3390/jmse13071349 - 16 Jul 2025
Abstract
In complex marine environments, ramp-based recovery systems for autonomous underwater vehicles (AUVs) often encounter engineering challenges such as reduced docking accuracy and success rate due to disturbances in the capture window attitude. In this study, a desktop-scale physical experimental platform for recovery compensation [...] Read more.
In complex marine environments, ramp-based recovery systems for autonomous underwater vehicles (AUVs) often encounter engineering challenges such as reduced docking accuracy and success rate due to disturbances in the capture window attitude. In this study, a desktop-scale physical experimental platform for recovery compensation was designed and constructed. The system integrates attitude feedback provided by an attitude sensor and dual-motor actuation to achieve active roll and pitch compensation of the capture window. Based on the structural and geometric characteristics of the platform, a dual-channel closed-loop control strategy was proposed utilizing midpoint tracking of the capture window, accompanied by multi-level software limit protection and automatic centering mechanisms. The control algorithm was implemented using a discrete-time PID structure, with gain parameters optimized through experimental tuning under repeatable disturbance conditions. A first-order system approximation was adopted to model the actuator dynamics. Experiments were conducted under various disturbance scenarios and multiple control parameter configurations to evaluate the attitude tracking performance, dynamic response, and repeatability of the system. The results show that, compared to the uncompensated case, the proposed compensation mechanism reduces the MSE by up to 76.4% and the MaxAE by 73.5%, significantly improving the tracking accuracy and dynamic stability of the recovery window. The study also discusses the platform’s limitations and future optimization directions, providing theoretical and engineering references for practical AUV recovery operations. Full article
(This article belongs to the Section Coastal Engineering)
Show Figures

Figure 1

20 pages, 2382 KiB  
Article
Heterogeneity-Aware Personalized Federated Neural Architecture Search
by An Yang and Ying Liu
Entropy 2025, 27(7), 759; https://doi.org/10.3390/e27070759 - 16 Jul 2025
Abstract
Federated learning (FL), which enables collaborative learning across distributed nodes, confronts a significant heterogeneity challenge, primarily including resource heterogeneity induced by different hardware platforms, and statistical heterogeneity originating from non-IID private data distributions among clients. Neural architecture search (NAS), particularly one-shot NAS, holds [...] Read more.
Federated learning (FL), which enables collaborative learning across distributed nodes, confronts a significant heterogeneity challenge, primarily including resource heterogeneity induced by different hardware platforms, and statistical heterogeneity originating from non-IID private data distributions among clients. Neural architecture search (NAS), particularly one-shot NAS, holds great promise for automatically designing optimal personalized models tailored to such heterogeneous scenarios. However, the coexistence of both resource and statistical heterogeneity destabilizes the training of the one-shot supernet, impairs the evaluation of candidate architectures, and ultimately hinders the discovery of optimal personalized models. To address this problem, we propose a heterogeneity-aware personalized federated NAS (HAPFNAS) method. First, we leverage lightweight knowledge models to distill knowledge from clients to server-side supernet, thereby effectively mitigating the effects of heterogeneity and enhancing the training stability. Then, we build random-forest-based personalized performance predictors to enable the efficient evaluation of candidate architectures across clients. Furthermore, we develop a model-heterogeneous FL algorithm called heteroFedAvg to facilitate collaborative model training for the discovered personalized models. Comprehensive experiments on CIFAR-10/100 and Tiny-ImageNet classification datasets demonstrate the effectiveness of our HAPFNAS, compared to state-of-the-art federated NAS methods. Full article
(This article belongs to the Section Signal and Data Analysis)
Show Figures

Figure 1

19 pages, 6796 KiB  
Article
Performance Assessment of Advanced Daily Surface Soil Moisture Products in China for Sustainable Land and Water Management
by Dai Chen, Zhounan Dong and Jingnan Chen
Sustainability 2025, 17(14), 6482; https://doi.org/10.3390/su17146482 - 15 Jul 2025
Viewed by 59
Abstract
This study evaluates the performance of nine satellite and model-based daily surface soil moisture products, encompassing sixteen algorithm versions across mainland China to support sustainable land and water management. The assessment utilizes 2018 in situ measurements from over 2400 stations in China’s Automatic [...] Read more.
This study evaluates the performance of nine satellite and model-based daily surface soil moisture products, encompassing sixteen algorithm versions across mainland China to support sustainable land and water management. The assessment utilizes 2018 in situ measurements from over 2400 stations in China’s Automatic Soil Moisture Monitoring Network. All products were standardized to a 0.25° × 0.25° grid in the WGS-84 coordinate system through reprojection and resampling for consistent comparison. Daily averaged station observations were matched to product pixels using a 10 km radius buffer, with the mean station value as the reference for each time series after rigorous quality control. Results reveal distinct performance rankings, with SMAP-based products, particularly the SMAP_IB descending orbit variant, achieving the lowest unbiased root mean square deviation (ubRMSD) and highest correlation with in situ data. Blended products like ESA CCI and NOAA SMOPS, alongside reanalysis datasets such as ERA5 and MERRA2, outperformed SMOS and China’s FY3 products. The SoMo.ml product showed the broadest spatial coverage and strong temporal consistency, while FY3-based products showed limitations in spatial reliability and seasonal dynamics capture. These findings provide critical insights for selecting appropriate soil moisture datasets to enhance sustainable agricultural practices, optimize water resource allocation, monitor ecosystem resilience, and support climate adaptation strategies, therefore advancing sustainable development across diverse geographical regions in China. Full article
Show Figures

Figure 1

27 pages, 3562 KiB  
Article
Automated Test Generation and Marking Using LLMs
by Ioannis Papachristou, Grigoris Dimitroulakos and Costas Vassilakis
Electronics 2025, 14(14), 2835; https://doi.org/10.3390/electronics14142835 - 15 Jul 2025
Viewed by 177
Abstract
This paper presents an innovative exam-creation and grading system powered by advanced natural language processing and local large language models. The system automatically generates clear, grammatically accurate questions from both short passages and longer documents across different languages, supports multiple formats and difficulty [...] Read more.
This paper presents an innovative exam-creation and grading system powered by advanced natural language processing and local large language models. The system automatically generates clear, grammatically accurate questions from both short passages and longer documents across different languages, supports multiple formats and difficulty levels, and ensures semantic diversity while minimizing redundancy, thus maximizing the percentage of the material that is covered in the generated exam paper. For grading, it employs a semantic-similarity model to evaluate essays and open-ended responses, awards partial credit, and mitigates bias from phrasing or syntax via named entity recognition. A major advantage of the proposed approach is its ability to run entirely on standard personal computers, without specialized artificial intelligence hardware, promoting privacy and exam security while maintaining low operational and maintenance costs. Moreover, its modular architecture allows the seamless swapping of models with minimal intervention, ensuring adaptability and the easy integration of future improvements. A requirements–compliance evaluation, combined with established performance metrics, was used to review and compare two popular multilingual LLMs and monolingual alternatives, demonstrating the system’s effectiveness and flexibility. The experimental results show that the system achieves a grading accuracy within a 17% normalized error margin compared to that of human experts, with generated questions reaching up to 89.5% semantic similarity to source content. The full exam generation and grading pipeline runs efficiently on consumer-grade hardware, with average inference times under 30 s. Full article
Show Figures

Figure 1

16 pages, 2721 KiB  
Article
An Adapter and Segmentation Network-Based Approach for Automated Atmospheric Front Detection
by Xinya Ding, Xuan Peng, Yanguang Xue, Liang Zhang, Tianying Wang and Yunpeng Zhang
Appl. Sci. 2025, 15(14), 7855; https://doi.org/10.3390/app15147855 - 14 Jul 2025
Viewed by 68
Abstract
This study presents AD-MRCNN, an advanced deep learning framework for automated atmospheric front detection that addresses two critical limitations in existing methods. First, current approaches directly input raw meteorological data without optimizing feature compatibility, potentially hindering model performance. Second, they typically only provide [...] Read more.
This study presents AD-MRCNN, an advanced deep learning framework for automated atmospheric front detection that addresses two critical limitations in existing methods. First, current approaches directly input raw meteorological data without optimizing feature compatibility, potentially hindering model performance. Second, they typically only provide frontal category information without identifying individual frontal systems. Our solution integrates two key innovations: 1. An intelligent adapter module that performs adaptive feature fusion, automatically weighting and combining multi-source meteorological inputs (including temperature, wind fields, and humidity data) to maximize their synergistic effects while minimizing feature conflicts; the utilized network achieves an average improvement of over 4% across various metrics. 2. An enhanced instance segmentation network based on Mask R-CNN architecture that simultaneously achieves (1) precise frontal type classification (cold/warm/stationary/occluded), (2) accurate spatial localization, and (3) identification of distinct frontal systems. Comprehensive evaluation using ERA5 reanalysis data (2009–2018) demonstrates significant improvements, including an 85.1% F1-score, outperforming traditional methods (TFP: 63.1%) and deep learning approaches (Unet: 83.3%), and a 31% reduction in false alarms compared to semantic segmentation methods. The framework’s modular design allows for potential application to other meteorological feature detection tasks. Future work will focus on incorporating temporal dynamics for frontal evolution prediction. Full article
Show Figures

Figure 1

16 pages, 396 KiB  
Article
Investigating Reproducibility Challenges in LLM Bugfixing on the HumanEvalFix Benchmark
by Balázs Szalontai, Balázs Márton, Balázs Pintér and Tibor Gregorics
Software 2025, 4(3), 17; https://doi.org/10.3390/software4030017 - 14 Jul 2025
Viewed by 118
Abstract
Benchmark results for large language models often show inconsistencies across different studies. This paper investigates the challenges of reproducing these results in automatic bugfixing using LLMs, on the HumanEvalFix benchmark. To determine the cause of the differing results in the literature, we attempted [...] Read more.
Benchmark results for large language models often show inconsistencies across different studies. This paper investigates the challenges of reproducing these results in automatic bugfixing using LLMs, on the HumanEvalFix benchmark. To determine the cause of the differing results in the literature, we attempted to reproduce a subset of them by evaluating 12 models in the DeepSeekCoder, CodeGemma, CodeLlama, and WizardCoder model families, in different sizes and tunings. A total of 35 unique results were reported for these models across studies, of which we successfully reproduced 12. We identified several relevant factors that influenced the results. The base models can be confused with their instruction-tuned variants, making their results better than expected. Incorrect prompt templates or generation length can decrease benchmark performance, as well as using 4-bit quantization. Using sampling instead of greedy decoding can increase the variance, especially with higher temperature values. We found that precision and 8-bit quantization have less influence on benchmark results. Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
Show Figures

Figure 1

Back to TopTop