MDPI - Publisher of Open Access Journals

34 pages, 699 KB

Open AccessArticle

ChatGPT at University: The Definitive Transition from Adoption to Quality of Student Interaction

by Angel Deroncele-Acosta, María de los Ángeles Sánchez-Trujillo, Madeleine Lourdes Palacios-Núñez, Paul Neira Del Ben, Carlos Alberto Atúncar-Prieto and Edith Soria-Valencia

Educ. Sci. 2026, 16(4), 515; https://doi.org/10.3390/educsci16040515 - 26 Mar 2026

Viewed by 175

Abstract

Research on ChatGPT GPT-4 and GPT-5 in higher education has focused on quantitative adoption models (intention to use and predictors) and fragmented effects (writing, performance, well-being, dependence, or ethics). However, this approach keeps the debate stuck in an outdated phase of debate about [...] Read more.

Research on ChatGPT GPT-4 and GPT-5 in higher education has focused on quantitative adoption models (intention to use and predictors) and fragmented effects (writing, performance, well-being, dependence, or ethics). However, this approach keeps the debate stuck in an outdated phase of debate about the tool’s acceptance, even though ChatGPT is part of the academic ecosystem. The objective of the study is to understand, from students’ voices, how the quality of academic interaction with ChatGPT is configured, and to identify patterns of decision-making, validation, ethical regulation, and communication (transparency/concealment) in university contexts. An interpretive qualitative approach was followed. A total of 418 university students participated, all of whom provided qualitative data through semi-structured virtual interviews. The data were analyzed using reflective thematic analysis in six phases, with the support of ATLAS.ti software for rooting and density calculations. The results revealed ten categories that structure the phenomenon (adoption, attitudes, writing, translation, performance, cross-cutting skills, integrity, well-being, disciplinary use, and institutional integration). A continuum was observed between high-quality interaction (verification, rewriting, appropriation, and responsible authorship) and low-quality interaction (cognitive delegation, overconfidence, dependence, and concealment). The quality of student interaction with ChatGPT requires critical, ethical, and institutional regulation to guide and legitimize the academic process. Full article

(This article belongs to the Special Issue ChatGPT as Educative and Pedagogical Tool: Perspectives and Prospects)

► Show Figures

Figure 1

11 pages, 575 KB

Open AccessProceeding Paper

Parameter-Efficient Adaptation of Qwen2.5 for Aspect-Based Sentiment Analysis Using Low-Rank Adaptation and Parameter-Efficient Fine-Tuning

by Pei Ying Lim, Chuk Fong Ho and Chi Wee Tan

Eng. Proc. 2026, 128(1), 15; https://doi.org/10.3390/engproc2026128015 - 9 Mar 2026

Viewed by 264

Abstract

Aspect-based sentiment analysis (ABSA) plays a vital role in deriving fine-grained sentiment from textual content. As large language models (LLMs) are increasingly adopted for automated data annotation in natural language processing (NLP), concerns have emerged regarding the accuracy of their outputs. Despite their [...] Read more.

Aspect-based sentiment analysis (ABSA) plays a vital role in deriving fine-grained sentiment from textual content. As large language models (LLMs) are increasingly adopted for automated data annotation in natural language processing (NLP), concerns have emerged regarding the accuracy of their outputs. Despite their capacity to generate large volumes of labeled data, LLMs often suffer from overconfidence in predictions, high uncertainty in complex contexts, and difficulty capturing nuanced meanings, which compromise the quality of annotations and, in turn, the performance of downstream models. This underscores the need to enhance LLM adaptability while maintaining annotation accuracy. To address these limitations, we integrated low-rank adaptation (LoRA) with parameter-efficient fine-tuning (PEFT) for adapting Qwen2.5 to ABSA. LoRA reduces the number of trainable parameters by decomposing weight updates into low-rank matrices, while PEFT introduces modular adapter layers with scaled gradient updates and dynamic rank allocation. Using the standard SemEval 2014 Laptop dataset, Qwen2.5-3B fine-tuned with LoRA and PEFT achieves 64.50% accuracy, outperforming its baseline of 24.05%. Likewise, Qwen2.5-7B attains 77.50%, compared with a baseline of 34.63%. These results highlight the potential of parameter-efficient methods to improve the accuracy of LLMs in ABSA annotation tasks, especially under resource constraints. Such results lay the groundwork for scalable, reproducible LLM deployment and open avenues for future research in cross-domain adapter transferability and dynamic rank optimization. Full article

► Show Figures

Figure 1

25 pages, 25575 KB

Open AccessArticle

Sea Ice Classification Enhancement Using Calibration-Focused Loss Functions

by Nima Ahmadian, Matthew Hamilton and Weimin Huang

Remote Sens. 2026, 18(5), 810; https://doi.org/10.3390/rs18050810 - 6 Mar 2026

Viewed by 207

Abstract

Deep learning has become a key approach for automated sea ice mapping in the AI4Arctic Sea Ice Challenge dataset, yet most studies focus on accuracy metrics and rarely evaluate whether predicted probabilities are reliable for operational use. This paper investigates calibration-aware training for [...] Read more.

Deep learning has become a key approach for automated sea ice mapping in the AI4Arctic Sea Ice Challenge dataset, yet most studies focus on accuracy metrics and rarely evaluate whether predicted probabilities are reliable for operational use. This paper investigates calibration-aware training for multi-task sea ice segmentation of sea ice concentration (SIC), stage of development (SOD), and floe size (FLOE) using the U-Net model. We train the network with cross-entropy (CE) and augment the objective with focal loss, Brier loss, and an entropy-regularization term to reduce overconfidence and improve calibration. Experiments follow a scene-level Monte Carlo cross-validation protocol on the ready-to-train AI4Arctic Sea Ice Challenge dataset (AI4Arctic) dataset and are evaluated using

R^{2}

for SIC, F1 for SOD and FLOE, a weighted combined score, and expected calibration error (ECE) and reliability diagrams. Results show that calibration-aware loss functions improve test performance relative to the CE loss, and the full objective (CE + Brier + focal + entropy) achieves the highest combined score of

84.73 %

and reduces FLOE ECE to 0.044. Qualitative comparisons further indicate cleaner spatial structures and fewer scattered errors, particularly for FLOE. Overall, the proposed loss design improves both segmentation quality and confidence reliability, supporting more trustworthy sea ice products for decision-making. Full article

► Show Figures

Figure 1

19 pages, 1064 KB

Open AccessArticle

Metacognitive Monitoring in Reading Comprehension: Examining the Role of Cognitive Flexibility, Vocabulary, and Fluency in Young Readers

by Vered Markovich, Shoshi Dorfberger, Vered Halamish, Tami Katzir, Dana Tal and Rotem Yinon

J. Intell. 2026, 14(3), 42; https://doi.org/10.3390/jintelligence14030042 - 5 Mar 2026

Viewed by 519

Abstract

This study examined associations between vocabulary knowledge, reading fluency, cognitive flexibility, and metacognitive monitoring accuracy in reading comprehension among fifth-grade students. Participants (N = 104) completed measures of cognitive–linguistic abilities and reading comprehension, with global metacomprehension judgments after reading and item-level confidence ratings. [...] Read more.

This study examined associations between vocabulary knowledge, reading fluency, cognitive flexibility, and metacognitive monitoring accuracy in reading comprehension among fifth-grade students. Participants (N = 104) completed measures of cognitive–linguistic abilities and reading comprehension, with global metacomprehension judgments after reading and item-level confidence ratings. Metacognitive monitoring accuracy was assessed using calibration of global metacomprehension judgments and item-level confidence ratings. Calibration bias (confidence minus performance) indexed miscalibration direction, and its absolute value indexed calibration accuracy. Resolution reflected discrimination between correct and incorrect item-level responses. Structural equation modeling (SEM) was used exploratorily to examine theoretically motivated direct and indirect pathways via reading comprehension. Vocabulary knowledge showed the strongest associations with calibration accuracy and resolution, fully mediated by comprehension. Reading fluency showed a dual pattern: it contributed positively to resolution through comprehension, while also showing direct associations with lower calibration accuracy, indicating greater miscalibration and overconfident judgment tendencies among more fluent readers. Cognitive flexibility was not significantly related to any monitoring index. By jointly examining distinct indices of monitoring accuracy and separating comprehension-mediated from direct pathways, the study clarifies how cognitive–linguistic abilities may support or bias metacognitive monitoring in developing readers. Linguistic abilities, particularly vocabulary and fluency were central to students’ comprehension monitoring accuracy. Full article

(This article belongs to the Section Studies on Cognitive Processes)

► Show Figures

Figure 1

20 pages, 836 KB

Open AccessArticle

Examining Gender Differences in the Force Concept Inventory (FCI) in a Turkish Context: Accuracy, Confidence and Bias Score Comparisons

by Derya Kaltakci-Gurel and Kubra Ozmen

Soc. Sci. 2026, 15(3), 164; https://doi.org/10.3390/socsci15030164 - 4 Mar 2026

Viewed by 261

Abstract

This study investigates gender differences in conceptual understanding, confidence, and calibration among 369 Turkish university students completing the Force Concept Inventory (FCI). Using accuracy scores, confidence ratings, and bias indices as complementary measures, we examined how male and female students differed in both [...] Read more.

This study investigates gender differences in conceptual understanding, confidence, and calibration among 369 Turkish university students completing the Force Concept Inventory (FCI). Using accuracy scores, confidence ratings, and bias indices as complementary measures, we examined how male and female students differed in both their conceptual reasoning and their self-evaluative judgments. The results show that male students achieved significantly higher accuracy scores than female students (M = 56.79 vs. 49.96), though the effect size was small, indicating modest conceptual differences. Confidence differences were more pronounced: male students reported substantially higher confidence (M = 68.17) than female students (M = 54.44), representing a moderate effect. Bias scores further revealed that male students exhibited greater overconfidence (M = 11.38), while female students were more likely to underestimate their performance (M = 4.47). Item-level analyses showed that gender differences were concentrated in well-documented areas of conceptual difficulty, including Newton’s first law and gravitation. These patterns align with international findings and suggest that gender differences in physics arise from a combination of conceptual challenges and metacognitive tendencies rather than large performance disparities. The findings highlight the importance of integrating confidence calibration, reflective metacognitive practices, and targeted conceptual support into introductory physics instruction to reduce gender-based differences in learning outcomes. Full article

(This article belongs to the Special Issue Gender Equity in K-12 Education, Academia and Higher Education: A Global Perspective)

► Show Figures

Figure 1

17 pages, 1437 KB

Open AccessArticle

False Reality Bias in Treasury Management

by Óscar de los Reyes Marín, Iria Paz Gil, Jose Torres-Pruñonosa and Raul Gómez-Martínez

Int. J. Financial Stud. 2026, 14(3), 65; https://doi.org/10.3390/ijfs14030065 - 4 Mar 2026

Viewed by 811

Abstract

This study examines the False Reality Bias in treasury management, a cognitive distortion through which small and medium-sized enterprises (SMEs) infer financial stability from salient bank balances while overlooking pending obligations and cash-flow timing. Using a firm-level dataset of 50 Spanish meat-processing SMEs, [...] Read more.

This study examines the False Reality Bias in treasury management, a cognitive distortion through which small and medium-sized enterprises (SMEs) infer financial stability from salient bank balances while overlooking pending obligations and cash-flow timing. Using a firm-level dataset of 50 Spanish meat-processing SMEs, the analysis develops two behavioral-finance indicators: the Liquidity Misperception Index (PEL), capturing the divergence between salient liquidity cues and effective short-term obligations, and the Liquidity Misconfidence Index (ICEL), measuring managerial overconfidence in liquidity assessments. Results show that 41% of firms overestimate liquidity (average PEL = 1.21), while 40% exhibit excessive confidence (ICEL > 1.3), both significantly associated with liquidity distress. Econometric estimates indicate that firms with PEL values above 1.2 are 4.48 times more likely to experience liquidity crises, even after controlling for bank balance levels. Predictive models are used in an exploratory capacity, achieving classification accuracies above 80% and supporting the robustness of the behavioral signals identified. In addition, AI-assisted cash-flow simulations reduce liquidity misperception by 34.7% (p < 0.01). Overall, the findings provide micro-level evidence that cognitive biases systematically distort SME treasury decisions but can be partially corrected through targeted decision-support tools, offering practical insights for managers, advisors, and policymakers. Full article

► Show Figures

Graphical abstract

26 pages, 1959 KB

Open AccessArticle

Trustworthy Celestial Eye: Calibrated and Robust Planetary Classification via Self-Supervised Vision Transformers

by Ziqiang Xu, Young Choi, Changyong Yi, Chanjeong Park, Jinyoung Park, Hyungkeun Park and Sujeen Song

Aerospace 2026, 13(3), 222; https://doi.org/10.3390/aerospace13030222 - 27 Feb 2026

Viewed by 306

Abstract

Automated recognition of celestial bodies from observational imagery is a cornerstone of autonomous space exploration. However, deploying deep learning models in space environments entails rigorous requirements not only for accuracy but also for reliability (calibration) and safety (anomaly rejection). Traditional Convolutional Neural Networks [...] Read more.

Automated recognition of celestial bodies from observational imagery is a cornerstone of autonomous space exploration. However, deploying deep learning models in space environments entails rigorous requirements not only for accuracy but also for reliability (calibration) and safety (anomaly rejection). Traditional Convolutional Neural Networks (CNNs) trained on small-scale astronomical datasets often suffer from overfitting and overconfidence on Out-of-Distribution (OOD) artifacts. In this work, we present a robust classification framework based on DINOv2, a Vision Transformer pre-trained via discriminative self-supervised learning. We curate a high-fidelity dataset of seven planetary classes sourced from NASA archives and propose a two-stage domain adaptation strategy to transfer large-scale foundation model features to this fine-grained task. Extensive experiments show that our method reaches 100% Top-1 accuracy on the canonical split, and remains highly stable under split variation, achieving 99.43% ± 0.85% Top-1 accuracy across R = 5 repeated stratified splits. More importantly, we address the critical issue of model trustworthiness. Through post hoc temperature scaling, our model achieves a state-of-the-art Expected Calibration Error (ECE) of 0.08%, representing a 36-fold improvement over ResNet50 (2.90%) and a 4.5-fold improvement over the EfficientNet-B3 baseline (0.36%). Furthermore, by integrating Energy-based OOD detection, the system effectively rejects non-planetary artifacts with an AUROC of 93.7%. Qualitative analysis using Grad-CAM reveals that self-supervised attention mechanisms naturally focus on intrinsic planetary features (e.g., surface textures and rings) while ignoring background noise, confirming the superior robustness of vision foundation models in astronomical vision tasks. Full article

(This article belongs to the Section Astronautics & Space Science)

► Show Figures

Figure 1

24 pages, 11644 KB

Open AccessArticle

Authenticating Matryoshka Nesting Dolls via an Auditable 2D–3D–Text Evidence Framework with BMA Compression and Zero-Shot 3D Completion

by Yulia Kumar and Srotriyo Sengupta

Electronics 2026, 15(5), 992; https://doi.org/10.3390/electronics15050992 - 27 Feb 2026

Viewed by 306

Abstract

Authenticating cultural heritage artifacts such as Matryoshka Nesting Dolls (MNDs) is increasingly complicated by high-fidelity replicas that successfully mimic surface textures and palettes, leading traditional 2D computer vision models to exhibit dangerous overconfidence in false-positive classifications. To address this, we propose an auditable [...] Read more.

Authenticating cultural heritage artifacts such as Matryoshka Nesting Dolls (MNDs) is increasingly complicated by high-fidelity replicas that successfully mimic surface textures and palettes, leading traditional 2D computer vision models to exhibit dangerous overconfidence in false-positive classifications. To address this, we propose an auditable multimodal framework that transitions from appearance-only detection to a robust verification system based on the following three technical pillars: (1) a 2D visual stream utilizing a ConvNeXt-Tiny backbone for fine-grained style recognition; (2) a 3D geometric stream employing a custom 2D-to-3D reconstruction pipeline based on the Blum Medial Axis (BMA) and surfaces of revolution to capture axisymmetric structural fidelity; and (3) a semantic stream leveraging the Qwen3-VL vision-language model to generate human-interpretable evidence cards. To support this framework, we introduce a novel multimodal dataset comprising 168 unique physical MND sets and 27,387 labeled frames, archived for reproducibility. Our experimental results demonstrate that while 2D-only baselines achieve 77.9% authenticity accuracy, they suffer from a high Expected Calibration Error (ECE) of 0.121. The integrated multimodal framework achieves a superior authenticity accuracy of 96.7% and reduces the ECE to 0.041, representing a 66% improvement in calibration reliability. Crucially, the system shifts the mean confidence for incorrect replica classifications from a high-risk 0.82 to a safe 0.45. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

29 pages, 11323 KB

Open AccessArticle

DenseNet-CSL: An Enhanced Network for Multi-Class Recognition of Agricultural Pests, Weeds, and Crop Diseases

by Yiqi Huang, Tao Huang, Jing Du, Jinxue Qiu, Conghui Liu, Fanghao Wan, Wanqiang Qian, Xi Qiao and Liang Wang

Agriculture 2026, 16(4), 394; https://doi.org/10.3390/agriculture16040394 - 8 Feb 2026

Viewed by 337

Abstract

Ensuring food security and agricultural biosecurity increasingly depends on the rapid and accurate identification of harmful organisms that threaten crop production. Traditional identification methods rely heavily on expert knowledge, are time-consuming, and often fail in complex multi-species scenarios. To address these limitations, this [...] Read more.

Ensuring food security and agricultural biosecurity increasingly depends on the rapid and accurate identification of harmful organisms that threaten crop production. Traditional identification methods rely heavily on expert knowledge, are time-consuming, and often fail in complex multi-species scenarios. To address these limitations, this study establishes a comprehensive image dataset that includes three major categories of agricultural harmful organisms—pests, weeds, and crop diseases—and proposes an enhanced convolutional neural network, DenseNet-CSL (DenseNet with Coordinate Attention, Deep Supervision, and Label Smoothing), developed based on DenseNet121 for efficient multi-class recognition. The dataset comprises 62 pest species, 28 weed species, and 30 major crop diseases, totaling 23,995 images collected under diverse growth stages, ecological conditions, and imaging environments. DenseNet-CSL incorporates three targeted improvements: a Coordinate Attention mechanism to strengthen spatial and channel feature representation, Deep Supervision to accelerate convergence and enhance generalization, and Label Smoothing Loss to regularize the output distribution and reduce overconfidence, which is beneficial under imbalanced and noisy data. Experimental results demonstrate that DenseNet-CSL achieves a precision of 81.3%, a recall of 80.1%, and an F1-score of 80% on the constructed dataset—outperforming DenseNet121, ResNet101, EfficientNetV2, and MobileNetV3—while shortening inference time by 1.36 s and adding only 1.772 MB of additional model parameters. These findings highlight the effectiveness of DenseNet-CSL for multi-class recognition of agricultural pests, weeds, and diseases, and underscore the importance of multi-source, multi-scene datasets for improving model robustness and generalization. The proposed framework provides a viable technical pathway for intelligent diagnosis and monitoring of agricultural harmful organisms, supporting port quarantine and agricultural biosecurity applications. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

37 pages, 975 KB

Open AccessReview

Wearable Biosensing and Machine Learning for Data-Driven Training and Coaching Support

by Rubén Madrigal-Cerezo, Natalia Domínguez-Sanz and Alexandra Martín-Rodríguez

Biosensors 2026, 16(2), 97; https://doi.org/10.3390/bios16020097 - 4 Feb 2026

Cited by 1 | Viewed by 1398

Abstract

Background: Artificial Intelligence (AI) and Machine Learning (ML) are increasingly integrated into sport and exercise through wearable biosensing systems that enable continuous monitoring and data-driven training adaptation. However, their practical value for coaching depends on the validity of biosensor data, the robustness of [...] Read more.

Background: Artificial Intelligence (AI) and Machine Learning (ML) are increasingly integrated into sport and exercise through wearable biosensing systems that enable continuous monitoring and data-driven training adaptation. However, their practical value for coaching depends on the validity of biosensor data, the robustness of analytical models, and the conditions under which these systems have been empirically evaluated. Methods: A structured narrative review was conducted using Scopus, PubMed, Web of Science, and Google Scholar (2010–2026), synthesising empirical and applied evidence on wearable biosensing, signal processing, and ML-based adaptive training systems. To enhance transparency, an evidence map of core empirical studies was constructed, summarising sensing modalities, cohort sizes, experimental settings (laboratory vs. field), model types, evaluation protocols, and key outcomes. Results: Evidence from field and laboratory studies indicates that wearable biosensors can reliably capture physiological (e.g., heart rate variability), biomechanical (e.g., inertial and electromyographic signals), and biochemical (e.g., sweat lactate and electrolytes) markers relevant to training load, fatigue, and recovery, provided that signal quality control and calibration procedures are applied. ML models trained on these data can support training adaptation and recovery estimation, with improved performance over traditional workload metrics in endurance, strength, and team-sport contexts when evaluated using athlete-wise or longitudinal validation schemes. Nevertheless, the evidence map also highlights recurring limitations, including sensitivity to motion artefacts, inter-session variability, distribution shift between laboratory and field settings, and overconfident predictions when contextual or psychosocial inputs are absent. Conclusions: Current empirical evidence supports the use of AI-driven biosensor systems as decision-support tools for monitoring and adaptive training, but not as autonomous coaching agents. Their effectiveness is bounded by sensor reliability, appropriate validation protocols, and human oversight. The most defensible model emerging from the evidence is human–AI collaboration, in which ML enhances precision and consistency in data interpretation, while coaches retain responsibility for contextual judgement, ethical decision-making, and athlete-centred care. Full article

(This article belongs to the Special Issue Wearable Sensors for Precise Exercise Monitoring and Analysis)

► Show Figures

Figure 1

26 pages, 6232 KB

Open AccessArticle

MFE-YOLO: A Multi-Scale Feature Enhanced Network for PCB Defect Detection with Cross-Group Attention and FIoU Loss

by Ruohai Di, Hao Fan, Hanxiao Feng, Zhigang Lv, Lei Shu, Rui Xie and Ruoyu Qian

Entropy 2026, 28(2), 174; https://doi.org/10.3390/e28020174 - 2 Feb 2026

Viewed by 436

Abstract

The detection of defects in Printed Circuit Boards (PCBs) is a critical yet challenging task in industrial quality control, characterized by the prevalence of small targets and complex backgrounds. While deep learning models like YOLOv5 have shown promise, they often lack the ability [...] Read more.

The detection of defects in Printed Circuit Boards (PCBs) is a critical yet challenging task in industrial quality control, characterized by the prevalence of small targets and complex backgrounds. While deep learning models like YOLOv5 have shown promise, they often lack the ability to quantify predictive uncertainty, leading to overconfident errors in challenging scenarios—a major source of false alarms and reduced reliability in automated manufacturing inspection lines. From a Bayesian perspective, this overconfidence signifies a failure in probabilistic calibration, which is crucial for trustworthy automated inspection. To address this, we propose MFE-YOLO, a Bayesian-enhanced detection framework built upon YOLOv5 that systematically integrates uncertainty-aware mechanisms to improve both accuracy and operational reliability in real-world settings. First, we construct a multi-background PCB defect dataset with diverse substrate colors and shapes, enhancing the model’s ability to generalize beyond the single-background bias of existing data. Second, we integrate the Convolutional Block Attention Module (CBAM), reinterpreted through a Bayesian lens as a feature-wise uncertainty weighting mechanism, to suppress background interference and amplify salient defect features. Third, we propose a novel FIoU loss function, redesigned within a probabilistic framework to improve bounding box regression accuracy and implicitly capture localization uncertainty, particularly for small defects. Extensive experiments demonstrate that MFE-YOLO achieves state-of-the-art performance, with mAP@0.5 and mAP@0.5:0.95 values of 93.9% and 59.6%, respectively, outperforming existing detectors, including YOLOv8 and EfficientDet. More importantly, the proposed framework yields better-calibrated confidence scores, significantly reducing false alarms and enabling more reliable human-in-the-loop verification. This work provides a deployable, uncertainty-aware solution for high-throughput PCB inspection, advancing toward trustworthy and efficient quality control in modern manufacturing environments. Full article

(This article belongs to the Special Issue Bayesian Networks and Causal Discovery)

► Show Figures

Figure 1

35 pages, 1699 KB

Open AccessReview

Will AI Replace Physicians in the Near Future? AI Adoption Barriers in Medicine

by Rafał Obuchowicz, Adam Piórkowski, Karolina Nurzyńska, Barbara Obuchowicz, Michał Strzelecki and Marzena Bielecka

Diagnostics 2026, 16(3), 396; https://doi.org/10.3390/diagnostics16030396 - 26 Jan 2026

Cited by 5 | Viewed by 1760

Abstract

Objectives: This study aims to evaluate whether contemporary artificial intelligence (AI), including convolutional neural networks (CNNs) for medical imaging and large language models (LLMs) for language processing, could replace physicians in the near future and to identify the principal clinical, technical, and [...] Read more.

Objectives: This study aims to evaluate whether contemporary artificial intelligence (AI), including convolutional neural networks (CNNs) for medical imaging and large language models (LLMs) for language processing, could replace physicians in the near future and to identify the principal clinical, technical, and regulatory barriers. Methods: A narrative review is conducted on the scientific literature addressing AI performance and reproducibility in medical imaging, LLM competence in medical knowledge assessment and patient communication, limitations in out-of-distribution generalization, absence of physical examination and sensory inputs, and current regulatory and legal frameworks, particularly within the European Union. Results: AI systems demonstrate high accuracy and reproducibility in narrowly defined tasks, such as image interpretation, lesion measurement, triage, documentation support, and written communication. These capabilities reduce interobserver variability and support workflow efficiency. However, major obstacles to physician replacement persist, including limited generalization beyond training distributions, inability to perform physical examination or procedural tasks, susceptibility of LLMs to hallucinations and overconfidence, unresolved issues of legal liability at higher levels of autonomy, and the continued requirement for clinician oversight. Conclusions: In the foreseeable future, AI will augment rather than replace physicians. The most realistic trajectory involves automation of well-defined tasks under human supervision, while clinical integration, physical examination, procedural performance, ethical judgment, and accountability remain physician-dependent. Future adoption should prioritize robust clinical validation, uncertainty management, escalation pathways to clinicians, and clear regulatory and legal frameworks. Full article

(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)

► Show Figures

Figure 1

21 pages, 11032 KB

Open AccessArticle

Scale Calibration and Pressure-Driven Knowledge Distillation for Image Classification

by Jing Xie, Penghui Guan, Han Li, Chunhua Tang, Li Wang and Yingcheng Lin

Symmetry 2026, 18(1), 177; https://doi.org/10.3390/sym18010177 - 18 Jan 2026

Viewed by 257

Abstract

Knowledge distillation achieves model compression by training a lightweight student network to mimic the output distribution of a larger teacher network. However, when the teacher becomes overconfident, its sharply peaked logits break the scale symmetry of supervision and induce high-variance gradients, leading to [...] Read more.

Knowledge distillation achieves model compression by training a lightweight student network to mimic the output distribution of a larger teacher network. However, when the teacher becomes overconfident, its sharply peaked logits break the scale symmetry of supervision and induce high-variance gradients, leading to unstable optimization. Meanwhile, research that focuses only on final-logit alignment often fails to utilize intermediate semantic structure effectively. This causes weak discrimination of student representations, especially under class imbalance. To address these issues, we propose Scale Calibration and Pressure-Driven Knowledge Distillation (SPKD): a one-stage framework comprising two lightweight, complementary mechanisms. First, a dynamic scale calibration module normalizes the teacher’s logits to a consistent magnitude, reducing gradient variance. Secondly, an adaptive pressure-driven mechanism refines student learning by preventing feature collapse and promoting intra-class compactness and inter-class separability. Extensive experiments on CIFAR-100 and ImageNet demonstrate that SPKD achieves superior performance to distillation baselines across various teacher–student combinations. For example, SPKD achieves a score of 74.84% on CIFAR-100 for the homogeneous architecture VGG13-VGG8. Additional evidence from logit norm and gradient variance statistics, as well as representation analyses, proves the fact that SPKD stabilizes optimization while learning more discriminative and well-structured features. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

21 pages, 581 KB

Open AccessArticle

Pre–Post Evaluation of Slovenia’s Additional Training Programme for Novice Drivers: Implications for Reducing Risk and Promoting Sustainable Road Safety

by Darja Topolšek and Tina Cvahte Ojsteršek

Sustainability 2026, 18(2), 972; https://doi.org/10.3390/su18020972 - 17 Jan 2026

Viewed by 352

Abstract

Education and post-licencing training programmes for novice drivers are widely implemented to improve road safety, yet their effectiveness remains debated. This study evaluates short-term attitudinal changes relating to participation in a mandatory post-licencing training programme for novice drivers in Slovenia. A within-subject pre–post [...] Read more.

Education and post-licencing training programmes for novice drivers are widely implemented to improve road safety, yet their effectiveness remains debated. This study evaluates short-term attitudinal changes relating to participation in a mandatory post-licencing training programme for novice drivers in Slovenia. A within-subject pre–post survey methodology was used to evaluate self-reported driving attitudes across six safety-related domains among 225 novice drivers at a Slovenian driving training centre in 2024. Paired t-tests revealed minor yet statistically significant improvement following the programme in perceived support for the additional driver training, lowered overconfidence, heightened care in speeding and intersection behaviour, and enhanced attitudes towards vehicle operation and utilization of safety equipment. Attitudes regarding attention and adherence to traffic regulations showed negligible shifts, indicating a strong baseline attitude towards safe driving. The findings indicate a modest but fairly consistent short-term change in attitudes after programme participation. Due to the lack of a control group and dependence on self-reported data, the findings should be seen as evaluative rather than causative, necessitating more longitudinal and behavioural research to evaluate long-term and behavioural effects. Full article

(This article belongs to the Collection Accident Prevention and Risk Management for Safe and Sustainable Transportation)

► Show Figures

Figure 1

23 pages, 1503 KB

Open AccessArticle

Hallucination-Aware Interpretable Sentiment Analysis Model: A Grounded Approach to Reliable Social Media Content Classification

by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif

Electronics 2026, 15(2), 409; https://doi.org/10.3390/electronics15020409 - 16 Jan 2026

Viewed by 426

Abstract

Sentiment analysis (SA) has become an essential tool for analyzing social media content in order to monitor public opinion and support digital analytics. Although transformer-based SA models exhibit remarkable performance, they lack mechanisms to mitigate hallucinated sentiment, which refers to the generation of [...] Read more.

Sentiment analysis (SA) has become an essential tool for analyzing social media content in order to monitor public opinion and support digital analytics. Although transformer-based SA models exhibit remarkable performance, they lack mechanisms to mitigate hallucinated sentiment, which refers to the generation of unsupported or overconfident predictions without explicit linguistic evidence. To address this limitation, this study presents a hallucination-aware SA model by incorporating semantic grounding, interpretability-congruent supervision, and neuro-symbolic reasoning within a unified architecture. The proposed model is based on a fine-tuned Open Pre-trained Transformer (OPT) model, using three fundamental mechanisms: a Sentiment Integrity Filter (SIF), a SHapley Additive exPlanations (SHAP)-guided regularization technique, and a confidence-based lexicon-deep fusion module. The experimental analysis was conducted on two multi-class sentiment datasets that contain Twitter (now X) and Reddit posts. In Dataset 1, the suggested model achieved an average accuracy of 97.6% and a hallucination rate of 2.3%, outperforming the current transformer-based and hybrid sentiment models. With Dataset 2, the framework demonstrated strong external generalization with an accuracy of 95.8%, and a hallucination rate of 3.4%, which is significantly lower than state-of-the-art methods. These findings indicate that it is possible to include hallucination mitigation into transformer optimization without any performance degradation, offering a deployable, interpretable, and linguistically complex social media SA framework, which will enhance the reliability of neural systems of language understanding. Full article

(This article belongs to the Special Issue Next-Generation Machine Learning and Deep Learning Models for Complex Data, Vision, and Intelligent Applications)

► Show Figures

Figure 1

Search Results (213)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (213)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI