MDPI - Publisher of Open Access Journals

45 pages, 6164 KB

Open AccessSystematic Review

Advances in Emerging Digital Technologies for Sustainable Agriculture: Applications and Future Perspectives

by Carlos Diego Rodríguez-Yparraguirre, Abel José Rodríguez-Yparraguirre, Cesar Moreno-Rojo, Wendy Akemmy Castañeda-Rodríguez, Janet Verónica Saavedra-Vera, Atilio Ruben Lopez-Carranza, Iván Martin Olivares-Espino, Andrés David Epifania-Huerta, Elías Guarniz-Vásquez and Wilson Arcenio Maco-Vasquez

Earth 2026, 7(2), 63; https://doi.org/10.3390/earth7020063 (registering DOI) - 11 Apr 2026

Abstract

The agricultural sector is undergoing a profound digital transformation driven by artificial intelligence, the Internet of Things, remote sensing, robotics, blockchain, and edge computing, which are being integrated into crop monitoring, irrigation management, disease detection, and supply chain transparency systems. This study employs [...] Read more.

The agricultural sector is undergoing a profound digital transformation driven by artificial intelligence, the Internet of Things, remote sensing, robotics, blockchain, and edge computing, which are being integrated into crop monitoring, irrigation management, disease detection, and supply chain transparency systems. This study employs systematic evidence mapping to characterize the applications of emerging digital technologies in sustainable agriculture; it delineates technological trajectories, areas of application, implementation gaps, and opportunities for improvement. Adhering to the PRISMA 2020 reporting protocol, 101 peer-reviewed articles indexed in Scopus and Web of Science (2020–2025) were identified, screened, and subjected to integrated thematic and bibliometric synthesis, using RStudio Version: 2026.01.1+403 and VOSviewer 1.6.20 for data mining on keywords and technological evolution patterns. Results show that deep learning and computer vision models achieved diagnostic accuracies of 90–99%, smart irrigation systems reduced water consumption by 10–30%, predictive yield models frequently reported R² values above 0.80, and greenhouse automation reduced energy consumption by approximately 20–30%. Blockchain-based architectures improved traceability and secure data transmission by 15–20%, while remote sensing integration enhanced spatial estimation accuracy up to R² = 0.92. The findings demonstrate a measurable transition toward data-driven, resource-efficient agricultural ecosystems supported by validated digital architectures. However, interoperability limitations, lack of standardized performance metrics, scalability challenges, and uneven geographical implementation—identified in nearly 40% of studies—highlight the need for harmonized evaluation frameworks, cross-platform integration standards, and long-term field validation to ensure sustainable and scalable digital transformation. Full article

28 pages, 3527 KB

Open AccessArticle

Autonomous Tomato Harvesting System Integrating AI-Controlled Robotics in Greenhouses

by Mihai Gabriel Matache, Florin Bogdan Marin, Catalin Ioan Persu, Robert Dorin Cristea, Florin Nenciu and Atanas Z. Atanasov

Agriculture 2026, 16(8), 847; https://doi.org/10.3390/agriculture16080847 (registering DOI) - 11 Apr 2026

Abstract

Labor shortages and the need for increased productivity have accelerated the development of robotic harvesting systems for greenhouse crops; however, reliable operation under fruit occlusion and clustered arrangements remains a major challenge, particularly due to the limited integration between perception and motion planning [...] Read more.

Labor shortages and the need for increased productivity have accelerated the development of robotic harvesting systems for greenhouse crops; however, reliable operation under fruit occlusion and clustered arrangements remains a major challenge, particularly due to the limited integration between perception and motion planning modules. The paper presents the design and experimental validation of an autonomous robotic system for greenhouse tomato harvesting. The proposed platform integrates a rail-guided mobile base, a six-degrees-of-freedom robotic manipulator, and an adaptive end effector with a hybrid vision framework that combines convolutional neural networks and watershed-based segmentation to enable robust fruit detection and localization under occluded conditions. The proposed approach enables improved separation of overlapping fruits and provides accurate spatial localization through stereo vision combined with IMU-assisted camera-to-robot coordinate transformation. An occlusion-aware trajectory planning strategy was developed to generate collision-free manipulation paths in the presence of leaves and stems, enhancing harvesting safety and reliability. The system was trained and evaluated using a dataset of real greenhouse images supplemented with synthetic data augmentation. Experimental trials conducted under practical greenhouse conditions demonstrated a fruit detection precision of 96.9%, recall of 93.5%, and mean Intersection-over-Union of 79.2%. The robotic platform achieved an overall harvesting success rate of 78.5%, reaching 85% for unobstructed fruits, with an average cycle time of 15 s per fruit in direct harvesting scenarios. The rail-guided mobility significantly improved positioning stability and repeatability during manipulation compared with fully mobile platforms. The results confirm that integrating hybrid perception with occlusion-aware motion planning can substantially improve the functionality of robotic harvesting systems in protected cultivation environments. The proposed solution contributes to the advancement of automation technologies for greenhouse vegetable production and supports the transition toward more sustainable and labor-efficient agricultural practices. Full article

(This article belongs to the Special Issue AI-Powered Agricultural Robots: From Field Sensing to Autonomous Operation)

► Show Figures

Figure 1

28 pages, 16466 KB

Open AccessArticle

SAW-YOLOv8l: An Enhanced Sewer Pipe Defect Detection Model for Sustainable Urban Drainage Infrastructure Management

by Linna Hu, Hao Li, Jiahao Guo, Penghao Xue, Weixian Zha, Shihan Sun, Bin Guo and Yanping Kang

Sustainability 2026, 18(8), 3685; https://doi.org/10.3390/su18083685 - 8 Apr 2026

Viewed by 187

Abstract

Urban underground sewage pipelines often suffer from defects such as cracks, irregular joint misalignment, and stratified sedimentation blockages, which may lead to pipeline bursts, sewage overflow, and water pollution. Timely detection of abnormal defects in sewage pipelines is critical to ensuring public health [...] Read more.

Urban underground sewage pipelines often suffer from defects such as cracks, irregular joint misalignment, and stratified sedimentation blockages, which may lead to pipeline bursts, sewage overflow, and water pollution. Timely detection of abnormal defects in sewage pipelines is critical to ensuring public health and environmental sustainability. Vision-based sewage pipeline defect detection plays a crucial role in modern urban wastewater treatment systems. However, it still faces challenges such as limited feature extraction capabilities, insufficient multi-scale defect characterization, and poor positioning stability when dealing with low-contrast images and in environments with severe background interference. To address this issue, this study proposes an enhanced SAW-YOLOv8l model that integrates RT-DETR (real-time detection Transformer) with CNN (convolutional neural network) architecture. First, a C2f_SCA module improves the long-distance feature extraction capability and localization precision. Second, an AIFI-PRBN module enhances global feature correlation through attention-mechanism-based intra-scale feature interaction and reduces computational complexity using lightweight techniques. Finally, an adaptive dynamic weighted loss function based on Wise-IoU (weighted intersection over union) further improves training convergence and robustness by balancing the gradient distribution of samples. Experiments on a mixed dataset comprising Sewer-ML and industrial images demonstrate that the SAW-YOLOv8l model achieved mAP@0.5 of 86.2% and precision of 84.4%, which were improvements of 2.4% and 6.6% respectively over the baseline model, significantly enhancing the detection performance of abnormal defects in sewage pipelines. Full article

(This article belongs to the Special Issue Exploiting Image Processing, Deep Learning, Machine Learning, and Sustainable Artificial Intelligence Applications)

► Show Figures

Figure 1

25 pages, 1022 KB

Open AccessArticle

Strategic Competence in Sustainability Education: Conceptual Patterns Identified Through AI-Assisted Qualitative Analysis

by Cathérine Conradty and Franz Xaver Bogner

Sustainability 2026, 18(7), 3643; https://doi.org/10.3390/su18073643 - 7 Apr 2026

Viewed by 166

Abstract

This study investigates how participants conceptualise sustainability and sustainability citizenship, as well as how these conceptualisations relate to perceived agency. Drawing on two open-ended prompts, it analyses participants’ visions of a sustainable future and the roles they would like to play within it. [...] Read more.

This study investigates how participants conceptualise sustainability and sustainability citizenship, as well as how these conceptualisations relate to perceived agency. Drawing on two open-ended prompts, it analyses participants’ visions of a sustainable future and the roles they would like to play within it. The dataset was based on 1714 coded response segments from 164 participants. Methodologically, the study combines qualitative content analysis, independent human-AI double coding, manual validation, inter-rater reliability assessment, and residual-based co-occurrence analysis within a qualitatively grounded mixed-methods design. The results show that sustainability is predominantly framed in civic, symbolic, and ecological terms, whereas strategic competence and professionally articulated agency remain less visible. Sustainability meanings and role conceptions also vary systematically across disciplinary contexts. In addition, the analyses reveal patterned gaps between participants’ future visions and their self-attributed roles in sustainability transformations. The study contributes empirical insights into sustainability meaning-making and perceived agency and shows how LLM-assisted coding can be embedded in a transparent mixed-methods workflow. For sustainability education, the findings underline the importance of strengthening strategic and systemic dimensions of competence and linking civic engagement more closely to professional pathways of action. Full article

(This article belongs to the Special Issue Sustainability Teaching in the Face of the Global Challenges of the 21st Century)

► Show Figures

Figure 1

13 pages, 533 KB

Open AccessReview

Towards a Vision of Sustainable Health: Definitions, Related Concepts and Key Dimensions

by Samira Amil, Julie-Alexandra Moulin and Éric Gagnon

Sustainability 2026, 18(7), 3586; https://doi.org/10.3390/su18073586 - 6 Apr 2026

Viewed by 240

Abstract

Contemporary societies are facing converging crises, including environmental degradation, worsening social inequalities, aging populations, and increasingly costly healthcare systems, prompting sustainable health to be proposed as an integrative conceptual perspective for rethinking health, its determinants, and collective action. This narrative review aims to [...] Read more.

Contemporary societies are facing converging crises, including environmental degradation, worsening social inequalities, aging populations, and increasingly costly healthcare systems, prompting sustainable health to be proposed as an integrative conceptual perspective for rethinking health, its determinants, and collective action. This narrative review aims to trace the historical evolution of the concept, clarify the vision it offers for public health, and identify its implications for research, policy, and intervention. A literature search (May 2025) was conducted in PubMed, Google Scholar, and Google, with no restrictions on language, time period, or document type. Of 40 relevant documents, 21 were selected for in-depth analysis by two independent reviewers, with duplicate data extraction. The results show that sustainable health broadens the World Health Organisation (WHO) definition of health by incorporating sustainability, intergenerational justice, ecological limits, and social equity. Close to, but distinct from Planetary Health, One Health, and EcoHealth, sustainable health is based on ecological, social and ethical, economic, behavioral, intergenerational, and systemic/intersectoral dimensions. Sustainable health thus emerges as a systemic and transdisciplinary conceptual approach for transforming health systems, living environments, and public policy, requiring further conceptual clarification, robust interdisciplinary research programs, and intersectoral initiatives involving communities. Full article

► Show Figures

Figure 1

26 pages, 5737 KB

Open AccessArticle

An Improved PST-Based Visual Pose Estimation Algorithm for UAV Navigation

by Shengxin Yu, Jinfa Xu and Tianhan Yang

Appl. Sci. 2026, 16(7), 3551; https://doi.org/10.3390/app16073551 - 5 Apr 2026

Viewed by 169

Abstract

Vision-based pose estimation has been widely applied in unmanned aerial vehicle (UAV) navigation. However, existing visual pose estimation algorithms are highly sensitive to camera imaging distortion, which degrades estimation accuracy, and often suffer from noticeable jitter between frames in dynamic scenarios. To address [...] Read more.

Vision-based pose estimation has been widely applied in unmanned aerial vehicle (UAV) navigation. However, existing visual pose estimation algorithms are highly sensitive to camera imaging distortion, which degrades estimation accuracy, and often suffer from noticeable jitter between frames in dynamic scenarios. To address these issues, this paper proposes an improved visual pose estimation algorithm built upon the Perspective Similar Triangle (PST) geometric model. Using a planar fiducial marker as the observation target, the single-frame pose estimation problem is reformulated as a hierarchical geometric inference framework, including image point distortion correction, depth recovery based on planar similar triangle constraint, and rigid transformation estimation between the camera and world coordinate systems. This formulation improves pose estimation accuracy under distorted imaging conditions. To accommodate distortion variations in practical scenarios, a radial distortion coefficient update method is further designed to adaptively adjust the radial distortion parameters under single-frame observations, ensuring that the distortion model remains consistent with the actual imaging distortion and providing reliable model inputs for distortion correction in pose estimation. In addition, to enhance pose stability in dynamic scenarios, a multi-frame optical center consistency constraint (MOCCC) method is introduced to optimize the pose estimation for more stability. By constraining pose estimation across adjacent frames using the mean optical center over multiple frames as the optimization objective, the proposed method effectively suppresses pose jitter caused by single-frame observation noise. Finally, a three-degree-of-freedom (3-DOF) attitude motion platform is established, and both static and dynamic experimental scenarios are designed to validate the accuracy and stability of the proposed algorithm. Experimental results demonstrate that the proposed algorithm achieves high accuracy and high stability pose estimation under imaging distortion and small perturbations, exhibiting good robustness and suitability for practical UAV visual navigation applications. Full article

► Show Figures

Figure 1

17 pages, 1372 KB

Open AccessArticle

GastroMalign: Vision Transformer-Based Framework for Early Detection and Malignancy-Risk Stratification for High-Risk Gastrointestinal Lesions

by Sri Harsha Boppana, Sachin Sravan Kumar Komati, Medha Sharath, Aditya Chandrashekar, Gautam Maddineni, Raja Chandra Chakinala, Pradeep Yarra and C. David Mintz

J. Clin. Med. 2026, 15(7), 2701; https://doi.org/10.3390/jcm15072701 - 2 Apr 2026

Viewed by 305

Abstract

Background: Current artificial intelligence (AI) systems in gastrointestinal (GI) endoscopy primarily emphasize binary detection or static classification, providing limited support for the graded assessment of malignant potential that underpins clinical decision-making. We developed GastroMalign, a transformer-based framework designed to stratify GI lesions [...] Read more.

Background: Current artificial intelligence (AI) systems in gastrointestinal (GI) endoscopy primarily emphasize binary detection or static classification, providing limited support for the graded assessment of malignant potential that underpins clinical decision-making. We developed GastroMalign, a transformer-based framework designed to stratify GI lesions according to ordinal disease severity while maintaining clinical interpretability, addressing this unmet need in endoscopic risk assessment. Methods: This retrospective development and validation study used the publicly available GastroVision dataset, comprising 8000 de-identified endoscopic still images from the upper and lower gastrointestinal tract, including the esophagus, stomach, duodenum, colon, rectum, and terminal ileum. GastroMalign integrates a Vision Transformer (ViT) encoder with a Sequential Feature Learner that explicitly models ordinal disease severity along a benign-to-malignant spectrum. The framework produces both categorical risk classification and a continuous malignancy risk score. Images were stratified into training (80%), validation (10%), and test (10%) sets. Performance was compared with convolutional neural network (CNN) baselines and a Swin Transformer. Interpretability was assessed using Score-CAM visualizations reviewed by blinded expert endoscopists. Results: On the held-out test set (n = 800 images), GastroMalign achieved an overall accuracy of 80.06%, precision of 79.65%, recall of 80.06%, and F1-score of 79.17%, with a micro-averaged AUC of 0.98. In comparison, ResNet-50 and DenseNet-121 achieved accuracies of 32.42% and 36.77%, respectively, while the Swin Transformer achieved 60.56% accuracy (AUC = 0.93). Ablation analyses demonstrated a 17% absolute reduction in High-Risk lesion recall when the progression-aware module was removed. Continuous malignancy risk scores increased monotonically across ordinal classes, with mean values < 0.18 for Benign and >0.72 for High-Risk/Malignant lesions. Score-CAM visualizations demonstrated 92% overlap with clinician-annotated lesion regions. Conclusions: GastroMalign delivers an interpretable, progression-aware AI framework for GI lesion risk stratification that outperforms existing CNN- and transformer-based models. Clinically, GastroMalign is intended as an adjunct decision-support tool during endoscopic review to standardize lesion risk stratification (benign to malignant spectrum), support management decisions (biopsy vs. resection vs. surveillance), and reduce operator-dependent variability by pairing ordinal risk outputs with interpretable visual explanations. Full article

(This article belongs to the Special Issue Artificial Intelligence in Gastrointestinal Disorders: Current Updates from Theory to Clinical Practice)

► Show Figures

Figure 1

32 pages, 9172 KB

Open AccessArticle

Design, Modeling, Self-Calibration and Grasping Method for Modular Cable-Driven Parallel Robots

by Wanlin Mai, Yonghe Wang, Zhiquan Yang, Bin Zhu, Lin Liu and Jianqing Peng

Sensors 2026, 26(7), 2204; https://doi.org/10.3390/s26072204 - 2 Apr 2026

Viewed by 237

Abstract

Cable-driven parallel robots (CDPRs) are attractive for large-space manipulation because of their lightweight structure, large workspace, and reconfigurability. However, existing systems still face three practical challenges: limited modularity of the mechanical architecture, repeated calibration after reconfiguration, and insufficient integration between visual perception and [...] Read more.

Cable-driven parallel robots (CDPRs) are attractive for large-space manipulation because of their lightweight structure, large workspace, and reconfigurability. However, existing systems still face three practical challenges: limited modularity of the mechanical architecture, repeated calibration after reconfiguration, and insufficient integration between visual perception and grasp execution. To address these issues, this paper presents a modular cable-driven parallel robot (MCDPR), together with its kinematic modeling, vision-based self-calibration, and visual grasping methods. First, a modular mechanical architecture is developed in which the drive, sensing, and cable-guiding functions are integrated to support rapid assembly/disassembly, convenient debugging, and cable anti-slack operation. Second, a pulley-considered multilayer kinematic model is established, and a vision-based self-calibration method is proposed to identify the structural parameters after assembly using onboard sensing and AprilTag observations, thereby reducing the number of recalibrations required during robot operation after reconfiguration. Third, a vision-guided bin-picking method is developed by combining RGB-D perception, coordinate transformation, and the calibrated robot model. Simulation and prototype experiments are conducted to validate the proposed system. A software/hardware combined validation framework is established, in which the CoppeliaSim-based simulation and the hardware prototype are used together to verify the proposed design and methods. In simulation, self-calibration reduces the Euclidean grasping position error from 0.371 mm to 0.048 mm and the orientation error from 0.071° to 0.004°. In experiments, the relative position error is reduced by 58.33% after self-calibration. Full article

(This article belongs to the Special Issue Motor Control and Remote Handling in Robotic Applications)

► Show Figures

Figure 1

29 pages, 2627 KB

Open AccessArticle

Building-Level Energy Disaggregation Using AI-Based NILM Techniques in Heterogeneous Environments

by Ana Rubio-Bustos, Gloria Calleja-Rodríguez, Jorge De-La-Torre-García, Unai Fernandez-Gamiz and Ekaitz Zulueta

AI 2026, 7(4), 122; https://doi.org/10.3390/ai7040122 - 1 Apr 2026

Viewed by 389

Abstract

Non-Intrusive Load Monitoring (NILM) represents a powerful approach for energy disaggregation, which enables detailed insights into energy consumption patterns without requiring extensive sensor deployment. While significant advances have been achieved in residential NILM applications, commercial and industrial buildings remain largely underexplored despite their [...] Read more.

Non-Intrusive Load Monitoring (NILM) represents a powerful approach for energy disaggregation, which enables detailed insights into energy consumption patterns without requiring extensive sensor deployment. While significant advances have been achieved in residential NILM applications, commercial and industrial buildings remain largely underexplored despite their substantial contribution to global energy consumption. This study addresses this gap by developing and evaluating multiple artificial intelligence approaches for energy disaggregation across residential, commercial, and industrial buildings under a unified experimental protocol. We implement and compare several AI-based models, including Vision Transformer (ViT), Variational Autoencoder (VAE), Random Forest (RF), and custom architectures inspired by TimeGPT and Prophet, alongside traditional baseline methods. The proposed framework is validated using three benchmark datasets representing residential (AMPds), commercial (COmBED), and industrial (IMDELD) environments. Experimental results demonstrate that architecture–load interactions, rather than model complexity alone, are the primary determinants of disaggregation accuracy: the ViT-small configuration achieves superior performance for complex industrial loads with R² values exceeding 0.94, Random Forest proves most effective for finite-state commercial HVAC systems with R² up to 0.97, and the Prophet-inspired model excels in capturing seasonal patterns in residential appliances. These findings provide evidence-based guidelines for selecting appropriate AI models based on load characteristics, signal-to-noise ratio, and building type, contributing to the practical deployment of NILM in heterogeneous building environments. Full article

► Show Figures

Figure 1

31 pages, 7864 KB

Open AccessArticle

Development of a General-Purpose AI-Powered Robotic Platform for Strawberry Harvesting

by Muhammad Tufail, Jamshed Iqbal and Rafiq Ahmad

Agriculture 2026, 16(7), 769; https://doi.org/10.3390/agriculture16070769 - 31 Mar 2026

Viewed by 358

Abstract

The integration of emerging technologies such as robotics and artificial intelligence (AI) has the potential to transform agricultural harvesting by improving efficiency, reducing waste, lowering labor dependency, and enhancing produce quality. This paper presents the development of an intelligent robotic berry harvesting system [...] Read more.

The integration of emerging technologies such as robotics and artificial intelligence (AI) has the potential to transform agricultural harvesting by improving efficiency, reducing waste, lowering labor dependency, and enhancing produce quality. This paper presents the development of an intelligent robotic berry harvesting system that combines deep learning–based perception with autonomous robotic manipulation for real-time strawberry harvesting. A computer vision pipeline based on the YOLOv11 segmentation model was developed and integrated into a Smart Mobile Manipulator (SMM) equipped with autonomous navigation, a 6-degree-of-freedom (6-DoF) xArm 6 robotic arm, and ROS middleware to enable real-time operation. Using a publicly available strawberry dataset comprising 2,800 images collected under ridge-planted cultivation conditions, the proposed YOLOv11-small segmentation model achieved 84.41% mAP@0.5, outperforming YOLOv11 object detection, Faster R-CNN, and RT-DETR in segmentation quality while maintaining real-time performance at 10 FPS on an NVIDIA Jetson Orin Nano edge GPU. A PCA-based fruit orientation and geometric analysis method achieved 86.5% localization accuracy on 200 test images. Controlled indoor harvesting experiments using synthetic strawberries demonstrated an overall harvesting success rate of 72% across 50 trials. The proposed system provides a general-purpose platform for berry harvesting in controlled environments, offering a scalable and efficient solution for autonomous harvesting. Full article

(This article belongs to the Special Issue Advances in Robotic Systems for Precision Orchard Operations)

► Show Figures

Figure 1

21 pages, 4187 KB

Open AccessArticle

Gender-Aware Driver Drowsiness Detection Using Multi-Stream Shifted-Window-Based Hierarchical Vision Transformers

by M. Faisal Nurnoby and El-Sayed M. El-Alfy

Appl. Sci. 2026, 16(7), 3353; https://doi.org/10.3390/app16073353 - 30 Mar 2026

Viewed by 211

Abstract

Given its substantial contribution to traffic accidents, one of the main goals of intelligent driver-assistance systems has become the detection and mitigation of driver fatigue to enhance driving safety and comfort. Among various approaches, vision-based facial analysis using deep learning has emerged as [...] Read more.

Given its substantial contribution to traffic accidents, one of the main goals of intelligent driver-assistance systems has become the detection and mitigation of driver fatigue to enhance driving safety and comfort. Among various approaches, vision-based facial analysis using deep learning has emerged as an effective and non-intrusive method for identifying driver drowsiness, as a key manifestation of fatigue. However, current drowsiness detection models do not account for demographic factors like gender, even though recent research has shown gender behavioral differences such as eye closure duration, blink frequency, yawning patterns, and facial muscle relaxation. In this paper, we present a fine-grained multi-stream transformer architecture that incorporates gender-awareness and shifted-windows attention for spatial feature fusion. Integrating gender embedding, by modulating the region-based features, allows the model to effectively learn gender-conditioned drowsiness features to minimize bias and diluted representations. Using the NTHU-DDD dataset, we evaluated two-stream and three-stream variants for gender-aware and gender-agnostic across three facial region contexts: the face region with a 20% margin, bare face region, and key facial regions (face, eyes, and mouth). A comprehensive ablation study was conducted to identify the most effective model setup. The results demonstrate that incorporating gender embedding improves detection performance, achieving an accuracy of 95.47% on the evaluation set. Moreover, using the proposed three-stream model (SWT-DD-3S) produced better results. Full article

► Show Figures

Figure 1

33 pages, 6064 KB

Open AccessArticle

Federated Gastrointestinal Lesion Classification with Clinical-Entropy Guided Quantum-Inspired Token Pruning in Vision Transformers

by Muhammad Awais, Ali Mustafa Qamar, Umair Khalid and Rehan Ullah Khan

Diagnostics 2026, 16(7), 1027; https://doi.org/10.3390/diagnostics16071027 - 29 Mar 2026

Viewed by 400

Abstract

Background: Gastrointestinal (GI) cancers remain a major global health concern, where timely and accurate interpretation of endoscopic findings plays a decisive role in patient outcomes. In recent years, deep learning–based decision support systems have shown considerable potential in assisting GI diagnosis; however, their [...] Read more.

Background: Gastrointestinal (GI) cancers remain a major global health concern, where timely and accurate interpretation of endoscopic findings plays a decisive role in patient outcomes. In recent years, deep learning–based decision support systems have shown considerable potential in assisting GI diagnosis; however, their broader adoption is often limited by patient privacy regulations, uneven data availability, and the fragmented nature of clinical data across institutions. Federated learning (FL) offers a practical solution by enabling collaborative model training while keeping patient data local to each hospital. Methods: Vision Transformers (ViTs) are particularly well suited for endoscopic image analysis due to their ability to capture long-range contextual information. Nevertheless, their high computational and communication costs pose a significant challenge in federated settings, especially when data distributions vary across clients. To address this issue, we propose a privacy-preserving federated framework that combines ViTs with a Clinical-Entropy Guided Quantum Evolutionary Algorithm (CEQEA) for adaptive token pruning. The CEQEA leverages the diagnostic diversity of each client’s local dataset to guide population initialization, evolutionary updates, and mutation strength, allowing the pruning strategy to adapt naturally to different clinical profiles. Results: The proposed framework was evaluated on curated upper- and lower-GI tract subsets of the HyperKVASIR dataset under realistic non-IID federated conditions. On the final test sets, the model achieved a mean micro-averaged accuracy of 92.33% for lower-GI classification and 90.19% for upper-GI classification, while maintaining high specificity across all diagnostic classes. At the same time, the adaptive pruning strategy reduced the number of tokens processed by approximately 40% and decreased the number of required federated communication rounds by 33% compared to ViT-based federated baselines. Conclusions: Overall, these results indicate that entropy-aware, quantum-inspired evolutionary optimization can effectively balance diagnostic performance and efficiency, making transformer-based models more practical for privacy-preserving, multi-institutional gastrointestinal endoscopy. Full article

(This article belongs to the Special Issue Medical Image Analysis and Machine Learning)

► Show Figures

Figure 1

26 pages, 2135 KB

Open AccessArticle

Mapping Research Trends in Road Safety: A Topic Modeling Perspective

by Iulius Alexandru Tudor and Florin Gîrbacia

Vehicles 2026, 8(4), 69; https://doi.org/10.3390/vehicles8040069 - 27 Mar 2026

Viewed by 445

Abstract

Over the past decade, road safety research has experienced rapid development due to the rapid expansion of large crash databases, the adoption of artificial intelligence techniques, and the demand for proactive and predictive safety solutions. This study conducts a data-driven review of recent [...] Read more.

Over the past decade, road safety research has experienced rapid development due to the rapid expansion of large crash databases, the adoption of artificial intelligence techniques, and the demand for proactive and predictive safety solutions. This study conducts a data-driven review of recent research trends in transport safety. It focuses on main domains including crash severity analysis, human factors, vulnerable road users (VRUs), spatial modeling, and artificial intelligence applications. A systematic search of the Scopus database identified 15,599 relevant scientific papers published between 2016 and 2025. After constructing this corpus, titles, abstracts, and keywords were preprocessed using a natural language pipeline. The analysis employed BERTopic, a transformer-based topic modeling framework. The analysis identified 29 distinct research topics, further synthesized into five major thematic areas: (1) crash severity and injury analysis, (2) driver behavior and human factors, (3) vulnerable road users, (4) artificial intelligence, machine learning, and computer vision in intelligent transportation systems, and (5) spatial analysis and hotspot detection. A notable increase in publications related to artificial intelligence and machine learning has been evident since 2020. The results show a transition from descriptive, post-crash studies to integrated, multimodal, predictive analysis. Overall, the findings reveal a paradigm shift in the field. This study also identifies ethical and economic issues associated with the use of artificial intelligence in intelligent transportation systems, including data management, infrastructure requirements, system security, and model transparency. The results signify a transition from intuition-based models to explainable, spatially explicit, and data-intensive models, ultimately facilitating proactive risk assessment and informed decision-making. Full article

(This article belongs to the Special Issue Intelligent Mobility and Sustainable Automotive Technologies)

► Show Figures

Figure 1

34 pages, 6554 KB

Open AccessArticle

Syncretic Grad-CAM Integrated ViT-CNN Hybrids with Inherent Explainability for Early Thyroid Cancer Diagnosis from Ultrasound

by Ahmed Y. Alhafdhi, Gibrael Abosamra and Abdulrhman M. Alshareef

Diagnostics 2026, 16(7), 999; https://doi.org/10.3390/diagnostics16070999 - 26 Mar 2026

Viewed by 278

Abstract

Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, [...] Read more.

Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, many approaches focus on local tissue and provide limited, non-quantitative interpretation, reducing clinical confidence. This study proposes an integrated framework combining enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E) to integrate local feature and global relational context during learning, rather than delayed integration. Methods: The proposed framework integrates enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E), enabling simultaneous learning of local feature representations and global relational context. This design allows feature fusion during the learning stage instead of delayed integration, aiming to improve diagnostic performance and interpretability in thyroid ultrasound image analysis. Results: The best-performing model, ViT-E–DenseNet169, achieved 98.5% accuracy, 98.9% sensitivity, 99.15% specificity, and 97.35% AUC, surpassing the robust basic hybrid model (CNN–XGBoost/ANN) and existing systems. A second contribution is improved interpretability, moving from mere illustration to validation. Gradient-weighted class activation mapping (Grad-CAM) maps demonstrated distinct and clinically understandable concentration patterns across various thyroid cancers: precise intralesional concentration for high-confidence malignancies (PTC = 0.968), edge/interface concentration for capsule risk patterns (PTC = 0.957), and broader-field activation consistent with infiltration concerns (PTC = 0.984), while benign scans showed low and diffuse activation (PTC = 0.002). Spatial audits reinforced this behavior (IoU/PAP: 0.72/91%, 0.65/78%, 0.58/62%). Conclusions: The integrated ViT-E–DenseNet169 framework provides highly accurate thyroid cancer detection while offering clinically meaningful interpretability through Grad-CAM-based spatial validation, supporting improved confidence in AI-assisted ultrasound diagnosis. Full article

(This article belongs to the Special Issue Deep Learning Techniques for Medical Image Analysis)

► Show Figures

Figure 1

20 pages, 2112 KB

Open AccessArticle

CE-Fusion Botanic: A Lightweight Leaf Disease Detection Model via Adaptive Local–Global Information Fusion

by Yamei Bao, Xiaolong Qi, Huiling Wang, Tao Liu and Yuqi Bai

Appl. Sci. 2026, 16(7), 3177; https://doi.org/10.3390/app16073177 - 25 Mar 2026

Viewed by 330

Abstract

To solve the problem of limited generalization ability that is widely existing in lightweight models used for leaf disease detection, this paper puts forward a lightweight detection model named CE-Fusion Botanic, which is based on the adaptive control of local–global information fusion. Therefore, [...] Read more.

To solve the problem of limited generalization ability that is widely existing in lightweight models used for leaf disease detection, this paper puts forward a lightweight detection model named CE-Fusion Botanic, which is based on the adaptive control of local–global information fusion. Therefore, this model includes a globally guided dynamic gating fusion mechanism that dynamically adjusts fusion weights between local features, such as spot lesions, and global semantic features, such as symptoms of systemic infection, thus realizing adaptive perception of the dual characteristics of plant diseases. Hence, the local information extraction branch combines an improved MobileNetV3-Small structure and a CBAM attention mechanism, while the global information extraction branch uses a lightweight Vision Transformer (ViT) design called EffiViT. Comprehensive contrast experiments were carried out by using seven mainstream lightweight models on the PlantVillage tomato disease subset, the full-category PlantVillage leaf disease dataset, and the Grapevine leaf disease dataset. Models were divided into large-scale, medium-scale, and small-scale groups according to the number of parameters. The results show that CE-Fusion Botanic is significantly better than comparative methods in both detection accuracy and generalization performance, and at the same time, it keeps a lightweight profile, which demonstrates superior cross-dataset adaptation capabilities. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

Search Results (711)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (711)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI