Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,383)

Search Parameters:
Keywords = robust vision

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
37 pages, 7225 KB  
Review
Artificial Intelligence-Enabled Intelligent Sensory Systems for Quality Evaluation of Traditional Chinese Medicine: A Review of Electronic Nose, Electronic Tongue, and Machine Vision Approaches
by Jingqiu Shi, Jinyi Wu, Li Xu, Ce Tang and Yi Zhang
Molecules 2026, 31(7), 1140; https://doi.org/10.3390/molecules31071140 - 30 Mar 2026
Abstract
Traditional sensory evaluation of traditional Chinese medicine (TCM) and medicinal and food homologous products has long relied on human observation of appearance, color, aroma, and taste. However, this approach is highly subjective, difficult to quantify, and often lacks reproducibility across evaluators. Intelligent sensory [...] Read more.
Traditional sensory evaluation of traditional Chinese medicine (TCM) and medicinal and food homologous products has long relied on human observation of appearance, color, aroma, and taste. However, this approach is highly subjective, difficult to quantify, and often lacks reproducibility across evaluators. Intelligent sensory systems, including the electronic nose, electronic tongue, and machine vision, provide objective and digitized sensory information for TCM quality evaluation. Nevertheless, these platforms generate high-dimensional and heterogeneous datasets, creating a strong demand for efficient artificial intelligence (AI)-based analytical tools. This review summarizes recent advances in the application of machine learning and deep learning methods, such as support vector machine, random forest, convolutional neural network, and long short-term memory networks, for intelligent sensory evaluation of TCM. Particular emphasis is placed on how AI supports feature extraction, pattern recognition, classification, regression, and multisource data fusion across electronic nose, electronic tongue, and machine vision systems. Representative applications in raw material authentication, geographical origin discrimination, processing monitoring, and quality grading are also discussed. In addition, the current challenges related to data standardization, sensor drift, model robustness, and interpretability are highlighted. Overall, this review provides an integrated overview of AI-enabled intelligent sensory technologies and clarifies their potential to advance TCM quality evaluation toward a more objective, efficient, and holistic framework. Full article
26 pages, 17618 KB  
Article
Foveated Retinotopy Improves Classification and Localization in Convolutional Neural Networks
by Jean-Nicolas Jérémie, Emmanuel Daucé and Laurent U. Perrinet
Vision 2026, 10(2), 17; https://doi.org/10.3390/vision10020017 (registering DOI) - 30 Mar 2026
Abstract
From falcons spotting prey to humans recognizing faces, the ability to rapidly process visual information depends on a foveated retinal organization that provides high-acuity central vision while preserving low-resolution peripheral vision. This organization is conserved along early visual pathways, yet remains under-explored in [...] Read more.
From falcons spotting prey to humans recognizing faces, the ability to rapidly process visual information depends on a foveated retinal organization that provides high-acuity central vision while preserving low-resolution peripheral vision. This organization is conserved along early visual pathways, yet remains under-explored in machine learning. Here, we examine the impact of embedding a foveated retinotopic transformation as a preprocessing layer on convolutional neural networks (CNNs) for image classification. By applying a log-polar mapping to off-the-shelf models and retraining them, we achieve comparable accuracy while improving robustness to scale and rotation. We demonstrate that this architecture is highly sensitive to shifts in the fixation point and that this sensitivity provides an effective proxy for defining saliency maps that facilitate object localization. Our results demonstrate that foveated retinotopy encodes prior geometric knowledge, providing a solution for visual searches and a meaningful classification robustness and localization trade-off. These findings provides a proof of concept in order to connect principles of biological vision with artificial networks, suggesting new, robust and efficient approaches for computer vision systems. Full article
Show Figures

Figure 1

23 pages, 3504 KB  
Article
Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT)
by Md. Rahatul Islam, Yui Kataoka, Keisuke Teramoto and Keiichi Horio
J. Imaging 2026, 12(4), 150; https://doi.org/10.3390/jimaging12040150 - 30 Mar 2026
Abstract
Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address [...] Read more.
Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address this challenge, this study proposes a novel hybrid framework combining YOLOv8 and Vision Transformer (ViT). Using YOLOv8 for detection and ViT for global feature extraction, we trained the model on a custom dataset of 31,521 images, achieving an overall accuracy of 93.75%, and the public benchmark MOT20 dataset of 28,630 images, achieving an overall accuracy of 96.02%. Our system showed remarkable success in tracking performance, where it achieved 86.7% MOTA and 99.7% IDF1 scores. This high IDF1 score proves that the model is highly effective in preventing identity switch. The main novelty of this study is the behavioral analysis of children beyond the boundaries of surveillance, where we measure walking distance and trajectory, and screen time. Finally, through cross-dataset comparison with the MOT20 public benchmark, we demonstrated that our proposed customized model is much more effective than current state-of-the-art methods in overcoming the domain gap in specific environments such as kindergarten. Full article
Show Figures

Figure 1

16 pages, 13705 KB  
Article
PRefiner: Enhancing Overlapped Cervical Cell Segmentation Through Progressive Refinement
by Linlin Zhu, Jiaxun Li and Jiaxi Liu
Electronics 2026, 15(7), 1418; https://doi.org/10.3390/electronics15071418 - 28 Mar 2026
Viewed by 13
Abstract
Cervical cancer is one of the most prevalent and easily contracted diseases among women, significantly impacting their daily lives. Computer vision-based cervical cell morphology diagnosis technology can offer robust support for cervical cell analysis at a lower cost. However, the presence of a [...] Read more.
Cervical cancer is one of the most prevalent and easily contracted diseases among women, significantly impacting their daily lives. Computer vision-based cervical cell morphology diagnosis technology can offer robust support for cervical cell analysis at a lower cost. However, the presence of a substantial number of overlapping cells in cervical images renders existing cell segmentation methods less accurate, thereby complicating the guidance of medical diagnosis. In this paper, we introduce a tristage Progressive Refinement method (PRefiner) for overlapping cell segmentation that decouples the traditional end-to-end pipeline, with the final stage specifically correcting anomalous results to enhance precision. We achieve separable overlapping cervical cell segmentation results through a cell nucleus locator, a single-cell segmenter, and a Segmentation Result Mask Refiner. Specifically, we employ a hybrid U-Net as the primary network for the cell nucleus locator and single-cell segmenter, which determines the position of the cell nucleus and procures the initial coarse segmentation result. In the mask refiner, we incorporate a conditional generation framework to address the perception decision problem and design a local–global dual-scale discriminator to ensure that the segmentation result aligns with the prior of a single-cell mask. Experimental results on CCEDD and ISBI2015 demonstrate that PRefiner achieves optimal performance by effectively resolving abnormal segmentations. Notably, our method improves the Dice coefficient of abnormal results from five different models by an average of 2.62% (ranging from 1.0% to 5.1%). Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

24 pages, 1254 KB  
Article
ConvNeXt Meets Vision Transformers: A Powerful Hybrid Framework for Facial Age Estimation
by Gaby Maroun, Salah Eddine Bekhouche and Fadi Dornaika
Appl. Sci. 2026, 16(7), 3281; https://doi.org/10.3390/app16073281 - 28 Mar 2026
Viewed by 55
Abstract
Age estimation based on facial images is a challenging task due to the complex and nonlinear nature of facial aging, which is influenced by both genetic and environmental factors. To address this challenge, we propose a hybrid ConvNeXt–Transformer framework that combines convolutional local [...] Read more.
Age estimation based on facial images is a challenging task due to the complex and nonlinear nature of facial aging, which is influenced by both genetic and environmental factors. To address this challenge, we propose a hybrid ConvNeXt–Transformer framework that combines convolutional local feature extraction with attention-based global contextual modeling within a unified age regression pipeline. The methodological contribution of this work lies in the sequential integration of these two complementary paradigms for facial age estimation, allowing the model to capture both fine-grained textural cues—such as wrinkles and skin spots—and long-range spatial dependencies. We evaluate the proposed framework on benchmark datasets including MORPH II, CACD, UTKFace, and AFAD. The results show competitive performance across these datasets and confirm the effectiveness of the proposed hybrid design through extensive ablation analyses. Experimental results demonstrate that our approach achieves state-of-the-art MAE on MORPH II (2.26), CACD (4.35), and AFAD (3.09) under the adopted benchmark settings while remaining competitive on UTKFace. To address computational efficiency, we employ ImageNet pre-trained backbones and explore different architectural configurations, including fusion strategies and varying depths of the Transformer module, as well as regularization techniques such as stochastic depth and label smoothing. Ablation studies confirm the contribution of each component, particularly the role of attention mechanisms, in enhancing the model’s sensitivity to age-relevant features. Overall, the proposed hybrid framework provides a robust and accurate solution for facial age estimation, effectively balancing performance and computational cost. Full article
(This article belongs to the Special Issue Applications of Data Science and Artificial Intelligence, 2nd Edition)
Show Figures

Figure 1

25 pages, 9331 KB  
Article
Numerical Investigation on Hydrodynamic Characteristics of Variable Flexible Tube Underwater Object Suction Robot
by Yida Zhu, Fenglei Han, Qing Chang, Wangyuan Zhao, Shuxuan Liang and Jiaqi Yu
J. Mar. Sci. Eng. 2026, 14(7), 624; https://doi.org/10.3390/jmse14070624 - 27 Mar 2026
Viewed by 169
Abstract
Remotely operated underwater vehicles (ROVs) play a significant role in the domain of underwater robotics, as observed in the field of deep-sea aquaculture. However, conventional stationary suction-tube underwater collection robots often struggle to efficiently collect target organisms located within complex reef environments. To [...] Read more.
Remotely operated underwater vehicles (ROVs) play a significant role in the domain of underwater robotics, as observed in the field of deep-sea aquaculture. However, conventional stationary suction-tube underwater collection robots often struggle to efficiently collect target organisms located within complex reef environments. To address this limitation, this paper proposes an underwater object suction robot with a variable flexible tube. For vision-based object recognition tasks, stable vehicle motion is essential, as hydrodynamic disturbances can significantly degrade visual accuracy. Therefore, a systematic numerical investigation is conducted into the hydrodynamic characteristics of the ROV under different suction-tube shapes. Computational fluid dynamics (CFD) simulations are used to evaluate the resistance acting on the vehicle. The results provide guidance for motion control strategies aimed at reducing disturbance effects and improving the robustness of underwater robotic vision. Full article
(This article belongs to the Special Issue Infrastructure for Offshore Aquaculture Farms)
Show Figures

Figure 1

50 pages, 7780 KB  
Systematic Review
Intelligent Eyes on Buildings: A Scientometric Mapping and Systematic Review of AI-Based Crack Detection and Predictive Diagnostics of Building Structures
by Mehdi Mohagheghi, Ali Bahadori-Jahromi and Shah Room
Encyclopedia 2026, 6(4), 75; https://doi.org/10.3390/encyclopedia6040075 - 27 Mar 2026
Viewed by 229
Abstract
Artificial Intelligence (AI)-based crack detection in buildings uses computer vision and deep learning to automatically identify structural cracks from inspection images. In recent years, many studies have explored this topic, but the overall development of the field, its methodological practices, and the remaining [...] Read more.
Artificial Intelligence (AI)-based crack detection in buildings uses computer vision and deep learning to automatically identify structural cracks from inspection images. In recent years, many studies have explored this topic, but the overall development of the field, its methodological practices, and the remaining challenges are still not fully clear. Unlike most previous reviews that focus mainly on technical methods, this study combines a large-scale scientometric mapping of the research field with a focused technical analysis of recent AI-based crack detection methods specifically applied to building structures. This study therefore provides a dual-layer review covering research published between 2015 and 2025. A total of 146 Scopus-indexed publications were analysed using Visualization of Similarities viewer (VOSviewer) to examine publication growth, thematic evolution, collaboration patterns, and citation structures. In addition, a focused technical review of 36 highly relevant studies was carried out to analyse task formulations, model families, datasets, evaluation protocols, and methodological practices. The results show a rapid increase in research activity after 2020, largely driven by advances in deep-learning and Unmanned Aerial Vehicle (UAV)-based inspections. At the same time, collaboration networks remain uneven, and citation influence is concentrated in a limited number of research communities. The technical review further shows that most studies focus on detection-level tasks, particularly You Only Look Once (YOLO)-based models, while predictive diagnostics, automated inspection reporting, and decision-oriented Structural Health Monitoring (SHM) are still rarely addressed. Current datasets and evaluation protocols also remain mostly perception-oriented, which makes it difficult to assess robustness, generalisability and long-term predictive capability. Full article
Show Figures

Figure 1

36 pages, 7711 KB  
Article
Integrating Visual Perception with Conservative Enhanced Bio-Inspired Optimization for Safe UAV Trajectory Planning
by Qiushuang Gao, Zhenshen Qu, Qihang Zhang and Yuhao Shang
Appl. Sci. 2026, 16(7), 3245; https://doi.org/10.3390/app16073245 - 27 Mar 2026
Viewed by 94
Abstract
Unmanned Aerial Vehicle (UAV) trajectory planning in complex three-dimensional environments with threats remains a challenging optimization problem requiring efficient algorithms and threat detection capabilities. This study proposes the Conservative Enhanced Dwarf Mongoose Optimization Algorithm (CEDMOA), which introduces four key innovations to the original [...] Read more.
Unmanned Aerial Vehicle (UAV) trajectory planning in complex three-dimensional environments with threats remains a challenging optimization problem requiring efficient algorithms and threat detection capabilities. This study proposes the Conservative Enhanced Dwarf Mongoose Optimization Algorithm (CEDMOA), which introduces four key innovations to the original DMOA: hybrid population initialization, adaptive vocalization parameters, elite-guided learning strategy, and intelligent restart mechanisms. This work proposed the integration of CEDMOA with a novel vision-based threat detection system using YOLO object detection technology, enabling the identification and incorporation of threats into the optimization process. CEDMOA was comprehensively evaluated on the CEC2022 benchmark test suite, demonstrating superior performance compared to other state-of-the-art algorithms in solution quality and convergence stability. The results show the approach successfully generates an optimal collision-free flight trajectory in complex environments in UAV trajectory planning with both static and dynamic threats. Combining metaheuristic optimization with computer vision technology provides a robust framework for autonomous navigation that adapts to changing threat conditions. Experimental results validate the effectiveness of both the enhanced algorithm and the vision-based threat integration approach for practical UAV operations. Full article
(This article belongs to the Special Issue Latest Research on Computer Vision and Its Application)
Show Figures

Figure 1

19 pages, 3480 KB  
Article
Adapting Vision–Language Models for Few-Shot Industrial Defect Detection
by Chayanon Sub-r-pa and Rung-Ching Chen
Algorithms 2026, 19(4), 259; https://doi.org/10.3390/a19040259 - 27 Mar 2026
Viewed by 140
Abstract
Automated surface defect detection often faces a “cold-start” problem due to limited annotated data for new anomalies. Traditional object detectors struggle to converge in such few-shot settings. To address this, we adapt Vision–Language Models (VLMs), specifically YOLO-World. We use semantic pre-training to mitigate [...] Read more.
Automated surface defect detection often faces a “cold-start” problem due to limited annotated data for new anomalies. Traditional object detectors struggle to converge in such few-shot settings. To address this, we adapt Vision–Language Models (VLMs), specifically YOLO-World. We use semantic pre-training to mitigate data scarcity. We evaluate this approach on the MVTec AD dataset in bounding-box format. We use a strict 1:9 train-validation split, resulting in an average of 11.8 defect instances per category. YOLO-World surpasses traditional baselines, like YOLOv11s and YOLOv26s, in 12 of 15 categories. The optimized VLM pipeline achieves up to 64.9% mAP@50 on texture-heavy categories, such as Tile, with only nine training instances. Ablation studies show standard optimization techniques are limited under 10-shot constraints. We find a critical augmentation divide. Disabling spatial distortions (Mosaic) is vital to preserving rigid-object geometry. The Normalized Wasserstein Distance (NWD) improves the localization of microscopic anomalies. Varifocal Loss (VFL) often causes model collapse. Ultimately, VLMs offer a superior foundation for cold-start inspection but require carefully tailored pipelines for robustness. Full article
Show Figures

Figure 1

30 pages, 2925 KB  
Review
Microalgae-Driven Circular Agriculture: System Integration, Nutrient Recovery, and AI-Assisted Optimization
by Xiaoyan Liu, Lijuan Wang, Chunyu Xing, Haiyan Liu, Guanghong Luo and Shenghui Yang
Microorganisms 2026, 14(4), 753; https://doi.org/10.3390/microorganisms14040753 - 27 Mar 2026
Viewed by 154
Abstract
With rising global pressures on resources and the environment, transitioning out of our traditional linear agricultural models is long overdue. By itself, circular agriculture seeks to close loops for nutrients, but it also has a future that is constrained by the fragmentation of [...] Read more.
With rising global pressures on resources and the environment, transitioning out of our traditional linear agricultural models is long overdue. By itself, circular agriculture seeks to close loops for nutrients, but it also has a future that is constrained by the fragmentation of process integration, lack of system integration and optimization, and poor adaptive decision-making under the often very variable circumstances of agricultural systems. Microalgae are a versatile photosynthetic platform with unique value in this context. They can recover key nutrients (nitrogen, phosphorus and carbon) from agricultural wastes simultaneously and also convert these vital nutrients into multipurpose biomass. Here, this review synthesizes the multifunction of microalgae towards sustainable agriculture, with a particular emphasis on nutrient recycling and the use of whole microalgal biomass. Downstream applications are manifold, ranging from agricultural outputs, such as biofertilizers and biostimulants, to different products of high value (HVPs). Realizing this potential requires practical challenges to be addressed in integrated system design, coupling and scaling up. AI-assisted modelling and optimization have already started emerging as important tools for this purpose. Reliable system optimization relies on defining objective functions and balancing resource recovery efficiency and economic output, which in turn enables robust multi-objective decision-making. Concluding this review, we propose a holistic vision from a central integral biorefinery concept. Our framework clearly demonstrates how to fully enhance competitiveness, sustainability and scalability of microalgae-based agricultural systems through co-integrated high-value utilization and nutrient cycling. Full article
(This article belongs to the Section Environmental Microbiology)
Show Figures

Graphical abstract

34 pages, 6554 KB  
Article
Syncretic Grad-CAM Integrated ViT-CNN Hybrids with Inherent Explainability for Early Thyroid Cancer Diagnosis from Ultrasound
by Ahmed Y. Alhafdhi, Gibrael Abosamra and Abdulrhman M. Alshareef
Diagnostics 2026, 16(7), 999; https://doi.org/10.3390/diagnostics16070999 - 26 Mar 2026
Viewed by 144
Abstract
Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, [...] Read more.
Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, many approaches focus on local tissue and provide limited, non-quantitative interpretation, reducing clinical confidence. This study proposes an integrated framework combining enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E) to integrate local feature and global relational context during learning, rather than delayed integration. Methods: The proposed framework integrates enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E), enabling simultaneous learning of local feature representations and global relational context. This design allows feature fusion during the learning stage instead of delayed integration, aiming to improve diagnostic performance and interpretability in thyroid ultrasound image analysis. Results: The best-performing model, ViT-E–DenseNet169, achieved 98.5% accuracy, 98.9% sensitivity, 99.15% specificity, and 97.35% AUC, surpassing the robust basic hybrid model (CNN–XGBoost/ANN) and existing systems. A second contribution is improved interpretability, moving from mere illustration to validation. Gradient-weighted class activation mapping (Grad-CAM) maps demonstrated distinct and clinically understandable concentration patterns across various thyroid cancers: precise intralesional concentration for high-confidence malignancies (PTC = 0.968), edge/interface concentration for capsule risk patterns (PTC = 0.957), and broader-field activation consistent with infiltration concerns (PTC = 0.984), while benign scans showed low and diffuse activation (PTC = 0.002). Spatial audits reinforced this behavior (IoU/PAP: 0.72/91%, 0.65/78%, 0.58/62%). Conclusions: The integrated ViT-E–DenseNet169 framework provides highly accurate thyroid cancer detection while offering clinically meaningful interpretability through Grad-CAM-based spatial validation, supporting improved confidence in AI-assisted ultrasound diagnosis. Full article
(This article belongs to the Special Issue Deep Learning Techniques for Medical Image Analysis)
Show Figures

Figure 1

33 pages, 15024 KB  
Article
HFA-Net: Explainable Multi-Scale Deep Learning Framework for Illumination-Invariant Plant Disease Diagnosis in Precision Agriculture
by Muhammad Hassaan Ashraf, Farhana Jabeen, Muhammad Waqar and Ajung Kim
Sensors 2026, 26(7), 2067; https://doi.org/10.3390/s26072067 - 26 Mar 2026
Viewed by 299
Abstract
Robust plant disease detection in real-world agricultural environments remains challenging due to dynamic environmental conditions. Accurate and reliable disease identification is essential for precision agriculture and effective crop management. Although computer vision and Artificial Intelligence (AI) have shown promising results in controlled settings, [...] Read more.
Robust plant disease detection in real-world agricultural environments remains challenging due to dynamic environmental conditions. Accurate and reliable disease identification is essential for precision agriculture and effective crop management. Although computer vision and Artificial Intelligence (AI) have shown promising results in controlled settings, their performance often drops under lesion scale variability, inter- and intra-class similarity among diseases, class imbalance, and illumination fluctuations. To overcome these challenges, we propose a Heterogeneous Feature Aggregation Network (HFA-Net) that brings together architectural improvements, illumination-aware preprocessing, and training-level enhancements into a single cohesive framework. To extract richer and more discriminative features from the early layers of the network, HFA-Net introduces a multi-scale, multi-level feature aggregation stem. The Reduction-Expansion (RE) mechanism helps preserve important lesion details while adapting to variations in scale. Considering real agricultural environments, an Illumination-Adaptive Contrast Enhancement (IACE) preprocessing pipeline is designed to address illumination variability in real agricultural environments. Experimental results show that HFA-Net achieves 96.03% accuracy under normal conditions and maintains strong performance under challenging lighting scenarios, achieving 92.95% and 93.07% accuracy in extremely dark and bright environments, respectively. Furthermore, quantitative explainability analysis using perturbation-based metrics demonstrates that the model’s predictions are not only accurate but also faithful to disease-relevant regions. Finally, Grad-CAM-based visual explanations confirm that the model’s predictions are driven by disease-specific regions, enhancing interpretability and practical reliability. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

33 pages, 1803 KB  
Article
An AI-Driven Dual-Spectral Vision–Language Sensing Framework for Intelligent Agricultural Phenotyping
by Lei Shi, Zhiyuan Chen, Chengze Li, Yang Hu, Xintong Wang, Haibo Wang and Yihong Song
Sensors 2026, 26(7), 2045; https://doi.org/10.3390/s26072045 - 25 Mar 2026
Viewed by 163
Abstract
Seed varietal purity and physiological viability are critical determinants of crop yield and quality. However, non-destructive assessment faces significant challenges in fine-grained variety discrimination and the perception of internal defects. This study proposes S3-Net, an AI-driven multimodal sensing framework that integrates vision–language alignment [...] Read more.
Seed varietal purity and physiological viability are critical determinants of crop yield and quality. However, non-destructive assessment faces significant challenges in fine-grained variety discrimination and the perception of internal defects. This study proposes S3-Net, an AI-driven multimodal sensing framework that integrates vision–language alignment with dual-spectral sensor fusion for autonomous seed quality evaluation. We introduce a Knowledge–Vision Alignment (KVA) module that incorporates encyclopedic morphological descriptions to guide feature learning, significantly enhancing few-shot generalization. Complementarily, a Dual-Spectral Fusion (DSF) module combines high-resolution RGB textures with penetrative Short-Wave Infrared (SWIR) sensing to jointly characterize external and internal traits. Experimental results on a custom multimodal dataset of 6000 samples across 12 crop categories demonstrate that S3-Net achieves 96.9% accuracy for species identification and 95.8% for viability detection. Notably, S3-Net outperforms ResNet-50 by 40.3% in extreme 1-shot scenarios. With a stable inference throughput of 95 fps, the system meets the high-throughput demands of industrial-scale applications, providing a robust and efficient solution for intelligent agricultural phenotyping. Full article
(This article belongs to the Special Issue Artificial Intelligence-Driven Sensing)
Show Figures

Figure 1

22 pages, 2243 KB  
Article
Multimodal Fake News Detection via Evidence Retrieval and Visual Forensics with Large Vision-Language Models
by Liwei Dong, Yanli Chen, Wei Ke, Hanzhou Wu, Lunzhi Deng and Guixiang Liao
Information 2026, 17(4), 317; https://doi.org/10.3390/info17040317 - 25 Mar 2026
Viewed by 227
Abstract
Fake news has caused significant harm and disruption across various sectors of society. With the rapid advancement of the Internet and social media platforms, both academic and industrial communities have shown growing interest in multimodal fake news detection. In this work, we propose [...] Read more.
Fake news has caused significant harm and disruption across various sectors of society. With the rapid advancement of the Internet and social media platforms, both academic and industrial communities have shown growing interest in multimodal fake news detection. In this work, we propose MERF (Multimodal Evidence Retrieval and Forensics with LVLM), a unified framework for multimodal fake news detection that leverages the reasoning capabilities of Large Vision-Language Models (LVLMs). While LVLMs outperform traditional Large Language Models (LLMs) in processing multimodal content, our study reveals that their reasoning abilities remain limited in the absence of sufficient supporting evidence. MERF addresses this challenge by integrating web-based content retrieval, reverse image search, and image manipulation detection into a coherent pipeline, enabling the model to generate informed and explainable veracity judgments. Specifically, our approach performs cross-modal consistency checking, retrieves corroborative information for both textual and visual content, and applies forensic analysis to detect potential visual forgeries. The aggregated evidence is then fed into the LVLM, facilitating comprehensive reasoning and evidence-based decision-making. Experimental results on two public benchmark datasets—Weibo and Twitter—demonstrate that MERF consistently outperforms state-of-the-art baselines across all major evaluation metrics, achieving substantial improvements in accuracy, robustness, and interpretability. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

25 pages, 3673 KB  
Systematic Review
Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review
by Carlos Julio Fierro-Silva, Carolina Del-Valle-Soto, Samih M. Mostafa and José Varela-Aldás
Algorithms 2026, 19(4), 249; https://doi.org/10.3390/a19040249 (registering DOI) - 25 Mar 2026
Viewed by 252
Abstract
The rapid deployment of surveillance cameras in urban, industrial, and domestic environments has intensified the need for intelligent systems capable of analyzing video streams beyond the limitations of single-camera setups. Unlike traditional single-camera approaches, multi-camera systems expand spatial coverage, reduce blind spots, and [...] Read more.
The rapid deployment of surveillance cameras in urban, industrial, and domestic environments has intensified the need for intelligent systems capable of analyzing video streams beyond the limitations of single-camera setups. Unlike traditional single-camera approaches, multi-camera systems expand spatial coverage, reduce blind spots, and enable consistent tracking of people and objects across non-overlapping views, thereby improving robustness against occlusions and viewpoint changes. This article presents a comprehensive review of multi-camera vision systems published between 2020 and 2025, covering application domains including public security and biometrics, intelligent transportation, smart cities and IoT, healthcare monitoring, precision agriculture, industry and robotics, pan–tilt–zoom (PTZ) camera networks, and emerging areas such as retail and forensic analysis. The review synthesizes predominant technical approaches, including deep-learning-based detection, multi-target multi-camera tracking (MTMCT), re-identification (Re-ID), spatiotemporal fusion, and edge computing architectures. Persistent challenges are identified, particularly in inter-camera data association, scalability, computational efficiency, privacy preservation, and dataset availability. Emerging trends such as distributed edge AI, cooperative camera networks, and active perception are discussed to outline future research directions toward scalable, privacy-aware, and intelligent multi-camera infrastructures. Full article
Show Figures

Figure 1

Back to TopTop