Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,191)

Search Parameters:
Keywords = vision-based monitoring

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 911 KB  
Article
Lightweight Remote Sensing Image Change Caption with Hierarchical Distillation and Dual-Constrained Attention
by Xiude Wang, Xiaolan Xie and Zhongyi Zhai
Electronics 2026, 15(1), 17; https://doi.org/10.3390/electronics15010017 (registering DOI) - 19 Dec 2025
Abstract
Remote sensing image change captioning (RSICC) fuses computer vision and natural language processing to translate visual differences between bi-temporal remote sensing images into interpretable text, with applications in environmental monitoring, urban planning, and disaster assessment. Multimodal Large Language Models (MLLMs) boost RSICC performance [...] Read more.
Remote sensing image change captioning (RSICC) fuses computer vision and natural language processing to translate visual differences between bi-temporal remote sensing images into interpretable text, with applications in environmental monitoring, urban planning, and disaster assessment. Multimodal Large Language Models (MLLMs) boost RSICC performance but suffer from inefficient inference due to massive parameters, whereas lightweight models enable fast inference yet lack generalization across diverse scenes, which creates a critical timeliness-generalization trade-off. To address this, we propose the Dual-Constrained Transformer (DCT), an end-to-end lightweight RSICC model with three core modules and a decoder. Full-Level Feature Distillation (FLFD) transfers hierarchical knowledge from a pre-trained Dinov3 teacher to a Generalizable Lightweight Visual Encoder (GLVE), enhancing generalization while retaining compactness. Key Change Region Adaptive Weighting (KCR-AW) generates Region Difference Weights (RDW) to emphasize critical changes and suppress backgrounds. Hierarchical encoding and Difference weight Constrained Attention (HDC-Attention) refine multi-scale features via hierarchical encoding and RDW-guided noise suppression; these features are fused by multi-head self-attention and fed into a Transformer decoder for accurate descriptions. The DCT resolves three core issues: lightweight encoder generalization, key change recognition, and multi-scale feature-text association noise, achieving a dynamic balance between inference efficiency and description quality. Experiments on the public LEVIR-CC dataset show our method attains SOTA among lightweight approaches and matches advanced MLLM-based methods with only 0.98% of their parameters. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

24 pages, 2676 KB  
Article
The Adaptive Lab Mentor (ALM): An AI-Driven IoT Framework for Real-Time Personalized Guidance in Hands-On Engineering Education
by Md Shakib Hasan, Awais Ahmed, Nouman Rasool, MST Mosaddeka Naher Jabe, Xiaoyang Zeng and Farman Ali Pirzado
Sensors 2025, 25(24), 7688; https://doi.org/10.3390/s25247688 - 18 Dec 2025
Abstract
Engineering education is based on experiential learning, but the problem is that in laboratory conditions, it is difficult to give feedback to the students in real time and personalize this feedback. The paper introduces the proposal of an innovative approach to the laboratories, [...] Read more.
Engineering education is based on experiential learning, but the problem is that in laboratory conditions, it is difficult to give feedback to the students in real time and personalize this feedback. The paper introduces the proposal of an innovative approach to the laboratories, called Adaptive Lab Mentor (ALM), which combines the technologies of Artificial Intelligence (AI), Internet of Things (IoT), and sensor technology to facilitate intelligent and customized laboratory setting. ALM is supported by a new real-time multimodal sensor fusion model in which a sensor-instrumented laboratory is used to record real-time electrical measurements (voltage and current) which are used in parallel with symbolic component measurements (target resistance) with a lightweight, dual-input Convolutional Neural Network (1D-CNN) running on an edge device. In this initial validation, visual context is presented as a symbolic target value, which establishes a pathway for the future integration of full computer vision. The architecture will enable monitoring of the student progress, making error diagnoses within a short time period, and provision of adaptive feedback based on information available in the context. To test this strategy, a high-fidelity model of an Ohm Laboratory was developed. LTspice was used to generate a huge amount of current and voltage time series of various circuit states. The trained model achieved 93.3% test accuracy and demonstrated that the proposed system could be applied. The ALM model, compared to the current Intelligent Tutoring Systems, is based on physical sensing and edge AI inference in real-time, as well as adaptive and safety-sensitive feedback throughout hands-on engineering demonstrations. The ALM framework serves as a blueprint for the new smart laboratory assistant. Full article
(This article belongs to the Special Issue AI and Sensors in Computer-Based Educational Systems)
12 pages, 2468 KB  
Article
A Real-World Underwater Video Dataset with Labeled Frames and Water-Quality Metadata for Aquaculture Monitoring
by Osbaldo Aragón-Banderas, Leonardo Trujillo, Yolocuauhtli Salazar, Guillaume J. V. E. Baguette and Jesús L. Arce-Valdez
Data 2025, 10(12), 211; https://doi.org/10.3390/data10120211 - 18 Dec 2025
Abstract
Aquaculture monitoring increasingly relies on computer vision to evaluate fish behavior and welfare under farming conditions. This dataset was collected in a commercial recirculating aquaculture system (RAS) integrated with hydroponics in Queretaro, Mexico, to support the development of robust visual models for Nile [...] Read more.
Aquaculture monitoring increasingly relies on computer vision to evaluate fish behavior and welfare under farming conditions. This dataset was collected in a commercial recirculating aquaculture system (RAS) integrated with hydroponics in Queretaro, Mexico, to support the development of robust visual models for Nile tilapia (Oreochromis niloticus). More than ten hours of underwater recordings were curated into 31 clips of 30 s each, a duration selected to balance representativeness of fish activity with a manageable size for annotation and training. Videos were captured using commercial action cameras at multiple resolutions (1920 × 1080 to 5312 × 4648 px), frame rates (24–60 fps), depths, and lighting configurations, reproducing real-world challenges such as turbidity, suspended solids, and variable illumination. For each recording, physicochemical parameters were measured, including temperature, pH, dissolved oxygen and turbidity, and are provided in a structured CSV file. In addition to the raw videos, the dataset includes 3520 extracted frames annotated using a polygon-based JSON format, enabling direct use for training object detection and behavior recognition models. This dual resource of unprocessed clips and annotated images enhances reproducibility, benchmarking, and comparative studies. By combining synchronized environmental data with annotated underwater imagery, the dataset contributes a non-invasive and versatile resource for advancing aquaculture monitoring through computer vision. Full article
Show Figures

Figure 1

26 pages, 7907 KB  
Review
Non-Destructive Testing for Conveyor Belt Monitoring and Diagnostics: A Review
by Aleksandra Rzeszowska, Ryszard Błażej and Leszek Jurdziak
Appl. Sci. 2025, 15(24), 13272; https://doi.org/10.3390/app152413272 - 18 Dec 2025
Abstract
Conveyor belts are among the most critical components of material transport systems across various industrial sectors, including mining, energy, cement production, metallurgy, and logistics. Their reliability directly affects the continuity and operational costs. Traditional methods for assessing belt condition often require downtime, are [...] Read more.
Conveyor belts are among the most critical components of material transport systems across various industrial sectors, including mining, energy, cement production, metallurgy, and logistics. Their reliability directly affects the continuity and operational costs. Traditional methods for assessing belt condition often require downtime, are labor-intensive, and involve a degree of subjectivity. In recent years, there has been a growing interest in non-destructive and remote diagnostic techniques that enable continuous and automated condition monitoring. This paper provides a comprehensive review of current diagnostic solutions, including machine vision systems, infrared thermography, ultrasonic and acoustic techniques, magnetic inspection methods, vibration sensors, and modern approaches based on radar and hyperspectral imaging. Particular attention is paid to the integration of measurement systems with artificial intelligence algorithms for automated damage detection, classification, and failure prediction. The advantages and limitations of each method are discussed, along with the perspectives for future development, such as digital twin concepts and predictive maintenance. The review aims to present recent trends in non-invasive diagnostics of conveyor belts using remote and non-destructive testing techniques, and to identify research directions that can enhance the reliability and efficiency of industrial transport systems. Full article
(This article belongs to the Special Issue Nondestructive Testing and Metrology for Advanced Manufacturing)
Show Figures

Figure 1

17 pages, 5885 KB  
Article
Real-Time Detection of Dynamic Targets in Dynamic Scattering Media
by Ying Jin, Wenbo Zhao, Siyu Guo, Jiakuan Zhang, Lixun Ye, Chen Nie, Yiyang Zhu, Hongfei Yu, Cangtao Zhou and Wanjun Dai
Photonics 2025, 12(12), 1242; https://doi.org/10.3390/photonics12121242 - 18 Dec 2025
Abstract
In dynamic scattering media (such as rain, fog, biological tissues, etc.) environments, scattered light causes severe degradation of target images, directly leading to a sudden drop in the detection confidence of target detection models and a significant increase in the rate of missed [...] Read more.
In dynamic scattering media (such as rain, fog, biological tissues, etc.) environments, scattered light causes severe degradation of target images, directly leading to a sudden drop in the detection confidence of target detection models and a significant increase in the rate of missed detections. This is a key challenge in the intersection of optical imaging and computer vision. Aiming to address the problems of poor generalization and slow reasoning speed of existing schemes, we construct an end-to-end framework of multi-stage preprocessing, customized network reconstruction, and object detection based on the existing network framework. First, we optimize the original degraded image through preprocessing to suppress scattered noise from the source and retain the key features for detection. Relying on a lightweight and customized network (with only 8.20 M of parameters), high-fidelity reconstruction is achieved to further reduce scattering interference and ultimately complete target detection. The reasoning speed of this framework is significantly better than that of the existing network. On RTX4060, the network’s reasoning ability reaches 147.93 frames per second. After reconstruction, the average confidence level of dynamic object detection is 0.95 with a maximum of 0.99, effectively solving the problem of detection failure in dynamic scattering media. It can provide technical support for scenarios such as unmanned aerial vehicle (UAV) monitoring in foggy weather, biomedical target recognition, and low-altitude security. Full article
Show Figures

Figure 1

9 pages, 1886 KB  
Proceeding Paper
On the Optimization of Additively Manufactured Part Quality Through Process Monitoring: The Wire DED-LB Case
by Konstantinos Tzimanis, Michail S. Koutsokeras, Nikolas Porevopoulos and Panagiotis Stavropoulos
Eng. Proc. 2025, 119(1), 26; https://doi.org/10.3390/engproc2025119026 - 17 Dec 2025
Abstract
The wire Laser-based Directed Energy Deposition (DED-LB) metal additive manufacturing (AM) process is time- and cost-effective, providing high-quality, dense parts while supporting multi-scale manufacturing, repair, and repurposing services. However, its ability to consistently produce parts of uniform quality depends on process stability, which [...] Read more.
The wire Laser-based Directed Energy Deposition (DED-LB) metal additive manufacturing (AM) process is time- and cost-effective, providing high-quality, dense parts while supporting multi-scale manufacturing, repair, and repurposing services. However, its ability to consistently produce parts of uniform quality depends on process stability, which can be achieved through monitoring and controlling key process phenomena, such as heat accumulation and variations in the distance between the deposition head and the working surface (standoff distance). Part quality is closely linked to achieving predictable melt pool dimensions and stable thermal conditions, which in turn influence the end-part’s cross-sectional stability, overall dimensions, and mechanical properties. This work presents a workflow that correlates process and metrology data, enabling the determination of tunable process parameters and their operating process window. The process data are acquired using a vision-based monitoring system and a load-cell embedded in the deposition head, which together detect variations in melt pool area and standoff distance during the process, while metrology devices assess the part quality. Finally, this monitoring setup and its ability to capture the complete process history are fundamental for developing in-line control strategies, enabling optimized, supervision-free, and repeatable processes. Full article
Show Figures

Figure 1

14 pages, 893 KB  
Entry
NOOR: Saudi Arabia’s National Platform for Educational Data Governance and Digital Transformation
by Dalia EL Khaled, Nuria Novas, Jose Antonio Gazquez and Wiam Ragheb
Encyclopedia 2025, 5(4), 216; https://doi.org/10.3390/encyclopedia5040216 - 16 Dec 2025
Viewed by 94
Definition
NOOR is the Kingdom of Saudi Arabia’s national Educational Management Information System (EMIS), developed by the Ministry of Education to digitize and streamline academic and administrative processes across public schools. Through its unified digital infrastructure, the platform enables essential functions such as student [...] Read more.
NOOR is the Kingdom of Saudi Arabia’s national Educational Management Information System (EMIS), developed by the Ministry of Education to digitize and streamline academic and administrative processes across public schools. Through its unified digital infrastructure, the platform enables essential functions such as student enrolment, grade and attendance management, curriculum administration, and communication with families. Beyond its operational role, NOOR is regarded as a flexible digital foundation, with a predictive architecture, modular integration, and distributed infrastructure which position it as a potential model for broader public-service domains, including healthcare and digital governance. NOOR’s design supports equitable access, facilitates cooperation between educational organizations, and provides real-time data to inform evidence-based decision making. These capabilities contribute to improving learning processes, though their impact depends on wider institutional and pedagogical environments. The system has already demonstrated progress in areas such as data accuracy, academic monitoring, family engagement, and reporting efficiency. Aligned with Saudi Arabia’s Vision 2030 and the Tanweer educational reform program, NOOR reflects the national shift toward centralized, data-driven management of public education. With more than 12 million users, it is one of the largest EMIS platforms in the Middle East and contributes to global discussions on how integrated digital infrastructures can support impactful educational reform. Full article
(This article belongs to the Collection Encyclopedia of Social Sciences)
Show Figures

Graphical abstract

27 pages, 4420 KB  
Article
Real-Time Quarry Truck Monitoring with Deep Learning and License Plate Recognition: Weighbridge Reconciliation for Production Control
by Ibrahima Dia, Bocar Sy, Ousmane Diagne, Sidy Mané and Lamine Diouf
Mining 2025, 5(4), 84; https://doi.org/10.3390/mining5040084 - 14 Dec 2025
Viewed by 135
Abstract
This paper presents a real-time quarry truck monitoring system that combines deep learning and license plate recognition (LPR) for operational monitoring and weighbridge reconciliation. Rather than estimating load volumes directly from imagery, the system ensures auditable matching between detected trucks and official weight [...] Read more.
This paper presents a real-time quarry truck monitoring system that combines deep learning and license plate recognition (LPR) for operational monitoring and weighbridge reconciliation. Rather than estimating load volumes directly from imagery, the system ensures auditable matching between detected trucks and official weight records. Deployed at quarry checkpoints, fixed cameras stream to an edge stack that performs truck detection, line-crossing counts, and per-frame plate Optical Character Recognition (OCR); a temporal voting and format-constrained post-processing step consolidates plate strings for registry matching. The system exposes a dashboard with auditable session bundles (model/version hashes, Region of Interest (ROI)/line geometry, thresholds, logs) to ensure replay and traceability between offline evaluation and live operations. We evaluate detection (precision, recall, mAP@0.5, and mAP@0.5:0.95), tracking (ID metrics), and (LPR) usability, and we quantify operational validity by reconciling estimated shift-level tonnage T against weighbridge tonnage T* using Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), R2, and Bland–Altman analysis. Results show stable convergence of the detection models, reliable plate usability under varied optics (day, dusk, night, and dust), low-latency processing suitable for commodity hardware, and close agreement with weighbridge references at the shift level. The study demonstrates that vision-based counting coupled with plate linkage can provide regulator-ready KPIs and auditable evidence for production control in quarry operations. Full article
(This article belongs to the Special Issue Mine Management Optimization in the Era of AI and Advanced Analytics)
Show Figures

Figure 1

27 pages, 17286 KB  
Article
Vision-Based Trajectory Reconstruction in Human Activities: Methodology and Application
by Jasper Lottefier, Peter Van den Broeck and Katrien Van Nimmen
Sensors 2025, 25(24), 7577; https://doi.org/10.3390/s25247577 - 13 Dec 2025
Viewed by 149
Abstract
Modern civil engineering structures, such as footbridges, are increasingly susceptible to vibrations induced by human activities, emphasizing the importance of accurately assessing crowd-induced loading. Developing realistic load models requires detailed insight into the underlying crowd dynamics, which in turn depend on the coordination [...] Read more.
Modern civil engineering structures, such as footbridges, are increasingly susceptible to vibrations induced by human activities, emphasizing the importance of accurately assessing crowd-induced loading. Developing realistic load models requires detailed insight into the underlying crowd dynamics, which in turn depend on the coordination between individuals and the spatial organization of the group. A deeper understanding of these human–human interactions is therefore essential for capturing the collective behaviour that governs crowd-induced vibrations. This paper presents a vision-based trajectory reconstruction methodology that captures individual movement trajectories in both small groups and large-scale running events. The approach integrates colour-based image segmentation for instrumented participants, deep learning–based object detection for uninstrumented crowds, and a homography-based projection method to map image coordinates to world space. The methodology is applied to empirical data from two urban running events and controlled experiments, including both stationary and dynamic camera perspectives. Results show that the framework reliably reconstructs individual trajectories under varied field conditions, applicable to both walking and running activities. The approach enables scalable monitoring of human activities and provides high-resolution spatio-temporal data for studying human–human interactions and modelling crowd dynamics. In this way, the findings highlight the potential of vision-based methods as practical, non-intrusive tools for analysing human-induced loading in both research and applied engineering contexts. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

13 pages, 434 KB  
Review
Home Monitoring for the Management of Age-Related Macular Degeneration: A Review of the Development and Implementation of Digital Health Solutions over a 25-Year Scientific Journey
by Miguel A. Busquets, Richard A. Garfinkel, Deepak Sambhara, Nishant Mohan, Kester Nahen, Gidi Benyamini and Anat Loewenstein
Medicina 2025, 61(12), 2193; https://doi.org/10.3390/medicina61122193 - 11 Dec 2025
Viewed by 388
Abstract
The management of age-related macular degeneration (AMD) presents a significant challenge attributable to high disease heterogeneity. Patient realization of symptoms is poor and it is urgent to treat before permanent anatomic damage results in vision loss. This is true for the initial conversion [...] Read more.
The management of age-related macular degeneration (AMD) presents a significant challenge attributable to high disease heterogeneity. Patient realization of symptoms is poor and it is urgent to treat before permanent anatomic damage results in vision loss. This is true for the initial conversion from non-exudative intermediate AMD (iAMD) to exudative AMD (nAMD), and for the recurrence of nAMD undergoing treatment. Starting from the essential requirements that any practical solution needs to fulfill, we will reflect on how persistent navigation towards innovative solutions during a 25-year journey yielded significant advances towards improvements in personalized care. An early insight was that the acute nature of AMD progression requires frequent monitoring and therefore diagnostic testing should be performed at the patient’s home. Four key requirements were identified: (1) A tele-connected home device with acceptable diagnostic performance and a supportive patient user interface, both hardware and software. (2) Automated analytics capabilities that can process large volumes of data. (3) Efficient remote patient engagement and support through a digital healthcare provider. (4) A low-cost medical system that enables digital healthcare delivery through appropriate compensation for both the monitoring provider and the prescribing physician services. We reviewed the published literature accompanying first the development of Preferential Hyperacuity Perimetry (PHP) for monitoring iAMD, followed by Spectral Domain Optical Coherence Tomography (SD-OCT) for monitoring nAMD. Emphasis was given to the review of the validation of the core technologies, the regulatory process, and real-world studies, and how they led to the release of commercial services that are covered by Medicare in the USA. We concluded that while during the first quarter of the 21st century, the two main pillars of management of AMD were anti-VEGF intravitreal injections and in-office OCT, the addition of home-monitoring-based digital health services can become the third pillar. Full article
(This article belongs to the Special Issue Modern Diagnostics and Therapy for Vitreoretinal Diseases)
Show Figures

Figure 1

25 pages, 5139 KB  
Article
A Mobile Robot Designed to Detect Hazardous and Explosive Materials in Airport Parking Lots
by Ireneusz Celiński, Jan Warczek and Tadeusz Opasiak
Electronics 2025, 14(24), 4866; https://doi.org/10.3390/electronics14244866 - 10 Dec 2025
Viewed by 347
Abstract
The article proposes a concept for a mobile robot designed to detect hazardous and explosive materials in airport parking lots. The problem with operating such a robot is twofold. Firstly, it must move in a dynamic environment, between vehicles that are parked or [...] Read more.
The article proposes a concept for a mobile robot designed to detect hazardous and explosive materials in airport parking lots. The problem with operating such a robot is twofold. Firstly, it must move in a dynamic environment, between vehicles that are parked or also in motion, but without stopping vehicles that are in motion. The second problem is the detection of hazardous and explosive materials. For robot mobility solutions, an obstacle analysis system based on popular, low-cost LIDAR sensors and cameras was proposed. For the detection of hazardous and explosive materials, a dual vehicle monitoring system was proposed for airport parking lots. The first is based on vision techniques, where cameras and image recognition procedures are used to examine the undercarriages of parked vehicles. This system is designed to detect unusual objects mounted on vehicle undercarriages. The second is based on the analysis of volatile substances produced by explosives and hazardous materials found under or inside car chassis and gasoline and oils. The aim of the project is to develop a functional prototype of such a robot and describe its capabilities. The article describes the preliminary findings of the research. Full article
(This article belongs to the Special Issue Multi-UAV Systems and Mobile Robots)
Show Figures

Figure 1

23 pages, 3326 KB  
Article
Hybrid Multi-Scale Neural Network with Attention-Based Fusion for Fruit Crop Disease Identification
by Shakhmaran Seilov, Akniyet Nurzhaubayev, Marat Baideldinov, Bibinur Zhursinbek, Medet Ashimgaliyev and Ainur Zhumadillayeva
J. Imaging 2025, 11(12), 440; https://doi.org/10.3390/jimaging11120440 - 10 Dec 2025
Viewed by 215
Abstract
Unobserved fruit crop illnesses are a major threat to agricultural productivity worldwide and frequently cause farmers to suffer large financial losses. Manual field inspection-based disease detection techniques are time-consuming, unreliable, and unsuitable for extensive monitoring. Deep learning approaches, in particular convolutional neural networks, [...] Read more.
Unobserved fruit crop illnesses are a major threat to agricultural productivity worldwide and frequently cause farmers to suffer large financial losses. Manual field inspection-based disease detection techniques are time-consuming, unreliable, and unsuitable for extensive monitoring. Deep learning approaches, in particular convolutional neural networks, have shown promise for automated plant disease identification, although they still face significant obstacles. These include poor generalization across complicated visual backdrops, limited resilience to different illness sizes, and high processing needs that make deployment on resource-constrained edge devices difficult. We suggest a Hybrid Multi-Scale Neural Network (HMCT-AF with GSAF) architecture for precise and effective fruit crop disease identification in order to overcome these drawbacks. In order to extract long-range dependencies, HMCT-AF with GSAF combines a Vision Transformer-based structural branch with multi-scale convolutional branches to capture both high-level contextual patterns and fine-grained local information. These disparate features are adaptively combined using a novel HMCT-AF with a GSAF module, which enhances model interpretability and classification performance. We conduct evaluations on both PlantVillage (controlled environment) and CLD (real-world in-field conditions), observing consistent performance gains that indicate strong resilience to natural lighting variations and background complexity. With an accuracy of up to 93.79%, HMCT-AF with GSAF outperforms vanilla Transformer models, EfficientNet, and traditional CNNs. These findings demonstrate how well the model captures scale-variant disease symptoms and how it may be used in real-time agricultural applications using hardware that is compatible with the edge. According to our research, HMCT-AF with GSAF presents a viable basis for intelligent, scalable plant disease monitoring systems in contemporary precision farming. Full article
Show Figures

Figure 1

21 pages, 3243 KB  
Article
A Multimodal Agent Framework for Construction Scenarios: Accurate Perception, Dynamic Retrieval, and Explainable Hazard Reasoning
by Sihan Cheng, Yujun Qi, Rui Wu and Yangyang Guan
Buildings 2025, 15(24), 4439; https://doi.org/10.3390/buildings15244439 - 9 Dec 2025
Viewed by 271
Abstract
Construction sites are complex environments where traditional safety monitoring methods often suffer from low detection accuracy and limited interpretability. To address these challenges, this study proposes a modular multimodal agent framework that integrates computer vision, knowledge representation, and large language model (LLM)–based reasoning. [...] Read more.
Construction sites are complex environments where traditional safety monitoring methods often suffer from low detection accuracy and limited interpretability. To address these challenges, this study proposes a modular multimodal agent framework that integrates computer vision, knowledge representation, and large language model (LLM)–based reasoning. First, the CLIP model fine-tuned with Low-Rank Adaptation (LoRA) is combined with YOLOv10 to achieve precise recognition of construction activities and personal protective equipment (PPE). Second, a construction safety knowledge graph integrating Retrieval-Augmented Generation (RAG) is constructed to provide structured domain knowledge and enhance contextual understanding. Third, the FusedChain prompting strategy is designed to guide large language models (LLMs) to perform step-by-step safety risk reasoning. Experimental results show that the proposed approach achieves 97.35% accuracy in activity recognition, an average F1-score of 0.84 in PPE detection, and significantly higher performance than existing methods in hazard reasoning. The modular design also facilitates scalable integration with more advanced foundation models, indicating strong potential for real-world deployment in intelligent construction safety management. Full article
Show Figures

Figure 1

22 pages, 8773 KB  
Article
Reconfigurable Multispectral Imaging System Design and Implementation with FPGA Control
by Shuyang Chen, Min Huang, Wenbin Ge, Guangming Wang, Xiangning Lu, Yixin Zhao, Jinlin Chen, Lulu Qian and Zhanchao Wang
Appl. Sci. 2025, 15(24), 12951; https://doi.org/10.3390/app152412951 - 8 Dec 2025
Viewed by 343
Abstract
Multispectral imaging plays an important role in fields such as environmental monitoring and industrial inspection. To meet the demands for high spatial resolution, portability, and multi-scenario use, this study presents a reconfigurable 2 × 3 multispectral camera-array imaging system. The system features a [...] Read more.
Multispectral imaging plays an important role in fields such as environmental monitoring and industrial inspection. To meet the demands for high spatial resolution, portability, and multi-scenario use, this study presents a reconfigurable 2 × 3 multispectral camera-array imaging system. The system features a modular architecture, allowing for the flexible exchange of lenses and narrowband filters. Each camera node is equipped with an FPGA that performs real-time sensor control and data preprocessing. A companion host program, based on the GigE Vision protocol, was developed for synchronous control, multi-channel real-time visualization, and unified parameter configuration. End-to-end performance verification confirmed stable, lossless, and synchronous acquisition from all six 3072 × 2048-pixel resolution channels. Following field alignment, the 16 mm lens achieves an effective 4.7 MP spatial resolution. Spectral profile measurements further confirm that the system exhibits favorable spectral response characteristics. The proposed framework provides a high-resolution and flexible solution for portable multispectral imaging. Full article
(This article belongs to the Section Optics and Lasers)
Show Figures

Figure 1

31 pages, 9712 KB  
Article
YOLO-HRNet with Attention Mechanism: For Automated Ergonomic Risk Assessment in Garment Manufacturing
by Yichen Tan, Ziqian Yang and Zhihui Wu
Appl. Sci. 2025, 15(24), 12950; https://doi.org/10.3390/app152412950 - 8 Dec 2025
Viewed by 316
Abstract
For garment manufacturing, an efficient and precise assessment of ergonomics is vital to prevent work-related musculoskeletal disorders. This study creates a computer vision-based algorithm for fast and accurate risk analysis. Specifically, we introduced SE and CBAM attention mechanisms into the YOLO network and [...] Read more.
For garment manufacturing, an efficient and precise assessment of ergonomics is vital to prevent work-related musculoskeletal disorders. This study creates a computer vision-based algorithm for fast and accurate risk analysis. Specifically, we introduced SE and CBAM attention mechanisms into the YOLO network and integrated the optimized modules into the HRNet architecture to improve the accuracy of human pose recognition. This approach effectively addresses common interferences in garment production environments, such as fabric accumulation, equipment occlusion, and complex hand movements, while significantly enhancing the accuracy of human detection. On the COCO dataset, it increased mAP and recall by 4.43% and 5.99%, respectively, over YOLOv8. Furthermore, by analyzing key postural features from worker videos of cutting, sewing, and pressing, we achieved a quantified ergonomic risk assessment. Experimental results indicate that the RULA scores calculated using this algorithm are highly consistent and stable with expert evaluations and accurately reflect the dynamic changes in ergonomic risk levels across different processes. It is important to note that the validation was based on a pilot study involving a limited number of workers and task types, meaning that the findings primarily demonstrate feasibility rather than full-scale generalizability. Even so, the algorithm outperforms existing lightweight solutions and can be deployed in real-time on edge devices within factories, providing a low-cost ergonomic monitoring tool for the garment manufacturing industry. This helps prevent and reduce musculoskeletal injuries among workers. Full article
Show Figures

Figure 1

Back to TopTop