Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (612)

Search Parameters:
Keywords = video/image enhancement

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 23817 KiB  
Article
Dual-Path Adversarial Denoising Network Based on UNet
by Jinchi Yu, Yu Zhou, Mingchen Sun and Dadong Wang
Sensors 2025, 25(15), 4751; https://doi.org/10.3390/s25154751 (registering DOI) - 1 Aug 2025
Abstract
Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a [...] Read more.
Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a novel three-module architecture for image denoising, comprising a generator, a dual-path-UNet-based denoiser, and a discriminator. The generator creates synthetic noise patterns to augment training data, while the dual-path-UNet denoiser uses multiple receptive field modules to preserve fine details and dense feature fusion to maintain global structural integrity. The discriminator provides adversarial feedback to enhance denoising performance. This dual-path adversarial training mechanism addresses the limitations of traditional methods by simultaneously capturing both local details and global structures. Experiments on the SIDD, DND, and PolyU datasets demonstrate superior performance. We compare our architecture with the latest state-of-the-art GAN variants through comprehensive qualitative and quantitative evaluations. These results confirm the effectiveness of noise removal with minimal loss of critical image details. The proposed architecture enhances image denoising capabilities in complex noise scenarios, providing a robust solution for applications that require high image fidelity. By enhancing adaptability to various types of noise while maintaining structural integrity, this method provides a versatile tool for image processing tasks that require preserving detail. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

28 pages, 5699 KiB  
Article
Multi-Modal Excavator Activity Recognition Using Two-Stream CNN-LSTM with RGB and Point Cloud Inputs
by Hyuk Soo Cho, Kamran Latif, Abubakar Sharafat and Jongwon Seo
Appl. Sci. 2025, 15(15), 8505; https://doi.org/10.3390/app15158505 (registering DOI) - 31 Jul 2025
Viewed by 41
Abstract
Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving [...] Read more.
Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving purposes. However, previous studies have solely focused on single-source external videos, which limits the activity recognition capabilities of the deep learning algorithm. This paper introduces a novel multi-modal deep learning-based methodology for recognizing excavator activities, utilizing multi-stream input data. It processes point clouds and RGB images using the two-stream long short-term memory convolutional neural network (CNN-LSTM) method to extract spatiotemporal features, enabling the recognition of excavator activities. A comprehensive dataset comprising 495,000 video frames of synchronized RGB and point cloud data was collected across multiple construction sites under varying conditions. The dataset encompasses five key excavator activities: Approach, Digging, Dumping, Idle, and Leveling. To assess the effectiveness of the proposed method, the performance of the two-stream CNN-LSTM architecture is compared with that of single-stream CNN-LSTM models on the same RGB and point cloud datasets, separately. The results demonstrate that the proposed multi-stream approach achieved an accuracy of 94.67%, outperforming existing state-of-the-art single-stream models, which achieved 90.67% accuracy for the RGB-based model and 92.00% for the point cloud-based model. These findings underscore the potential of the proposed activity recognition method, making it highly effective for automatic real-time monitoring of excavator activities, thereby laying the groundwork for future integration into digital twin systems for proactive maintenance and intelligent equipment management. Full article
(This article belongs to the Special Issue AI-Based Machinery Health Monitoring)
Show Figures

Figure 1

19 pages, 3130 KiB  
Article
Deep Learning-Based Instance Segmentation of Galloping High-Speed Railway Overhead Contact System Conductors in Video Images
by Xiaotong Yao, Huayu Yuan, Shanpeng Zhao, Wei Tian, Dongzhao Han, Xiaoping Li, Feng Wang and Sihua Wang
Sensors 2025, 25(15), 4714; https://doi.org/10.3390/s25154714 (registering DOI) - 30 Jul 2025
Viewed by 157
Abstract
The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping [...] Read more.
The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping status of conductors is crucial, and instance segmentation techniques, by delineating the pixel-level contours of each conductor, can significantly aid in the identification and study of galloping phenomena. This work expands upon the YOLO11-seg model and introduces an instance segmentation approach for galloping video and image sensor data of OCS conductors. The algorithm, designed for the stripe-like distribution of OCS conductors in the data, employs four-direction Sobel filters to extract edge features in horizontal, vertical, and diagonal orientations. These features are subsequently integrated with the original convolutional branch to form the FDSE (Four Direction Sobel Enhancement) module. It integrates the ECA (Efficient Channel Attention) mechanism for the adaptive augmentation of conductor characteristics and utilizes the FL (Focal Loss) function to mitigate the class-imbalance issue between positive and negative samples, hence enhancing the model’s sensitivity to conductors. Consequently, segmentation outcomes from neighboring frames are utilized, and mask-difference analysis is performed to autonomously detect conductor galloping locations, emphasizing their contours for the clear depiction of galloping characteristics. Experimental results demonstrate that the enhanced YOLO11-seg model achieves 85.38% precision, 77.30% recall, 84.25% AP@0.5, 81.14% F1-score, and a real-time processing speed of 44.78 FPS. When combined with the galloping visualization module, it can issue real-time alerts of conductor galloping anomalies, providing robust technical support for railway OCS safety monitoring. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

18 pages, 2688 KiB  
Article
Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
by Jie Zhao, Ying Gao, Chunjuan Bo and Dong Wang
Sensors 2025, 25(15), 4691; https://doi.org/10.3390/s25154691 - 29 Jul 2025
Viewed by 101
Abstract
Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are [...] Read more.
Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%. Full article
Show Figures

Figure 1

24 pages, 1408 KiB  
Systematic Review
Fear Detection Using Electroencephalogram and Artificial Intelligence: A Systematic Review
by Bladimir Serna, Ricardo Salazar, Gustavo A. Alonso-Silverio, Rosario Baltazar, Elías Ventura-Molina and Antonio Alarcón-Paredes
Brain Sci. 2025, 15(8), 815; https://doi.org/10.3390/brainsci15080815 - 29 Jul 2025
Viewed by 266
Abstract
Background/Objectives: Fear detection through EEG signals has gained increasing attention due to its applications in affective computing, mental health monitoring, and intelligent safety systems. This systematic review aimed to identify the most effective methods, algorithms, and configurations reported in the literature for detecting [...] Read more.
Background/Objectives: Fear detection through EEG signals has gained increasing attention due to its applications in affective computing, mental health monitoring, and intelligent safety systems. This systematic review aimed to identify the most effective methods, algorithms, and configurations reported in the literature for detecting fear from EEG signals using artificial intelligence (AI). Methods: Following the PRISMA 2020 methodology, a structured search was conducted using the string (“fear detection” AND “artificial intelligence” OR “machine learning” AND NOT “fnirs OR mri OR ct OR pet OR image”). After applying inclusion and exclusion criteria, 11 relevant studies were selected. Results: The review examined key methodological aspects such as algorithms (e.g., SVM, CNN, Decision Trees), EEG devices (Emotiv, Biosemi), experimental paradigms (videos, interactive games), dominant brainwave bands (beta, gamma, alpha), and electrode placement. Non-linear models, particularly when combined with immersive stimulation, achieved the highest classification accuracy (up to 92%). Beta and gamma frequencies were consistently associated with fear states, while frontotemporal electrode positioning and proprietary datasets further enhanced model performance. Conclusions: EEG-based fear detection using AI demonstrates high potential and rapid growth, offering significant interdisciplinary applications in healthcare, safety systems, and affective computing. Full article
(This article belongs to the Special Issue Neuropeptides, Behavior and Psychiatric Disorders)
Show Figures

Figure 1

17 pages, 1603 KiB  
Perspective
A Perspective on Quality Evaluation for AI-Generated Videos
by Zhichao Zhang, Wei Sun and Guangtao Zhai
Sensors 2025, 25(15), 4668; https://doi.org/10.3390/s25154668 - 28 Jul 2025
Viewed by 188
Abstract
Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames [...] Read more.
Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames but also by temporal coherence across frames and precise semantic alignment with the intended message. The foundational role of sensor technologies is critical, as they determine the physical plausibility of AIGC outputs. In this perspective, we argue that multimodal large language models (MLLMs) are poised to become the cornerstone of next-generation video quality assessment (VQA). By jointly encoding cues from multiple modalities such as vision, language, sound, and even depth, the MLLM can leverage its powerful language understanding capabilities to assess the quality of scene composition, motion dynamics, and narrative consistency, overcoming the fragmentation of hand-engineered metrics and the poor generalization ability of CNN-based methods. Furthermore, we provide a comprehensive analysis of current methodologies for assessing AIGC video quality, including the evolution of generation models, dataset design, quality dimensions, and evaluation frameworks. We argue that advances in sensor fusion enable MLLMs to combine low-level physical constraints with high-level semantic interpretations, further enhancing the accuracy of visual quality assessment. Full article
(This article belongs to the Special Issue Perspectives in Intelligent Sensors and Sensing Systems)
Show Figures

Figure 1

18 pages, 4836 KiB  
Article
Deep Learning to Analyze Spatter and Melt Pool Behavior During Additive Manufacturing
by Deepak Gadde, Alaa Elwany and Yang Du
Metals 2025, 15(8), 840; https://doi.org/10.3390/met15080840 - 28 Jul 2025
Viewed by 350
Abstract
To capture the complex metallic spatter and melt pool behavior during the rapid interaction between the laser and metal material, high-speed cameras are applied to record the laser powder bed fusion process and generate a large volume of image data. In this study, [...] Read more.
To capture the complex metallic spatter and melt pool behavior during the rapid interaction between the laser and metal material, high-speed cameras are applied to record the laser powder bed fusion process and generate a large volume of image data. In this study, four deep learning algorithms are applied: YOLOv5, Fast R-CNN, RetinaNet, and EfficientDet. They are trained by the recorded videos to learn and extract information on spatter and melt pool behavior during the laser powder bed fusion process. The well-trained models achieved high accuracy and low loss, demonstrating strong capability in accurately detecting and tracking spatter and melt pool dynamics. A stability index is proposed and calculated based on the melt pool length change rate. Greater index value reflects a more stable melt pool. We found that more spatters were detected for the unstable melt pool, while fewer spatters were found for the stable melt pool. The spatter’s size can affect its initial ejection speed, and large spatters are ejected slowly while small spatters are ejected rapidly. In addition, more than 58% of detected spatters have their initial ejection angle in the range of 60–120°. These findings provide a better understanding of spatter and melt pool dynamics and behavior, uncover the influence of melt pool stability on spatter formation, and demonstrate the correlation between the spatter size and its initial ejection speed. This work will contribute to the extraction of important information from high-speed recorded videos for additive manufacturing to reduce waste, lower cost, enhance part quality, and increase process reliability. Full article
(This article belongs to the Special Issue Machine Learning in Metal Additive Manufacturing)
Show Figures

Figure 1

21 pages, 9651 KiB  
Article
Self-Supervised Visual Tracking via Image Synthesis and Domain Adversarial Learning
by Gu Geng, Sida Zhou, Jianing Tang, Xinming Zhang, Qiao Liu and Di Yuan
Sensors 2025, 25(15), 4621; https://doi.org/10.3390/s25154621 - 25 Jul 2025
Viewed by 180
Abstract
With the widespread use of sensors in applications such as autonomous driving and intelligent security, stable and efficient target tracking from diverse sensor data has become increasingly important. Self-supervised visual tracking has attracted increasing attention due to its potential to eliminate reliance on [...] Read more.
With the widespread use of sensors in applications such as autonomous driving and intelligent security, stable and efficient target tracking from diverse sensor data has become increasingly important. Self-supervised visual tracking has attracted increasing attention due to its potential to eliminate reliance on costly manual annotations; however, existing methods often train on incomplete object representations, resulting in inaccurate localization during inference. In addition, current methods typically struggle when applied to deep networks. To address these limitations, we propose a novel self-supervised tracking framework based on image synthesis and domain adversarial learning. We first construct a large-scale database of real-world target objects, then synthesize training video pairs by randomly inserting these targets into background frames while applying geometric and appearance transformations to simulate realistic variations. To reduce domain shift introduced by synthetic content, we incorporate a domain classification branch after feature extraction and adopt domain adversarial training to encourage feature alignment between real and synthetic domains. Experimental results on five standard tracking benchmarks demonstrate that our method significantly enhances tracking accuracy compared to existing self-supervised approaches without introducing any additional labeling cost. The proposed framework not only ensures complete target coverage during training but also shows strong scalability to deeper network architectures, offering a practical and effective solution for real-world tracking applications. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

25 pages, 5652 KiB  
Article
Modeling and Optimization of the Vacuum Degassing Process in Electric Steelmaking Route
by Bikram Konar, Noah Quintana and Mukesh Sharma
Processes 2025, 13(8), 2368; https://doi.org/10.3390/pr13082368 - 25 Jul 2025
Viewed by 221
Abstract
Vacuum degassing (VD) is a critical refining step in electric arc furnace (EAF) steelmaking for producing clean steel with reduced nitrogen and hydrogen content. This study develops an Effective Equilibrium Reaction Zone (EERZ) model focused on denitrogenation (de-N) by simulating interfacial reactions at [...] Read more.
Vacuum degassing (VD) is a critical refining step in electric arc furnace (EAF) steelmaking for producing clean steel with reduced nitrogen and hydrogen content. This study develops an Effective Equilibrium Reaction Zone (EERZ) model focused on denitrogenation (de-N) by simulating interfacial reactions at the bubble–steel interface (Z1). The model incorporates key process parameters such as argon flow rate, vacuum pressure, and initial nitrogen and sulfur concentrations. A robust empirical correlation was established between de-N efficiency and the mass of Z1, reducing prediction time from a day to under a minute. Additionally, the model was further improved by incorporating a dynamic surface exposure zone (Z_eye) to account for transient ladle eye effects on nitrogen removal under deep vacuum (<10 torr), validated using synchronized plant trials and Python-based video analysis. The integrated approach—combining thermodynamic-kinetic modeling, plant validation, and image-based diagnostics—provides a robust framework for optimizing VD control and enhancing nitrogen removal control in EAF-based steelmaking. Full article
Show Figures

Figure 1

30 pages, 3451 KiB  
Article
Integrating Google Maps and Smooth Street View Videos for Route Planning
by Federica Massimi, Antonio Tedeschi, Kalapraveen Bagadi and Francesco Benedetto
J. Imaging 2025, 11(8), 251; https://doi.org/10.3390/jimaging11080251 - 25 Jul 2025
Viewed by 308
Abstract
This research addresses the long-standing dependence on printed maps for navigation and highlights the limitations of existing digital services like Google Street View and Google Street View Player in providing comprehensive solutions for route analysis and understanding. The absence of a systematic approach [...] Read more.
This research addresses the long-standing dependence on printed maps for navigation and highlights the limitations of existing digital services like Google Street View and Google Street View Player in providing comprehensive solutions for route analysis and understanding. The absence of a systematic approach to route analysis, issues related to insufficient street view images, and the lack of proper image mapping for desired roads remain unaddressed by current applications, which are predominantly client-based. In response, we propose an innovative automatic system designed to generate videos depicting road routes between two geographic locations. The system calculates and presents the route conventionally, emphasizing the path on a two-dimensional representation, and in a multimedia format. A prototype is developed based on a cloud-based client–server architecture, featuring three core modules: frames acquisition, frames analysis and elaboration, and the persistence of metadata information and computed videos. The tests, encompassing both real-world and synthetic scenarios, have produced promising results, showcasing the efficiency of our system. By providing users with a real and immersive understanding of requested routes, our approach fills a crucial gap in existing navigation solutions. This research contributes to the advancement of route planning technologies, offering a comprehensive and user-friendly system that leverages cloud computing and multimedia visualization for an enhanced navigation experience. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 3116 KiB  
Article
Deep Learning for Visual Leading of Ships: AI for Human Factor Accident Prevention
by Manuel Vázquez Neira, Genaro Cao Feijóo, Blanca Sánchez Fernández and José A. Orosa
Appl. Sci. 2025, 15(15), 8261; https://doi.org/10.3390/app15158261 - 24 Jul 2025
Viewed by 338
Abstract
Traditional navigation relies on visual alignment with leading lights, a task typically monitored by bridge officers over extended periods. This process can lead to fatigue-related human factor errors, increasing the risk of maritime accidents and environmental damage. To address this issue, this study [...] Read more.
Traditional navigation relies on visual alignment with leading lights, a task typically monitored by bridge officers over extended periods. This process can lead to fatigue-related human factor errors, increasing the risk of maritime accidents and environmental damage. To address this issue, this study explores the use of convolutional neural networks (CNNs), evaluating different training strategies and hyperparameter configurations to assist officers in identifying deviations from proper visual leading. Using video data captured from a navigation simulator, we trained a lightweight CNN capable of advising bridge personnel with an accuracy of 86% during night-time operations. Notably, the model demonstrated robustness against visual interference from other light sources, such as lighthouses or coastal lights. The primary source of classification error was linked to images with low bow deviation, largely influenced by human mislabeling during dataset preparation. Future work will focus on refining the classification scheme to enhance model performance. We (1) propose a lightweight CNN based on SqueezeNet for night-time ship navigation, (2) expand the traditional binary risk classification into six operational categories, and (3) demonstrate improved performance over human judgment in visually ambiguous conditions. Full article
Show Figures

Figure 1

14 pages, 691 KiB  
Article
Three-Dimensional-Printed Models: A Novel Approach to Ultrasound Education of the Placental Cord Insertion Site
by Samantha Ward, Sharon Maresse and Zhonghua Sun
Appl. Sci. 2025, 15(15), 8221; https://doi.org/10.3390/app15158221 - 24 Jul 2025
Viewed by 268
Abstract
Assessment of the placental cord insertion (PCI) is a vital component of antenatal ultrasound examinations. PCI can be complex, particularly in cases of abnormal PCI, and requires proficient sonographer spatial perception. The current literature describes the increasing potential of three-dimensional (3D) modelling to [...] Read more.
Assessment of the placental cord insertion (PCI) is a vital component of antenatal ultrasound examinations. PCI can be complex, particularly in cases of abnormal PCI, and requires proficient sonographer spatial perception. The current literature describes the increasing potential of three-dimensional (3D) modelling to enhance spatial awareness and understanding of complex anatomical structures. This study aimed to evaluate sonographers’ confidence in ultrasound assessment of the PCI and the potential benefit of novel 3D-printed models (3DPMs) of the PCI in ultrasound education. Sonographers employed at a large private medical imaging practice in Western Australia were invited to participate in a face-to-face presentation of two-dimensional (2D) ultrasound images, ultrasound videos, and 3DPMs of normal cord insertion (NCI), marginal cord insertion (MCI), and velamentous cord insertion (VCI). Our objective was to determine the benefit of 3DPMs in improving sonographers’ confidence and ability to spatially visualise the PCI. Thirty-three participants completed questionnaires designed to compare their confidence in assessing the PCI and their ability to spatially visualise the anatomical relationship between the placenta and PCI, before and after the presentation. There was a significant association between a participant’s year of experience and their confidence levels and spatial awareness of the PCI prior to the demonstration. The results showed the 3DPMs increased participant confidence and their spatial awareness of the PCI, with no significant association with years of experience. Additionally, participating sonographers were asked to rate the 3DPMs as an educational device. The 3DPMs were ranked as being a more useful educational tool for spatially visualising the NCI, MCI, and VCI than 2D ultrasound images and videos. Most participants responded favourably when asked whether the 3DPMs would be useful in ultrasound education, with 75.8%, 84.8%, and 97% indicating the models of NCI, MCI, and VCI, respectively, would be extremely useful. Our study has demonstrated a potential role for 3DPMs of the PCI in ultrasound education, supplementing traditional 2D educational resources. Full article
Show Figures

Figure 1

28 pages, 8982 KiB  
Article
Decision-Level Multi-Sensor Fusion to Improve Limitations of Single-Camera-Based CNN Classification in Precision Farming: Application in Weed Detection
by Md. Nazmuzzaman Khan, Adibuzzaman Rahi, Mohammad Al Hasan and Sohel Anwar
Computation 2025, 13(7), 174; https://doi.org/10.3390/computation13070174 - 18 Jul 2025
Viewed by 276
Abstract
The United States leads in corn production and consumption in the world with an estimated USD 50 billion per year. There is a pressing need for the development of novel and efficient techniques aimed at enhancing the identification and eradication of weeds in [...] Read more.
The United States leads in corn production and consumption in the world with an estimated USD 50 billion per year. There is a pressing need for the development of novel and efficient techniques aimed at enhancing the identification and eradication of weeds in a manner that is both environmentally sustainable and economically advantageous. Weed classification for autonomous agricultural robots is a challenging task for a single-camera-based system due to noise, vibration, and occlusion. To address this issue, we present a multi-camera-based system with decision-level sensor fusion to improve the limitations of a single-camera-based system in this paper. This study involves the utilization of a convolutional neural network (CNN) that was pre-trained on the ImageNet dataset. The CNN subsequently underwent re-training using a limited weed dataset to facilitate the classification of three distinct weed species: Xanthium strumarium (Common Cocklebur), Amaranthus retroflexus (Redroot Pigweed), and Ambrosia trifida (Giant Ragweed). These weed species are frequently encountered within corn fields. The test results showed that the re-trained VGG16 with a transfer-learning-based classifier exhibited acceptable accuracy (99% training, 97% validation, 94% testing accuracy) and inference time for weed classification from the video feed was suitable for real-time implementation. But the accuracy of CNN-based classification from video feed from a single camera was found to deteriorate due to noise, vibration, and partial occlusion of weeds. Test results from a single-camera video feed show that weed classification accuracy is not always accurate for the spray system of an agricultural robot (AgBot). To improve the accuracy of the weed classification system and to overcome the shortcomings of single-sensor-based classification from CNN, an improved Dempster–Shafer (DS)-based decision-level multi-sensor fusion algorithm was developed and implemented. The proposed algorithm offers improvement on the CNN-based weed classification when the weed is partially occluded. This algorithm can also detect if a sensor is faulty within an array of sensors and improves the overall classification accuracy by penalizing the evidence from a faulty sensor. Overall, the proposed fusion algorithm showed robust results in challenging scenarios, overcoming the limitations of a single-sensor-based system. Full article
(This article belongs to the Special Issue Moving Object Detection Using Computational Methods and Modeling)
Show Figures

Figure 1

21 pages, 9571 KiB  
Article
Performance Evaluation of Real-Time Image-Based Heat Release Rate Prediction Model Using Deep Learning and Image Processing Methods
by Joohyung Roh, Sehong Min and Minsuk Kong
Fire 2025, 8(7), 283; https://doi.org/10.3390/fire8070283 - 18 Jul 2025
Viewed by 485
Abstract
Heat release rate (HRR) is a key indicator for characterizing fire behavior, and it is conventionally measured under laboratory conditions. However, this measurement is limited in its widespread application to various fire conditions, due to its high cost, operational complexity, and lack of [...] Read more.
Heat release rate (HRR) is a key indicator for characterizing fire behavior, and it is conventionally measured under laboratory conditions. However, this measurement is limited in its widespread application to various fire conditions, due to its high cost, operational complexity, and lack of real-time predictive capability. Therefore, this study proposes an image-based HRR prediction model that uses deep learning and image processing techniques. The flame region in a fire video was segmented using the YOLO-YCbCr model, which integrates YCbCr color-space-based segmentation with YOLO object detection. For comparative analysis, the YOLO segmentation model was used. Furthermore, the fire diameter and flame height were determined from the spatial information of the segmented flame, and the HRR was predicted based on the correlation between flame size and HRR. The proposed models were applied to various experimental fire videos, and their prediction performances were quantitatively assessed. The results indicated that the proposed models accurately captured the HRR variations over time, and applying the average flame height calculation enhanced the prediction performance by reducing fluctuations in the predicted HRR. These findings demonstrate that the image-based HRR prediction model can be used to estimate real-time HRR values in diverse fire environments. Full article
Show Figures

Figure 1

22 pages, 845 KiB  
Article
Bridging Cities and Citizens with Generative AI: Public Readiness and Trust in Urban Planning
by Adnan Alshahrani
Buildings 2025, 15(14), 2494; https://doi.org/10.3390/buildings15142494 - 16 Jul 2025
Viewed by 452
Abstract
As part of its modernisation and economic diversification policies, Saudi Arabia is building smart, sustainable cities intended to improve quality of life and meet environmental goals. However, involving the public in urban planning remains complex, with traditional methods often proving expensive, time-consuming, and [...] Read more.
As part of its modernisation and economic diversification policies, Saudi Arabia is building smart, sustainable cities intended to improve quality of life and meet environmental goals. However, involving the public in urban planning remains complex, with traditional methods often proving expensive, time-consuming, and inaccessible to many groups. Integrating artificial intelligence (AI) into public participation may help to address these limitations. This study explores whether Saudi residents are ready to engage with AI-driven tools in urban planning, how they prefer to interact with them, and what ethical concerns may arise. Using a quantitative, survey-based approach, the study collected data from 232 Saudi residents using non-probability stratified sampling. The survey assessed demographic influences on AI readiness, preferred engagement methods, and perceptions of ethical risks. The results showed a strong willingness among participants (200 respondents, 86%)—especially younger and university-educated respondents—to engage through AI platforms. Visual tools such as image and video analysis were the most preferred (96 respondents, 41%), while chatbots were less favoured (16 respondents, 17%). However, concerns were raised about privacy (76 respondents, 33%), bias (52 respondents, 22%), and over-reliance on technology (84 respondents, 36%). By exploring the intersection of generative AI and participatory urban governance, this study contributes directly to the discourse on inclusive smart city development. The research also offers insights into how AI-driven public engagement tools can be integrated into urban planning workflows to enhance the design, governance, and performance of the built environment. The findings suggest that AI has the potential to improve inclusivity and responsiveness in urban planning, but that its success depends on public trust, ethical safeguards, and the thoughtful design of accessible, user-friendly engagement platforms. Full article
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)
Show Figures

Figure 1

Back to TopTop