Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,427)

Search Parameters:
Keywords = visual grounding

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
45 pages, 10039 KiB  
Article
Design of an Interactive System by Combining Affective Computing Technology with Music for Stress Relief
by Chao-Ming Wang and Ching-Hsuan Lin
Electronics 2025, 14(15), 3087; https://doi.org/10.3390/electronics14153087 (registering DOI) - 1 Aug 2025
Abstract
In response to the stress commonly experienced by young people in high-pressure daily environments, a music-based stress-relief interactive system was developed by integrating music-assisted care with emotion-sensing technology. The design principles of the system were established through a literature review on stress, music [...] Read more.
In response to the stress commonly experienced by young people in high-pressure daily environments, a music-based stress-relief interactive system was developed by integrating music-assisted care with emotion-sensing technology. The design principles of the system were established through a literature review on stress, music listening, emotion detection, and interactive devices. A prototype was created accordingly and refined through interviews with four experts and eleven users participating in a preliminary experiment. The system is grounded in a four-stage guided imagery and music framework, along with a static activity model focused on relaxation-based stress management. Emotion detection was achieved using a wearable EEG device (NeuroSky’s MindWave Mobile device) and a two-dimensional emotion model, and the emotional states were translated into visual representations using seasonal and weather metaphors. A formal experiment involving 52 users was conducted. The system was evaluated, and its effectiveness confirmed, through user interviews and questionnaire surveys, with statistical analysis conducted using SPSS 26 and AMOS 23. The findings reveal that: (1) integrating emotion sensing with music listening creates a novel and engaging interactive experience; (2) emotional states can be effectively visualized using nature-inspired metaphors, enhancing user immersion and understanding; and (3) the combination of music listening, guided imagery, and real-time emotional feedback successfully promotes emotional relaxation and increases self-awareness. Full article
(This article belongs to the Special Issue New Trends in Human-Computer Interactions for Smart Devices)
Show Figures

Figure 1

22 pages, 24173 KiB  
Article
ScaleViM-PDD: Multi-Scale EfficientViM with Physical Decoupling and Dual-Domain Fusion for Remote Sensing Image Dehazing
by Hao Zhou, Yalun Wang, Wanting Peng, Xin Guan and Tao Tao
Remote Sens. 2025, 17(15), 2664; https://doi.org/10.3390/rs17152664 (registering DOI) - 1 Aug 2025
Abstract
Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm [...] Read more.
Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm for vision tasks, showing great promise due to their computational efficiency and robust capacity to model global dependencies. However, most existing learning-based dehazing methods lack physical interpretability, leading to weak generalization. Furthermore, they typically rely on spatial features while neglecting crucial frequency domain information, resulting in incomplete feature representation. To address these challenges, we propose ScaleViM-PDD, a novel network that enhances an SSM backbone with two key innovations: a Multi-scale EfficientViM with Physical Decoupling (ScaleViM-P) module and a Dual-Domain Fusion (DD Fusion) module. The ScaleViM-P module synergistically integrates a Physical Decoupling block within a Multi-scale EfficientViM architecture. This design enables the network to mitigate haze interference in a physically grounded manner at each representational scale while simultaneously capturing global contextual information to adaptively handle complex haze distributions. To further address detail loss, the DD Fusion module replaces conventional skip connections by incorporating a novel Frequency Domain Module (FDM) alongside channel and position attention. This allows for a more effective fusion of spatial and frequency features, significantly improving the recovery of fine-grained details, including color and texture information. Extensive experiments on nine publicly available remote sensing datasets demonstrate that ScaleViM-PDD consistently surpasses state-of-the-art baselines in both qualitative and quantitative evaluations, highlighting its strong generalization ability. Full article
Show Figures

Figure 1

24 pages, 6260 KiB  
Article
Transforming Product Discovery and Interpretation Using Vision–Language Models
by Simona-Vasilica Oprea and Adela Bâra
J. Theor. Appl. Electron. Commer. Res. 2025, 20(3), 191; https://doi.org/10.3390/jtaer20030191 (registering DOI) - 1 Aug 2025
Abstract
In this work, the utility of multimodal vision–language models (VLMs) for visual product understanding in e-commerce is investigated, focusing on two complementary models: ColQwen2 (vidore/colqwen2-v1.0) and ColPali (vidore/colpali-v1.2-hf). These models are integrated into two architectures and evaluated across various [...] Read more.
In this work, the utility of multimodal vision–language models (VLMs) for visual product understanding in e-commerce is investigated, focusing on two complementary models: ColQwen2 (vidore/colqwen2-v1.0) and ColPali (vidore/colpali-v1.2-hf). These models are integrated into two architectures and evaluated across various product interpretation tasks, including image-grounded question answering, brand recognition and visual retrieval based on natural language prompts. ColQwen2, built on the Qwen2-VL backbone with LoRA-based adapter hot-swapping, demonstrates strong performance, allowing end-to-end image querying and text response synthesis. It excels at identifying attributes such as brand, color or usage based solely on product images and responds fluently to user questions. In contrast, ColPali, which utilizes the PaliGemma backbone, is optimized for explainability. It delivers detailed visual-token alignment maps that reveal how specific regions of an image contribute to retrieval decisions, offering transparency ideal for diagnostics or educational applications. Through comparative experiments using footwear imagery, it is demonstrated that ColQwen2 is highly effective in generating accurate responses to product-related questions, while ColPali provides fine-grained visual explanations that reinforce trust and model accountability. Full article
Show Figures

Figure 1

23 pages, 6315 KiB  
Article
A Kansei-Oriented Morphological Design Method for Industrial Cleaning Robots Integrating Extenics-Based Semantic Quantification and Eye-Tracking Analysis
by Qingchen Li, Yiqian Zhao, Yajun Li and Tianyu Wu
Appl. Sci. 2025, 15(15), 8459; https://doi.org/10.3390/app15158459 - 30 Jul 2025
Abstract
In the context of Industry 4.0, user demands for industrial robots have shifted toward diversification and experience-orientation. Effectively integrating users’ affective imagery requirements into industrial-robot form design remains a critical challenge. Traditional methods rely heavily on designers’ subjective judgments and lack objective data [...] Read more.
In the context of Industry 4.0, user demands for industrial robots have shifted toward diversification and experience-orientation. Effectively integrating users’ affective imagery requirements into industrial-robot form design remains a critical challenge. Traditional methods rely heavily on designers’ subjective judgments and lack objective data on user cognition. To address these limitations, this study develops a comprehensive methodology grounded in Kansei engineering that combines Extenics-based semantic analysis, eye-tracking experiments, and user imagery evaluation. First, we used web crawlers to harvest user-generated descriptors for industrial floor-cleaning robots and applied Extenics theory to quantify and filter key perceptual imagery features. Second, eye-tracking experiments captured users’ visual-attention patterns during robot observation, allowing us to identify pivotal design elements and assemble a sample repository. Finally, the semantic differential method collected users’ evaluations of these design elements, and correlation analysis mapped emotional needs onto stylistic features. Our findings reveal strong positive correlations between four core imagery preferences—“dignified,” “technological,” “agile,” and “minimalist”—and their corresponding styling elements. By integrating qualitative semantic data with quantitative eye-tracking metrics, this research provides a scientific foundation and novel insights for emotion-driven design in industrial floor-cleaning robots. Full article
(This article belongs to the Special Issue Intelligent Robotics in the Era of Industry 5.0)
Show Figures

Figure 1

41 pages, 1202 KiB  
Article
Exploring Key Factors Influencing the Processual Experience of Visitors in Metaverse Museum Exhibitions: An Approach Based on the Experience Economy and the SOR Model
by Ronghui Wu, Lin Gao, Jiaxin Li, Anxin Xie and Xiao Zhang
Electronics 2025, 14(15), 3045; https://doi.org/10.3390/electronics14153045 - 30 Jul 2025
Abstract
With the advancement of immersive technologies, metaverse museum exhibitions have become an increasingly important medium through which audiences access cultural content and experience artistic works. This study aims to identify the key factors influencing visitors’ processual experiences in metaverse museum exhibitions and to [...] Read more.
With the advancement of immersive technologies, metaverse museum exhibitions have become an increasingly important medium through which audiences access cultural content and experience artistic works. This study aims to identify the key factors influencing visitors’ processual experiences in metaverse museum exhibitions and to explore how these factors collectively contribute to the formation of satisfaction with the visiting experience. Adopting an interdisciplinary theoretical perspective, the study integrates the Experience Economy theory with the Stimulus–Organism–Response (SOR) model to construct a systematic theoretical framework. This framework reveals how exhibition-related stimuli affect visitors’ behavioral intentions through psychological response pathways. Specifically, perceived educational appeal, interactive entertainment, escapist experience, and perceived visual aesthetics are defined as stimulus variables, while psychological immersion, emotional trigger, and cognitive engagement are introduced as organismic variables to explain their effects on satisfaction with the visiting experience and social sharing intention as response variables. Based on 507 valid responses, Partial Least Squares Structural Equation Modeling (PLS-SEM) was employed for empirical analysis. The results indicate that interactive entertainment and escapist experience have significant positive effects on psychological responses, serving as key drivers of deep visitor engagement. Emotional Trigger acts as a significant mediator between exhibition stimuli and satisfaction with the visiting experience, which in turn significantly predicts social sharing intention. In contrast, perceived educational appeal and perceived visual aesthetics exhibit weaker impacts at the cognitive and behavioral levels. This study not only identifies these weakened pathways but also proposes optimization strategies grounded in experiential construction and cognitive synergy, offering guidance for enhancing the educational function and deep experiential design of metaverse exhibitions. The findings validate the applicability of the Experience Economy theory and the SOR model in metaverse cultural contexts and deepen our understanding of the psychological mechanisms underlying immersive cultural experiences. This study further provides a pathway for shifting exhibition design from a “content-oriented” to an “experience-driven” approach, offering theoretical and practical insights into enhancing audience engagement and cultural communication effectiveness in metaverse museums. Full article
(This article belongs to the Special Issue Metaverse, Digital Twins and AI, 3rd Edition)
Show Figures

Figure 1

30 pages, 37977 KiB  
Article
Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding
by Yun Tian, Xiaobo Guo, Jinsong Wang and Xinyue Liang
Sensors 2025, 25(15), 4704; https://doi.org/10.3390/s25154704 - 30 Jul 2025
Abstract
Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired [...] Read more.
Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired video streams, linguistic ambiguity, and discrepancies in modality-specific representations. Most existing approaches rely on intra-modal feature modeling, processing video and text independently throughout the representation learning stage. However, this isolation undermines semantic alignment by neglecting the potential of cross-modal interactions. In practice, a natural language query typically corresponds to spatiotemporal content in video signals collected through camera-based sensing systems, encompassing a particular sequence of frames and its associated salient subregions. We propose a text-guided visual representation optimization framework tailored to enhance semantic interpretation over video signals captured by visual sensors. This framework leverages textual information to focus on spatiotemporal video content, thereby narrowing the cross-modal gap. Built upon the unified cross-modal embedding space provided by CLIP, our model leverages video data from sensing devices to structure representations and introduces two dedicated modules to semantically refine visual representations across spatial and temporal dimensions. First, we design a Spatial Visual Representation Optimization (SVRO) module to learn spatial information within intra-frames. It selects salient patches related to the text, capturing more fine-grained visual details. Second, we introduce a Temporal Visual Representation Optimization (TVRO) module to learn temporal relations from inter-frames. Temporal triplet loss is employed in TVRO to enhance attention on text-relevant frames and capture clip semantics. Additionally, a self-supervised contrastive loss is introduced at the clip–text level to improve inter-clip discrimination by maximizing semantic variance during training. Experiments on Charades-STA, ActivityNet Captions, and TACoS, widely used benchmark datasets, demonstrate that our method outperforms state-of-the-art methods across multiple metrics. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

27 pages, 6715 KiB  
Article
Structural Component Identification and Damage Localization of Civil Infrastructure Using Semantic Segmentation
by Piotr Tauzowski, Mariusz Ostrowski, Dominik Bogucki, Piotr Jarosik and Bartłomiej Błachowski
Sensors 2025, 25(15), 4698; https://doi.org/10.3390/s25154698 - 30 Jul 2025
Viewed by 35
Abstract
Visual inspection of civil infrastructure for structural health assessment, as performed by structural engineers, is expensive and time-consuming. Therefore, automating this process is highly attractive, which has received significant attention in recent years. With the increasing capabilities of computers, deep neural networks have [...] Read more.
Visual inspection of civil infrastructure for structural health assessment, as performed by structural engineers, is expensive and time-consuming. Therefore, automating this process is highly attractive, which has received significant attention in recent years. With the increasing capabilities of computers, deep neural networks have become a standard tool and can be used for structural health inspections. A key challenge, however, is the availability of reliable datasets. In this work, the U-net and DeepLab v3+ convolutional neural networks are trained on a synthetic Tokaido dataset. This dataset comprises images representative of data acquired by unmanned aerial vehicle (UAV) imagery and corresponding ground truth data. The data includes semantic segmentation masks for both categorizing structural elements (slabs, beams, and columns) and assessing structural damage (concrete spalling or exposed rebars). Data augmentation, including both image quality degradation (e.g., brightness modification, added noise) and image transformations (e.g., image flipping), is applied to the synthetic dataset. The selected neural network architectures achieve excellent performance, reaching values of 97% for accuracy and 87% for Mean Intersection over Union (mIoU) on the validation data. It also demonstrates promising results in the semantic segmentation of real-world structures captured in photographs, despite being trained solely on synthetic data. Additionally, based on the obtained results of semantic segmentation, it can be concluded that DeepLabV3+ outperforms U-net in structural component identification. However, this is not the case in the damage identification task. Full article
(This article belongs to the Special Issue AI-Assisted Condition Monitoring and Fault Diagnosis)
Show Figures

Figure 1

28 pages, 3441 KiB  
Article
Which AI Sees Like Us? Investigating the Cognitive Plausibility of Language and Vision Models via Eye-Tracking in Human-Robot Interaction
by Khashayar Ghamati, Maryam Banitalebi Dehkordi and Abolfazl Zaraki
Sensors 2025, 25(15), 4687; https://doi.org/10.3390/s25154687 - 29 Jul 2025
Viewed by 196
Abstract
As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception [...] Read more.
As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception capabilities, their cognitive plausibility remains underexplored. In this study, we address this gap by using human visual attention as a behavioural proxy for cognition in a naturalistic human-robot interaction (HRI) scenario. Eye-tracking data were previously collected from participants engaging in social human-human interactions, providing frame-level gaze fixations as a human attentional ground truth. We then prompted a state-of-the-art VLM (LLaVA) to generate scene descriptions, which were processed by four LLMs (DeepSeek-R1-Distill-Qwen-7B, Qwen1.5-7B-Chat, LLaMA-3.1-8b-instruct, and Gemma-7b-it) to infer saliency points. Critically, we evaluated each model in both stateless and memory-augmented (short-term memory, STM) modes to assess the influence of temporal context on saliency prediction. Our results presented that whilst stateless LLaVA most closely replicates human gaze patterns, STM confers measurable benefits only for DeepSeek, whose lexical anchoring mirrors human rehearsal mechanisms. Other models exhibited degraded performance with memory due to prompt interference or limited contextual integration. This work introduces a novel, empirically grounded framework for assessing cognitive plausibility in generative models and underscores the role of short-term memory in shaping human-like visual attention in robotic systems. Full article
Show Figures

Figure 1

19 pages, 8766 KiB  
Article
Fusion of Airborne, SLAM-Based, and iPhone LiDAR for Accurate Forest Road Mapping in Harvesting Areas
by Evangelia Siafali, Vasilis Polychronos and Petros A. Tsioras
Land 2025, 14(8), 1553; https://doi.org/10.3390/land14081553 - 28 Jul 2025
Viewed by 207
Abstract
This study examined the integraftion of airborne Light Detection and Ranging (LiDAR), Simultaneous Localization and Mapping (SLAM)-based handheld LiDAR, and iPhone LiDAR to inspect forest road networks following forest operations. The goal is to overcome the challenges posed by dense canopy cover and [...] Read more.
This study examined the integraftion of airborne Light Detection and Ranging (LiDAR), Simultaneous Localization and Mapping (SLAM)-based handheld LiDAR, and iPhone LiDAR to inspect forest road networks following forest operations. The goal is to overcome the challenges posed by dense canopy cover and ensure accurate and efficient data collection and mapping. Airborne data were collected using the DJI Matrice 300 RTK UAV equipped with a Zenmuse L2 LiDAR sensor, which achieved a high point density of 285 points/m2 at an altitude of 80 m. Ground-level data were collected using the BLK2GO handheld laser scanner (HPLS) with SLAM methods (LiDAR SLAM, Visual SLAM, Inertial Measurement Unit) and the iPhone 13 Pro Max LiDAR. Data processing included generating DEMs, DSMs, and True Digital Orthophotos (TDOMs) via DJI Terra, LiDAR360 V8, and Cyclone REGISTER 360 PLUS, with additional processing and merging using CloudCompare V2 and ArcGIS Pro 3.4.0. The pairwise comparison analysis between ALS data and each alternative method revealed notable differences in elevation, highlighting discrepancies between methods. ALS + iPhone demonstrated the smallest deviation from ALS (MAE = 0.011, RMSE = 0.011, RE = 0.003%) and HPLS the larger deviation from ALS (MAE = 0.507, RMSE = 0.542, RE = 0.123%). The findings highlight the potential of fusing point clouds from diverse platforms to enhance forest road mapping accuracy. However, the selection of technology should consider trade-offs among accuracy, cost, and operational constraints. Mobile LiDAR solutions, particularly the iPhone, offer promising low-cost alternatives for certain applications. Future research should explore real-time fusion workflows and strategies to improve the cost-effectiveness and scalability of multisensor approaches for forest road monitoring. Full article
Show Figures

Figure 1

18 pages, 2335 KiB  
Article
MLLM-Search: A Zero-Shot Approach to Finding People Using Multimodal Large Language Models
by Angus Fung, Aaron Hao Tan, Haitong Wang, Bensiyon Benhabib and Goldie Nejat
Robotics 2025, 14(8), 102; https://doi.org/10.3390/robotics14080102 - 28 Jul 2025
Viewed by 230
Abstract
Robotic search of people in human-centered environments, including healthcare settings, is challenging, as autonomous robots need to locate people without complete or any prior knowledge of their schedules, plans, or locations. Furthermore, robots need to be able to adapt to real-time events that [...] Read more.
Robotic search of people in human-centered environments, including healthcare settings, is challenging, as autonomous robots need to locate people without complete or any prior knowledge of their schedules, plans, or locations. Furthermore, robots need to be able to adapt to real-time events that can influence a person’s plan in an environment. In this paper, we present MLLM-Search, a novel zero-shot person search architecture that leverages multimodal large language models (MLLM) to address the mobile robot problem of searching for a person under event-driven scenarios with varying user schedules. Our approach introduces a novel visual prompting method to provide robots with spatial understanding of the environment by generating a spatially grounded waypoint map, representing navigable waypoints using a topological graph and regions by semantic labels. This is incorporated into an MLLM with a region planner that selects the next search region based on the semantic relevance to the search scenario and a waypoint planner that generates a search path by considering the semantically relevant objects and the local spatial context through our unique spatial chain-of-thought prompting approach. Extensive 3D photorealistic experiments were conducted to validate the performance of MLLM-Search in searching for a person with a changing schedule in different environments. An ablation study was also conducted to validate the main design choices of MLLM-Search. Furthermore, a comparison study with state-of-the-art search methods demonstrated that MLLM-Search outperforms existing methods with respect to search efficiency. Real-world experiments with a mobile robot in a multi-room floor of a building showed that MLLM-Search was able to generalize to new and unseen environments. Full article
(This article belongs to the Section Intelligent Robots and Mechatronics)
Show Figures

Figure 1

26 pages, 11912 KiB  
Article
Multi-Dimensional Estimation of Leaf Loss Rate from Larch Caterpillar Under Insect Pest Stress Using UAV-Based Multi-Source Remote Sensing
by He-Ya Sa, Xiaojun Huang, Li Ling, Debao Zhou, Junsheng Zhang, Gang Bao, Siqin Tong, Yuhai Bao, Dashzebeg Ganbat, Mungunkhuyag Ariunaa, Dorjsuren Altanchimeg and Davaadorj Enkhnasan
Drones 2025, 9(8), 529; https://doi.org/10.3390/drones9080529 - 28 Jul 2025
Viewed by 249
Abstract
Leaf loss caused by pest infestations poses a serious threat to forest health. The leaf loss rate (LLR) refers to the percentage of the overall tree-crown leaf loss per unit area and is an important indicator for evaluating forest health. Therefore, rapid and [...] Read more.
Leaf loss caused by pest infestations poses a serious threat to forest health. The leaf loss rate (LLR) refers to the percentage of the overall tree-crown leaf loss per unit area and is an important indicator for evaluating forest health. Therefore, rapid and accurate acquisition of the LLR via remote sensing monitoring is crucial. This study is based on drone hyperspectral and LiDAR data as well as ground survey data, calculating hyperspectral indices (HSI), multispectral indices (MSI), and LiDAR indices (LI). It employs Savitzky–Golay (S–G) smoothing with different window sizes (W) and polynomial orders (P) combined with recursive feature elimination (RFE) to select sensitive features. Using Random Forest Regression (RFR) and Convolutional Neural Network Regression (CNNR) to construct a multidimensional (horizontal and vertical) estimation model for LLR, combined with LiDAR point cloud data, achieved a three-dimensional visualization of the leaf loss rate of trees. The results of the study showed: (1) The optimal combination of HSI and MSI was determined to be W11P3, and the LI was W5P2. (2) The optimal combination of the number of sensitive features extracted by the RFE algorithm was 13 HSI, 16 MSI, and hierarchical LI (2 in layer I, 9 in layer II, and 11 in layer III). (3) In terms of the horizontal estimation of the defoliation rate, the model performance index of the CNNRHSI model (MPI = 0.9383) was significantly better than that of RFRMSI (MPI = 0.8817), indicating that the continuous bands of hyperspectral could better monitor the subtle changes of LLR. (4) The I-CNNRHSI+LI, II-CNNRHSI+LI, and III-CNNRHSI+LI vertical estimation models were constructed by combining the CNNRHSI model with the best accuracy and the LI sensitive to different vertical levels, respectively, and their MPIs reached more than 0.8, indicating that the LLR estimation of different vertical levels had high accuracy. According to the model, the pixel-level LLR of the sample tree was estimated, and the three-dimensional display of the LLR for forest trees under the pest stress of larch caterpillars was generated, providing a high-precision research scheme for LLR estimation under pest stress. Full article
(This article belongs to the Section Drones in Agriculture and Forestry)
Show Figures

Figure 1

27 pages, 2978 KiB  
Article
Dynamic Monitoring and Precision Fertilization Decision System for Agricultural Soil Nutrients Using UAV Remote Sensing and GIS
by Xiaolong Chen, Hongfeng Zhang and Cora Un In Wong
Agriculture 2025, 15(15), 1627; https://doi.org/10.3390/agriculture15151627 - 27 Jul 2025
Viewed by 289
Abstract
We propose a dynamic monitoring and precision fertilization decision system for agricultural soil nutrients, integrating UAV remote sensing and GIS technologies to address the limitations of traditional soil nutrient assessment methods. The proposed method combines multi-source data fusion, including hyperspectral and multispectral UAV [...] Read more.
We propose a dynamic monitoring and precision fertilization decision system for agricultural soil nutrients, integrating UAV remote sensing and GIS technologies to address the limitations of traditional soil nutrient assessment methods. The proposed method combines multi-source data fusion, including hyperspectral and multispectral UAV imagery with ground sensor data, to achieve high-resolution spatial and spectral analysis of soil nutrients. Real-time data processing algorithms enable rapid updates of soil nutrient status, while a time-series dynamic model captures seasonal variations and crop growth stage influences, improving prediction accuracy (RMSE reductions of 43–70% for nitrogen, phosphorus, and potassium compared to conventional laboratory-based methods and satellite NDVI approaches). The experimental validation compared the proposed system against two conventional approaches: (1) laboratory soil testing with standardized fertilization recommendations and (2) satellite NDVI-based fertilization. Field trials across three distinct agroecological zones demonstrated that the proposed system reduced fertilizer inputs by 18–27% while increasing crop yields by 4–11%, outperforming both conventional methods. Furthermore, an intelligent fertilization decision model generates tailored fertilization plans by analyzing real-time soil conditions, crop demands, and climate factors, with continuous learning enhancing its precision over time. The system also incorporates GIS-based visualization tools, providing intuitive spatial representations of nutrient distributions and interactive functionalities for detailed insights. Our approach significantly advances precision agriculture by automating the entire workflow from data collection to decision-making, reducing resource waste and optimizing crop yields. The integration of UAV remote sensing, dynamic modeling, and machine learning distinguishes this work from conventional static systems, offering a scalable and adaptive framework for sustainable farming practices. Full article
(This article belongs to the Section Agricultural Soils)
Show Figures

Figure 1

10 pages, 1114 KiB  
Article
Restoration of Joint Line Obliquity May Not Influence Lower Extremity Peak Frontal Plane Moments During Stair Negotiation
by Alexis K. Nelson-Tranum, Marcus C. Ford, Nuanqiu Hou, Douglas W. Powell, Christopher T. Holland and William M. Mihalko
Bioengineering 2025, 12(8), 803; https://doi.org/10.3390/bioengineering12080803 - 26 Jul 2025
Viewed by 253
Abstract
Approximately 15% of total knee arthroplasty (TKA) patients remain dissatisfied after surgery, with joint line obliquity (JLO) potentially affecting patient outcomes. This study investigated whether JLO restoration influenced lower extremity frontal plane joint moments during stair negotiation by TKA patients. Thirty unrestored and [...] Read more.
Approximately 15% of total knee arthroplasty (TKA) patients remain dissatisfied after surgery, with joint line obliquity (JLO) potentially affecting patient outcomes. This study investigated whether JLO restoration influenced lower extremity frontal plane joint moments during stair negotiation by TKA patients. Thirty unrestored and twenty-two restored JLO patients participated in this study and were asked to perform five trials on each limb for stair negotiation while three-dimensional kinematics and ground reaction forces were recorded. Frontal plane moments at the ankle, knee and hip were calculated using Visual 3D. The restoration of JLO did not alter frontal plane joint moments during stair negotiation. Both groups showed symmetrical moment profiles, indicating no significant biomechanical differences between the restored and unrestored JLO groups. Restoring JLO did not affect frontal plane joint moments during stair negotiation, suggesting it may not contribute to patient satisfaction disparities post-TKA. Further research should explore other factors, such as surgical technique and implant design, that might influence recovery. Full article
Show Figures

Figure 1

21 pages, 3293 KiB  
Article
A Fusion of Entropy-Enhanced Image Processing and Improved YOLOv8 for Smoke Recognition in Mine Fires
by Xiaowei Li and Yi Liu
Entropy 2025, 27(8), 791; https://doi.org/10.3390/e27080791 - 25 Jul 2025
Viewed by 160
Abstract
Smoke appears earlier than flames, so image-based fire monitoring techniques mainly focus on the detection of smoke, which is regarded as one of the effective strategies for preventing the spread of initial fires that eventually evolve into serious fires. Smoke monitoring in mine [...] Read more.
Smoke appears earlier than flames, so image-based fire monitoring techniques mainly focus on the detection of smoke, which is regarded as one of the effective strategies for preventing the spread of initial fires that eventually evolve into serious fires. Smoke monitoring in mine fires faces serious challenges: the underground environment is complex, with smoke and backgrounds being highly integrated and visual features being blurred, which makes it difficult for existing image-based monitoring techniques to meet the actual needs in terms of accuracy and robustness. The conventional ground-based methods are directly used in the underground with a high rate of missed detection and false detection. Aiming at the core problems of mixed target and background information and high boundary uncertainty in smoke images, this paper, inspired by the principle of information entropy, proposes a method for recognizing smoke from mine fires by integrating entropy-enhanced image processing and improved YOLOv8. Firstly, according to the entropy change characteristics of spatio-temporal information brought by smoke diffusion movement, based on spatio-temporal entropy separation, an equidistant frame image differential fusion method is proposed, which effectively suppresses the low entropy background noise, enhances the detail clarity of the high entropy smoke region, and significantly improves the image signal-to-noise ratio. Further, in order to cope with the variable scale and complex texture (high information entropy) of the smoke target, an improvement mechanism based on entropy-constrained feature focusing is introduced on the basis of the YOLOv8m model, so as to more effectively capture and distinguish the rich detailed features and uncertain information of the smoke region, realizing the balanced and accurate detection of large and small smoke targets. The experiments show that the comprehensive performance of the proposed method is significantly better than the baseline model and similar algorithms, and it can meet the demand of real-time detection. Compared with YOLOv9m, YOLOv10n, and YOLOv11n, although there is a decrease in inference speed, the accuracy, recall, average detection accuracy mAP (50), and mAP (50–95) performance metrics are all substantially improved. The precision and robustness of smoke recognition in complex mine scenarios are effectively improved. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

18 pages, 2878 KiB  
Article
Flow Field Reconstruction and Prediction of Powder Fuel Transport Based on Scattering Images and Deep Learning
by Hongyuan Du, Zhen Cao, Yingjie Song, Jiangbo Peng, Chaobo Yang and Xin Yu
Sensors 2025, 25(15), 4613; https://doi.org/10.3390/s25154613 - 25 Jul 2025
Viewed by 130
Abstract
This paper presents the flow field reconstruction and prediction of powder fuel transport systems based on representative feature extraction from scattering images using deep learning techniques. A laboratory-built powder fuel supply system was used to conduct scattering spectroscopy experiments on boron-based fuel under [...] Read more.
This paper presents the flow field reconstruction and prediction of powder fuel transport systems based on representative feature extraction from scattering images using deep learning techniques. A laboratory-built powder fuel supply system was used to conduct scattering spectroscopy experiments on boron-based fuel under various flow rate conditions. Based on the acquired scattering images, a prediction and reconstruction method was developed using a deep network framework composed of a Stacked Autoencoder (SAE), a Backpropagation Neural Network (BP), and a Long Short-Term Memory (LSTM) model. The proposed framework enables accurate classification and prediction of the dynamic evolution of flow structures based on learned representations from scattering images. Experimental results show that the feature vectors extracted by the SAE form clearly separable clusters in the latent space, leading to high classification accuracy under varying flow conditions. In the prediction task, the feature vectors predicted by the LSTM exhibit strong agreement with ground truth, with average mean square error, mean absolute error, and r-square values of 0.0027, 0.0398, and 0.9897, respectively. Furthermore, the reconstructed images offer a visual representation of the changing flow field, validating the model’s effectiveness in structure-level recovery. These results suggest that the proposed method provides reliable support for future real-time prediction of powder fuel mass flow rates based on optical sensing and imaging techniques. Full article
(This article belongs to the Special Issue Important Achievements in Optical Measurements in China 2024–2025)
Show Figures

Figure 1

Back to TopTop