Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,119)

Search Parameters:
Keywords = scenario recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1019 KB  
Article
Pose-Driven Cow Behavior Recognition in Complex Barn Environments: A Method Combining Knowledge Distillation and Deployment Optimization
by Jie Hu, Xuan Li, Ruyue Ren, Shujie Wang, Mingkai Yang, Jianing Zhao, Juan Liu and Fuzhong Li
Animals 2026, 16(9), 1301; https://doi.org/10.3390/ani16091301 - 23 Apr 2026
Abstract
Cattle behavior constitutes important phenotypic information reflecting animals’ health status, activity level, and welfare condition, and is therefore of considerable significance for automated monitoring and precision management in smart livestock farming. However, under complex barn conditions, cattle behavior recognition is easily affected by [...] Read more.
Cattle behavior constitutes important phenotypic information reflecting animals’ health status, activity level, and welfare condition, and is therefore of considerable significance for automated monitoring and precision management in smart livestock farming. However, under complex barn conditions, cattle behavior recognition is easily affected by factors such as illumination variation, partial occlusion, background interference, and individual differences, thereby reducing recognition stability and generalization capability. To address these challenges, this study proposes a pose-driven method for cattle behavior recognition in complex barn environments. First, a 16-keypoint annotation scheme suitable for describing bovine posture, termed cow16, was constructed. Based on this scheme, OpenPose was employed to extract heatmaps (HMs) and part affinity fields (PAFs), which were then used to build an intermediate HM/PAF posture representation. Subsequently, this representation was taken as the input to a lightweight convolutional neural network for classifying three behavioral categories: stand, walk, and lying. On this basis, class-imbalance correction during training and a multi-random-seed logits ensemble strategy during inference were further introduced. In addition, knowledge distillation was adopted to transfer knowledge from a high-performance teacher model to a lightweight student model. Experimental results demonstrate that training-stage class-imbalance correction and inference-stage multi-random-seed logits ensembling exhibit strong complementarity; when combined, the AB configuration improves the test-set Macro-F1 by 3.83 percentage points. Moreover, the distilled student model still achieves competitive recognition performance while maintaining 1× inference cost, indicating a favorable trade-off between accuracy and efficiency. This study provides a useful reference for deployment-oriented cattle behavior recognition in smart farming scenarios and offers a lightweight technical basis for subsequent practical applications. Full article
(This article belongs to the Section Cattle)
25 pages, 2360 KB  
Article
ACF-YOLO: Feature Enhancement and Multi-Scale Alignment for Sustainable Crop Small Object Detection
by Chuanxiang Li, Yihang Li, Wenzhong Yang and Danny Chen
Sustainability 2026, 18(9), 4168; https://doi.org/10.3390/su18094168 - 22 Apr 2026
Abstract
Sustainable precision agriculture is crucial for optimizing resource utilization, reducing chemical inputs, and ensuring global food security. High-precision automatic recognition and monitoring of key crop organs (e.g., wheat heads and flower clusters) serve as the technological foundation for sustainable agricultural management decisions. However, [...] Read more.
Sustainable precision agriculture is crucial for optimizing resource utilization, reducing chemical inputs, and ensuring global food security. High-precision automatic recognition and monitoring of key crop organs (e.g., wheat heads and flower clusters) serve as the technological foundation for sustainable agricultural management decisions. However, visual perception in natural field environments is highly susceptible to external conditions. To address the challenges of severe background interference and feature dilution in crop small object detection within complex agricultural scenarios, this paper proposes an enhanced detection network, ACF-YOLO, based on YOLO11. First, an Aggregated Multi-scale Local-Global Attention (AMLGA) module is designed to enhance the feature representation of weak targets by fusing local details with global semantics. Second, a Context-Guided Fusion Module (CGFM) and a Soft-Neighbor Interpolation (SNI) strategy are introduced. Their synergy alleviates feature aliasing effects and ensures the precise alignment of deep semantic information with shallow spatial details. Furthermore, the Inner-MPDIoU loss function is employed to optimize the bounding box regression accuracy for non-rigid targets by incorporating geometric constraints and auxiliary scale factors. To verify the detection capability of the proposed method, we constructed a UAV Wheat Head Dataset (UWHD) and conducted extensive experiments on the UWHD, GWHD2021, and RFRB datasets. The experimental results demonstrate that ACF-YOLO outperforms other comparative methods, confirming its stable detection performance and contributing to the sustainable development of agriculture. Full article
(This article belongs to the Section Sustainable Agriculture)
15 pages, 5996 KB  
Article
Real-Time Analysis and Intervention of Classroom Behavior Using Multi-Modal Fusion and Spatiotemporal Context
by Kai Zhao and Guiling Sun
Appl. Sci. 2026, 16(9), 4069; https://doi.org/10.3390/app16094069 - 22 Apr 2026
Abstract
Analyzing classroom engagement is essential for developing effective smart learning environments. Conventional methods often face challenges in achieving reliable identification of individual students, accurately recognizing their behavioral states, and providing timely support. This paper presents a multimodal sensing and supportive feedback system built [...] Read more.
Analyzing classroom engagement is essential for developing effective smart learning environments. Conventional methods often face challenges in achieving reliable identification of individual students, accurately recognizing their behavioral states, and providing timely support. This paper presents a multimodal sensing and supportive feedback system built upon an end–edge–cloud collaborative architecture. By integrating RFID-based seat association, fingerprint verification, and computer vision-based activity analysis, the system establishes a reliable link between student identity and observed activities. Key computational tasks, including activity recognition, spatiotemporal context matching, and rule-based assessment, are executed locally on edge nodes. This enables low-latency, privacy-conscious feedback delivered via Bluetooth, effectively avoiding delays associated with cloud processing. Experimental results indicate that the proposed system significantly enhances both activity recognition accuracy and identity–behavior association reliability in typical classroom scenarios while substantially reducing the average feedback latency compared to traditional approaches. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

18 pages, 3878 KB  
Article
Research on Vision-Based Autonomous Landing Fusion Positioning Algorithm for Unmanned Aerial Vehicle
by Hongyuan Zhu, Jing Ni, Nan Yang, Boyang Gao and Xiaoxiong Liu
Machines 2026, 14(5), 460; https://doi.org/10.3390/machines14050460 - 22 Apr 2026
Viewed by 50
Abstract
A multi-task network for runway lines and runway markings based on deep learning was designed to address the issue of prior information dependence on runway width in unmanned aerial vehicle visual autonomous landing application scenarios. By detecting runway images captured at different positions [...] Read more.
A multi-task network for runway lines and runway markings based on deep learning was designed to address the issue of prior information dependence on runway width in unmanned aerial vehicle visual autonomous landing application scenarios. By detecting runway images captured at different positions during flight, the parameters of the runway start line, left and right boundary lines, and runway markings were obtained. On this basis, a runway width estimation model and visual positioning algorithm based on line features were designed. In standard runway scenarios, the recognition of runway signs provides valuable prior information about the runway width. For simplified runways or cases where signs are missing, we have devised a width estimation model based on the left/right boundary lines. Furthermore, considering the variation in pitch angle during the UAV’s landing process, we have analyzed and refined the width estimation model to ensure its applicability throughout the entire landing process. Additionally, we have developed a visual positioning algorithm that utilizes the runway width and runway line parameters to calculate the relative position between the UAV and the runway. Considering the limitations of a single visual positioning algorithm, we adopt a visual and inertial navigation fusion positioning algorithm to enhance the reliability of landing positioning. To validate our algorithms, we have constructed a visual simulation platform and flight test. These tests confirm the effectiveness and accuracy of our detection algorithm and width estimation model. Furthermore, by utilizing the estimated runway width and the detected runway line parameters, we have successfully calculated the relative position, further validating the effectiveness of our positioning algorithm. Full article
(This article belongs to the Special Issue Advanced Flight Control and Intelligent Trajectory Planning in UAVs)
Show Figures

Figure 1

28 pages, 7294 KB  
Article
Nighttime Encounter Situation Recognition for Unmanned Surface Vessels Based on Images of Vessel Navigation Lights
by Ruoyun Huang, Xiang Zheng, Jianhua Wang, Gongxing Wu, Yu Tian and Yining Tian
J. Mar. Sci. Eng. 2026, 14(8), 761; https://doi.org/10.3390/jmse14080761 - 21 Apr 2026
Viewed by 81
Abstract
To address the limitations of existing perception methods for nighttime encounter situation recognition of unmanned surface vessels (USVs), this study proposes an image-based method for navigation-light recognition and encounter situation recognition. In accordance with the International Regulations for Preventing Collisions at Sea (COLREGs), [...] Read more.
To address the limitations of existing perception methods for nighttime encounter situation recognition of unmanned surface vessels (USVs), this study proposes an image-based method for navigation-light recognition and encounter situation recognition. In accordance with the International Regulations for Preventing Collisions at Sea (COLREGs), a parameterized 3D geometric model of vessel navigation lights and encounter scenario models is established. Based on the camera imaging principle, a dataset of navigation-light images under various encounter situations is generated through simulation experiments. By analyzing the variation patterns of navigation-light images in different encounter situations, a feature vector composed of area-domain and azimuth-domain features is constructed, and an encounter situation recognition method is developed accordingly. To mitigate the effects of water reflections and interfering light sources in real images, a navigation-light image-processing method is designed for the stable extraction of feature parameters. Simulation results show that the classification accuracy ranges from 96.6% to 98.3% at different distance conditions. In field experiments conducted with a small USV under a three-light configuration, the proposed method achieves a navigation-light recognition accuracy of 96.2% and an encounter situation recognition accuracy of 94.94%. The proposed method provides an interpretable and lightweight complementary visual solution for nighttime encounter situation recognition, complementing existing nighttime perception technologies. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

27 pages, 3977 KB  
Review
Recovering Speech from Vibrations: Principles and Algorithms in Radar and Laser Sensing
by Emily Bederov, Baruch Berdugo and Israel Cohen
Sensors 2026, 26(8), 2553; https://doi.org/10.3390/s26082553 - 21 Apr 2026
Viewed by 186
Abstract
Sensing audio using non-acoustic modalities such as millimeter-wave radar and laser-based systems has emerged as an active research area with significant implications for privacy, security, and robust speech processing. These approaches recover speech-related information from vibration measurements captured by non-acoustic sensing modalities. Prior [...] Read more.
Sensing audio using non-acoustic modalities such as millimeter-wave radar and laser-based systems has emerged as an active research area with significant implications for privacy, security, and robust speech processing. These approaches recover speech-related information from vibration measurements captured by non-acoustic sensing modalities. Prior work spans a wide range of techniques, from classical signal-processing pipelines to modern machine-learning and deep-learning models, enabling applications such as speech reconstruction, eavesdropping, automatic speech recognition, and noise-robust enhancement. Some systems rely on radar or laser sensing as a standalone audio surrogate, while others fuse radar-derived features with microphone signals to improve robustness in noisy or non-line-of-sight environments. Experimental results across the literature demonstrate that recovering intelligible speech or discriminative speech features from radar or laser-sensed vibrations is feasible under controlled conditions. However, performance remains sensitive to practical factors including sensing distance, object material and geometries, environmental interference, multipath effects, and task complexity. Not all speech-related tasks are reliably solved, particularly in unconstrained real-world scenarios. Overall, the field is rapidly evolving, with open challenges in robustness, generalization, and deployment, offering several promising directions for future research. Full article
Show Figures

Figure 1

13 pages, 758 KB  
Review
Incidental Gastric Neuroendocrine Tumor on Histology: What Should the Gastroenterologist Do Next?
by Elisabetta Dell’Unto, Maria Rinzivillo, Gianluca Esposito and Francesco Panzuto
Gastroenterol. Insights 2026, 17(2), 28; https://doi.org/10.3390/gastroent17020028 - 18 Apr 2026
Viewed by 227
Abstract
Gastric neuroendocrine tumors (NETs) are increasingly diagnosed as incidental findings during upper gastrointestinal endoscopy. For the gastroenterologist, the crucial challenge is not only at the time of endoscopic recognition but also when the pathology report states “well-differentiated gastric NET”. At that moment, the [...] Read more.
Gastric neuroendocrine tumors (NETs) are increasingly diagnosed as incidental findings during upper gastrointestinal endoscopy. For the gastroenterologist, the crucial challenge is not only at the time of endoscopic recognition but also when the pathology report states “well-differentiated gastric NET”. At that moment, the key clinical question is how to manage it correctly. Gastric NETs are biologically heterogeneous, and their management depends primarily on the pathophysiological setting in which they arise. Type 1 tumors develop in chronic atrophic gastritis and are usually indolent; type 2 tumors arise in the context of gastrinoma and MEN1; type 3 tumors are sporadic and carry a substantially higher metastatic risk. Consequently, the same histological label may correspond to profoundly different clinical scenarios. This review addresses what the gastroenterologist should do after receiving an incidental histological diagnosis of gastric NET, how to reconstruct the gastric background, when to suspect a sporadic type 3 lesion, how to select patients for endoscopic treatment versus staging or surgery, and how to interpret incomplete endoscopic resection. Particular attention is devoted to the emerging concept of proton pump inhibitor-associated gastric NETs, which may represent a distinct gastrin-driven subgroup with lower malignant potential than truly sporadic type 3 tumors. A practical algorithm and a clinicopathological comparison of the classic three gastric NET types are provided to support decision-making in daily practice. Full article
(This article belongs to the Section Alimentary Tract)
Show Figures

Figure 1

10 pages, 1056 KB  
Proceeding Paper
Low-Resolution Script Recognition for Chinese Characters with Similar Radicals Based on Local Relations and Global Statistical Features
by Yu-Jie Yao, Pin-Wen Huang and Jian-Jiun Ding
Eng. Proc. 2026, 134(1), 61; https://doi.org/10.3390/engproc2026134061 - 17 Apr 2026
Viewed by 95
Abstract
Chinese handwritten character recognition is challenging due to structural similarities among visually similar radicals and limited available training data, especially in the low-resolution case. In this study, a multi-dimensional feature fusion method combining a histogram of oriented gradients, Hu moments, Zernike moments, local [...] Read more.
Chinese handwritten character recognition is challenging due to structural similarities among visually similar radicals and limited available training data, especially in the low-resolution case. In this study, a multi-dimensional feature fusion method combining a histogram of oriented gradients, Hu moments, Zernike moments, local binary patterns, gray-level co-occurrence matrix, and stroke-based descriptors is proposed. Region-specific segmentation strategies enable fine-grained feature extraction, and recursive feature elimination with cross-validation effectively reduces feature redundancy. Experiments demonstrate that the proposed algorithm has superior recognition performance compared to existing methods, including deep learning-based methods, especially under data-constrained or low-resolution scenarios, highlighting the effectiveness and practicality of the proposed approach. Full article
Show Figures

Figure 1

29 pages, 23295 KB  
Article
Improving the Robustness of Odour Recognition with Odour-Image Data Fusion in Open-Air Settings
by Fanny Monori and Alin Tisan
Sensors 2026, 26(8), 2493; https://doi.org/10.3390/s26082493 - 17 Apr 2026
Viewed by 146
Abstract
Odour recognition with low-cost gas sensors is challenging in open-air settings due to the non-specificity of the sensors and environmental variability. This can be mitigated by incorporating additional information into the classification process. This paper investigates odour-image multimodality in two case-studies of increasing [...] Read more.
Odour recognition with low-cost gas sensors is challenging in open-air settings due to the non-specificity of the sensors and environmental variability. This can be mitigated by incorporating additional information into the classification process. This paper investigates odour-image multimodality in two case-studies of increasing complexity: banana ripening in open-air environment and strawberry ripening in a glasshouse environment. Data were collected using custom acquisition platforms equipped with cameras and MOX gas sensors operated with temperature modulation. For the visual modality, image classification (MobileNetV3) and object detection (YoloV5) models are trained. For the odour modality, established classical machine learning methods (Random Forest, XGBoost, SVM and Logistic Regression) and neural networks (1D-CNN, LSTM, MLP, and ELM) are employed. Each modality is analysed independently and together to critically assess scenarios in which combining modalities provides a clear advantage over using either modality alone. Results show that models trained on odour data achieve high accuracy in controlled environments but underperform in more dynamic open-air settings. Image-based models are sensitive to the image quality in all environments; however, they are more robust when deployed in different environments. Lastly, it is demonstrated that decision fusion consistently increases the accuracy, by as much as +12.36% in the banana ripening and +3.63% in the strawberry ripening scenario. Where decision fusion does not improve classification accuracy significantly, it is shown that the multimodal approach can still be leveraged to identify high-confidence predictions by selecting samples where both modalities agree on the label. Full article
(This article belongs to the Special Issue Recent Advances in Gas Sensors)
Show Figures

Figure 1

18 pages, 1867 KB  
Article
An Edge-Aware Change Detection Network Toward Urban Construction Land Change Identification
by Wuyi Cai, Gongming Li, Yanlong Zhang and Yonghong Mo
Buildings 2026, 16(8), 1573; https://doi.org/10.3390/buildings16081573 - 16 Apr 2026
Viewed by 112
Abstract
As urbanization transitions from incremental expansion to the optimized utilization of existing construction land, the precise identification of land-use status and changes has become a core requirement for enhancing refined land resource management. However, in urban built environments characterized by dense object distributions [...] Read more.
As urbanization transitions from incremental expansion to the optimized utilization of existing construction land, the precise identification of land-use status and changes has become a core requirement for enhancing refined land resource management. However, in urban built environments characterized by dense object distributions and complex geometric contours, existing change detection methods often struggle to capture subtle boundaries, leading to edge blurring and loss of detail. To address these challenges, this study proposes an Edge-aware Change Detection Network for urban construction land change identification. The model features a shared Siamese encoding network based on MiT-B1, leveraging its hierarchical multi-scale attention mechanism to balance local detail extraction with long-range semantic dependency capture, thereby overcoming the limitations of monolithic feature extraction. Furthermore, a multi-level feature concatenation and fusion strategy is designed to align and interact with bi-temporal features along the channel dimension, significantly enhancing the saliency and discriminative representation of change areas. Experimental results on the Yongzhou building change detection dataset demonstrate that the proposed model outperforms state-of-the-art methods in both visual recognition and quantitative metrics. It effectively resolves the difficulty of boundary definition in complex urban scenarios, providing localized high-precision technical support for the assessment and dynamic monitoring of construction land within the study area. Full article
(This article belongs to the Section Building Energy, Physics, Environment, and Systems)
Show Figures

Figure 1

36 pages, 2125 KB  
Article
Hybrid Neural Network-Based PDR with Multi-Layer Heading Correction Across Smartphone Carrying Modes
by Junhua Ye, Anzhe Ye, Ahmed Mansour, Shusu Qiu, Zhenzhen Li and Xuanyu Qu
Sensors 2026, 26(8), 2421; https://doi.org/10.3390/s26082421 - 15 Apr 2026
Viewed by 178
Abstract
Traditional pedestrian inertial navigation (PDR) algorithms usually assume that the carrying mode of a smartphone is fixed and remains horizontal, while ignoring the significant impact of dynamic changes in the carrying mode on heading estimation, which is the core element of PDR algorithms. [...] Read more.
Traditional pedestrian inertial navigation (PDR) algorithms usually assume that the carrying mode of a smartphone is fixed and remains horizontal, while ignoring the significant impact of dynamic changes in the carrying mode on heading estimation, which is the core element of PDR algorithms. In practical application scenarios, pedestrians often change their way of carrying smart terminals (e.g., calling) according to their needs, corresponding to the difference in the heading estimation method; especially when the mode is switched, it will cause a sudden change in heading, which will lead to a significant increase in the localization error if it cannot be corrected in time. Existing smart terminal carrying mode recognition methods that rely on traditional machine learning or set thresholds have poor robustness; lack of universality, especially weak diagnostic ability for mutation; and can not effectively reduce the heading error. Based on these practical problems, this paper innovatively proposes a PDR framework that tries to overcome these limitations. Based on this research purpose, firstly, this paper classifies four types of common carrying modes based on practical applications and designs a CNN-LSTM hybrid model, which can classify the four common carrying modes in near real-time, with a recognition accuracy as high as 99.68%. Secondly, based on the mode recognition results, a multi-layer heading correction strategy is introduced: (1) introducing a quaternion-based universal filter (VQF) algorithm to realize the accurate estimation of initial heading; (2) designing an algorithm to accurately detect the mode switching point and developing an adaptive offset correction algorithm to realize the dynamic compensation of heading in the process of mode switching to reduce the impact of sudden changes; and (3) considering the motion characteristics of pedestrians walking in a straight line segment where lateral displacement tends to be close to zero. This study designs a heading optimization method with lateral displacement constraints to further inhibit the drifting of the heading caused by the slight swaying of the smart terminal. In this study, two validation experiments are carried out in two different environment—an indoor corridor and a tree shelter—and the results show that based on the proposed multi-layer heading optimization strategy, the average heading error of the system is lower than 1.5°, the cumulative positioning error is lower than 1% of the walking distance, and the root mean square error of the checkpoints is lower than 2 m, which significantly reduces the positioning error and shows the effectiveness of the framework in complex environments. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

20 pages, 10822 KB  
Article
T-CASP: Time-Aware Continual Aspect Semantic-Driven Incremental Pretraining
by Shuai Feng, Pan Su, Zaishan Qi and Liran Yang
Appl. Sci. 2026, 16(8), 3837; https://doi.org/10.3390/app16083837 - 15 Apr 2026
Viewed by 270
Abstract
With the rapid advancement of medical foundation models, their deployment in clinical practice is increasingly required. However, privacy constraints of hospital-specific data make large-scale retraining infeasible, limiting model adaptability. To address this issue, we propose a Continual Aspect Semantic-driven Incremental Pretraining (CASP) framework, [...] Read more.
With the rapid advancement of medical foundation models, their deployment in clinical practice is increasingly required. However, privacy constraints of hospital-specific data make large-scale retraining infeasible, limiting model adaptability. To address this issue, we propose a Continual Aspect Semantic-driven Incremental Pretraining (CASP) framework, which enables efficient adaptation of foundation models to private data, and the pre-trained models can be effectively applied to downstream tasks. In this paper, we focus on fundus fluorescein angiography (FFA) in ophthalmology as a representative application scenario to validate the proposed approach. FFA is a critical imaging modality for retinal disease diagnosis, as it is able to capture dynamic vascular changes across multiple angiographic phases. However, most existing learning-based methods treat FFA images statically and independently, failing to exploit the rich temporal and phase-specific semantics that are essential for accurate diagnosis. In this paper, a Time-aware Continual Aspect Semantic-driven incremental Pretraining (T-CASP) framework is proposed for FFA sequences. To compensate for limited temporal descriptions in clinical reports, large language models are first used to construct a temporal disease knowledge dictionary with phase-specific semantic descriptions. Based on this dictionary, a disease correlation matrix is injected into contrastive learning to guide fine-grained image–text alignment. A multi-layer gated residual adapter is further designed to capture phase-level semantics and enable knowledge transfer across learning stages through phase-wise continual pretraining. Extensive experiments demonstrate that T-CASP effectively models dynamic temporal semantics in FFA sequences, yielding consistent performance improvements over time-unaware and static baselines in retinal disease recognition. By explicitly integrating phase-wise temporal knowledge and continual semantic refinement, T-CASP provides a clinically consistent and effective solution for temporal FFA analysis, enhancing robustness and diagnostic accuracy in ophthalmic multimodal learning. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 5307 KB  
Article
MSA-DETR: A Multi-Scale Attention Augmented Model for Small Object Detection in UAV Imagery
by Zhihao Li and Liang Qi
Remote Sens. 2026, 18(8), 1179; https://doi.org/10.3390/rs18081179 - 15 Apr 2026
Viewed by 288
Abstract
Small object detection in UAV imagery presents challenges due to factors such as minute scale, indistinct features, and severe background clutter, which constrain the recognition performance of end-to-end models like RT-DETR. To enhance detection accuracy for small-scale objects, this paper proposes MSA-DETR, a [...] Read more.
Small object detection in UAV imagery presents challenges due to factors such as minute scale, indistinct features, and severe background clutter, which constrain the recognition performance of end-to-end models like RT-DETR. To enhance detection accuracy for small-scale objects, this paper proposes MSA-DETR, a Multi-scale Spatial Attention-enhanced detection model based on RT-DETR (Res18). Three specific structural improvements are introduced. First, a PercepConv module is designed to capture comprehensive multi-scale information through 1 × 1, 3 × 3, and 5 × 5 convolutions, as well as dilated convolutions. This module integrates a lightweight channel attention mechanism to adaptively emphasize regions containing small objects. Second, the SODAttention module is introduced to jointly model local spatial details and global contextual information, thereby enhancing the discriminative capability in key regions and significantly suppressing interference from complex backgrounds. Finally, a dedicated small object detection layer is added to the detection head, incorporating shallow fine-grained features to compensate for the semantic limitations of deep layers concerning small targets. Experimental results demonstrate that the proposed MSA-DETR achieves significant performance gains on the VisDrone2019 dataset, increasing mAP@50 from 47.5% to 52.2% and mAP@50–95 from 29.3% to 33.2%. Moreover, the proposed model outperforms the baseline by an absolute margin of 1.9% on the small-object-specific metric APs, achieving 20.3%. These results validate the effectiveness of the proposed method for small object detection in UAV scenarios. Full article
Show Figures

Figure 1

14 pages, 579 KB  
Article
Wearable Sensor-Free Adult Physical Activity Monitoring Using Smartphone IMU Signals: Cross-Subject Deep Learning with Window-Length and Sensor Modality Studies
by Mussa Turdalyuly, Ay Zholdassova, Tolganay Turdalykyzy and Aydin Doshybekov
Information 2026, 17(4), 368; https://doi.org/10.3390/info17040368 - 14 Apr 2026
Viewed by 306
Abstract
Human activity recognition (HAR) using inertial sensors is essential for health monitoring and wellness applications, yet robust classification in real-world adult scenarios remains challenging due to subject variability and activity transitions in smartphone sensing environments. This study investigated smartphone-based physical activity recognition using [...] Read more.
Human activity recognition (HAR) using inertial sensors is essential for health monitoring and wellness applications, yet robust classification in real-world adult scenarios remains challenging due to subject variability and activity transitions in smartphone sensing environments. This study investigated smartphone-based physical activity recognition using accelerometer and gyroscope signals under a cross-subject evaluation protocol. To reduce label ambiguity and improve generalization, the original activity set was grouped into a reduced 6-class taxonomy. We evaluated lightweight deep learning models, including a smartphone-only convolutional neural network (CNN) and a multimodal fusion model combining smartphone and smartwatch signals. Using GroupKFold cross-subject validation, the smartphone-only CNN achieved competitive performance with Macro-F1 ≈ 0.46, while multimodal fusion did not provide consistent improvements. We also examined temporal segmentation and showed that shorter windows (2.0 s) yield better results than longer windows. Sensor ablation confirmed the importance of gyroscope information, and per-class analysis indicated that dynamic activities could be recognized reliably, whereas stairs and static categories remained difficult. Overall, the results demonstrate the practicality of smartphone-based activity recognition using built-in smartphone sensors without external wearable devices for adult activity monitoring and provide recommendations for window length and sensor selection in cross-subject HAR. Full article
Show Figures

Figure 1

16 pages, 1470 KB  
Article
Physics-Guided Deep Learning for Interpretable Biomedical Image Reconstruction and Pattern Recognition in Diagnostic Frameworks
by Akeel Qadir, Saad Arif, Prajoona Valsalan and Osama Khan
Bioengineering 2026, 13(4), 457; https://doi.org/10.3390/bioengineering13040457 - 13 Apr 2026
Viewed by 347
Abstract
This study introduces a physics-guided deep learning architecture designed for the simulation, reconstruction, and pattern recognition of biomedical images. By explicitly integrating physical priors into the learning model, the framework addresses the black-box nature of traditional artificial intelligence (AI). It provides an explainable [...] Read more.
This study introduces a physics-guided deep learning architecture designed for the simulation, reconstruction, and pattern recognition of biomedical images. By explicitly integrating physical priors into the learning model, the framework addresses the black-box nature of traditional artificial intelligence (AI). It provides an explainable AI pathway that enhances diagnostic accuracy, robustness, and clinical interpretation. The proposed framework was evaluated through systematic simulation studies. It involved complex geometric configurations, multimodal physical fields, and noise-corrupted synthetic three-dimensional brain volumes. Quantitative analysis demonstrates consistent improvements in reconstruction fidelity, with the peak signal-to-noise ratio (PSNR) reaching 47 dB and the structural similarity index exceeding 0.90 across all scenarios. Notably, at moderate noise levels (0.05), the framework maintains a PSNR greater than 32 dB, ensuring structural integrity essential for computer-aided diagnosis. Volumetric brain experiments further reveal a 38–44% reduction in activation localization errors, highlighting the framework’s utility in functional imaging and disease prognosis. By grounding deep learning in physical constraints, this study provides a transparent and robust solution for automated disease classification and advanced biomedical imaging tasks within clinical decision support systems. Full article
Show Figures

Figure 1

Back to TopTop