MDPI - Publisher of Open Access Journals

19 pages, 17158 KiB

Open AccessArticle

Deep Learning Strategy for UAV-Based Multi-Class Damage Detection on Railway Bridges Using U-Net with Different Loss Functions

by Yong-Hyoun Na and Doo-Kie Kim

Appl. Sci. 2025, 15(15), 8719; https://doi.org/10.3390/app15158719 (registering DOI) - 7 Aug 2025

Abstract

Periodic visual inspections are currently conducted to maintain the condition of railway bridges. These inspections rely on direct visual assessments by human inspectors, often requiring specialized equipment such as aerial ladders. However, this method is not only time-consuming and costly but also involves [...] Read more.

Periodic visual inspections are currently conducted to maintain the condition of railway bridges. These inspections rely on direct visual assessments by human inspectors, often requiring specialized equipment such as aerial ladders. However, this method is not only time-consuming and costly but also involves significant safety risks. Therefore, there is a growing need for a more efficient and reliable alternative to traditional visual inspections of railway bridges. In this study, we evaluated and compared the performance of damage detection using U-Net-based deep learning models on images captured by unmanned aerial vehicles (UAVs). The target damage types include cracks, concrete spalling and delamination, water leakage, exposed reinforcement, and paint peeling. To enable multi-class segmentation, the U-Net model was trained using three different loss functions: Cross-Entropy Loss, Focal Loss, and Intersection over Union (IoU) Loss. We compared these methods to determine their ability to distinguish actual structural damage from environmental factors and surface contamination, particularly under real-world site conditions. The results showed that the U-Net model trained with IoU Loss outperformed the others in terms of detection accuracy. When applied to field inspection scenarios, this approach demonstrates strong potential for objective and precise damage detection. Furthermore, the use of UAVs in the inspection process is expected to significantly reduce both time and cost in railway infrastructure maintenance. Future research will focus on extending the detection capabilities to additional damage types such as efflorescence and corrosion, aiming to ultimately replace manual visual inspections of railway bridge surfaces with deep-learning-based methods. Full article

(This article belongs to the Special Issue Intelligent Construction: Advancements in Civil Engineering and Building Structures)

► Show Figures

Figure 1

19 pages, 3549 KiB

Open AccessArticle

Method for Target Detection in a High Noise Environment Through Frequency Analysis Using an Event-Based Vision Sensor

by Will Johnston, Shannon Young, David Howe, Rachel Oliver, Zachry Theis, Brian McReynolds and Michael Dexter

Signals 2025, 6(3), 39; https://doi.org/10.3390/signals6030039 - 5 Aug 2025

Abstract

Event-based vision sensors (EVSs), often referred to as neuromorphic cameras, operate by responding to changes in brightness on a pixel-by-pixel basis. In contrast, traditional framing cameras employ some fixed sampling interval where integrated intensity is read off the entire focal plane at once. [...] Read more.

Event-based vision sensors (EVSs), often referred to as neuromorphic cameras, operate by responding to changes in brightness on a pixel-by-pixel basis. In contrast, traditional framing cameras employ some fixed sampling interval where integrated intensity is read off the entire focal plane at once. Similar to traditional cameras, EVSs can suffer loss of sensitivity through scenes with high intensity and dynamic clutter, reducing the ability to see points of interest through traditional event processing means. This paper describes a method to reduce the negative impacts of these types of EVS clutter and enable more robust target detection through the use of individual pixel frequency analysis, background suppression, and statistical filtering. Additionally, issues found in normal frequency analysis such as phase differences between sources, aliasing, and spectral leakage are less relevant in this method. The statistical filtering simply determines what pixels have significant frequency content after the background suppression instead of focusing on the actual frequencies in the scene. Initial testing on simulated data demonstrates a proof of concept for this method, which reduces artificial scene noise and enables improved target detection. Full article

► Show Figures

Figure 1

21 pages, 4331 KiB

Open AccessArticle

Research on Lightweight Tracking of Small-Sized UAVs Based on the Improved YOLOv8N-Drone Architecture

by Yongjuan Zhao, Qiang Ma, Guannan Lei, Lijin Wang and Chaozhe Guo

Drones 2025, 9(8), 551; https://doi.org/10.3390/drones9080551 - 5 Aug 2025

Abstract

Traditional unmanned aerial vehicle (UAV) detection and tracking methods have long faced the twin challenges of high cost and poor efficiency. In real-world battlefield environments with complex backgrounds, occlusions, and varying speeds, existing techniques struggle to track small UAVs accurately and stably. To [...] Read more.

Traditional unmanned aerial vehicle (UAV) detection and tracking methods have long faced the twin challenges of high cost and poor efficiency. In real-world battlefield environments with complex backgrounds, occlusions, and varying speeds, existing techniques struggle to track small UAVs accurately and stably. To tackle these issues, this paper presents an enhanced YOLOv8N-Drone-based algorithm for improved target tracking of small UAVs. Firstly, a novel module named C2f-DSFEM (Depthwise-Separable and Sobel Feature Enhancement Module) is designed, integrating Sobel convolution with depthwise separable convolution across layers. Edge detail extraction and multi-scale feature representation are synchronized through a bidirectional feature enhancement mechanism, and the discriminability of target features in complex backgrounds is thus significantly enhanced. For the feature confusion problem, the improved lightweight Context Anchored Attention (CAA) mechanism is integrated into the Neck network, which effectively improves the system’s adaptability to complex scenes. By employing a position-aware weight allocation strategy, this approach enables adaptive suppression of background interference and precise focus on the target region, thereby improving localization accuracy. At the level of loss function optimization, the traditional classification loss is replaced by the focal loss (Focal Loss). This mechanism effectively suppresses the contribution of easy-to-classify samples through a dynamic weight adjustment strategy, while significantly increasing the priority of difficult samples in the training process. The class imbalance that exists between the positive and negative samples is then significantly mitigated. Experimental results show the enhanced YOLOv8 boosts mean average precision (Map@0.5) by 12.3%, hitting 99.2%. In terms of tracking performance, the proposed YOLOv8 N-Drone algorithm achieves a 19.2% improvement in Multiple Object Tracking Accuracy (MOTA) under complex multi-scenario conditions. Additionally, the IDF1 score increases by 6.8%, and the number of ID switches is reduced by 85.2%, indicating significant improvements in both accuracy and stability of UAV tracking. Compared to other mainstream algorithms, the proposed improved method demonstrates significant advantages in tracking performance, offering a more effective and reliable solution for small-target tracking tasks in UAV applications. Full article

► Show Figures

Figure 1

17 pages, 6471 KiB

Open AccessArticle

A Deep Learning Framework for Traffic Accident Detection Based on Improved YOLO11

by Weijun Li, Liyan Huang and Xiaofeng Lai

Vehicles 2025, 7(3), 81; https://doi.org/10.3390/vehicles7030081 - 4 Aug 2025

Viewed by 158

Abstract

The automatic detection of traffic accidents plays an increasingly vital role in advancing intelligent traffic monitoring systems and improving road safety. Leveraging computer vision techniques offers a promising solution, enabling rapid, reliable, and automated identification of accidents, thereby significantly reducing emergency response times. [...] Read more.

The automatic detection of traffic accidents plays an increasingly vital role in advancing intelligent traffic monitoring systems and improving road safety. Leveraging computer vision techniques offers a promising solution, enabling rapid, reliable, and automated identification of accidents, thereby significantly reducing emergency response times. This study proposes an enhanced version of the YOLO11 architecture, termed YOLO11-AMF. The proposed model integrates a Mamba-Like Linear Attention (MLLA) mechanism, an Asymptotic Feature Pyramid Network (AFPN), and a novel Focaler-IoU loss function to optimize traffic accident detection performance under complex and diverse conditions. The MLLA module introduces efficient linear attention to improve contextual representation, while the AFPN adopts an asymptotic feature fusion strategy to enhance the expressiveness of the detection head. The Focaler-IoU further refines bounding box regression for improved localization accuracy. To evaluate the proposed model, a custom dataset of traffic accident images was constructed. Experimental results demonstrate that the enhanced model achieves precision, recall, mAP50, and mAP50–95 scores of 96.5%, 82.9%, 90.0%, and 66.0%, respectively, surpassing the baseline YOLO11n by 6.5%, 6.0%, 6.3%, and 6.3% on these metrics. These findings demonstrate the effectiveness of the proposed enhancements and suggest the model’s potential for robust and accurate traffic accident detection within real-world conditions. Full article

(This article belongs to the Special Issue The Application of Deep Learning in Intelligent Vehicle Perception Systems)

► Show Figures

Figure 1

9 pages, 477 KiB

Open AccessOpinion

Underlying Piezo2 Channelopathy-Induced Neural Switch of COVID-19 Infection

by Balázs Sonkodi

Cells 2025, 14(15), 1182; https://doi.org/10.3390/cells14151182 - 31 Jul 2025

Viewed by 190

Abstract

The focal “hot spot” neuropathologies in COVID-19 infection are revealing footprints of a hidden underlying collapse of a novel ultrafast ultradian Piezo2 signaling system within the nervous system. Paradoxically, the same initiating pathophysiology may underpin the systemic findings in COVID-19 infection, namely the [...] Read more.

The focal “hot spot” neuropathologies in COVID-19 infection are revealing footprints of a hidden underlying collapse of a novel ultrafast ultradian Piezo2 signaling system within the nervous system. Paradoxically, the same initiating pathophysiology may underpin the systemic findings in COVID-19 infection, namely the multiorgan SARS-CoV-2 infection-induced vascular pathologies and brain–body-wide systemic pro-inflammatory signaling, depending on the concentration and exposure to infecting SARS-CoV-2 viruses. This common initiating microdamage is suggested to be the primary damage or the acquired channelopathy of the Piezo2 ion channel, leading to a principal gateway to pathophysiology. This Piezo2 channelopathy-induced neural switch could not only explain the initiation of disrupted cell–cell interactions, metabolic failure, microglial dysfunction, mitochondrial injury, glutamatergic synapse loss, inflammation and neurological states with the central involvement of the hippocampus and the medulla, but also the initiating pathophysiology without SARS-CoV-2 viral intracellular entry into neurons as well. Therefore, the impairment of the proposed Piezo2-induced quantum mechanical free-energy-stimulated ultrafast proton-coupled tunneling seems to be the principal and critical underlying COVID-19 infection-induced primary damage along the brain axes, depending on the loci of SARS-CoV-2 viral infection and intracellular entry. Moreover, this initiating Piezo2 channelopathy may also explain resultant autonomic dysregulation involving the medulla, hippocampus and heart rate regulation, not to mention sleep disturbance with altered rapid eye movement sleep and cognitive deficit in the short term, and even as a consequence of long COVID. The current opinion piece aims to promote future angles of science and research in order to further elucidate the not entirely known initiating pathophysiology of SARS-CoV-2 infection. Full article

(This article belongs to the Special Issue Insights into the Pathophysiology of NeuroCOVID: Current Topics)

► Show Figures

Figure 1

19 pages, 3130 KiB

Open AccessArticle

Deep Learning-Based Instance Segmentation of Galloping High-Speed Railway Overhead Contact System Conductors in Video Images

by Xiaotong Yao, Huayu Yuan, Shanpeng Zhao, Wei Tian, Dongzhao Han, Xiaoping Li, Feng Wang and Sihua Wang

Sensors 2025, 25(15), 4714; https://doi.org/10.3390/s25154714 - 30 Jul 2025

Viewed by 234

Abstract

The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping [...] Read more.

The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping status of conductors is crucial, and instance segmentation techniques, by delineating the pixel-level contours of each conductor, can significantly aid in the identification and study of galloping phenomena. This work expands upon the YOLO11-seg model and introduces an instance segmentation approach for galloping video and image sensor data of OCS conductors. The algorithm, designed for the stripe-like distribution of OCS conductors in the data, employs four-direction Sobel filters to extract edge features in horizontal, vertical, and diagonal orientations. These features are subsequently integrated with the original convolutional branch to form the FDSE (Four Direction Sobel Enhancement) module. It integrates the ECA (Efficient Channel Attention) mechanism for the adaptive augmentation of conductor characteristics and utilizes the FL (Focal Loss) function to mitigate the class-imbalance issue between positive and negative samples, hence enhancing the model’s sensitivity to conductors. Consequently, segmentation outcomes from neighboring frames are utilized, and mask-difference analysis is performed to autonomously detect conductor galloping locations, emphasizing their contours for the clear depiction of galloping characteristics. Experimental results demonstrate that the enhanced YOLO11-seg model achieves 85.38% precision, 77.30% recall, 84.25% AP@0.5, 81.14% F1-score, and a real-time processing speed of 44.78 FPS. When combined with the galloping visualization module, it can issue real-time alerts of conductor galloping anomalies, providing robust technical support for railway OCS safety monitoring. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

18 pages, 1498 KiB

Open AccessArticle

A Proactive Predictive Model for Machine Failure Forecasting

by Olusola O. Ajayi, Anish M. Kurien, Karim Djouani and Lamine Dieng

Machines 2025, 13(8), 663; https://doi.org/10.3390/machines13080663 - 29 Jul 2025

Viewed by 392

Abstract

Unexpected machine failures in industrial environments lead to high maintenance costs, unplanned downtime, and safety risks. This study proposes a proactive predictive model using a hybrid of eXtreme Gradient Boosting (XGBoost) and Neural Networks (NN) to forecast machine failures. A synthetic dataset capturing [...] Read more.

Unexpected machine failures in industrial environments lead to high maintenance costs, unplanned downtime, and safety risks. This study proposes a proactive predictive model using a hybrid of eXtreme Gradient Boosting (XGBoost) and Neural Networks (NN) to forecast machine failures. A synthetic dataset capturing recent breakdown history and time since last failure was used to simulate industrial scenarios. To address class imbalance, SMOTE and class weighting were applied, alongside a focal loss function to emphasize difficult-to-classify failures. The XGBoost model was tuned via GridSearchCV, while the NN model utilized ReLU-activated hidden layers with dropout. Evaluation using stratified 5-fold cross-validation showed that the NN achieved an F1-score of 0.7199 and a recall of 0.9545 for the minority class. XGBoost attained a higher PR AUC of 0.7126 and a more balanced precision–recall trade-off. Sample predictions demonstrated strong recall (100%) for failures, but also a high false positive rate, with most prediction probabilities clustered between 0.50–0.55. Additional benchmarking against Logistic Regression, Random Forest, and SVM further confirmed the superiority of the proposed hybrid model. Model interpretability was enhanced using SHAP and LIME, confirming that recent breakdowns and time since last failure were key predictors. While the model effectively detects failures, further improvements in feature engineering and threshold tuning are recommended to reduce false alarms and boost decision confidence. Full article

(This article belongs to the Section Machines Testing and Maintenance)

► Show Figures

Figure 1

25 pages, 4296 KiB

Open AccessArticle

StripSurface-YOLO: An Enhanced Yolov8n-Based Framework for Detecting Surface Defects on Strip Steel in Industrial Environments

by Haomin Li, Huanzun Zhang and Wenke Zang

Electronics 2025, 14(15), 2994; https://doi.org/10.3390/electronics14152994 - 27 Jul 2025

Viewed by 395

Abstract

Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in [...] Read more.

Recent advances in precision manufacturing and high-end equipment technologies have imposed ever more stringent requirements on the accuracy, real-time performance, and lightweight design of online steel strip surface defect detection systems. To reconcile the persistent trade-off between detection precision and inference efficiency in complex industrial environments, this study proposes StripSurface–YOLO, a novel real-time defect detection framework built upon YOLOv8n. The core architecture integrates an Efficient Cross-Stage Local Perception module (ResGSCSP), which synergistically combines GSConv lightweight convolutions with a one-shot aggregation strategy, thereby markedly reducing both model parameters and computational complexity. To further enhance multi-scale feature representation, this study introduces an Efficient Multi-Scale Attention (EMA) mechanism at the feature-fusion stage, enabling the network to more effectively attend to critical defect regions. Moreover, conventional nearest-neighbor upsampling is replaced by DySample, which produces deeper, high-resolution feature maps enriched with semantic content, improving both inference speed and fusion quality. To heighten sensitivity to small-scale and low-contrast defects, the model adopts Focal Loss, dynamically adjusting to sample difficulty. Extensive evaluations on the NEU-DET dataset demonstrate that StripSurface–YOLO reduces FLOPs by 11.6% and parameter count by 7.4% relative to the baseline YOLOv8n, while achieving respective improvements of 1.4%, 3.1%, 4.1%, and 3.0% in precision, recall, mAP₅₀, and mAP_50:95. Under adverse conditions—including contrast variations, brightness fluctuations, and Gaussian noise—SteelSurface-YOLO outperforms the baseline model, delivering improvements of 5.0% in mAP₅₀ and 4.7% in mAP_50:95, attesting to the model’s robust interference resistance. These findings underscore the potential of StripSurface–YOLO to meet the rigorous performance demands of real-time surface defect detection in the metal forging industry. Full article

► Show Figures

Figure 1

23 pages, 4467 KiB

Open AccessArticle

Research on Indoor Object Detection and Scene Recognition Algorithm Based on Apriori Algorithm and Mobile-EFSSD Model

by Wenda Zheng, Yibo Ai and Weidong Zhang

Mathematics 2025, 13(15), 2408; https://doi.org/10.3390/math13152408 - 26 Jul 2025

Viewed by 232

Abstract

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this [...] Read more.

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this paper proposes an indoor object detection and scene recognition algorithm based on the Apriori algorithm and the Mobile-EFSSD model, which can simultaneously obtain object category and location information while recognizing scenes. The specific research contents are as follows: (1) To address complex indoor scenes and occlusion, this paper proposes an improved Mobile-EFSSD object detection algorithm. An optimized MobileNetV3 with ECA attention is used as the backbone. Multi-scale feature maps are fused via FPN. The localization loss includes a hyperparameter, and focal loss replaces confidence loss. Experiments show that the method achieves stable performance, effectively detects occluded objects, and accurately extracts category and location information. (2) To improve classification stability in indoor scene recognition, this paper proposes a naive Bayes-based method. Object detection results are converted into text features, and the Apriori algorithm extracts object associations. Prior probabilities are calculated and fed into a naive Bayes classifier for scene recognition. Evaluated using the ADE20K dataset, the method outperforms existing approaches by achieving a better accuracy–speed trade-off and enhanced classification stability. The proposed algorithm is applied to indoor scene images, enabling the simultaneous acquisition of object categories and location information while recognizing scenes. Moreover, the algorithm has a simple structure, with an object detection average precision of 82.7% and a scene recognition average accuracy of 95.23%, making it suitable for practical detection requirements. Full article

► Show Figures

Figure 1

25 pages, 27219 KiB

Open AccessArticle

KCUNET: Multi-Focus Image Fusion via the Parallel Integration of KAN and Convolutional Layers

by Jing Fang, Ruxian Wang, Xinglin Ning, Ruiqing Wang, Shuyun Teng, Xuran Liu, Zhipeng Zhang, Wenfeng Lu, Shaohai Hu and Jingjing Wang

Entropy 2025, 27(8), 785; https://doi.org/10.3390/e27080785 - 24 Jul 2025

Viewed by 179

Abstract

Multi-focus image fusion (MFIF) is an image-processing method that aims to generate fully focused images by integrating source images from different focal planes. However, the defocus spread effect (DSE) often leads to blurred or jagged focus/defocus boundaries in fused images, which affects the [...] Read more.

Multi-focus image fusion (MFIF) is an image-processing method that aims to generate fully focused images by integrating source images from different focal planes. However, the defocus spread effect (DSE) often leads to blurred or jagged focus/defocus boundaries in fused images, which affects the quality of the image. To address this issue, this paper proposes a novel model that embeds the Kolmogorov–Arnold network with convolutional layers in parallel within the U-Net architecture (KCUNet). This model keeps the spatial dimensions of the feature map constant to maintain high-resolution details while progressively increasing the number of channels to capture multi-level features at the encoding stage. In addition, KCUNet incorporates a content-guided attention mechanism to enhance edge information processing, which is crucial for DSE reduction and edge preservation. The model’s performance is optimized through a hybrid loss function that evaluates in several aspects, including edge alignment, mask prediction, and image quality. Finally, comparative evaluations against 15 state-of-the-art methods demonstrate KCUNet’s superior performance in both qualitative and quantitative analyses. Full article

(This article belongs to the Section Signal and Data Analysis)

► Show Figures

Figure 1

18 pages, 4203 KiB

Open AccessArticle

SRW-YOLO: A Detection Model for Environmental Risk Factors During the Grid Construction Phase

by Yu Zhao, Fei Liu, Qiang He, Fang Liu, Xiaohu Sun and Jiyong Zhang

Remote Sens. 2025, 17(15), 2576; https://doi.org/10.3390/rs17152576 - 24 Jul 2025

Viewed by 283

Abstract

With the rapid advancement of UAV-based remote sensing and image recognition techniques, identifying environmental risk factors from aerial imagery has emerged as a focal point in intelligent inspection during the power transmission and distribution projects construction phase. The uneven spatial distribution of risk [...] Read more.

With the rapid advancement of UAV-based remote sensing and image recognition techniques, identifying environmental risk factors from aerial imagery has emerged as a focal point in intelligent inspection during the power transmission and distribution projects construction phase. The uneven spatial distribution of risk factors on construction sites, their weak texture signatures, and the inherently multi-scale nature of UAV imagery pose significant detection challenges. To address these issues, we propose a one-stage SRW-YOLO algorithm built upon the YOLOv11 framework. First, a P2-scale shallow feature detection layer is added to capture high-resolution fine details of small targets. Second, we integrate a reparameterized convolution based on channel shuffle (RCS) of a one-shot aggregation (RCS-OSA) module into the backbone and neck’s shallow layers, enhancing feature extraction while significantly reducing inference latency. Finally, a dynamic non-monotonic focusing mechanism WIoU v3 loss function is employed to reweigh low-quality annotations, thereby improving small-object localization accuracy. Experimental results demonstrate that SRW-YOLO achieves an overall precision of 80.6% and mAP of 79.1% on the State Grid dataset, and exhibits similarly superior performance on the VisDrone2019 dataset. Compared with other one-stage detectors, SRW-YOLO delivers markedly higher detection accuracy, offering critical technical support for multi-scale, heterogeneous environmental risk monitoring during the power transmission and distribution projects construction phase, and establishes the theoretical foundation for rapid and accurate inspection using UAV-based intelligent imaging. Full article

► Show Figures

Graphical abstract

23 pages, 13739 KiB

Open AccessArticle

Traffic Accident Rescue Action Recognition Method Based on Real-Time UAV Video

by Bo Yang, Jianan Lu, Tao Liu, Bixing Zhang, Chen Geng, Yan Tian and Siyu Zhang

Drones 2025, 9(8), 519; https://doi.org/10.3390/drones9080519 - 24 Jul 2025

Viewed by 427

Abstract

Low-altitude drones, which are unimpeded by traffic congestion or urban terrain, have become a critical asset in emergency rescue missions. To address the current lack of emergency rescue data, UAV aerial videos were collected to create an experimental dataset for action classification and [...] Read more.

Low-altitude drones, which are unimpeded by traffic congestion or urban terrain, have become a critical asset in emergency rescue missions. To address the current lack of emergency rescue data, UAV aerial videos were collected to create an experimental dataset for action classification and localization annotation. A total of 5082 keyframes were labeled with 1–5 targets each, and 14,412 instances of data were prepared (including flight altitude and camera angles) for action classification and position annotation. To mitigate the challenges posed by high-resolution drone footage with excessive redundant information, we propose the SlowFast-Traffic (SF-T) framework, a spatio-temporal sequence-based algorithm for recognizing traffic accident rescue actions. For more efficient extraction of target–background correlation features, we introduce the Actor-Centric Relation Network (ACRN) module, which employs temporal max pooling to enhance the time-dimensional features of static backgrounds, significantly reducing redundancy-induced interference. Additionally, smaller ROI feature map outputs are adopted to boost computational speed. To tackle class imbalance in incident samples, we integrate a Class-Balanced Focal Loss (CB-Focal Loss) function, effectively resolving rare-action recognition in specific rescue scenarios. We replace the original Faster R-CNN with YOLOX-s to improve the target detection rate. On our proposed dataset, the SF-T model achieves a mean average precision (mAP) of 83.9%, which is 8.5% higher than that of the standard SlowFast architecture while maintaining a processing speed of 34.9 tasks/s. Both accuracy-related metrics and computational efficiency are substantially improved. The proposed method demonstrates strong robustness and real-time analysis capabilities for modern traffic rescue action recognition. Full article

(This article belongs to the Special Issue Cooperative Perception for Modern Transportation)

► Show Figures

Figure 1

24 pages, 9379 KiB

Open AccessArticle

Performance Evaluation of YOLOv11 and YOLOv12 Deep Learning Architectures for Automated Detection and Classification of Immature Macauba (Acrocomia aculeata) Fruits

by David Ribeiro, Dennis Tavares, Eduardo Tiradentes, Fabio Santos and Demostenes Rodriguez

Agriculture 2025, 15(15), 1571; https://doi.org/10.3390/agriculture15151571 - 22 Jul 2025

Viewed by 573

Abstract

The automated detection and classification of immature macauba (Acrocomia aculeata) fruits is critical for improving post-harvest processing and quality control. In this study, we present a comparative evaluation of two state-of-the-art YOLO architectures, YOLOv11x and YOLOv12x, trained on the newly constructed [...] Read more.

The automated detection and classification of immature macauba (Acrocomia aculeata) fruits is critical for improving post-harvest processing and quality control. In this study, we present a comparative evaluation of two state-of-the-art YOLO architectures, YOLOv11x and YOLOv12x, trained on the newly constructed VIC01 dataset comprising 1600 annotated images captured under both background-free and natural background conditions. Both models were implemented in PyTorch and trained until the convergence of box regression, classification, and distribution-focal losses. Under an IoU (intersection over union) threshold of 0.50, YOLOv11x and YOLOv12x achieved an identical mean average precision (mAP₅₀) of 0.995 with perfect precision and recall or TPR (true positive rate). Averaged over IoU thresholds from 0.50 to 0.95, YOLOv11x demonstrated superior spatial localization performance (mAP_50–95 = 0.973), while YOLOv12x exhibited robust performance in complex background scenarios, achieving a competitive mAP_50–95. Inference throughput averaged 3.9 ms per image for YOLOv11x and 6.7 ms for YOLOv12x, highlighting a trade-off between speed and architectural complexity. Fused model representations revealed optimized layer fusion and reduced computational overhead (GFLOPs), facilitating efficient deployment. Confusion-matrix analyses confirmed YOLOv11x’s ability to reject background clutter more effectively than YOLOv12x, whereas precision–recall and F1-score curves indicated both models maintain near-perfect detection balance across thresholds. The public release of the VIC01 dataset and trained weights ensures reproducibility and supports future research. Our results underscore the importance of selecting architectures based on application-specific requirements, balancing detection accuracy, background discrimination, and computational constraints. Future work will extend this framework to additional maturation stages, sensor fusion modalities, and lightweight edge-deployment variants. By facilitating precise immature fruit identification, this work contributes to sustainable production and value addition in macauba processing. Full article

(This article belongs to the Section Agricultural Technology)

► Show Figures

Figure 1

23 pages, 2543 KiB

Open AccessArticle

Beyond Standard Losses: Redefining Text-to-SQL with Task-Specific Optimization

by Iker Azurmendi, Ekaitz Zulueta, Gustavo García, Nekane Uriarte-Arrazola and Jose Manuel Lopez-Guede

Mathematics 2025, 13(14), 2315; https://doi.org/10.3390/math13142315 - 20 Jul 2025

Viewed by 515

Abstract

In recent years, large language models (LLMs) have shown an impressive ability in translating text to SQL queries. However, in real-world applications, standard loss functions frequently fail to capture the complexity of queries adequately. Therefore, in this study, a dynamic loss function is [...] Read more.

In recent years, large language models (LLMs) have shown an impressive ability in translating text to SQL queries. However, in real-world applications, standard loss functions frequently fail to capture the complexity of queries adequately. Therefore, in this study, a dynamic loss function is proposed, which assigns different weights to specific groups of tokens, such as SQL keywords or table names. The objective is to guide the model during training to facilitate the mastery of more fundamental concepts within the SQL. Our custom loss function is composed of four components: cross-entropy with sequence matching loss, focal loss, F-beta loss, and contrastive sequence loss. During the training process, the weights of each component of the loss function are dynamically adjusted to prioritize different aspects of query generation at the appropriate stage. This approach avoids computationally expensive approaches such as SQL validation or detokenization, which improves the efficiency of the learning process compared to alternative methods. We empirically tested this method on several open source LLMs with less than 2 billion parameters, using a customized real vehicle diagnostic dataset. The findings demonstrate that the employment of our dynamic loss function can enhance SQL execution accuracy by up to 20% in comparison with standard cross-entropy loss. It has been demonstrated that customized loss functions for specific tasks can improve the efficiency of LLMs without extending the model or acquiring additional labelled data. The proposed technique is also scalable and adaptable to new domains or more complex weighting schemes, highlighting the importance of custom design of loss functions in real world applications. Full article

► Show Figures

Figure 1

24 pages, 7474 KiB

Open AccessArticle

YOLO11m-SCFPose: An Improved Detection Framework for Keypoint Extraction in Cucumber Fruit Phenotyping

by Huijiao Yu, Xuehui Zhang, Jun Yan and Xianyong Meng

Horticulturae 2025, 11(7), 858; https://doi.org/10.3390/horticulturae11070858 - 20 Jul 2025

Viewed by 284

Abstract

To address the issues of low efficiency and large errors in traditional manual cucumber fruit phenotyping methods, this paper proposes the application of keypoint detection technology for cucumber phenotyping and designs an improved lightweight model called YOLO11m-SCFPose. Based on YOLO11m-pose, the original backbone [...] Read more.

To address the issues of low efficiency and large errors in traditional manual cucumber fruit phenotyping methods, this paper proposes the application of keypoint detection technology for cucumber phenotyping and designs an improved lightweight model called YOLO11m-SCFPose. Based on YOLO11m-pose, the original backbone network is replaced with the lightweight StarNet-S1 backbone, reducing model complexity. Additionally, an improved C3K2_PartialConv neck module is used to enhance information interaction and fusion among multi-scale features while maintaining computational efficiency. The Focaler-IoU loss function is employed to improve keypoint localization accuracy. Results show that the improved model achieves an mAP50-95 of 0.924, with a floating-point operation count (GFLOPs) of 32.1, and reduces the model size to 1.229 × 10⁷ parameters. This model demonstrates better computational efficiency and lower resource consumption, providing an effective lightweight solution for crop phenotypic analysis. Full article

(This article belongs to the Section Vegetable Production Systems)

► Show Figures

Figure 1

Search Results (752)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (752)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI