MDPI - Publisher of Open Access Journals

21 pages, 21215 KiB

Open AccessArticle

ES-Net Empowers Forest Disturbance Monitoring: Edge–Semantic Collaborative Network for Canopy Gap Mapping

by Yutong Wang, Zhang Zhang, Jisheng Xia, Fei Zhao and Pinliang Dong

Remote Sens. 2025, 17(14), 2427; https://doi.org/10.3390/rs17142427 - 12 Jul 2025

Viewed by 392

Canopy gaps are vital microhabitats for forest carbon cycling and species regeneration, whose accurate extraction is crucial for ecological modeling and smart forestry. However, traditional monitoring methods have notable limitations: ground-based measurements are inefficient; remote-sensing interpretation is susceptible to terrain and spectral interference; [...] Read more.

Canopy gaps are vital microhabitats for forest carbon cycling and species regeneration, whose accurate extraction is crucial for ecological modeling and smart forestry. However, traditional monitoring methods have notable limitations: ground-based measurements are inefficient; remote-sensing interpretation is susceptible to terrain and spectral interference; and traditional algorithms exhibit an insufficient feature representation capability. Aiming at overcoming the bottleneck issues of canopy gap identification in mountainous forest regions, we constructed a multi-task deep learning model (ES-Net) integrating an edge–semantic collaborative perception mechanism. First, a refined sample library containing multi-scale interference features was constructed, which included 2808 annotated UAV images. Based on this, a dual-branch feature interaction architecture was designed. A cross-layer attention mechanism was embedded in the semantic segmentation module (SSM) to enhance the discriminative ability for heterogeneous features. Meanwhile, an edge detection module (EDM) was built to strengthen geometric constraints. Results from selected areas in Yunnan Province (China) demonstrate that ES-Net outperforms U-Net, boosting the Intersection over Union (IoU) by 0.86% (95.41% vs. 94.55%), improving the edge coverage rate by 3.14% (85.32% vs. 82.18%), and reducing the Hausdorff Distance by 38.6% (28.26 pixels vs. 46.02 pixels). Ablation studies further verify that the synergy between SSM and EDM yields a 13.0% IoU gain over the baseline, highlighting the effectiveness of joint semantic–edge optimization. This study provides a terrain-adaptive intelligent interpretation method for forest disturbance monitoring and holds significant practical value for advancing smart forestry construction and ecosystem sustainable management. Full article

► Show Figures

Graphical abstract

18 pages, 8193 KiB

Open AccessArticle

An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models

by Ulugbek Hudayberdiev and Junyeong Lee

Sustainability 2025, 17(12), 5420; https://doi.org/10.3390/su17125420 - 12 Jun 2025

Viewed by 508

Abstract

Tourist destination classification is pivotal for enhancing the travel experience, supporting cultural heritage preservation, and enabling smart tourism services. With recent advancements in artificial intelligence, deep learning-based systems have significantly improved the accuracy and efficiency of landmark recognition. To address the limitations of [...] Read more.

Tourist destination classification is pivotal for enhancing the travel experience, supporting cultural heritage preservation, and enabling smart tourism services. With recent advancements in artificial intelligence, deep learning-based systems have significantly improved the accuracy and efficiency of landmark recognition. To address the limitations of existing datasets, we developed the Samarkand dataset, containing diverse images of historical landmarks captured under varying environmental conditions. Additionally, we created enhanced image variants by squaring pixel values greater than 225 to emphasize high-intensity architectural features, improving the model’s ability to recognize subtle visual patterns. Using these datasets, we trained two parallel YOLO11 models on original and enhanced images, respectively. Each model was independently trained and validated, preserving only the best-performing epoch for final inference. We then ensembled the models by averaging the model outputs from the best checkpoints to leverage their complementary strengths. Our proposed approach outperforms conventional single-model baselines, achieving an accuracy of 99.07%, precision of 99.15%, recall of 99.21%, and F1-score of 99.14%, particularly excelling in challenging scenarios involving poor lighting or occlusions. The model’s robustness and high performance underscore its practical value for smart tourism systems. Future work will explore broader geographic datasets and real-time deployment on mobile platforms. Full article

(This article belongs to the Special Issue Digital Technology and Conservation Science for Sustainable Preservation of Cultural Heritage: Interdisciplinary Studies, Challenges, and Perspectives)

► Show Figures

Figure 1

25 pages, 11680 KiB

Open AccessArticle

ETAFHrNet: A Transformer-Based Multi-Scale Network for Asymmetric Pavement Crack Segmentation

by Chao Tan, Jiaqi Liu, Zhedong Zhao, Rufei Liu, Peng Tan, Aishu Yao, Shoudao Pan and Jingyi Dong

Appl. Sci. 2025, 15(11), 6183; https://doi.org/10.3390/app15116183 - 30 May 2025

Viewed by 640

Abstract

Accurate segmentation of pavement cracks from high-resolution remote sensing imagery plays a crucial role in automated road condition assessment and infrastructure maintenance. However, crack structures often exhibit asymmetry, irregular morphology, and multi-scale variations, posing significant challenges to conventional CNN-based methods in real-world environments. [...] Read more.

Accurate segmentation of pavement cracks from high-resolution remote sensing imagery plays a crucial role in automated road condition assessment and infrastructure maintenance. However, crack structures often exhibit asymmetry, irregular morphology, and multi-scale variations, posing significant challenges to conventional CNN-based methods in real-world environments. Specifically, the proposed ETAFHrNet focuses on two predominant pavement-distress morphologies—linear cracks (transverse and longitudinal) and alligator cracks—and has been empirically validated on their intersections and branching patterns over both asphalt and concrete road surfaces. In this work, we present ETAFHrNet, a novel attention-guided segmentation network designed to address the limitations of traditional architectures in detecting fine-grained and asymmetric patterns. ETAFHrNet integrates Transformer-based global attention and multi-scale hybrid feature fusion, enhancing both contextual perception and detail sensitivity. The network introduces two key modules: the Efficient Hybrid Attention Transformer (EHAT), which captures long-range dependencies, and the Cross-Scale Hybrid Attention Module (CSHAM), which adaptively fuses features across spatial resolutions. To support model training and benchmarking, we also propose QD-Crack, a high-resolution, pixel-level annotated dataset collected from real-world road inspection scenarios. Experimental results show that ETAFHrNet significantly outperforms existing methods—including U-Net, DeepLabv3+, and HRNet—in both segmentation accuracy and generalization ability. These findings demonstrate the effectiveness of interpretable, multi-scale attention architectures in complex object detection and image classification tasks, making our approach relevant for broader applications, such as autonomous driving, remote sensing, and smart infrastructure systems. Full article

(This article belongs to the Special Issue Object Detection and Image Classification)

► Show Figures

Figure 1

23 pages, 5897 KiB

Open AccessArticle

A Vision-Based Procedure with Subpixel Resolution for Motion Estimation

by Samira Azizi, Kaveh Karami and Stefano Mariani

Sensors 2025, 25(10), 3101; https://doi.org/10.3390/s25103101 - 14 May 2025

Cited by 2 | Viewed by 515

Abstract

Vision-based motion estimation for structural systems has attracted significant interest in recent years. As the design of robust algorithms to accurately estimate motion still represents a challenge, a multi-step framework is proposed to deal with both large and small motion amplitudes. The solution [...] Read more.

Vision-based motion estimation for structural systems has attracted significant interest in recent years. As the design of robust algorithms to accurately estimate motion still represents a challenge, a multi-step framework is proposed to deal with both large and small motion amplitudes. The solution combines a stochastic search method for coarse-level measurements with a deterministic method for fine-level measurements. A population-based block matching approach, featuring adaptive search limit selection for robust estimation and a subsampled block strategy, is implemented to reduce the computational burden of integer pixel motion estimation. A Reduced-Error Gradient-based method is next adopted to achieve subpixel resolution accuracy. This hybrid Smart Block Matching with Reduced-Error Gradient (SBM-REG) approach therefore provides a powerful solution for motion estimation. By employing Complexity Pursuit, a blind source separation method for output-only modal analysis, structural mode shapes and vibration frequencies are finally extracted from video data. The method’s efficiency and accuracy are assessed here against synthetic shifted patterns, a cantilever beam, and six-story laboratory tests. Full article

(This article belongs to the Special Issue Selected Papers from the 10th International Electronic Conference on Sensors and Applications)

► Show Figures

Figure 1

23 pages, 3358 KiB

Open AccessArticle

A Software-Defined Sensor System Using Semantic Segmentation for Monitoring Remaining Intravenous Fluids

by Hasik Sunwoo, Seungwoo Lee and Woojin Paik

Sensors 2025, 25(10), 3082; https://doi.org/10.3390/s25103082 - 13 May 2025

Cited by 1 | Viewed by 520

Abstract

Accurate intravenous (IV) fluid monitoring is critical in healthcare to prevent infusion errors and ensure patient safety. Traditional monitoring methods often depend on dedicated hardware, such as weight sensors or optical systems, which can be costly, complex, and challenging to scale across diverse [...] Read more.

Accurate intravenous (IV) fluid monitoring is critical in healthcare to prevent infusion errors and ensure patient safety. Traditional monitoring methods often depend on dedicated hardware, such as weight sensors or optical systems, which can be costly, complex, and challenging to scale across diverse clinical settings. This study introduces a software-defined sensing approach that leverages semantic segmentation using the pyramid scene parsing network (PSPNet) to estimate the remaining IV fluid volumes directly from images captured by standard smartphones. The system identifies the IV container (vessel) and its fluid content (liquid) using pixel-level segmentation and estimates the remaining fluid volume without requiring physical sensors. Trained on a custom IV-specific image dataset, the proposed model achieved high accuracy with mean intersection over union (mIoU) scores of 0.94 for the vessel and 0.92 for the fluid regions. Comparative analysis with the segment anything model (SAM) demonstrated that the PSPNet-based system significantly outperformed the SAM, particularly in segmenting transparent fluids without requiring manual threshold tuning. This approach provides a scalable, cost-effective alternative to hardware-dependent monitoring systems and opens the door to AI-powered fluid sensing in smart healthcare environments. Preliminary benchmarking demonstrated that the system achieves near-real-time inference on mobile devices such as the iPhone 12, confirming its suitability for bedside and point-of-care use. Full article

(This article belongs to the Special Issue Advanced Sensing and Signal Processing Technologies for Medical Applications)

► Show Figures

Figure 1

14 pages, 2896 KiB

Open AccessArticle

Optical Design of a Smart-Pixel-Based Optical Convolutional Neural Network

by Young-Gu Ju

Optics 2025, 6(2), 19; https://doi.org/10.3390/opt6020019 - 13 May 2025

Viewed by 438

Abstract

We designed lens systems for a smart-pixel-based optical convolutional neural network (SPOCNN) using optical software to analyze image spread and estimate alignment tolerance for various kernel sizes. The design, based on a three-element lens, was reoptimized to minimize spot size while meeting system [...] Read more.

We designed lens systems for a smart-pixel-based optical convolutional neural network (SPOCNN) using optical software to analyze image spread and estimate alignment tolerance for various kernel sizes. The design, based on a three-element lens, was reoptimized to minimize spot size while meeting system constraints. Simulations included root mean square spot and encircled energy diagrams, showing that geometric aberration increases with the scale factor, while diffraction effect remains constant. Alignment tolerance was determined by combining geometric image size with image spread analysis. While the preliminary scaling analysis predicted a limit at a kernel array size of 66 × 66, simulations showed that a size of 61 × 61 maintains sufficient alignment tolerance, well above the critical threshold. The discrepancy is likely due to lower angular aberration in the simulated optical design. This study confirms that an array size of 61 × 61 is feasible for SPOCNN, validating the scaling analysis for predicting image spread trends caused by aberration and diffraction. Full article

► Show Figures

Figure 1

37 pages, 46669 KiB

Open AccessArticle

ViX-MangoEFormer: An Enhanced Vision Transformer–EfficientFormer and Stacking Ensemble Approach for Mango Leaf Disease Recognition with Explainable Artificial Intelligence

by Abdullah Al Noman, Amira Hossain, Anamul Sakib, Jesika Debnath, Hasib Fardin, Abdullah Al Sakib, Rezaul Haque, Md. Redwan Ahmed, Ahmed Wasif Reza and M. Ali Akber Dewan

Computers 2025, 14(5), 171; https://doi.org/10.3390/computers14050171 - 2 May 2025

Viewed by 1743

Abstract

Mango productivity suffers greatly from leaf diseases, leading to economic and food security issues. Current visual inspection methods are slow and subjective. Previous Deep-Learning (DL) solutions have shown promise but suffer from imbalanced datasets, modest generalization, and limited interpretability. To address these challenges, [...] Read more.

Mango productivity suffers greatly from leaf diseases, leading to economic and food security issues. Current visual inspection methods are slow and subjective. Previous Deep-Learning (DL) solutions have shown promise but suffer from imbalanced datasets, modest generalization, and limited interpretability. To address these challenges, this study introduces the ViX-MangoEFormer, which combines convolutional kernels and self-attention to effectively diagnose multiple mango leaf conditions in both balanced and imbalanced image sets. To benchmark against ViX-MangoEFormer, we developed a stacking ensemble model (MangoNet-Stack) that utilizes five transfer learning networks as base learners. All models were trained with Grad-CAM produced pixel-level explanations. In a combined dataset of 25,530 images, ViX-MangoEFormer achieved an F1 score of 99.78% and a Matthews Correlation Coefficient (MCC) of 99.34%. This performance consistently outperformed individual pre-trained models and MangoNet-Stack. Additionally, data augmentation has improved the performance of every architecture compared to its non-augmented version. Cross-domain tests on morphologically similar crop leaves confirmed strong generalization. Our findings validate the effectiveness of transformer attention and XAI in mango leaf disease detection. ViX-MangoEFormer is deployed as a web application that delivers real-time predictions, probability scores, and visual rationales. The system enables growers to respond quickly and enhances large-scale smart crop health monitoring. Full article

(This article belongs to the Special Issue Deep Learning and Explainable Artificial Intelligence)

► Show Figures

Figure 1

12 pages, 3079 KiB

Open AccessEssay

An Automated Image Segmentation, Annotation, and Training Framework of Plant Leaves by Joining the SAM and the YOLOv8 Models

by Lumiao Zhao, Kubwimana Olivier and Liping Chen

Agronomy 2025, 15(5), 1081; https://doi.org/10.3390/agronomy15051081 - 29 Apr 2025

Viewed by 869

Abstract

Recognizing plant leaves in complex agricultural scenes is challenging due to high manual annotation costs and real-time detection demands. Current deep learning methods, such as YOLOv8 and SAM, face trade-offs between annotation efficiency and inference speed. This paper proposes an automated framework integrating [...] Read more.

Recognizing plant leaves in complex agricultural scenes is challenging due to high manual annotation costs and real-time detection demands. Current deep learning methods, such as YOLOv8 and SAM, face trade-offs between annotation efficiency and inference speed. This paper proposes an automated framework integrating SAM for offline semantic segmentation and YOLOv8 for real-time detection. SAM generates pixel-level leaf masks, which are converted to YOLOv8-compatible bounding boxes, eliminating manual labeling. Experiments on three plant species show the framework achieves 87% detection accuracy and 0.03 s per image inference time, reducing annotation labor by 100% compared to traditional methods. The proposed pipeline balances high-quality annotation and lightweight detection, enabling scalable smart agriculture applications. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

15 pages, 5009 KiB

Open AccessArticle

Integrating Visual Cryptography for Efficient and Secure Image Sharing on Social Networks

by Lijing Ren and Denghui Zhang

Appl. Sci. 2025, 15(8), 4150; https://doi.org/10.3390/app15084150 - 9 Apr 2025

Viewed by 719

Abstract

The widespread use of smart devices, such as phones and live-streaming cameras, has ushered in an era where digital images can be captured and shared on social networks anytime and anywhere. Sharing images demands more bandwidth and stricter security than text. This prevalence [...] Read more.

The widespread use of smart devices, such as phones and live-streaming cameras, has ushered in an era where digital images can be captured and shared on social networks anytime and anywhere. Sharing images demands more bandwidth and stricter security than text. This prevalence poses challenges for secure image forwarding, as it is susceptible to privacy leaks when sharing data. While standard encryption algorithms can safeguard the privacy of textual data, image data entail larger volumes and significant redundancy. The limited computing power of smart devices complicates the encrypted transmission of images, creating substantial obstacles to implementing security policies on low-computing devices. To address privacy concerns regarding image sharing on social networks, we propose a lightweight data forwarding mechanism for resource-constrained environments. By integrating large-scale data forwarding with visual cryptography, we enhance data security and resource utilization while minimizing overhead. We introduce a downsampling-based non-expansive scheme to reduce pixel expansion and decrease encrypted image size without compromising decryption quality. Experimental results demonstrate that our method achieves a peak signal-to-noise ratio of up to 20.54 dB, and a structural similarity index of 0.72, outperforming existing methods such as random-grid. Our approach prevents size expansion while maintaining high decryption quality, addressing access control gaps, and enabling secure and efficient data exchange between interconnected systems. Full article

(This article belongs to the Special Issue Novel Insights into Cryptography and Network Security)

► Show Figures

Figure 1

35 pages, 9403 KiB

Open AccessArticle

An AI-Based Nested Large–Small Model for Passive Microwave Soil Moisture and Land Surface Temperature Retrieval Method

by Mengjie Liang, Kebiao Mao, Jiancheng Shi, Sayed M. Bateni and Fei Meng

Remote Sens. 2025, 17(7), 1198; https://doi.org/10.3390/rs17071198 - 27 Mar 2025

Cited by 1 | Viewed by 525

Abstract

Retrieving soil moisture (SM) and land surface temperature (LST) provides crucial environmental data for smart agriculture, enabling precise irrigation, crop health monitoring, and yield optimization. The rapid advancement of Artificial intelligence (AI) hardware offers new opportunities to overcome the limitations of traditional geophysical [...] Read more.

Retrieving soil moisture (SM) and land surface temperature (LST) provides crucial environmental data for smart agriculture, enabling precise irrigation, crop health monitoring, and yield optimization. The rapid advancement of Artificial intelligence (AI) hardware offers new opportunities to overcome the limitations of traditional geophysical parameter retrieval methods. We propose a nested large–small model method that uses AI techniques for the joint iterative retrieval of passive microwave SM and LST. This method retains the strengths of traditional physical and statistical methods while incorporating spatiotemporal factors influencing surface emissivity for multi-hierarchical classification. The method preserves the physical significance and interpretability of traditional methods while significantly improving the accuracy of passive microwave SM and LST retrieval. With the use of the terrestrial area of China as a case, multi-hierarchical classification was applied to verify the feasibility of the method. Experimental data show a significant improvement in retrieval accuracy after hierarchical classification. In ground-based validation, the ascending and descending orbit SM retrieval models 5 achieved MAEs of 0.026 m³/m³ and 0.030 m³/m³, respectively, improving by 0.015 m³/m³ and 0.012 m³/m³ over the large model, and 0.032 m³/m³ and 0.028 m³/m³ over AMSR2 SM products. The ascending and descending orbit LST retrieval models 5 achieved MAEs of 1.67 K and 1.72 K, respectively, with improvements of 0.67 K and 0.49 K over the large model, and 0.57 K and 0.56 K over the MODIS LST products. The retrieval model can theoretically be enhanced to the pixel level, potentially maximizing retrieval accuracy, which provides a theoretical and technical basis for the parameter retrieval of AI passive microwave large models. Full article

► Show Figures

Figure 1

23 pages, 5670 KiB

Open AccessArticle

A Conceptual Study of Rapidly Reconfigurable and Scalable Optical Convolutional Neural Networks Based on Free-Space Optics Using a Smart Pixel Light Modulator

by Young-Gu Ju

Computers 2025, 14(3), 111; https://doi.org/10.3390/computers14030111 - 20 Mar 2025

Cited by 1 | Viewed by 399

Abstract

The smart-pixel-based optical convolutional neural network was proposed to improve kernel refresh rates in scalable optical convolutional neural networks (CNNs) by replacing the spatial light modulator with a smart pixel light modulator while preserving benefits such as an unlimited input node size, cascadability, [...] Read more.

The smart-pixel-based optical convolutional neural network was proposed to improve kernel refresh rates in scalable optical convolutional neural networks (CNNs) by replacing the spatial light modulator with a smart pixel light modulator while preserving benefits such as an unlimited input node size, cascadability, and direct kernel representation. The smart pixel light modulator enhances weight update speed, enabling rapid reconfigurability. Its fast updating capability and memory expand the application scope of scalable optical CNNs, supporting operations like convolution with multiple kernel sets and difference mode. Simplifications using electrical fan-out reduce hardware complexity and costs. An evolution of this system, the smart-pixel-based bidirectional optical CNN, employs a bidirectional architecture and single lens-array optics, achieving a computational throughput of 8.3 × 10¹⁴ MAC/s with a smart pixel light modulator resolution of 3840 × 2160. Further advancements led to the two-mirror-like smart-pixel-based bidirectional optical CNN, which emulates 2n layers using only two physical layers, significantly reducing hardware requirements despite increased time delay. This architecture was demonstrated for solving partial differential equations by leveraging local interactions as a sequence of convolutions. These advancements position smart-pixel-based optical CNNs and their derivatives as promising solutions for future CNN applications. Full article

(This article belongs to the Special Issue Emerging Trends in Machine Learning and Artificial Intelligence)

► Show Figures

Graphical abstract

15 pages, 4072 KiB

Open AccessArticle

A Conceptual Study of Rapidly Reconfigurable and Scalable Bidirectional Optical Neural Networks Leveraging a Smart Pixel Light Modulator

by Young-Gu Ju

Photonics 2025, 12(2), 132; https://doi.org/10.3390/photonics12020132 - 2 Feb 2025

Cited by 2 | Viewed by 818

Abstract

We explore the integration of smart pixel light modulators (SPLMs) into bidirectional optical neural networks (BONNs), highlighting their advantages over traditional spatial light modulators (SLMs). SPLMs enhance BONN performance by enabling faster light modulation in both directions, significantly increasing the refresh rate of [...] Read more.

We explore the integration of smart pixel light modulators (SPLMs) into bidirectional optical neural networks (BONNs), highlighting their advantages over traditional spatial light modulators (SLMs). SPLMs enhance BONN performance by enabling faster light modulation in both directions, significantly increasing the refresh rate of neural network weights to hundreds of megahertz, thus facilitating the practical implementation of the backpropagation algorithm and two-mirror-like BONN structures. The architecture of an SPLM-based BONN (SPBONN) features bidirectional modulation, simplifying hardware with electrical fan-in and fan-out. An SPBONN with an array size of 96 × 96 can achieve high throughput, up to 4.3 × 10¹⁶ MAC/s with 10 layers. Energy assessments showed that the SPLM array, despite its higher power consumption compared to the SLM array, is manageable via effective heat dissipation. Smart pixels with programmable memory in the SPBONN provide a cost-effective solution for expanding network node size and overcoming scalability limitations without the need for additional hardware. Full article

(This article belongs to the Special Issue Advances in Free-Space Optical Communications)

► Show Figures

Figure 1

14 pages, 1825 KiB

Open AccessArticle

A Deep Learning-Based Method for Measuring Apparent Disease Areas of Sling Sheaths

by Jinsheng Du, Haibin Liu, Yaoyang Liu, Zhiqiang Xu, Sen Liu and Shunquan Lu

Buildings 2025, 15(3), 375; https://doi.org/10.3390/buildings15030375 - 25 Jan 2025

Viewed by 761

Abstract

The sling sheath plays an important protective role in the sling of suspension bridges, effectively preventing accidental damage to the sling caused by wind, fatigue and other impacts. To conduct a quantitative analysis of the apparent disease of suspension bridge slings, a method [...] Read more.

The sling sheath plays an important protective role in the sling of suspension bridges, effectively preventing accidental damage to the sling caused by wind, fatigue and other impacts. To conduct a quantitative analysis of the apparent disease of suspension bridge slings, a method for segmenting and quantifying the apparent disease of the sling sheath using deep learning and image processing was proposed. A total of 1408 disease images were obtained after image acquisition of a suspension bridge following sling replacement. MATLAB 2021a Image Labeler software was used to establish a disease dataset by manual labelling. Then, the MobileNetV2 model was trained and tested on the dataset to determine disease segmentation; additionally, an area measurement algorithm was proposed based on the images’ projection relationships. Finally, the measurement results were compared with the manually acquired crack area. The results show that the accuracy of background and sheath category pixels in the MobileNetV2 model is above 97%, indicating that the model achieves satisfactory results in these classifications. However, the accuracy of crack category pixels and the intersection over union ratio only reaches 80%, which needs to be improved by setting model correction coefficients. When measuring directly, it was found that the area measurement error of the test image mainly ranged between 8% and 30%, and the measurement error of the crack area after correction mainly ranged between −3% and 15%, indicating that the area measurement method can achieve a higher degree of measurement accuracy. The method for segmenting and quantifying the apparent disease of the sling sheath based on deep learning and image processing fills the research gap in the measurement of the surface damage area caused by apparent disease and has the advantages of high efficiency and high recognition accuracy. Reducing the maintenance costs of suspension bridge slings is crucial for promoting comprehensive intelligent detection of bridges and advancing the smart transformation of the civil engineering industry. Full article

(This article belongs to the Special Issue Research on the Structural Mechanics of Steel–Concrete Composite Structures)

► Show Figures

Figure 1

23 pages, 5215 KiB

Open AccessArticle

A Feature-Enhanced Small Object Detection Algorithm Based on Attention Mechanism

by Zhe Quan and Jun Sun

Sensors 2025, 25(2), 589; https://doi.org/10.3390/s25020589 - 20 Jan 2025

Viewed by 2499

Abstract

With the rapid development of AI algorithms and computational power, object recognition based on deep learning frameworks has become a major research direction in computer vision. UAVs equipped with object detection systems are increasingly used in fields like smart transportation, disaster warning, and [...] Read more.

With the rapid development of AI algorithms and computational power, object recognition based on deep learning frameworks has become a major research direction in computer vision. UAVs equipped with object detection systems are increasingly used in fields like smart transportation, disaster warning, and emergency rescue. However, due to factors such as the environment, lighting, altitude, and angle, UAV images face challenges like small object sizes, high object density, and significant background interference, making object detection tasks difficult. To address these issues, we use YOLOv8s as the basic framework and introduce a multi-level feature fusion algorithm. Additionally, we design an attention mechanism that links distant pixels to improve small object feature extraction. To address missed detections and inaccurate localization, we replace the detection head with a dynamic head, allowing the model to route objects to the appropriate head for final output. We also introduce Slideloss to improve the model’s learning of difficult samples and ShapeIoU to better account for the shape and scale of bounding boxes. Experiments on datasets like VisDrone2019 show that our method improves accuracy by nearly 10% and recall by about 11% compared to the baseline. Additionally, on the AI-TODv1.5 dataset, our method improves the mAP50 from 38.8 to 45.2. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

27 pages, 3367 KiB

Open AccessArticle

Binocular Video-Based Automatic Pixel-Level Crack Detection and Quantification Using Deep Convolutional Neural Networks for Concrete Structures

by Liqu Liu, Bo Shen, Shuchen Huang, Runlin Liu, Weizhang Liao, Bin Wang and Shuo Diao

Buildings 2025, 15(2), 258; https://doi.org/10.3390/buildings15020258 - 17 Jan 2025

Cited by 5 | Viewed by 1098

Abstract

Crack detection and quantification play crucial roles in assessing the condition of concrete structures. Herein, a novel real-time crack detection and quantification method that leverages binocular vision and a lightweight deep learning model is proposed. In this methodology, the proposed method based on [...] Read more.

Crack detection and quantification play crucial roles in assessing the condition of concrete structures. Herein, a novel real-time crack detection and quantification method that leverages binocular vision and a lightweight deep learning model is proposed. In this methodology, the proposed method based on the following four modules is adopted: a lightweight classification algorithm, a high-precision segmentation algorithm, a semi-global block matching algorithm (SGBM), and a crack quantification technique. Based on the crack segmentation results, a framework is developed for quantitative analysis of the major geometric parameters, including crack length, crack width, and crack angle of orientation at the pixel level. Results indicate that, by incorporating channel attention and spatial attention mechanisms in the MBConv module, the detection accuracy of the improved EfficientNetV2 increased by 1.6% compared with the original EfficientNetV2. Results indicate that using the proposed quantification method can achieve low quantification errors of 2%, 4.5%, and 4% for the crack length, width, and angle of orientation, respectively. The proposed method can contribute to crack detection and quantification in practical use by being deployed on smart devices. Full article

(This article belongs to the Special Issue Seismic Performance and Durability of Engineering Structures)

► Show Figures

Figure 1

Search Results (116)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (116)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI