Saved Queries

Industrial equipment fault diagnosis faces dual challenges: significant data distribution discrepancies caused by diverse operating conditions impair generalization capabilities, while underutilized spatio-temporal information from multi-source data hinders feature extraction. To address this, we propose a spatio-temporal collaborative perception-driven feature graph construction and topology mining methodology for variable-condition diagnosis. First, leveraging the operational condition invariance and cross-condition consistency of fault features, we construct fault feature graphs using single-source data and similarity clustering, validating topological similarity and representational consistency under varying conditions. Second, we reveal spatio-temporal correlations within multi-source feature topologies. By embedding multi-source spatio-temporal information into fault feature graphs via spatio-temporal collaborative perception, we establish high-dimensional spatio-temporal feature topology graphs based on spectral similarity, extending generalized feature representations into the spatio-temporal domain. Finally, we develop a graph residual convolutional network to mine topological information from multi-source spatio-temporal features under complex operating conditions. Experiments on variable/multi-condition datasets demonstrate the following: feature graphs seamlessly integrate multi-source information with operational variations; the methodology precisely captures spatio-temporal delays induced by vibrational direction/path discrepancies; and the proposed model maintains both high diagnostic accuracy and strong generalization capacity under complex operating conditions, delivering a highly reliable framework for rotating machinery fault diagnosis. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

►▼ Show Figures

Figure 1

20 pages, 2631 KiB

Open AccessArticle

Automatic 3D Reconstruction: Mesh Extraction Based on Gaussian Splatting from Romanesque–Mudéjar Churches

by Nelson Montas-Laracuente, Emilio Delgado Martos, Carlos Pesqueira-Calvo, Giovanni Intra Sidola, Ana Maitín, Alberto Nogales and Álvaro José García-Tejedor

Appl. Sci. 2025, 15(15), 8379; https://doi.org/10.3390/app15158379 - 28 Jul 2025

Abstract

This research introduces an automated 3D virtual reconstruction system tailored for architectural heritage (AH) applications, contributing to the ongoing paradigm shift from traditional CAD-based workflows to artificial intelligence-driven methodologies. It reviews recent advancements in machine learning and deep learning—particularly neural radiance fields (NeRFs) and its successor, Gaussian splatting (GS)—as state-of-the-art techniques in the domain. The study advocates for replacing point cloud data in heritage building information modeling workflows with image-based inputs, proposing a novel “photo-to-BIM” pipeline. A proof-of-concept system is presented, capable of processing photographs or video footage of ancient ruins—specifically, Romanesque–Mudéjar churches—to automatically generate 3D mesh reconstructions. The system’s performance is assessed using both objective metrics and subjective evaluations of mesh quality. The results confirm the feasibility and promise of image-based reconstruction as a viable alternative to conventional methods. The study successfully developed a system for automated 3D mesh reconstruction of AH from images. It applied GS and Mip-splatting for NeRFs, proving superior in noise reduction for subsequent mesh extraction via surface-aligned Gaussian splatting for efficient 3D mesh reconstruction. This photo-to-mesh pipeline signifies a viable step towards HBIM. Full article

(This article belongs to the Special Issue Intelligent Techniques and 3D Virtual Reconstruction for Architectural Heritage)

41 pages, 5668 KiB

Open AccessSystematic Review

How Architectural Heritage Is Moving to Smart: A Systematic Review of HBIM

by Huachun Cui and Jiawei Wu

Buildings 2025, 15(15), 2664; https://doi.org/10.3390/buildings15152664 - 28 Jul 2025

Abstract

Heritage Building Information Modeling (HBIM) has emerged as a key tool in advancing heritage conservation and sustainable management. Preceding reviews had typically concentrated on specific technical aspects but did not provide sufficient bibliometric analysis. This study aims to integrate existing HBIM research to identify key research patterns, emerging trends, and forecast future directions. A total of 1516 documents were initially retrieved from the Web of Science Core Collection using targeted search terms. Following a relevance screening, 1175 documents were related to the topic. CiteSpace 6.4.R1, VOSviewer 1.6.20, and Bibliometrix 4.1, three bibliometric tools, were employed to conduct both quantitative and qualitative assessments. The results show three historical phases of HBIM, identify core journals, influential authors, and leading regions, and extract six major keyword clusters: risk assessment, data acquisition, semantic annotation, digital twins, and energy and equipment management. Nine co-citation clusters further outline the foundational literature in the field. The results highlight growing scholarly interest in workflow integration and digital twin applications. Future projections emphasize the transformative potential of artificial intelligence in HBIM, while also recognizing critical implementation barriers, particularly in developing countries and resource-constrained contexts. This study provides a comprehensive and systematic framework for HBIM research, offering valuable insights for scholars, practitioners, and policymakers involved in heritage preservation and digital management. Full article

(This article belongs to the Special Issue Application of Digital Technology in the Preservation and Restoration of Historic Buildings)

25 pages, 837 KiB

Open AccessArticle

DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion

by Nhat-Hai Nguyen, Thi-Thu Nguyen and Quan T. Ngo

J. Risk Financial Manag. 2025, 18(8), 417; https://doi.org/10.3390/jrfm18080417 - 28 Jul 2025

Abstract

Stock price forecasting remains a persistent challenge in time series analysis due to complex inter-stock relationships and dynamic textual signals such as financial news. While Graph Neural Networks (GNNs) can model relational structures, they often struggle with capturing higher-order dependencies and are sensitive to noise. Moreover, sentiment signals are typically aggregated using fixed time windows, which may introduce temporal bias. To address these issues, we propose DASF-Net (Diffusion-Aware Sentiment Fusion Network), a multimodal framework that integrates structural and textual information for robust prediction. DASF-Net leverages diffusion processes over two complementary financial graphs—one based on industry relationships, the other on fundamental indicators—to learn richer stock representations. Simultaneously, sentiment embeddings extracted from financial news using FinBERT are aggregated over an empirically optimized window to preserve temporal relevance. These modalities are fused via a multi-head attention mechanism and passed to a temporal forecasting module. DASF-Net integrates daily stock prices and news sentiment, using a 3-day sentiment aggregation window, to forecast stock prices over daily horizons (1–3 days). Experiments on 12 large-cap S&P 500 stocks over four years demonstrate that DASF-Net outperforms competitive baselines, achieving up to 91.6% relative reduction in Mean Squared Error (MSE). Results highlight the effectiveness of combining graph diffusion and sentiment-aware features for improved financial forecasting. Full article

(This article belongs to the Special Issue Machine Learning Applications in Finance, 2nd Edition)

►▼ Show Figures

Figure 1

21 pages, 3448 KiB

Open AccessArticle

A Welding Defect Detection Model Based on Hybrid-Enhanced Multi-Granularity Spatiotemporal Representation Learning

by Chenbo Shi, Shaojia Yan, Lei Wang, Changsheng Zhu, Yue Yu, Xiangteng Zang, Aiping Liu, Chun Zhang and Xiaobing Feng

Sensors 2025, 25(15), 4656; https://doi.org/10.3390/s25154656 - 27 Jul 2025

Abstract

Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability of deep learning models, this paper proposes a multi-granularity spatiotemporal representation learning algorithm based on the hybrid enhancement of handcrafted and deep learning features. A MobileNetV2 backbone network integrated with a Temporal Shift Module (TSM) is designed to progressively capture the short-term dynamic features of the molten pool and integrate temporal information across both low-level and high-level features. A multi-granularity attention-based feature aggregation module is developed to select key interference-free frames using cross-frame attention, generate multi-granularity features via grouped pooling, and apply the Convolutional Block Attention Module (CBAM) at each granularity level. Finally, these multi-granularity spatiotemporal features are adaptively fused. Meanwhile, an independent branch utilizes the Histogram of Oriented Gradient (HOG) and Scale-Invariant Feature Transform (SIFT) features to extract long-term spatial structural information from historical edge images, enhancing the model’s interpretability. The proposed method achieves an accuracy of 99.187% on a self-constructed dataset. Additionally, it attains a real-time inference speed of 20.983 ms per sample on a hardware platform equipped with an Intel i9-12900H CPU and an RTX 3060 GPU, thus effectively balancing accuracy, speed, and interpretability. Full article

(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))

►▼ Show Figures

Figure 1

23 pages, 1154 KiB

Open AccessArticle

Impact of Paintings Made from Waste Materials from a Particular Region on Viewers’ Behavioral Intention Regarding Social and Environmental Issues

by Ryuzo Furukawa and Ayami Tamura

Sustainability 2025, 17(15), 6822; https://doi.org/10.3390/su17156822 - 27 Jul 2025

Abstract

The purpose of this study is to analyze how future landscape paintings using paints and binders made from the waste materials of a particular region and the background information of these artworks affect viewers’ behavioral intentions regarding social and environmental issues. First, 30 beauty evaluation items were extracted by a repertory grid analysis. Then, we asked participants to view the artworks at an exhibition in order to determine the impact of the background information behind the painting and the painting itself on participants’ imagination of the future, reconsidering themselves, social and environmental issues, and their behavioral intentions. The results showed that viewing paintings with their background information and using paints made of waste materials from a particular region improved participants’ behavioral intentions to imagine the future, to reconsider themselves, and to reconsider social and environmental issues. The elements of beauty of the paintings were found to have the potential to trigger the first step toward lifestyle change for sustainability. Full article

►▼ Show Figures

Figure 1

17 pages, 2864 KiB

Open AccessArticle

A Deep-Learning-Based Diffusion Tensor Imaging Pathological Auto-Analysis Method for Cervical Spondylotic Myelopathy

by Shuoheng Yang, Junpeng Li, Ningbo Fei, Guangsheng Li and Yong Hu

Bioengineering 2025, 12(8), 806; https://doi.org/10.3390/bioengineering12080806 - 27 Jul 2025

Abstract

Pathological conditions of the spinal cord have been found to be associated with cervical spondylotic myelopathy (CSM). This study aims to explore the feasibility of automatic deep-learning-based classification of the pathological condition of the spinal cord to quantify its severity. A Diffusion Tensor Imaging (DTI)-based spinal cord pathological assessment method was proposed. A multi-dimensional feature fusion model, referred to as DCSANet-MD (DTI-Based CSM Severity Assessment Network-Multi-Dimensional), was developed to extract both 2D and 3D features from DTI slices, incorporating a feature integration mechanism to enhance the representation of spatial information. To evaluate this method, 176 CSM patients with cervical DTI slices and clinical records were collected. The proposed assessment model demonstrated an accuracy of 82% in predicting two categories of severity levels (mild and severe). Furthermore, in a more refined three-category severity classification (mild, moderate, and severe), using a hierarchical classification strategy, the model achieved an accuracy of approximately 68%, which significantly exceeded the baseline performance. In conclusion, these findings highlight the potential of the deep-learning-based method as a decision-making support tool for DTI-based pathological assessments of CSM, offering great value in monitoring disease progression and guiding the intervention strategies. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Medical Imaging Processing)

►▼ Show Figures

Figure 1

23 pages, 4467 KiB

Open AccessArticle

Research on Indoor Object Detection and Scene Recognition Algorithm Based on Apriori Algorithm and Mobile-EFSSD Model

by Wenda Zheng, Yibo Ai and Weidong Zhang

Mathematics 2025, 13(15), 2408; https://doi.org/10.3390/math13152408 - 26 Jul 2025

Viewed by 67

Abstract

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this paper proposes an indoor object detection and scene recognition algorithm based on the Apriori algorithm and the Mobile-EFSSD model, which can simultaneously obtain object category and location information while recognizing scenes. The specific research contents are as follows: (1) To address complex indoor scenes and occlusion, this paper proposes an improved Mobile-EFSSD object detection algorithm. An optimized MobileNetV3 with ECA attention is used as the backbone. Multi-scale feature maps are fused via FPN. The localization loss includes a hyperparameter, and focal loss replaces confidence loss. Experiments show that the method achieves stable performance, effectively detects occluded objects, and accurately extracts category and location information. (2) To improve classification stability in indoor scene recognition, this paper proposes a naive Bayes-based method. Object detection results are converted into text features, and the Apriori algorithm extracts object associations. Prior probabilities are calculated and fed into a naive Bayes classifier for scene recognition. Evaluated using the ADE20K dataset, the method outperforms existing approaches by achieving a better accuracy–speed trade-off and enhanced classification stability. The proposed algorithm is applied to indoor scene images, enabling the simultaneous acquisition of object categories and location information while recognizing scenes. Moreover, the algorithm has a simple structure, with an object detection average precision of 82.7% and a scene recognition average accuracy of 95.23%, making it suitable for practical detection requirements. Full article

►▼ Show Figures

Figure 1

20 pages, 5843 KiB

Open AccessArticle

Accurate and Robust Train Localization: Fusing Degeneracy-Aware LiDAR-Inertial Odometry and Visual Landmark Correction

by Lin Yue, Peng Wang, Jinchao Mu, Chen Cai, Dingyi Wang and Hao Ren

Sensors 2025, 25(15), 4637; https://doi.org/10.3390/s25154637 - 26 Jul 2025

Viewed by 65

Abstract

To overcome the limitations of current train positioning systems, including low positioning accuracy and heavy reliance on track transponders or GNSS signals, this paper proposes a novel LiDAR-inertial and visual landmark fusion framework. Firstly, an IMU preintegration factor considering the Earth’s rotation and a LiDAR-inertial odometry factor accounting for degenerate states are constructed to adapt to railway train operating environments. Subsequently, a lightweight network based on YOLO improvement is used for recognizing reflective kilometer posts, while PaddleOCR extracts numerical codes. High-precision vertex coordinates of kilometer posts are obtained by jointly using LiDAR point cloud and an image detection box. Next, a kilometer post factor is constructed, and multi-source information is optimized within a factor graph framework. Finally, onboard experiments conducted on real railway vehicles demonstrate high-precision landmark detection at 35 FPS with 94.8% average precision. The proposed method delivers robust positioning within 5 m RMSE accuracy for high-speed, long-distance train travel, establishing a novel framework for intelligent railway development. Full article

(This article belongs to the Section Navigation and Positioning)

►▼ Show Figures

Figure 1

13 pages, 704 KiB

Open AccessArticle

Population Substructures of Castanopsis tribuloides in Northern Thailand Revealed Using Autosomal STR Variations

by Patcharawadee Thongkumkoon, Jatupol Kampuansai, Maneesawan Dansawan, Pimonrat Tiansawat, Nuttapol Noirungsee, Kittiyut Punchay, Nuttaluck Khamyong and Prasit Wangpakapattanawong

Plants 2025, 14(15), 2306; https://doi.org/10.3390/plants14152306 - 26 Jul 2025

Viewed by 99

Abstract

This study investigates the genetic diversity and population structure of Castanopsis tribuloides, a vital tree species in Asian forest ecosystems. Understanding the genetic patterns of keystone forest species provides critical insights into forest resilience and ecosystem function and informs conservation strategies. We analyzed population samples collected from three distinct locations within Doi Suthep Mountain in northern Thailand using Short Tandem Repeat (STR) markers to assess both intra- and inter-population genetic relationships. DNA was extracted from leaf samples and analyzed using a panel of polymorphic microsatellite loci specifically optimized for Castanopsis species. Statistical analyses included the assessment of forensic parameters (number of alleles, observed and expected heterozygosity, gene diversity, polymorphic information content), population differentiation metrics (G_ST), inbreeding coefficients (F_IS), and gene flow estimates (N_m). We further examined population history through bottleneck analysis using three models (IAM, SMM, and TPM) and visualized genetic relationships through principal coordinate analysis and cluster analysis. Our results revealed significant patterns of genetic structuring across the sampled populations, with genetic distance metrics showing statistically significant differentiation between certain population pairs. The PCA and cluster analyses confirmed distinct population groupings that correspond to geographic distribution patterns. These findings provide the first comprehensive assessment of C. tribuloides population genetics in this region, establishing baseline data for monitoring genetic diversity and informing conservation strategies. This research contributes to our understanding of how landscape features and ecological factors shape genetic diversity patterns in essential forest tree species, with implications for managing forest genetic resources in the face of environmental change. Full article

(This article belongs to the Section Plant Genetic Resources)

►▼ Show Figures

Figure 1

40 pages, 3221 KiB

Open AccessArticle

Balancing Multi-Source Heterogeneous User Requirement Information in Complex Product Design

by Cengjuan Wu, Tianlu Zhu, Yajun Li, Zhizheng Zhang and Tianyu Wu

Symmetry 2025, 17(8), 1192; https://doi.org/10.3390/sym17081192 - 25 Jul 2025

Viewed by 83

Abstract

User requirements are the core driving force behind the iterative development of complex products. Their comprehensive collection, accurate interpretation, and effective integration directly affect design outcomes. However, current practices often depend heavily on single-source data and designer intuition, resulting in incomplete, biased, and fragile design decisions. Moreover, multi-source heterogeneous user requirements often exhibit inherent asymmetry and imbalance in both structure and contribution. To address these issues, this study proposes a symmetric and balanced optimization method for multi-source heterogeneous user requirements in complex product design. Multiple acquisition and analysis approaches are integrated to mitigate the limitations of single-source data by fusing complementary information and enabling balanced decision-making. Firstly, unstructured text data from online reviews are used to extract initial user requirements, and a topic analysis method is applied for modeling and clustering. Secondly, user interviews are analyzed using a fuzzy satisfaction analysis, while eye-tracking experiments capture physiological behavior to support correlation analysis between internal preferences and external behavior. Finally, a cooperative game-based model is introduced to optimize conflicts among data sources, ensuring fairness in decision-making. The method was validated using a case study of oxygen concentrators. The findings demonstrate improvements in both decision robustness and requirement representation. Full article

(This article belongs to the Section Engineering and Materials)

►▼ Show Figures

Figure 1

14 pages, 6202 KiB

Open AccessArticle

Masked Channel Modeling Enables Vision Transformers to Learn Better Semantics

by Jiayi Chen, Yanbiao Ma, Wei Dai and Zhihao Li

Entropy 2025, 27(8), 794; https://doi.org/10.3390/e27080794 - 25 Jul 2025

Viewed by 86

Abstract

Leveraging the ability of Vision Transformers (ViTs) to model contextual information across spatial patches, Masked Image Modeling (MIM) has emerged as a successful pre-training paradigm for visual representation learning by masking parts of the input and reconstructing the original image. However, this characteristic of ViTs has led many existing MIM methods to focus primarily on spatial patch reconstruction, overlooking the importance of semantic continuity in the channel dimension. Therefore, we propose a novel Masked Channel Modeling (MCM) pre-training paradigm, which reconstructs masked channel features using the contextual information from unmasked channels, thereby enhancing the model’s understanding of images from the perspective of channel semantic continuity. Considering that traditional RGB reconstruction targets lack sufficient semantic attributes in the channel dimension, MCM introduces advanced features extracted by the CLIP image encoder as reconstruction targets. This guides the model to better capture semantic continuity across feature channels. Extensive experiments on downstream tasks, including image classification, object detection, and semantic segmentation, demonstrate the effectiveness and superiority of MCM. Our code will be available later. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

►▼ Show Figures

Figure 1

26 pages, 16392 KiB

Open AccessArticle

TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology

by Jun-Hyeon Choi, Jeong-Won Pyo, Ye-Chan An and Tae-Yong Kuc

Sensors 2025, 25(15), 4614; https://doi.org/10.3390/s25154614 - 25 Jul 2025

Viewed by 179

Abstract

This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is explicitly designed for multi-level reasoning. TOSD combines shape, color, and topological information without depending on predefined class labels. The shape descriptor captures the geometric configuration of each object. The color descriptor focuses on internal appearance by extracting normalized color features. The topology descriptor models the spatial and semantic relationships between objects in a scene. These components are integrated at both object and scene levels to produce compact and consistent embeddings. The resulting representation covers three levels of abstraction: low-level pixel details, mid-level object features, and high-level semantic structure. This hierarchical organization makes it possible to represent both local cues and global context in a unified form. We evaluate the proposed method on multiple vision tasks. The results show that TOSD performs competitively compared to baseline methods, while maintaining robustness in challenging cases such as occlusion and viewpoint changes. The framework is applicable to visual odometry, SLAM, object tracking, global localization, scene clustering, and image retrieval. In addition, this work extends our previous research on the Semantic Modeling Framework, which represents environments using layered structures of places, objects, and their ontological relations. Full article

(This article belongs to the Special Issue Event-Driven Vision Sensor Architectures and Application Scenarios)

►▼ Show Figures

Figure 1

16 pages, 589 KiB

Open AccessArticle

CT-Based Radiomics Enhance Respiratory Function Analysis for Lung SBRT

by Alice Porazzi, Mattia Zaffaroni, Vanessa Eleonora Pierini, Maria Giulia Vincini, Aurora Gaeta, Sara Raimondi, Lucrezia Berton, Lars Johannes Isaksson, Federico Mastroleo, Sara Gandini, Monica Casiraghi, Gaia Piperno, Lorenzo Spaggiari, Juliana Guarize, Stefano Maria Donghi, Łukasz Kuncman, Roberto Orecchia, Stefania Volpe and Barbara Alicja Jereczek-Fossa

Bioengineering 2025, 12(8), 800; https://doi.org/10.3390/bioengineering12080800 - 25 Jul 2025

Viewed by 181

Abstract

Introduction: Radiomics is the extraction of non-invasive and reproducible quantitative imaging features, which may yield mineable information for clinical practice implementation. Quantification of lung function through radiomics could play a role in the management of patients with pulmonary lesions. The aim of this study is to test the capability of radiomic features to predict pulmonary function parameters, focusing on the diffusing capacity of lungs to carbon monoxide (DL_CO). Methods: Retrospective data were retrieved from electronical medical records of patients treated with Stereotactic Body Radiation Therapy (SBRT) at a single institution. Inclusion criteria were as follows: (1) SBRT treatment performed for primary early-stage non-small cell lung cancer (ES-NSCLC) or oligometastatic lung nodules, (2) availability of simulation four-dimensional computed tomography (4DCT) scan, (3) baseline spirometry data availability, (4) availability of baseline clinical data, and (5) written informed consent for the anonymized use of data. The gross tumor volume (GTV) was segmented on 4DCT reconstructed phases representing the moment of maximum inhalation and maximum exhalation (Phase 0 and Phase 50, respectively), and radiomic features were extracted from the lung parenchyma subtracting the lesion/s. An iterative algorithm was clustered based on correlation, while keeping only those most associated with baseline and post-treatment DL_CO. Three models were built to predict DL_CO abnormality: the clinical model—containing clinical information; the radiomic model—containing the radiomic score; the clinical-radiomic model—containing clinical information and the radiomic score. For the models just described, the following were constructed: Model 1 based on the features in Phase 0; Model 2 based on the features in Phase 50; Model 3 based on the difference between the two phases. The AUC was used to compare their performances. Results: A total of 98 patients met the inclusion criteria. The Charlson Comorbidity Index (CCI) scored as the clinical variable most associated with baseline DL_CO (p = 0.014), while the most associated features were mainly texture features and similar among the two phases. Clinical-radiomic models were the best at predicting both baseline and post-treatment abnormal DL_CO. In particular, the performances for the three clinical-radiomic models at predicting baseline abnormal DL_CO were AUC₁ = 0.72, AUC₂ = 0.72, and AUC₃ = 0.75, for Model 1, Model 2, and Model 3, respectively. Regarding the prediction of post-treatment abnormal DL_CO, the performances of the three clinical-radiomic models were AUC₁ = 0.91, AUC₂ = 0.91, and AUC₃ = 0.95, for Model 1, Model 2, and Model 3, respectively. Conclusions: This study demonstrates that radiomic features extracted from healthy lung parenchyma on a 4DCT scan are associated with baseline pulmonary function parameters, showing that radiomics can add a layer of information in surrogate models for lung function assessment. Preliminary results suggest the potential applicability of these models for predicting post-SBRT lung function, warranting validation in larger, prospective cohorts. Full article

(This article belongs to the Special Issue Engineering the Future of Radiotherapy: Innovations and Challenges)

►▼ Show Figures

Figure 1

28 pages, 42031 KiB

Open AccessArticle

A Building Crack Detection UAV System Based on Deep Learning and Linear Active Disturbance Rejection Control Algorithm

by Lei Zhang, Lili Gong, Le Wang, Zhou Wang and Song Yan

Electronics 2025, 14(15), 2975; https://doi.org/10.3390/electronics14152975 - 25 Jul 2025

Viewed by 96

Abstract

This paper presents a UAV-based building crack real-time detection system that integrates an improved YOLOv8 algorithm with Linear Active Disturbance Rejection Control (LADRC). The system is equipped with a high-resolution camera and sensors to capture high-definition images and height information. First, a trajectory tracking controller based on LADRC was designed for the UAV, which uses a linear extended state observer to estimate and compensate for unknown disturbances such as wind interference, significantly enhancing the flight stability of the UAV in complex environments and ensuring stable crack image acquisition. Secondly, we integrated Convolutional Block Attention Module (CBAM) into the YOLOv8 model, dynamically enhancing crack feature extraction through both channel and spatial attention mechanisms, thereby improving recognition robustness in complex backgrounds. Lastly, a skeleton extraction algorithm was applied for the secondary processing of the segmented cracks, enabling precise calculations of crack length and average width and outputting the results to a user interface for visualization. The experimental results demonstrate that the system successfully identifies and extracts crack regions, accurately calculates crack dimensions, and enables real-time monitoring through high-speed data transmission to the ground station. Compared to traditional manual inspection methods, the system significantly improves detection efficiency while maintaining high accuracy and reliability. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 330.

Go to page 1 2 3 4 5

Search Results (16,496)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI