Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,128)

Search Parameters:
Keywords = pixel-level feature

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 11949 KB  
Article
MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards
by Bo Ma, Jiawei Xu, Ruofei Liu, Junlin Mu, Biye Li, Rongsen Xie, Shuangxi Liu, Xianliang Hu, Yongqiang Zheng, Hongjian Zhang and Jinxing Wang
Horticulturae 2025, 11(11), 1273; https://doi.org/10.3390/horticulturae11111273 - 22 Oct 2025
Abstract
Accurate monitoring of orchard pests is vital for green and efficient apple production. Yet images captured by intelligent pest-monitoring lamps often contain small targets, weak boundaries, and crowded scenes, which hamper detection accuracy. We present MDAS-YOLO, a lightweight detection framework tailored for smart [...] Read more.
Accurate monitoring of orchard pests is vital for green and efficient apple production. Yet images captured by intelligent pest-monitoring lamps often contain small targets, weak boundaries, and crowded scenes, which hamper detection accuracy. We present MDAS-YOLO, a lightweight detection framework tailored for smart pest monitoring in apple orchards. At the input stage, we adopt the LIME++ enhancement to mitigate low illumination and non-uniform lighting, improving image quality at the source. On the model side, we integrate three structural innovations: (1) a C3k2-MESA-DSM module in the backbone to explicitly strengthen contours and fine textures via multi-scale edge enhancement and dual-domain feature selection; (2) an AP-BiFPN in the neck to achieve adaptive cross-scale fusion through learnable weighting and differentiated pooling; and (3) a SimAM block before the detection head to perform zero-parameter, pixel-level saliency re-calibration, suppressing background redundancy without extra computation. On a self-built apple-orchard pest dataset, MDAS-YOLO attains 95.68% mAP, outperforming YOLOv11n by 6.97 percentage points while maintaining a superior trade-off among accuracy, model size, and inference speed. Overall, the proposed synergistic pipeline—input enhancement, early edge fidelity, mid-level adaptive fusion, and end-stage lightweight re-calibration—effectively addresses small-scale, weak-boundary, and densely distributed pests, providing a promising and regionally validated approach for intelligent pest monitoring and sustainable orchard management, and offering methodological insights for future multi-regional pest monitoring research. Full article
(This article belongs to the Section Insect Pest Management)
Show Figures

Figure 1

29 pages, 48102 KB  
Article
Infrared Temporal Differential Perception for Space-Based Aerial Targets
by Lan Guo, Xin Chen, Cong Gao, Zhiqi Zhao and Peng Rao
Remote Sens. 2025, 17(20), 3487; https://doi.org/10.3390/rs17203487 - 20 Oct 2025
Viewed by 214
Abstract
Space-based infrared (IR) detection, with wide coverage, all-time operation, and stealth, is crucial for aerial target surveillance. Under low signal-to-noise ratio (SNR) conditions, however, its small target size, limited features, and strong clutters often lead to missed detections and false alarms, reducing stability [...] Read more.
Space-based infrared (IR) detection, with wide coverage, all-time operation, and stealth, is crucial for aerial target surveillance. Under low signal-to-noise ratio (SNR) conditions, however, its small target size, limited features, and strong clutters often lead to missed detections and false alarms, reducing stability and real-time performance. To overcome these issues of energy-integration imaging in perceiving dim targets, this paper proposes a biomimetic vision-inspired Infrared Temporal Differential Detection (ITDD) method. The ITDD method generates sparse event streams by triggering pixel-level radiation variations and establishes an irradiance-based sensitivity model with optimized threshold voltage, spectral bands, and optical aperture parameters. IR sequences are converted into differential event streams with inherent noise, upon which a lightweight multi-modal fusion detection network is developed. Simulation experiments demonstrate that ITDD reduces data volume by three orders of magnitude and improves the SNR by 4.21 times. On the SITP-QLEF dataset, the network achieves a detection rate of 99.31%, and a false alarm rate of 1.97×105, confirming its effectiveness and application potential under complex backgrounds. As the current findings are based on simulated data, future work will focus on building an ITDD demonstration system to validate the approach with real-world IR measurements. Full article
(This article belongs to the Special Issue Deep Learning-Based Small-Target Detection in Remote Sensing)
Show Figures

Figure 1

18 pages, 11753 KB  
Article
SemiSeg-CAW: Semi-Supervised Segmentation of Ultrasound Images by Leveraging Class-Level Information and an Adaptive Multi-Loss Function
by Somayeh Barzegar and Naimul Khan
Mach. Learn. Knowl. Extr. 2025, 7(4), 124; https://doi.org/10.3390/make7040124 - 20 Oct 2025
Viewed by 189
Abstract
The limited availability of pixel-level annotated medical images complicates training supervised segmentation models, as these models require large datasets. To deal with this issue, SemiSeg-CAW, a semi-supervised segmentation framework that leverages class-level information and an adaptive multi-loss function, is proposed to reduce dependency [...] Read more.
The limited availability of pixel-level annotated medical images complicates training supervised segmentation models, as these models require large datasets. To deal with this issue, SemiSeg-CAW, a semi-supervised segmentation framework that leverages class-level information and an adaptive multi-loss function, is proposed to reduce dependency on extensive annotations. The model combines segmentation and classification tasks in a multitask architecture that includes segmentation, classification, weight generation, and ClassElevateSeg modules. In this framework, the ClassElevateSeg module is initially pre-trained and then fine-tuned jointly with the main model to produce auxiliary feature maps that support the main model, while the adaptive weighting strategy computes a dynamic combination of classification and segmentation losses using trainable weights. The proposed approach enables effective use of both labeled and unlabeled images with class-level information by compensating for the shortage of pixel-level labels. Experimental evaluation on two public ultrasound datasets demonstrates that SemiSeg-CAW consistently outperforms fully supervised segmentation models when trained with equal or fewer labeled samples. The results suggest that incorporating class-level information with adaptive loss weighting provides an effective strategy for semi-supervised medical image segmentation and can improve the segmentation performance in situations with limited annotations. Full article
Show Figures

Figure 1

17 pages, 10635 KB  
Article
Hybrid Convolutional Transformer with Dynamic Prompting for Adaptive Image Restoration
by Jinmei Zhang, Guorong Chen, Junliang Yang, Qingru Zhang, Shaofeng Liu and Weijie Zhang
Mathematics 2025, 13(20), 3329; https://doi.org/10.3390/math13203329 - 19 Oct 2025
Viewed by 178
Abstract
High-quality image restoration (IR) is a fundamental task in computer vision, aiming to recover a clear image from its degraded version. Prevailing methods typically employ a static inference pipeline, neglecting the spatial variability of image content and degradation, which makes it difficult for [...] Read more.
High-quality image restoration (IR) is a fundamental task in computer vision, aiming to recover a clear image from its degraded version. Prevailing methods typically employ a static inference pipeline, neglecting the spatial variability of image content and degradation, which makes it difficult for them to adaptively handle complex and diverse restoration scenarios. To address this issue, we propose a novel adaptive image restoration framework named Hybrid Convolutional Transformer with Dynamic Prompting (HCTDP). Our approach introduces two key architectural innovations: a Spatially Aware Dynamic Prompt Head Attention (SADPHA) module, which performs fine-grained local restoration by generating spatially variant prompts through real-time analysis of image content and a Gated Skip-Connection (GSC) module that refines multi-scale feature flow using efficient channel attention. To guide the network in generating more visually plausible results, the framework is optimized with a hybrid objective function that combines a pixel-wise L1 loss and a feature-level perceptual loss. Extensive experiments on multiple public benchmarks, including image deraining, dehazing, and denoising, demonstrate that our proposed HCTDP exhibits superior performance in both quantitative and qualitative evaluations, validating the effectiveness of the adaptive restoration framework while utilizing fewer parameters than key competitors. Full article
(This article belongs to the Special Issue Intelligent Mathematics and Applications)
Show Figures

Figure 1

20 pages, 11103 KB  
Data Descriptor
VitralColor-12: A Synthetic Twelve-Color Segmentation Dataset from GPT-Generated Stained-Glass Images
by Martín Montes Rivera, Carlos Guerrero-Mendez, Daniela Lopez-Betancur, Tonatiuh Saucedo-Anaya, Manuel Sánchez-Cárdenas and Salvador Gómez-Jiménez
Data 2025, 10(10), 165; https://doi.org/10.3390/data10100165 - 18 Oct 2025
Viewed by 251
Abstract
The segmentation and classification of color are crucial stages in image processing, computer vision, and pattern recognition, as they significantly impact the results. The diverse, hand-labeled datasets in the literature are applied for monochromatic or color segmentation in specific domains. On the other [...] Read more.
The segmentation and classification of color are crucial stages in image processing, computer vision, and pattern recognition, as they significantly impact the results. The diverse, hand-labeled datasets in the literature are applied for monochromatic or color segmentation in specific domains. On the other hand, synthetic datasets are generated using statistics, artificial intelligence algorithms, or generative artificial intelligence (AI). This last one includes Large Language Models (LLMs), Generative Adversarial Neural Networks (GANs), and Variational Autoencoders (VAEs), among others. In this work, we propose VitralColor-12, a synthetic dataset for color classification and segmentation, comprising twelve colors: black, blue, brown, cyan, gray, green, orange, pink, purple, red, white, and yellow. VitralColor-12 addresses the limitations of color segmentation and classification datasets by leveraging the capabilities of LLMs, including adaptability, variability, copyright-free content, and lower-cost data—properties that are desirable in image datasets. VitralColor-12 includes pixel-level classification and segmentation maps. This makes the dataset broadly applicable and highly variable for a range of computer vision applications. VitralColor-12 utilizes GPT-5 and DALL·E 3 for generating stained-glass images. These images simplify the annotation process, since stained-glass images have isolated colors with distinct boundaries within the steel structure, which provide easy regions to label with a single color per region. Once we obtain the images, we use at least one hand-labeled centroid per color to automatically cluster all pixels based on Euclidean distance and morphological operations, including erosion and dilation. This process enables us to automatically label a classification dataset and generate segmentation maps. Our dataset comprises 910 images, organized into 70 generated images and 12 pixel segmentation maps—one for each color—which include 9,509,524 labeled pixels, 1,794,758 of which are unique. These annotated pixels are represented by RGB, HSL, CIELAB, and YCbCr values, enabling a detailed color analysis. Moreover, VitralColor-12 offers features that address gaps in public resources such as violin diagrams with the frequency of colors across images, histograms of channels per color, 3D color maps, descriptive statistics, and standardized metrics, such as ΔE76, ΔE94, and CIELAB Chromacity, which prove the distribution, applicability, and realistic perceptual structures, including warm, neutral, and cold colors, as well as the high contrast between black and white colors, offering meaningful perceptual clusters, reinforcing its utility for color segmentation and classification. Full article
Show Figures

Figure 1

29 pages, 7838 KB  
Article
MSLNet and Perceptual Grouping for Guidewire Segmentation and Localization
by Adrian Barbu
Sensors 2025, 25(20), 6426; https://doi.org/10.3390/s25206426 - 17 Oct 2025
Viewed by 156
Abstract
Fluoroscopy (real-time X-ray) images are used for monitoring minimally invasive coronary angioplasty operations such as stent placement. During these operations, a thin wire called a guidewire is used to guide different tools, such as a stent or a balloon, in order to repair [...] Read more.
Fluoroscopy (real-time X-ray) images are used for monitoring minimally invasive coronary angioplasty operations such as stent placement. During these operations, a thin wire called a guidewire is used to guide different tools, such as a stent or a balloon, in order to repair the vessels. However, fluoroscopy images are noisy, and the guidewire is very thin, practically invisible in many places, making its localization very difficult. Guidewire segmentation is the task of finding the guidewire pixels, while guidewire localization is the higher-level task aimed at finding a parameterized curve describing the guidewire points. This paper presents a method for guidewire localization that starts from a guidewire segmentation, from which it extracts a number of initial curves as pixel chains and uses a novel perceptual grouping method to merge these initial curves into a small number of curves. The paper also introduces a novel guidewire segmentation method that uses a residual network (ResNet) as a feature extractor and predicts a coarse segmentation that is refined only in promising locations to a fine segmentation. Experiments on two challenging datasets, one with 871 frames and one with 23,449 frames, show that the method obtains results competitive with existing segmentation methods such as Res-UNet and nnU-Net, while having no skip connections and a faster inference time. Full article
(This article belongs to the Special Issue Advanced Deep Learning for Biomedical Sensing and Imaging)
Show Figures

Figure 1

19 pages, 2488 KB  
Article
Unsupervised Segmentation of Bolus and Residue in Videofluoroscopy Swallowing Studies
by Farnaz Khodami, Mehdy Dousty, James L. Coyle and Ervin Sejdić
J. Imaging 2025, 11(10), 368; https://doi.org/10.3390/jimaging11100368 - 17 Oct 2025
Viewed by 210
Abstract
Bolus tracking is a critical component of swallowing analysis, as the speed, course, and integrity of bolus movement from the mouth to the stomach, along with the presence of residue, serve as key indicators of potential abnormalities. Existing machine learning approaches for videofluoroscopic [...] Read more.
Bolus tracking is a critical component of swallowing analysis, as the speed, course, and integrity of bolus movement from the mouth to the stomach, along with the presence of residue, serve as key indicators of potential abnormalities. Existing machine learning approaches for videofluoroscopic swallowing study (VFSS) analysis heavily rely on annotated data and often struggle to detect residue, which is visually subtle and underrepresented. This study proposes an unsupervised architecture to segment both bolus and residue, marking the first successful machine learning-based residue segmentation in swallowing analysis with quantitative evaluation. We introduce an unsupervised convolutional autoencoder that segments bolus and residue without requiring pixel-level annotations. To address the locality bias inherent in convolutional architectures, we incorporate positional encoding into the input representation, enabling the model to capture global spatial context. The proposed model was validated on a diverse set of VFSS images annotated by certified raters. Our method achieves an intersection over union (IoU) of 61% for bolus segmentation—comparable to state-of-the-art supervised methods—and 52% for residue detection. Despite not using pixel-wise labels for training, our model significantly outperforms top-performing supervised baselines in residue detection, as confirmed by statistical testing. These findings suggest that learning from negative space provides a robust and generalizable pathway for detecting clinically significant but sparsely represented features like residue. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

22 pages, 51772 KB  
Article
On a Software Framework for Automated Pore Identification and Quantification for SEM Images of Metals
by Michael Mulligan, Oliver Fowler, Joshua Voell, Mark Atwater and Howie Fang
Computers 2025, 14(10), 442; https://doi.org/10.3390/computers14100442 - 16 Oct 2025
Viewed by 162
Abstract
The functional performance of porous metals and alloys is dictated by pore features such as size, connectivity, and morphology. While methods like mercury porosimetry or gas pycnometry provide cumulative information, direct observation via scanning electron microscopy (SEM) offers detailed insights unavailable through other [...] Read more.
The functional performance of porous metals and alloys is dictated by pore features such as size, connectivity, and morphology. While methods like mercury porosimetry or gas pycnometry provide cumulative information, direct observation via scanning electron microscopy (SEM) offers detailed insights unavailable through other means, especially for microscale or nanoscale pores. Each scanned image can contain hundreds or thousands of pores, making efficient identification, classification, and quantification challenging due to the processing time required for pixel-level edge recognition. Traditionally, pore outlines on scanned images were hand-traced and analyzed using image-processing software, a process that is time-consuming and often inconsistent for capturing both large and small pores while accurately removing noise. In this work, a software framework was developed that leverages modern computing tools and methodologies for automated image processing for pore identification, classification, and quantification. Vectorization was implemented as the final step to utilize the direction and magnitude of unconnected endpoints to reconstruct incomplete or broken edges. Combined with other existing pore analysis methods, this automated approach reduces manual effort dramatically, reducing analysis time from multiple hours per image to only minutes, while maintaining acceptable accuracy in quantified pore metrics. Full article
(This article belongs to the Section Human–Computer Interactions)
Show Figures

Figure 1

22 pages, 6107 KB  
Article
FPGA-Based Real-Time Deblurring and Enhancement for UAV-Captured Infrared Imagery
by Jianghua Cheng, Lehao Pan, Tong Liu, Bang Cheng and Yahui Cai
Remote Sens. 2025, 17(20), 3446; https://doi.org/10.3390/rs17203446 - 15 Oct 2025
Viewed by 205
Abstract
In response to the inherent limitations of uncooled infrared imaging devices and the image degradation caused by UAV (Unmanned Aerial Vehicle) platform motion, resulting in low contrast and blurred details, a novel single-image blind deblurring and enhancement network is proposed for UAV infrared [...] Read more.
In response to the inherent limitations of uncooled infrared imaging devices and the image degradation caused by UAV (Unmanned Aerial Vehicle) platform motion, resulting in low contrast and blurred details, a novel single-image blind deblurring and enhancement network is proposed for UAV infrared imagery. This network achieves global blind deblurring and local feature enhancement, laying a foundation for subsequent high-level vision tasks. The proposed architecture comprises three key modules: feature extraction, feature fusion, and simulated diffusion. Furthermore, a region-specific pixel loss is introduced to strengthen local feature perception, while a progressive training strategy is adopted to optimize model performance. Experimental results on public infrared datasets demonstrate that the presented method outperforms state-of-the-art methods HCTIRdeblur, reducing parameter count by 18.4%, improving PSNR by 10.7%, and decreasing edge inference time by 25.6%. This work addresses critical challenges in UAV infrared image processing and offers a promising solution for real-world applications. Full article
(This article belongs to the Special Issue Advances in Deep Learning Approaches: UAV Data Analysis)
Show Figures

Figure 1

22 pages, 3532 KB  
Article
Dual Weakly Supervised Anomaly Detection and Unsupervised Segmentation for Real-Time Railway Perimeter Intrusion Monitoring
by Donghua Wu, Yi Tian, Fangqing Gao, Xiukun Wei and Changfan Wang
Sensors 2025, 25(20), 6344; https://doi.org/10.3390/s25206344 - 14 Oct 2025
Viewed by 301
Abstract
The high operational velocities of high-speed trains present constraints on their onboard track intrusion detection systems for real-time capture and analysis, encompassing limited computational resources and motion image blurring. This emphasizes the critical necessity of track perimeter intrusion monitoring systems. Consequently, an intelligent [...] Read more.
The high operational velocities of high-speed trains present constraints on their onboard track intrusion detection systems for real-time capture and analysis, encompassing limited computational resources and motion image blurring. This emphasizes the critical necessity of track perimeter intrusion monitoring systems. Consequently, an intelligent monitoring system employing trackside cameras is constructed, integrating weakly supervised video anomaly detection and unsupervised foreground segmentation, which offers a solution for monitoring foreign objects on high-speed train tracks. To address the challenges of complex dataset annotation and unidentified target detection, weakly supervised learning detection is proposed to track foreign object intrusions based on video. The pretraining of Xception3D and the integration of multiple attention mechanisms have markedly enhanced the feature extraction capabilities. The Top-K sample selection alongside the amplitude score/feature loss function effectively discriminates abnormal from normal samples, incorporating time-smoothing constraints to ensure detection consistency across consecutive frames. Once abnormal video frames are identified, a multiscale variational autoencoder is proposed for the positioning of foreign objects. A downsampling/upsampling module is optimized to increase feature extraction efficiency. The pixel-level background weight distribution loss function is engineered to jointly balance background authenticity and noise resistance. Ultimately, the experimental results indicate that the video anomaly detection model achieved an AUC of 0.99 on the track anomaly detection dataset and processes 2 s video segments in 0.41 s. The proposed foreground segmentation algorithm achieved an F1 score of 0.9030 in the track anomaly dataset and 0.8375 on CDnet2014, with 91 Frames per Second, confirming its efficacy. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 550 KB  
Article
AnomalyNLP: Noisy-Label Prompt Learning for Few-Shot Industrial Anomaly Detection
by Li Hua and Jin Qian
Electronics 2025, 14(20), 4016; https://doi.org/10.3390/electronics14204016 - 13 Oct 2025
Viewed by 466
Abstract
Few-Shot Industrial Anomaly Detection (FSIAD) is an essential yet challenging problem in practical scenarios such as industrial quality inspection. Its objective is to identify previously unseen anomalous regions using only a limited number of normal support images from the same category. Recently, large [...] Read more.
Few-Shot Industrial Anomaly Detection (FSIAD) is an essential yet challenging problem in practical scenarios such as industrial quality inspection. Its objective is to identify previously unseen anomalous regions using only a limited number of normal support images from the same category. Recently, large pre-trained vision-language models (VLMs), such as CLIP, have exhibited remarkable few-shot image-text representation abilities across a range of visual tasks, including anomaly detection. Despite their promise, real-world industrial anomaly datasets often contain noisy labels, which can degrade prompt learning and detection performance. In this paper, we propose AnomalyNLP, a new Noisy-Label Prompt Learning approach designed to tackle the challenge of few-shot anomaly detection. This framework offers a simple and efficient approach that leverages the expressive representations and precise alignment capabilities of VLMs for industrial anomaly detection. First, we design a Noisy-Label Prompt Learning (NLPL) strategy. This strategy utilizes feature learning principles to suppress the influence of noisy samples via Mean Absolute Error (MAE) loss, thereby improving the signal-to-noise ratio and enhancing overall model robustness. Furthermore, we introduce a prompt-driven optimal transport feature purification method to accurately partition datasets into clean and noisy subsets. For both image-level and pixel-level anomaly detection, AnomalyNLP achieves state-of-the-art performance across various few-shot settings on the MVTecAD and VisA public datasets. Qualitative and quantitative results on two datasets demonstrate that our method achieves the largest average AUC improvement over baseline methods across 1-, 2-, and 4-shot settings, with gains of up to 10.60%, 10.11%, and 9.55% in practical anomaly detection scenarios. Full article
Show Figures

Figure 1

19 pages, 12919 KB  
Article
Mapping Flat Peaches Using GF-1 Imagery and Overwintering Features by Comparing Pixel/Object-Based Random Forest Algorithm
by Yawen Wang, Jing Wang and Cheng Tang
Forests 2025, 16(10), 1566; https://doi.org/10.3390/f16101566 - 10 Oct 2025
Viewed by 194
Abstract
The flat peach, an important commercial crop in the 143rd Regiment of Shihezi, China, is overwintered using plastic film mulching. Flat peaches are cultivated to boost the local temperate rural economy. The development of accurate maps of the spatial distribution of flat peach [...] Read more.
The flat peach, an important commercial crop in the 143rd Regiment of Shihezi, China, is overwintered using plastic film mulching. Flat peaches are cultivated to boost the local temperate rural economy. The development of accurate maps of the spatial distribution of flat peach plantations is crucial for the intelligent management of economic orchards. This study evaluated the performance of pixel-based and object-based random forest algorithms for mapping flat peaches using the GF-1 image acquired during the overwintering period. A total of 45 variables, including spectral bands, vegetation indices, and texture, were used as input features. To assess the importance of different features on classification accuracy, the five different sets of variables (5, 15, 25, and 35 input variables and all 45 variables) were classified using pixel/object-based classification methods. Results of the feature optimization suggested that vegetation indices played a key role in the study, and the mean and variance of Gray-Level Co-occurrence Matrix (GLCM) texture features were important variables for distinguishing flat peach orchards. The object-based classification method was superior to the pixel-based classification method with statistically significant differences. The optimal performance was achieved by the object-based method using 25 input variables, with an overall accuracy of 94.47% and a Kappa coefficient of 0.9273. Furthermore, there were no statistically significant differences between the image-derived flat peach cultivated area and the statistical yearbook data. The result indicated that high-resolution images based on the overwintering period can successfully achieve the mapping of flat peach planting areas, which will provide a useful reference for temperate lands with similar agricultural management. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

25 pages, 1214 KB  
Article
Towards Realistic Industrial Anomaly Detection: MADE-Net Framework and ManuDefect-21 Benchmark
by Junyang Yang, Jiuxin Cao and Chengge Duan
Appl. Sci. 2025, 15(20), 10885; https://doi.org/10.3390/app152010885 - 10 Oct 2025
Viewed by 353
Abstract
Visual anomaly detection (VAD) plays a critical role in manufacturing and quality inspection, where the scarcity of anomalous samples poses challenges for developing reliable models. Existing approaches primarily rely on unsupervised training with synthetic anomalies, which often favor specific defect types and struggle [...] Read more.
Visual anomaly detection (VAD) plays a critical role in manufacturing and quality inspection, where the scarcity of anomalous samples poses challenges for developing reliable models. Existing approaches primarily rely on unsupervised training with synthetic anomalies, which often favor specific defect types and struggle to generalize across diverse categories. To address these limitations, we propose MADE-Net (Multi-model Adaptive anomaly Detection Ensemble Network), an industrial anomaly detection framework that integrates three complementary submodels: a reconstruction-based submodel (SRAD), a feature embedding-based submodel (SFAD), and a patch discrimination submodel (LPD). A dynamic integration and selection module (ISM) adaptively determines the most suitable submodel output according to input characteristics. We further introduce ManuDefect-21, a large-scale benchmark dataset comprising 11 categories of electronic components with both normal and anomalous samples in the training and test sets. The dataset reflects realistic positive-to-negative ratios and diverse defect types encountered in real manufacturing environments, addressing several limitations of previous datasets such as MVTec-AD and VisA. Experiments conducted on ManuDefect-21 demonstrate that MADE-Net achieves consistent improvements in both detection and localization metrics (e.g., average AUROC of 98.5%, Pixel-AP of 68.7%) compared with existing methods. While MADE-Net requires pixel-level annotations for fine-tuning and introduces additional computational overhead, it provides enhanced adaptability to complex industrial conditions. The proposed framework and dataset jointly contribute to advancing practical and reproducible research in industrial anomaly detection. Full article
Show Figures

Figure 1

19 pages, 8850 KB  
Article
Intelligent Defect Recognition of Glazed Components in Ancient Buildings Based on Binocular Vision
by Youshan Zhao, Xiaolan Zhang, Ming Guo, Haoyu Han, Jiayi Wang, Yaofeng Wang, Xiaoxu Li and Ming Huang
Buildings 2025, 15(20), 3641; https://doi.org/10.3390/buildings15203641 - 10 Oct 2025
Viewed by 183
Abstract
Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed [...] Read more.
Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed components is crucial. The key to preventive protection lies in the early detection and repair of damage, thereby extending the component’s service life and preventing significant structural damage. To address this challenge, this study proposes a Restoration-Scale Identification (RSI) method that integrates depth information. By combining RGB-D images acquired from a depth camera with intrinsic camera parameters, and embedding a Convolutional Block Attention Module (CBAM) into the backbone network, the method dynamically enhances critical feature regions. It then employs a scale restoration strategy to accurately identify damage areas and recover the physical dimensions of glazed components from a global perspective. In addition, we constructed a dedicated semantic segmentation dataset for glazed tile damage, focusing on cracks and spalling. Both qualitative and quantitative evaluation results demonstrate that, compared with various high-performance semantic segmentation methods, our approach significantly improves the accuracy and robustness of damage detection in glazed components. The achieved accuracy deviates by only ±10 mm from high-precision laser scanning, a level of precision that is essential for reliably identifying and assessing subtle damages in complex glazed architectural elements. By integrating depth information, real scale information can be effectively obtained during the intelligent recognition process, thereby efficiently and accurately identifying the type of damage and size information of glazed components, and realizing the conversion from two-dimensional (2D) pixel coordinates to local three-dimensional (3D) coordinates, providing a scientific basis for the protection and restoration of ancient buildings, and ensuring the long-term stability of cultural heritage and the inheritance of historical value. Full article
(This article belongs to the Section Building Materials, and Repair & Renovation)
Show Figures

Figure 1

23 pages, 4731 KB  
Article
Advancing Urban Roof Segmentation: Transformative Deep Learning Models from CNNs to Transformers for Scalable and Accurate Urban Imaging Solutions—A Case Study in Ben Guerir City, Morocco
by Hachem Saadaoui, Saad Farah, Hatim Lechgar, Abdellatif Ghennioui and Hassan Rhinane
Technologies 2025, 13(10), 452; https://doi.org/10.3390/technologies13100452 - 6 Oct 2025
Viewed by 591
Abstract
Urban roof segmentation plays a pivotal role in applications such as urban planning, infrastructure management, and renewable energy deployment. This study explores the evolution of deep learning techniques from traditional Convolutional Neural Networks (CNNs) to cutting-edge transformer-based models in the context of roof [...] Read more.
Urban roof segmentation plays a pivotal role in applications such as urban planning, infrastructure management, and renewable energy deployment. This study explores the evolution of deep learning techniques from traditional Convolutional Neural Networks (CNNs) to cutting-edge transformer-based models in the context of roof segmentation from satellite imagery. We highlight the limitations of conventional methods when applied to urban environments, including resolution constraints and the complexity of roof structures. To address these challenges, we evaluate two advanced deep learning models, Mask R-CNN and MaskFormer, which have shown significant promise in accurately segmenting roofs, even in dense urban settings with diverse roof geometries. These models, especially the one based on transformers, offer improved segmentation accuracy by capturing both global and local image features, enhancing their performance in tasks where fine detail and contextual awareness are critical. A case study on Ben Guerir City in Morocco, an urban area experiencing rapid development, serves as the foundation for testing these models. Using high-resolution satellite imagery, the segmentation results offer a deeper understanding of the accuracy and effectiveness of these models, particularly in optimizing urban planning and renewable energy assessments. Quantitative metrics such as Intersection over Union (IoU), precision, recall, and F1-score are used to benchmark model performance. Mask R-CNN achieved a mean IoU of 74.6%, precision of 81.3%, recall of 78.9%, and F1-score of 80.1%, while MaskFormer reached a mean IoU of 79.8%, precision of 85.6%, recall of 82.7%, and F1-score of 84.1% (pixel-level, micro-averaged at IoU = 0.50 on the held-out test set), highlighting the transformative potential of transformer-based architectures for scalable and precise urban imaging. The study also outlines future work in 3D modeling and height estimation, positioning these advancements as critical tools for sustainable urban development. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Graphical abstract

Back to TopTop