Saved Queries

Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information about space for autonomous systems, identifying landscape elements, or monitoring and maintaining the infrastructure and environment. Hence, in this paper, we propose a neural classifier architecture that analyzes different features by the parallel processing of information in the network and combines them with a feature fusion mechanism. The neural architecture model takes into account different types of features by extracting them by focusing on spatial, local patterns and multi-scale representation. In addition, the classifier is guided by an attention mechanism for focusing more on different channels, spatial information, and even feature pyramid mechanisms. Atrous convolutional operators were also used in such an architecture as better context feature extractors. The proposed classifier architecture is the main element of the modeled framework for satellite data analysis, which is based on the possibility of training depending on the client’s desire. The proposed methodology was evaluated on three publicly available classification datasets for remote sensing: satellite images, Visual Terrain Recognition, and USTC SmokeRS, where the proposed model achieved accuracy scores of 97.8%, 100.0%, and 92.4%, respectively. The obtained results indicate the effectiveness of the proposed attention mechanisms across different remote sensing challenges. Full article

►▼ Show Figures

Figure 1

27 pages, 7645 KiB

Open AccessArticle

VMMT-Net: A Dual-Branch Parallel Network Combining Visual State Space Model and Mix Transformer for Land–Sea Segmentation of Remote Sensing Images

by Jiawei Wu, Zijian Liu, Zhipeng Zhu, Chunhui Song, Xinghui Wu and Haihua Xing

Remote Sens. 2025, 17(14), 2473; https://doi.org/10.3390/rs17142473 - 16 Jul 2025

Abstract

Land–sea segmentation is a fundamental task in remote sensing image analysis, and plays a vital role in dynamic coastline monitoring. The complex morphology and blurred boundaries of coastlines in remote sensing imagery make fast and accurate segmentation challenging. Recent deep learning approaches lack the ability to model spatial continuity effectively, thereby limiting a comprehensive understanding of coastline features in remote sensing imagery. To address this issue, we have developed VMMT-Net, a novel dual-branch semantic segmentation framework. By constructing a parallel heterogeneous dual-branch encoder, VMMT-Net integrates the complementary strengths of the Mix Transformer and the Visual State Space Model, enabling comprehensive modeling of local details, global semantics, and spatial continuity. We design a Cross-Branch Fusion Module to facilitate deep feature interaction and collaborative representation across branches, and implement a customized decoder module that enhances the integration of multiscale features and improves boundary refinement of coastlines. Extensive experiments conducted on two benchmark remote sensing datasets, GF-HNCD and BSD, demonstrate that the proposed VMMT-Net outperforms existing state-of-the-art methods in both quantitative metrics and visual quality. Specifically, the model achieves mean F1-scores of 98.48% (GF-HNCD) and 98.53% (BSD) and mean intersection-over-union values of 97.02% (GF-HNCD) and 97.11% (BSD). The model maintains reasonable computational complexity, with only 28.24 M parameters and 25.21 GFLOPs, striking a favorable balance between accuracy and efficiency. These results indicate the strong generalization ability and practical applicability of VMMT-Net in real-world remote sensing segmentation tasks. Full article

(This article belongs to the Special Issue Application of Remote Sensing in Coastline Monitoring)

►▼ Show Figures

Figure 1

21 pages, 41202 KiB

Open AccessArticle

Copper Stress Levels Classification in Oilseed Rape Using Deep Residual Networks and Hyperspectral False-Color Images

by Yifei Peng, Jun Sun, Zhentao Cai, Lei Shi, Xiaohong Wu, Chunxia Dai and Yubin Xie

Horticulturae 2025, 11(7), 840; https://doi.org/10.3390/horticulturae11070840 - 16 Jul 2025

Abstract

In recent years, heavy metal contamination in agricultural products has become a growing concern in the field of food safety. Copper (Cu) stress in crops not only leads to significant reductions in both yield and quality but also poses potential health risks to humans. This study proposes an efficient and precise non-destructive detection method for Cu stress in oilseed rape, which is based on hyperspectral false-color image construction using principal component analysis (PCA). By comprehensively capturing the spectral representation of oilseed rape plants, both the one-dimensional (1D) spectral sequence and spatial image data were utilized for multi-class classification. The classification performance of models based on 1D spectral sequences was compared from two perspectives: first, between machine learning and deep learning methods (best accuracy: 93.49% vs. 96.69%); and second, between shallow and deep convolutional neural networks (CNNs) (best accuracy: 95.15% vs. 96.69%). For spatial image data, deep residual networks were employed to evaluate the effectiveness of visible-light and false-color images. The RegNet architecture was chosen for its flexible parameterization and proven effectiveness in extracting multi-scale features from hyperspectral false-color images. This flexibility enabled RegNetX-6.4GF to achieve optimal performance on the dataset constructed from three types of false-color images, with the model reaching a Macro-Precision, Macro-Recall, Macro-F₁, and Accuracy of 98.17%, 98.15%, 98.15%, and 98.15%, respectively. Furthermore, Grad-CAM visualizations revealed that latent physiological changes in plants under heavy metal stress guided feature learning within CNNs, and demonstrated the effectiveness of false-color image construction in extracting discriminative features. Overall, the proposed technique can be integrated into portable hyperspectral imaging devices, enabling real-time and non-destructive detection of heavy metal stress in modern agricultural practices. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in the Processing of Horticultural Crops)

►▼ Show Figures

Figure 1

21 pages, 3937 KiB

Open AccessArticle

Wind Turbine Blade Defect Recognition Method Based on Large-Vision-Model Transfer Learning

by Xin Li, Jinghe Tian, Xinfu Pang, Li Shen, Haibo Li and Zedong Zheng

Sensors 2025, 25(14), 4414; https://doi.org/10.3390/s25144414 - 15 Jul 2025

Viewed by 55

Abstract

Timely and accurate detection of wind turbine blade surface defects is crucial for ensuring operational safety and improving maintenance efficiency with respect to large-scale wind farms. However, existing methods often suffer from poor generalization, background interference, and inadequate real-time performance. To overcome these limitations, we developed an end-to-end defect recognition framework, structured as a three-stage process: blade localization using YOLOv5, robust feature extraction via the large vision model DINOv2, and defect classification using a Stochastic Configuration Network (SCN). Unlike conventional CNN-based approaches, the use of DINOv2 significantly improves the capability for representation under complex textures. The experimental results reveal that the proposed method achieved a classification accuracy of 97.8% and an average inference time of 19.65 ms per image, satisfying real-time requirements. Compared to traditional methods, this framework provides a more scalable, accurate, and efficient solution for the intelligent inspection and maintenance of wind turbine blades. Full article

(This article belongs to the Special Issue Deep Learning for Perception and Recognition: Method and Applications)

►▼ Show Figures

Figure 1

26 pages, 26792 KiB

Open AccessArticle

Evaluating the Interferometric Performance of China’s Dual-Star SAR Satellite Constellation in Large Deformation Scenarios: A Case Study in the Jinchuan Mining Area, Gansu

by Zixuan Ge, Wenhao Wu, Jiyuan Hu, Nijiati Muhetaer, Peijie Zhu, Jie Guo, Zhihui Li, Gonghai Zhang, Yuxing Bai and Weijia Ren

Remote Sens. 2025, 17(14), 2451; https://doi.org/10.3390/rs17142451 - 15 Jul 2025

Viewed by 57

Abstract

Mining activities can trigger geological disasters, including slope instability and surface subsidence, posing a serious threat to the surrounding environment and miners’ safety. Consequently, the development of reasonable, effective, and rapid deformation monitoring methods in mining areas is essential. Traditional synthetic aperture radar(SAR) satellites are often limited by their revisiting period and image resolution, leading to unwrapping errors and decorrelation issues in the central mining area, which pose challenges in deformation monitoring in mining areas. In this study, persistent scatterer interferometric synthetic aperture radar (PS-InSAR) technology is used to monitor and analyze surface deformation of the Jinchuan mining area in Jinchang City, based on SAR images from the small satellites “Fucheng-1” and “Shenqi”, launched by the Tianyi Research Institute in Hunan Province, China. Notably, the dual-star constellation offers high-resolution SAR data with a spatial resolution of up to 3 m and a minimum revisit period of 4 days. We also assessed the stability of the dual-star interferometric capability, imaging quality, and time-series monitoring capability of the “Fucheng-1” and “Shenqi” satellites and performed a comparison with the time-series results from Sentinel-1A. The results show that the phase difference (SPD) and phase standard deviation (PSD) mean values for the “Fucheng-1” and “Shenqi” interferograms show improvements of 21.47% and 35.47%, respectively, compared to Sentinel-1A interferograms. Additionally, the processing results of the dual-satellite constellation exhibit spatial distribution characteristics highly consistent with those of Sentinel-1A, while demonstrating relatively better detail representation capabilities at certain measurement points. In the context of rapid deformation monitoring in mining areas, they show a higher revisit frequency and spatial resolution, demonstrating high practical value. Full article

20 pages, 2926 KiB

Open AccessArticle

SonarNet: Global Feature-Based Hybrid Attention Network for Side-Scan Sonar Image Segmentation

by Juan Lei, Huigang Wang, Liming Fan, Qingyue Gu, Shaowei Rong and Huaxia Zhang

Remote Sens. 2025, 17(14), 2450; https://doi.org/10.3390/rs17142450 - 15 Jul 2025

Viewed by 72

Abstract

With the rapid advancement of deep learning techniques, side-scan sonar image segmentation has become a crucial task in underwater scene understanding. However, the complex and variable underwater environment poses significant challenges for salient object detection, with traditional deep learning approaches often suffering from inadequate feature representation and the loss of global context during downsampling, thus compromising the segmentation accuracy of fine structures. To address these issues, we propose SonarNet, a Global Feature-Based Hybrid Attention Network specifically designed for side-scan sonar image segmentation. SonarNet features a dual-encoder architecture that leverages residual blocks and a self-attention mechanism to simultaneously capture both global structural and local contextual information. In addition, an adaptive hybrid attention module is introduced to effectively integrate channel and spatial features, while a global enhancement block fuses multi-scale global and spatial representations from the dual encoders, mitigating information loss throughout the network. Comprehensive experiments on a dedicated underwater sonar dataset demonstrate that SonarNet outperforms ten state-of-the-art saliency detection methods, achieving a mean absolute error as low as 2.35%. These results highlight the superior performance of SonarNet in challenging sonar image segmentation tasks. Full article

(This article belongs to the Special Issue Ocean Remote Sensing Based on Radar, Sonar and Optical Techniques (Second Edition))

►▼ Show Figures

Figure 1

23 pages, 29759 KiB

Open AccessArticle

UAV-Satellite Cross-View Image Matching Based on Adaptive Threshold-Guided Ring Partitioning Framework

by Yushi Liao, Juan Su, Decao Ma and Chao Niu

Remote Sens. 2025, 17(14), 2448; https://doi.org/10.3390/rs17142448 - 15 Jul 2025

Viewed by 121

Abstract

Cross-view image matching between UAV and satellite platforms is critical for geographic localization but remains challenging due to domain gaps caused by disparities in imaging sensors, viewpoints, and illumination conditions. To address these challenges, this paper proposes an Adaptive Threshold-guided Ring Partitioning Framework (ATRPF) for UAV–satellite cross-view image matching. Unlike conventional ring-based methods with fixed partitioning rules, ATRPF innovatively incorporates heatmap-guided adaptive thresholds and learnable hyperparameters to dynamically adjust ring-wise feature extraction regions, significantly enhancing cross-domain representation learning through context-aware adaptability. The framework synergizes three core components: brightness-aligned preprocessing to reduce illumination-induced domain shifts, hybrid loss functions to improve feature discriminability across domains, and keypoint-aware re-ranking to refine retrieval results by compensating for neural networks’ localization uncertainty. Comprehensive evaluations on the University-1652 benchmark demonstrate the framework’s superiority; it achieves 82.50% Recall@1 and 84.28% AP for UAV→Satellite geo-localization, along with 90.87% Recall@1 and 80.25% AP for Satellite→UAV navigation. These results validate the framework’s capability to bridge UAV–satellite domain gaps while maintaining robust matching precision under heterogeneous imaging conditions, providing a viable solution for practical applications such as UAV navigation in GNSS-denied environments. Full article

(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)

►▼ Show Figures

Figure 1

21 pages, 7084 KiB

Open AccessArticle

Chinese Paper-Cutting Style Transfer via Vision Transformer

by Chao Wu, Yao Ren, Yuying Zhou, Ming Lou and Qing Zhang

Entropy 2025, 27(7), 754; https://doi.org/10.3390/e27070754 - 15 Jul 2025

Viewed by 107

Abstract

Style transfer technology has seen substantial attention in image synthesis, notably in applications like oil painting, digital printing, and Chinese landscape painting. However, it is often difficult to generate migrated images that retain the essence of paper-cutting art and have strong visual appeal when trying to apply the unique style of Chinese paper-cutting art to style transfer. Therefore, this paper proposes a new method for Chinese paper-cutting style transformation based on the Transformer, aiming at realizing the efficient transformation of Chinese paper-cutting art styles. Specifically, the network consists of a frequency-domain mixture block and a multi-level feature contrastive learning module. The frequency-domain mixture block explores spatial and frequency-domain interaction information, integrates multiple attention windows along with frequency-domain features, preserves critical details, and enhances the effectiveness of style conversion. To further embody the symmetrical structures and hollowed hierarchical patterns intrinsic to Chinese paper-cutting, the multi-level feature contrastive learning module is designed based on a contrastive learning strategy. This module maximizes mutual information between multi-level transferred features and content features, improves the consistency of representations across different layers, and thus accentuates the unique symmetrical aesthetics and artistic expression of paper-cutting. Extensive experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches in both qualitative and quantitative evaluations. Additionally, we created a Chinese paper-cutting dataset that, although modest in size, represents an important initial step towards enriching existing resources. This dataset provides valuable training data and a reference benchmark for future research in this field. Full article

(This article belongs to the Section Multidisciplinary Applications)

►▼ Show Figures

Figure 1

19 pages, 3619 KiB

Open AccessArticle

An Adaptive Underwater Image Enhancement Framework Combining Structural Detail Enhancement and Unsupervised Deep Fusion

by Semih Kahveci and Erdinç Avaroğlu

Appl. Sci. 2025, 15(14), 7883; https://doi.org/10.3390/app15147883 - 15 Jul 2025

Viewed by 81

Abstract

The underwater environment severely degrades image quality by absorbing and scattering light. This causes significant challenges, including non-uniform illumination, low contrast, color distortion, and blurring. These degradations compromise the performance of critical underwater applications, including water quality monitoring, object detection, and identification. To address these issues, this study proposes a detail-oriented hybrid framework for underwater image enhancement that synergizes the strengths of traditional image processing with the powerful feature extraction capabilities of unsupervised deep learning. Our framework introduces a novel multi-scale detail enhancement unit to accentuate structural information, followed by a Latent Low-Rank Representation (LatLRR)-based simplification step. This unique combination effectively suppresses common artifacts like oversharpening, spurious edges, and noise by decomposing the image into meaningful subspaces. The principal structural features are then optimally combined with a gamma-corrected luminance channel using an unsupervised MU-Fusion network, achieving a balanced optimization of both global contrast and local details. The experimental results on the challenging Test-C60 and OceanDark datasets demonstrate that our method consistently outperforms state-of-the-art fusion-based approaches, achieving average improvements of 7.5% in UIQM, 6% in IL-NIQE, and 3% in AG. Wilcoxon signed-rank tests confirm that these performance gains are statistically significant (p < 0.01). Consequently, the proposed method significantly mitigates prevalent issues such as color aberration, detail loss, and artificial haze, which are frequently encountered in existing techniques. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

20 pages, 4820 KiB

Open AccessArticle

Sem-SLAM: Semantic-Integrated SLAM Approach for 3D Reconstruction

by Shuqi Liu, Yufeng Zhuang, Chenxu Zhang, Qifei Li and Jiayu Hou

Appl. Sci. 2025, 15(14), 7881; https://doi.org/10.3390/app15147881 - 15 Jul 2025

Viewed by 100

Abstract

Under the upsurge of research on the integration of Simultaneous Localization and Mapping (SLAM) and neural implicit representation, existing methods exhibit obvious limitations in terms of environmental semantic parsing and scene understanding capabilities. In response to this, this paper proposes a SLAM system that integrates a full attention mechanism and a multi-scale information extractor. This system constructs a more accurate 3D environmental model by fusing semantic, shape, and geometric orientation features. Meanwhile, to deeply excavate the semantic information in images, a pre-trained frozen 2D segmentation algorithm is employed to extract semantic features, providing a powerful support for 3D environmental reconstruction. Furthermore, a multi-layer perceptron and interpolation techniques are utilized to extract multi-scale features, distinguishing information at different scales. This enables the effective decoding of semantic, RGB, and Truncated Signed Distance Field (TSDF) values from the fused features, achieving high-quality information rendering. Experimental results demonstrate that this method significantly outperforms the baseline-based methods in terms of mapping and tracking accuracy on the Replica and ScanNet datasets. It also shows superior performance in semantic segmentation and real-time semantic mapping tasks, offering a new direction for the development of SLAM technology. Full article

(This article belongs to the Special Issue Applications of Data Science and Artificial Intelligence)

►▼ Show Figures

Figure 1

24 pages, 8171 KiB

Open AccessArticle

Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models

by Edgar Omar Molina Molina and Victor H. Diaz-Ramirez

Appl. Sci. 2025, 15(14), 7879; https://doi.org/10.3390/app15147879 - 15 Jul 2025

Viewed by 120

Abstract

Breast cancer is a leading cause of mortality among women worldwide. Early detection is crucial for increasing patient survival rates. Artificial intelligence, particularly convolutional neural networks (CNNs), has enabled the development of effective diagnostic systems by digitally processing mammograms. CNNs have been widely used for the classification of breast cancer in images, obtaining accurate results similar in many cases to those of medical specialists. This work presents a hybrid feature extraction approach for breast cancer detection that employs variants of EfficientNetV2 network and convenient image representation based on phase features. First, a region of interest (ROI) is extracted from the mammogram. Next, a three-channel image is created using the local phase, amplitude, and orientation features of the ROI. A feature vector is constructed for the processed mammogram using the developed CNN model. The size of the feature vector is reduced using simple statistics, achieving a redundancy suppression of

99.65 %

. The reduced feature vector is classified as either malignant or benign using a classifier ensemble. Experimental results using a training/testing ratio of 70/30 on 15,506 mammography images from three datasets produced an accuracy of

86.28 %

, a precision of

78.75 %

, a recall of

86.14 %

, and an F1-score of

80.09 %

with the modified EfficientNetV2 model and stacking classifier. However, an accuracy of

93.47 %

, a precision of

87.61 %

, a recall of

93.19 %

, and an F1-score of

90.32 %

were obtained using only CSAW-M dataset images. Full article

(This article belongs to the Special Issue Object Detection and Image Processing Based on Computer Vision)

►▼ Show Figures

Figure 1

21 pages, 3826 KiB

Open AccessArticle

UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding

by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang

Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025

Viewed by 157

Abstract

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article

(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)

►▼ Show Figures

Figure 1

13 pages, 3767 KiB

Open AccessArticle

An Analysis of Audio Information Streaming in Georg Philipp Telemann’s Sonata in C Major for Recorder and Basso Continuo, Allegro (TWV 41:C2)

by Adam Rosiński

Arts 2025, 14(4), 76; https://doi.org/10.3390/arts14040076 - 14 Jul 2025

Viewed by 151

Abstract

This paper presents an analysis of G. P. Telemann’s Sonata in C Major for Recorder and Basso Continuo (TWV 41:C2, Allegro), with the aim of investigating the occurrence of perceptual streams. The presence of perceptual streams in musical works helps to organise the sound stimuli received by the listener in a specific manner. This enables each listener to perceive the piece in an individual and distinctive manner, granting primacy to selected sounds over others. Directing the listener’s attention to particular elements of the auditory image leads to the formation of specific mental representations. This, in turn, results in distinctive interpretations of the acoustic stimuli. All of these processes are explored and illustrated in this analysis. Full article

(This article belongs to the Special Issue Sound, Space, and Creativity in Performing Arts)

►▼ Show Figures

Figure 1

19 pages, 38984 KiB

Open AccessArticle

AFNE-Net: Semantic Segmentation of Remote Sensing Images via Attention-Based Feature Fusion and Neighborhood Feature Enhancement

by Ke Li, Hao Ji, Zhijiang Li, Zeyu Cui and Chengkai Liu

Remote Sens. 2025, 17(14), 2443; https://doi.org/10.3390/rs17142443 - 14 Jul 2025

Viewed by 157

Abstract

Understanding remote sensing imagery is vital for object observation and planning. However, the acquisition of optical images is inevitably affected by shadows and occlusions, resulting in local discrepancies in object representation. To address these challenges, this paper proposes AFNE-Net, a general network architecture for remote sensing image segmentation. First, the model introduces an attention-based feature fusion module. Through the use of weighted fusion of multi-resolution features, this effectively expands the receptive field and enhances semantic associations between categories. Subsequently, a feature enhancement module based on the consistency of neighborhood semantic representation is introduced. This aims to improve the feature representation and reduce segmentation errors caused by local perturbations. Finally, evaluations are conducted on the ISPRS Potsdam, UAVid, and LoveDA datasets to verify the effectiveness of the proposed model. Full article

(This article belongs to the Section AI Remote Sensing)

►▼ Show Figures

Figure 1

22 pages, 2492 KiB

Open AccessArticle

VJDNet: A Simple Variational Joint Discrimination Network for Cross-Image Hyperspectral Anomaly Detection

by Shiqi Wu, Xiangrong Zhang, Guanchun Wang, Puhua Chen, Jing Gu, Xina Cheng and Licheng Jiao

Remote Sens. 2025, 17(14), 2438; https://doi.org/10.3390/rs17142438 - 14 Jul 2025

Viewed by 73

Abstract

To enhance the generalization of networks and avoid redundant training efforts, cross-image hyperspectral anomaly detection (HAD) based on deep learning has been gradually studied in recent years. Cross-image HAD aims to perform anomaly detection on unknown hyperspectral images after a single training process on the network, thereby improving detection efficiency in practical applications. However, the existing approaches may require additional supervised information or stacking of networks to improve model performance, which may impose high demands on data or hardware in practical applications. In this paper, a simple and lightweight unsupervised cross-image HAD method called Variational Joint Discrimination Network (VJDNet) is proposed. We leverage the reconstruction and distribution representation ability of the variational autoencoder (VAE), learning the global and local discriminability of anomalies jointly. To integrate these representations from the VAE, a probability distribution joint discrimination (PDJD) module is proposed. Through the PDJD module, the VJDNet can directly output the anomaly score mask of pixels. To further facilitate the unsupervised paradigm, a sample pair generation module is proposed, which is able to generate anomaly samples and background representation samples tailored for the cross-image HAD task. The experimental results show that the proposed method is able to maintain the detection accuracy with only a small number of parameters. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 82.

Go to page 1 2 3 4 5

Search Results (4,073)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI