Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (783)

Search Parameters:
Keywords = visual representation learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 1273 KB  
Article
Explainable Instrument Classification: From MFCC Mean-Vector Models to CNNs on MFCC and Mel-Spectrograms with t-SNE and Grad-CAM Insights
by Tommaso Senatori, Daniela Nardone, Michele Lo Giudice and Alessandro Salvini
Information 2025, 16(10), 864; https://doi.org/10.3390/info16100864 - 5 Oct 2025
Abstract
This paper presents an automatic system for the classification of musical instruments from audio recordings. The project leverages deep learning (DL) techniques to achieve its objective, exploring three different classification approaches based on distinct input representations. The first method involves the extraction of [...] Read more.
This paper presents an automatic system for the classification of musical instruments from audio recordings. The project leverages deep learning (DL) techniques to achieve its objective, exploring three different classification approaches based on distinct input representations. The first method involves the extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from the audio files, which are then fed into a two-dimensional convolutional neural network (Conv2D). The second approach makes use of mel-spectrogram images as input to a similar Conv2D architecture. The third approach employs conventional machine learning (ML) classifiers, including Logistic Regression, K-Nearest Neighbors, and Random Forest, trained on MFCC-derived feature vectors. To gain insight into the behavior of the DL model, explainability techniques were applied to the Conv2D model using mel-spectrograms, allowing for a better understanding of how the network interprets relevant features for classification. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was employed on the MFCC vectors to visualize how instrument classes are organized in the feature space. One of the main challenges encountered was the class imbalance within the dataset, which was addressed by assigning class-specific weights during training. The results, in terms of classification accuracy, were very satisfactory across all approaches, with the convolutional models and Random Forest achieving around 97–98%, and Logistic Regression yielding slightly lower performance. In conclusion, the proposed methods proved effective for the selected dataset, and future work may focus on further improving class balance techniques. Full article
(This article belongs to the Special Issue Artificial Intelligence for Acoustics and Audio Signal Processing)
22 pages, 782 KB  
Article
Hybrid CNN-Swin Transformer Model to Advance the Diagnosis of Maxillary Sinus Abnormalities on CT Images Using Explainable AI
by Mohammad Alhumaid and Ayman G. Fayoumi
Computers 2025, 14(10), 419; https://doi.org/10.3390/computers14100419 - 2 Oct 2025
Abstract
Accurate diagnosis of sinusitis is essential due to its widespread prevalence and its considerable impact on patient quality of life. While multiple imaging techniques are available for detecting maxillary sinus, computed tomography (CT) remains the preferred modality because of its high sensitivity and [...] Read more.
Accurate diagnosis of sinusitis is essential due to its widespread prevalence and its considerable impact on patient quality of life. While multiple imaging techniques are available for detecting maxillary sinus, computed tomography (CT) remains the preferred modality because of its high sensitivity and spatial resolution. Although recent advances in deep learning have led to the development of automated methods for sinusitis classification, many existing models perform poorly in the presence of complex pathological features and offer limited interpretability, which hinders their integration into clinical workflows. In this study, we propose a hybrid deep learning framework that combines EfficientNetB0, a convolutional neural network, with the Swin Transformer, a vision transformer, to improve feature representation. An attention-based fusion module is used to integrate both local and global information, thereby enhancing diagnostic accuracy. To improve transparency and support clinical adoption, the model incorporates explainable artificial intelligence (XAI) techniques using Gradient-weighted Class Activation Mapping (Grad-CAM). This allows for visualization of the regions influencing the model’s predictions, helping radiologists assess the clinical relevance of the results. We evaluate the proposed method on a curated maxillary sinus CT dataset covering four diagnostic categories: Normal, Opacified, Polyposis, and Retention Cysts. The model achieves a classification accuracy of 95.83%, with precision, recall, and F1 score all at 95%. Grad-CAM visualizations indicate that the model consistently focuses on clinically significant regions of the sinus anatomy, supporting its potential utility as a reliable diagnostic aid in medical practice. Full article
18 pages, 24741 KB  
Article
Cross-Domain Residual Learning for Shared Representation Discovery
by Baoqi Zhao, Jie Pan, Zhijie Zhang and Fang Yang
Information 2025, 16(10), 852; https://doi.org/10.3390/info16100852 - 2 Oct 2025
Abstract
In order to solve the problem of inconsistent data distribution in machine learning, domain adaptation based on feature representation methods extracts features from the source domain, and transfers them to the target domain for classification. The existing feature representation-based methods mainly solve the [...] Read more.
In order to solve the problem of inconsistent data distribution in machine learning, domain adaptation based on feature representation methods extracts features from the source domain, and transfers them to the target domain for classification. The existing feature representation-based methods mainly solve the problem of inconsistent feature distribution between the source domain data and the target domain data, but only a few methods analyze the correlation of cross-domain features between the original space and shared latent space, which reduces the performance of domain adaptation. To this end, we propose a domain adaptation method with a residual module, the main ideas of which are as follows: (1) transfer the source domain data features to the target domain data through the shared latent space to achieve features sharing; (2) build a cross-domain residual learning model using the latent feature space as the residual connection of the original feature space, which improves the propagation efficiency of features; (3) use a regular feature space to sparse feature representation, which can improve the robustness of the model; and (4) give an optimization algorithm, and the experiments on the public visual datasets (Office31, Office-Caltech, Office-Home, PIE, MNIST-UPS, COIL20) results show that our method achieved 92.7% accuracy on Office-Caltech and 83.2% on PIE and achieved the highest recognition accuracy in three datasets, which verify the effectiveness of the method. Full article
(This article belongs to the Special Issue Machine Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

29 pages, 3628 KB  
Article
From Questionnaires to Heatmaps: Visual Classification and Interpretation of Quantitative Response Data Using Convolutional Neural Networks
by Michael Woelk, Modelice Nam, Björn Häckel and Matthias Spörrle
Appl. Sci. 2025, 15(19), 10642; https://doi.org/10.3390/app151910642 - 1 Oct 2025
Abstract
Structured quantitative data, such as survey responses in human resource management research, are often analysed using machine learning methods, including logistic regression. Although these methods provide accurate statistical predictions, their results are frequently abstract and difficult for non-specialists to comprehend. This limits their [...] Read more.
Structured quantitative data, such as survey responses in human resource management research, are often analysed using machine learning methods, including logistic regression. Although these methods provide accurate statistical predictions, their results are frequently abstract and difficult for non-specialists to comprehend. This limits their usefulness in practice, particularly in contexts where eXplainable Artificial Intelligence (XAI) is essential. This study proposes a domain-independent approach for the autonomous classification and interpretation of quantitative data using visual processing. This method transforms individual responses based on rating scales into visual representations, which are subsequently processed by Convolutional Neural Networks (CNNs). In combination with Class Activation Maps (CAMs), image-based CNN models enable not only accurate and reproducible classification but also visual interpretability of the underlying decision-making process. Our evaluation found that CNN models with bar chart coding achieved an accuracy of between 93.05% and 93.16%, comparable to the 93.19% achieved by logistic regression. Compared with conventional numerical approaches, exemplified by logistic regression in this study, the approach achieves comparable classification accuracy while providing additional comprehensibility and transparency through graphical representations. Robustness is demonstrated by consistent results across different visualisations generated from the same underlying data. By converting abstract numerical information into visual explanations, this approach addresses a core challenge: bridging the gap between model performance and human understanding. Its transparency, domain-agnostic design, and straightforward interpretability make it particularly suitable for XAI-driven applications across diverse disciplines that use quantitative response data. Full article
Show Figures

Figure 1

28 pages, 32809 KB  
Article
LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery
by Boya Wang, Shuo Wang, Yibin Han, Linfeng Xu and Dong Ye
Remote Sens. 2025, 17(19), 3349; https://doi.org/10.3390/rs17193349 - 1 Oct 2025
Abstract
We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV [...] Read more.
We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV applications. LiteSAM integrates three key components to address these issues. First, efficient multi-scale feature extraction optimizes representation, reducing inference latency for edge devices. Second, a Token Aggregation–Interaction Transformer (TAIFormer) with a convolutional token mixer (CTM) models inter- and intra-image correlations, enabling robust global–local feature fusion. Third, a MinGRU-based dynamic subpixel refinement module adaptively learns spatial offsets, enhancing subpixel-level matching accuracy and cross-scenario generalization. The experiments show that LiteSAM achieves competitive performance across multiple datasets. On UAV-VisLoc, LiteSAM attains an RMSE@30 of 17.86 m, outperforming state-of-the-art semi-dense methods such as EfficientLoFTR. Its optimized variant, LiteSAM (opt., without dual softmax), delivers inference times of 61.98 ms on standard GPUs and 497.49 ms on NVIDIA Jetson AGX Orin, which are 22.9% and 19.8% faster than EfficientLoFTR (opt.), respectively. With 6.31M parameters, which is 2.4× fewer than EfficientLoFTR’s 15.05M, LiteSAM proves to be suitable for edge deployment. Extensive evaluations on natural image matching and downstream vision tasks confirm its superior accuracy and efficiency for general feature matching. Full article
38 pages, 14848 KB  
Article
Image Sand–Dust Removal Using Reinforced Multiscale Image Pair Training
by Dong-Min Son, Jun-Ru Huang and Sung-Hak Lee
Sensors 2025, 25(19), 5981; https://doi.org/10.3390/s25195981 - 26 Sep 2025
Abstract
This study proposes an image-enhancement method to address the challenges of low visibility and color distortion in images captured during yellow sandstorms for an image sensor based outdoor surveillance system. The technique combines traditional image processing with deep learning to improve image quality [...] Read more.
This study proposes an image-enhancement method to address the challenges of low visibility and color distortion in images captured during yellow sandstorms for an image sensor based outdoor surveillance system. The technique combines traditional image processing with deep learning to improve image quality while preserving color consistency during transformation. Conventional methods can partially improve color representation and reduce blurriness in sand–dust environments. However, they are limited in their ability to restore fine details and sharp object boundaries effectively. In contrast, the proposed method incorporates Retinex-based processing into the training phase, enabling enhanced clarity and sharpness in the restored images. The proposed framework comprises three main steps. First, a cycle-consistent generative adversarial network (CycleGAN) is trained with unpaired images to generate synthetically paired data. Second, CycleGAN is retrained using these generated images along with clear images obtained through multiscale image decomposition, allowing the model to transform dust-interfered images into clear ones. Finally, color preservation is achieved by selecting the A and B chrominance channels from the small-scale model to maintain the original color characteristics. The experimental results confirmed that the proposed method effectively restores image color and removes sand–dust-related interference, thereby providing enhanced visual quality under sandstorm conditions. Specifically, it outperformed algorithm-based dust removal methods such as Sand-Dust Image Enhancement (SDIE), Chromatic Variance Consistency Gamma and Correction-Based Dehazing (CVCGCBD), and Rank-One Prior (ROP+), as well as machine learning-based methods including Fusion strategy and Two-in-One Low-Visibility Enhancement Network (TOENet), achieving a Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) score of 17.238, which demonstrates improved perceptual quality, and an Local Phase Coherence-Sharpness Index (LPC-SI) value of 0.973, indicating enhanced sharpness. Both metrics showed superior performance compared to conventional methods. When applied to Closed-Circuit Television (CCTV) systems, the proposed method is expected to mitigate the adverse effects of color distortion and image blurring caused by sand–dust, thereby effectively improving visual clarity in practical surveillance applications. Full article
Show Figures

Figure 1

37 pages, 8653 KB  
Article
AI-Driven Recognition and Sustainable Preservation of Ancient Murals: The DKR-YOLO Framework
by Zixuan Guo, Sameer Kumar, Houbin Wang and Jingyi Li
Heritage 2025, 8(10), 402; https://doi.org/10.3390/heritage8100402 - 25 Sep 2025
Abstract
This paper introduces DKR-YOLO, an advanced deep learning framework designed to empower the digital preservation and sustainable management of ancient mural heritage. Building upon YOLOv8, DKR-YOLO integrates innovative components—including the DySnake Conv layer for refined feature extraction and an Adaptive Convolutional Kernel Warehouse [...] Read more.
This paper introduces DKR-YOLO, an advanced deep learning framework designed to empower the digital preservation and sustainable management of ancient mural heritage. Building upon YOLOv8, DKR-YOLO integrates innovative components—including the DySnake Conv layer for refined feature extraction and an Adaptive Convolutional Kernel Warehouse to optimize representation—addressing challenges posed by intricate details, diverse artistic styles, and mural degradation. The network’s architecture further incorporates a Residual Feature Augmentation (RFA)-enhanced FPN (RE-FPN), prioritizing the most critical visual features and enhancing interpretability. Extensive experiments on mural datasets demonstrate that DKR-YOLO achieves a 43.6% reduction in FLOPs, a 3.7% increase in accuracy, and a 5.1% improvement in mAP compared to baseline models. This performance, combined with an emphasis on robustness and interpretability, supports more inclusive and accessible applications of AI for cultural institutions, thereby fostering broader participation and equity in digital heritage preservation. Full article
(This article belongs to the Special Issue AI and the Future of Cultural Heritage)
Show Figures

Figure 1

16 pages, 7045 KB  
Article
Convolutional Neural Networks for Hole Inspection in Aerospace Systems
by Garrett Madison, Grayson Michael Griser, Gage Truelson, Cole Farris, Christopher Lee Colaw and Yildirim Hurmuzlu
Sensors 2025, 25(18), 5921; https://doi.org/10.3390/s25185921 - 22 Sep 2025
Viewed by 198
Abstract
Foreign object debris (FOd) in rivet holes, machined holes, and fastener sites poses a critical risk to aerospace manufacturing, where current inspections rely on manual visual checks with flashlights and mirrors. These methods are slow, fatiguing, and prone to error. This work introduces [...] Read more.
Foreign object debris (FOd) in rivet holes, machined holes, and fastener sites poses a critical risk to aerospace manufacturing, where current inspections rely on manual visual checks with flashlights and mirrors. These methods are slow, fatiguing, and prone to error. This work introduces HANNDI, a compact handheld inspection device that integrates controlled optics, illumination, and onboard deep learning for rapid and reliable inspection directly on the factory floor. The system performs focal sweeps, aligns and fuses the images into an all-in-focus representation, and applies a dual CNN pipeline based on the YOLO architecture: one network detects and localizes holes, while the other classifies debris. All training images were collected with the prototype, ensuring consistent geometry and lighting. On a withheld test set from a proprietary ≈3700 image dataset of aerospace assets, HANNDI achieved per-class precision and recall near 95%. An end-to-end demonstration on representative aircraft parts yielded an effective task time of 13.6 s per hole. To our knowledge, this is the first handheld automated optical inspection system that combines mechanical enforcement of imaging geometry, controlled illumination, and embedded CNN inference, providing a practical path toward robust factory floor deployment. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 1070 KB  
Article
Saliency-Guided Local Semantic Mixing for Long-Tailed Image Classification
by Jiahui Lv, Jun Lei, Jun Zhang, Chao Chen and Shuohao Li
Mach. Learn. Knowl. Extr. 2025, 7(3), 107; https://doi.org/10.3390/make7030107 - 22 Sep 2025
Viewed by 226
Abstract
In real-world visual recognition tasks, long-tailed distributions pose a widespread challenge, with extreme class imbalance severely limiting the representational learning capability of deep models. In practice, due to this imbalance, deep models often exhibit poor generalization performance on tail classes. To address this [...] Read more.
In real-world visual recognition tasks, long-tailed distributions pose a widespread challenge, with extreme class imbalance severely limiting the representational learning capability of deep models. In practice, due to this imbalance, deep models often exhibit poor generalization performance on tail classes. To address this issue, data augmentation through the synthesis of new tail-class samples has become an effective method. One popular approach is CutMix, which explicitly mixes images from tail and other classes, constructing labels based on the ratio of the regions cropped from both images. However, region-based labels completely ignore the inherent semantic information of the augmented samples. To overcome this problem, we propose a saliency-guided local semantic mixing (LSM) method, which uses differentiable block decoupling and semantic-aware local mixing techniques. This method integrates head-class backgrounds while preserving the key discriminative features of tail classes and dynamically assigns labels to effectively augment tail-class samples. This results in efficient balancing of long-tailed data distributions and significant improvements in classification performance. The experimental validation shows that this method demonstrates significant advantages across three long-tailed benchmark datasets, improving classification accuracy by 5.0%, 7.3%, and 6.1%, respectively. Notably, the LSM framework is highly compatible, seamlessly integrating with existing classification models and providing significant performance gains, validating its broad applicability. Full article
Show Figures

Figure 1

31 pages, 3788 KB  
Article
Multi-Scale Feature Convolutional Modeling for Industrial Weld Defects Detection in Battery Manufacturing
by Waqar Riaz, Xiaozhi Qi, Jiancheng (Charles) Ji and Asif Ullah
Fractal Fract. 2025, 9(9), 611; https://doi.org/10.3390/fractalfract9090611 - 21 Sep 2025
Viewed by 226
Abstract
Defect detection in lithium-ion battery (LIB) welding presents unique challenges, including scale heterogeneity, subtle texture variations, and severe class imbalance. We propose a multi-scale convolutional framework that integrates EfficientNet-B0 for lightweight representation learning, PANet for cross-scale feature aggregation, and a YOLOv8 detection head [...] Read more.
Defect detection in lithium-ion battery (LIB) welding presents unique challenges, including scale heterogeneity, subtle texture variations, and severe class imbalance. We propose a multi-scale convolutional framework that integrates EfficientNet-B0 for lightweight representation learning, PANet for cross-scale feature aggregation, and a YOLOv8 detection head augmented with multi-head attention. Parallel dilated convolutions are employed to approximate self-similar receptive fields, enabling simultaneous sensitivity to fine-grained microstructural anomalies and large-scale geometric irregularities. The approach is validated on three datasets including RIAWELC, GC10-DET, and an industrial LIB defects dataset, where it consistently outperforms competitive baselines, achieving 8–10% improvements in recall and F1-score while preserving real-time inference on GPU. Ablation experiments and statistical significance tests isolate the contributions of attention and multi-scale design, confirming their role in reducing false negatives. Attention-based visualizations further enhance interpretability by exposing spatial regions driving predictions. Limitations remain regarding fixed imaging conditions and partial reliance on synthetic augmentation, but the framework establishes a principled direction toward efficient, interpretable, and scalable defect inspection in industrial manufacturing. Full article
Show Figures

Figure 1

25 pages, 783 KB  
Systematic Review
KAVAI: A Systematic Review of the Building Blocks for Knowledge-Assisted Visual Analytics in Industrial Manufacturing
by Adrian J. Böck, Stefanie Größbacher, Jan Vrablicz, Christina Stoiber, Alexander Rind, Josef Suschnigg, Tobias Schreck, Wolfgang Aigner and Markus Wagner
Appl. Sci. 2025, 15(18), 10172; https://doi.org/10.3390/app151810172 - 18 Sep 2025
Viewed by 282
Abstract
Industry 4.0 produces large volumes of sensor and machine data, offering new possibilities for manufacturing analytics but also creating challenges in combining domain knowledge with visual analysis. We present a systematic review of 13 peer-reviewed knowledge-assisted visual analytics (KAVA) systems published between 2014 [...] Read more.
Industry 4.0 produces large volumes of sensor and machine data, offering new possibilities for manufacturing analytics but also creating challenges in combining domain knowledge with visual analysis. We present a systematic review of 13 peer-reviewed knowledge-assisted visual analytics (KAVA) systems published between 2014 and 2024, following PRISMA guidelines for the identification, screening, and inclusion processes. The survey is organized around six predefined building blocks, namely, user group, industrial domain, visualization, knowledge, data and machine learning, with a specific emphasis on the integration of knowledge and visualization in the reviewed studies. We find that ontologies, taxonomies, rule sets, and knowledge graphs provide explicit representations of expert understanding, sometimes enriched with annotations and threshold specifications. These structures are stored in RDF or graph databases, relational tables, or flat files, though interoperability is limited, and post-design contributions are not always persisted. Explicit knowledge is visualized through standard and specialized techniques, including thresholds in time-series plots, annotated dashboards, node–link diagrams, customized machine views from ontologies, and 3D digital twins with expert-defined rules. Line graphs, bar charts, and scatterplots are the most frequently used chart types, often augmented with thresholds and annotations derived from explicit knowledge. Recurring challenges include fragmented storage, heterogeneous data and knowledge types, limited automation, inconsistent validation of user input, and scarce long-term evaluations. Addressing these gaps will be essential for developing adaptable, reusable KAVA systems for industrial analytics. Full article
(This article belongs to the Section Applied Industrial Technologies)
Show Figures

Figure 1

18 pages, 6012 KB  
Article
Vision-AQ: Explainable Multi-Modal Deep Learning for Air Pollution Classification in Smart Cities
by Faisal Mehmood, Sajid Ur Rehman and Ahyoung Choi
Mathematics 2025, 13(18), 3017; https://doi.org/10.3390/math13183017 - 18 Sep 2025
Viewed by 409
Abstract
Accurate air quality prediction (AQP) is crucial for safeguarding public health and guiding smart city management. However, reliable assessment remains challenging due to complex emission patterns, meteorological variability, and chemical interactions, compounded by the limited coverage of ground-based monitoring networks. To address this [...] Read more.
Accurate air quality prediction (AQP) is crucial for safeguarding public health and guiding smart city management. However, reliable assessment remains challenging due to complex emission patterns, meteorological variability, and chemical interactions, compounded by the limited coverage of ground-based monitoring networks. To address this gap, we propose Vision-AQ (Visual Integrated Operational Network for Air Quality), a novel multi-modal deep learning framework that classifies Air Quality Index (AQI) levels by integrating environmental imagery with pollutant data. Vision-AQ employs a dual-input neural architecture: (1) a pre-trained ResNet50 convolutional neural network (CNN) that extracts high-level features from city-scale environmental photographs in India and Nepal, capturing haze, smog, and visibility patterns, and (2) a multi-layer perceptron (MLP) that processes tabular sensor data, including PM2.5, PM10, and AQI values. The fused representations are passed to a classifier to predict six AQI categories. Trained on a comprehensive dataset, the model achieves strong predictive performance with high accuracy, precision, recall and F1-score of 99%, with 23.7 million parameters. To ensure interpretability, we use Grad-CAM visualization to highlights the model’s reliance on meaningful atmospheric features, confirming its explainability. The results demonstrate that Vision-AQ is a reliable, scalable, and cost-effective approach for localized AQI classification, offering the potential to augment conventional monitoring networks and enable more granular air quality management in urban South Asia. Full article
(This article belongs to the Special Issue Explainable and Trustworthy AI Models for Data Analytics)
Show Figures

Figure 1

14 pages, 1296 KB  
Essay
Visual Quantification, Spatial Distribution and Combination Association of Tourist Attractions in Qingdao Based on Social Media Images
by Xiaomeng Ji, Simeng Zhang and Jia Liu
Land 2025, 14(9), 1900; https://doi.org/10.3390/land14091900 - 17 Sep 2025
Viewed by 261
Abstract
Focusing on the deficiencies of traditional tourism attraction survey methods in terms of accuracy, efficiency, and large-scale visual representation, this study selects Qingdao as the research case, collects tourism image data from the Weibo platform, applies a deep learning model to identify the [...] Read more.
Focusing on the deficiencies of traditional tourism attraction survey methods in terms of accuracy, efficiency, and large-scale visual representation, this study selects Qingdao as the research case, collects tourism image data from the Weibo platform, applies a deep learning model to identify the visual elements of tourism images, and employs kernel density analysis and Apriori association analysis to clarify further the distribution characteristics and associated features of tourist attractions. Its core objective is to systematically reveal the visual composition, spatial distribution patterns, and related features of tourist attractions in the case study area by identifying and extracting tourist attraction elements from images, thereby providing a decision-making basis for effectively identifying tourism demands and their spatial distribution characteristics, as well as for tourism spatial planning. The findings are as follows: Buildings, sea, and other elements are the main components of tourist attractions in Qingdao. Regarding spatial distribution, tourist attractions in Qingdao exhibit the spatial characteristic of “distributed around the bay and converging towards the sea”, with a certain circular structure and multi-level core distribution pattern. Regarding associated features, tourist attractions in Qingdao form combinations centered on buildings, sea, and signs—such as building-centric, sea-centric, cityscape-centric, and sign-centric combinations—around elements including buildings, sea, and signs. The contribution and significance of this study lie in providing technical support for resolving the contradiction between traditional tourist attraction survey methods and precise demands, offering a scientific basis for decision-making in tourism spatial layout planning, and opening up a new path for the intelligent and refined development of tourism resources using massive visual data. Full article
Show Figures

Figure 1

25 pages, 4520 KB  
Article
A Multimodal Fake News Detection Model Based on Bidirectional Semantic Enhancement and Adversarial Network Under Web3.0
by Ying Xing, Changhe Zhai, Zhanbin Che, Heng Pan, Kunyang Li, Bowei Zhang, Zhongyuan Yao and Xueming Si
Electronics 2025, 14(18), 3652; https://doi.org/10.3390/electronics14183652 - 15 Sep 2025
Viewed by 458
Abstract
Web3.0 aims to foster a trustworthy environment enabling user trust and content verifiability. However, the proliferation of fake news undermines this trust and disrupts social ecosystems, making the effective alignment of visual-textual semantics and accurate content verification a pivotal challenge. Existing methods still [...] Read more.
Web3.0 aims to foster a trustworthy environment enabling user trust and content verifiability. However, the proliferation of fake news undermines this trust and disrupts social ecosystems, making the effective alignment of visual-textual semantics and accurate content verification a pivotal challenge. Existing methods still struggle with deep cross-modal interaction and the adaptive calibration of discrepancies. To address this, we introduce the Bidirectional Semantic Enhancement and Adversarial Network (BSEAN). BSEAN first extracts features using large pre-trained models: a hybrid encoder for text and the Swin Transformer for images. It then employs a Bidirectional Modality Mapping Network, governed by cycle consistency, to achieve preliminary semantic alignment. Building on this, a Semantic Enhancement and Calibration Network explores inter-modal dependencies and quantifies semantic deviations to enhance discriminative capability. Finally, a Dual Adversarial Learning framework bolsters event generalization and representation consistency through adversarial training with event and modality discriminators. Experiments on public Weibo and Twitter datasets validate BSEAN’s superior performance across all metrics, demonstrating its efficacy in tackling the complex challenges of deep cross-modal interaction and dynamic modality calibration within Web3.0 social networks. Full article
Show Figures

Figure 1

20 pages, 55265 KB  
Article
Learning Precise Mask Representation for Siamese Visual Tracking
by Peng Yang, Fen Hu, Qinghui Wang and Lei Dou
Sensors 2025, 25(18), 5743; https://doi.org/10.3390/s25185743 - 15 Sep 2025
Viewed by 387
Abstract
Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, [...] Read more.
Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, since the bounding box frequently includes excessive background pixels, trackers are sensitive to similar distractors. To address these issues, we propose a novel segmentation-assisted model that learns binary mask representations of targets. This model is generic and can be seamlessly integrated into various Siamese frameworks, enabling pixel-wise segmentation tracking instead of the suboptimal bounding box tracking. Specifically, our model features two core components: (i) a multi-stage precise mask representation module composed of cascaded U-Net decoders, designed to predict segmentation masks of targets, and (ii) a saliency localization head based on the Euclidean model, which extracts spatial position constraints to boost the decoder’s discriminative capability. Extensive experiments on five tracking benchmarks demonstrate that our method effectively improves the performance of both anchor-based and anchor-free Siamese trackers. Notably, on GOT-10k, our method increases the AO scores of the baseline trackers SiamRPN++ (anchor-based) and SiamBAN (anchor-free) by 5.2% and 7.5%, respectively while maintaining speeds exceeding 60 FPS. Full article
(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)
Show Figures

Figure 1

Back to TopTop