MDPI - Publisher of Open Access Journals

28 pages, 3012 KB

Open AccessArticle

Context-Aware Visual Emotion Recognition Through Hierarchical Fusion of Facial Micro-Features and Scene Semantics

by Karn Yongsiriwit, Parkpoom Chaisiriprasert, Thannob Aribarg and Sokliv Kork

Appl. Sci. 2025, 15(24), 13160; https://doi.org/10.3390/app152413160 - 15 Dec 2025

Visual emotion recognition in unconstrained environments remains challenging, as single-stream deep learning models often fail to capture the localized facial cues and contextual information necessary for accurate classification. This study introduces a hierarchical multi-level feature fusion framework that systematically combines low-level micro-textural features [...] Read more.

Visual emotion recognition in unconstrained environments remains challenging, as single-stream deep learning models often fail to capture the localized facial cues and contextual information necessary for accurate classification. This study introduces a hierarchical multi-level feature fusion framework that systematically combines low-level micro-textural features (Local Binary Patterns), mid-level facial cues (Facial Action Units), and high-level scene semantics (Places365) with ResNet-50 global embeddings. Evaluated on the large-scale EmoSet-3.3M dataset, which contains 3.3 million images across eight emotion categories, the framework demonstrates marked performance gains with the best configuration (LBP-FAUs-Places365-ResNet). The proposed framework achieves 74% accuracy and a macro-averaged F1-score of 0.75 under its best configuration (LBP-FAUs-Places365-ResNet), representing a five-percentage-point improvement over the ResNet-50 baseline. The approach excels at distinguishing high-intensity emotions, maintaining efficient inference (2.2 ms per image, 29 M parameters), and analysis confirms that integrating facial muscle activations with scene context enables nuanced emotional differentiation. These results validate that hierarchical feature integration significantly advances robust, human-aligned visual emotion recognition, making it suitable for real-world Human–Computer Interaction (HCI) and affective computing applications. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

18 pages, 1070 KB

Open AccessArticle

Advancing Real-Time Polyp Detection in Colonoscopy Imaging: An Anchor-Free Deep Learning Framework with Adaptive Multi-Scale Perception

by Wanyu Qiu, Xiao Yang, Zirui Liu and Chen Qiu

Sensors 2025, 25(24), 7524; https://doi.org/10.3390/s25247524 - 11 Dec 2025

Viewed by 147

Abstract

Accurate and real-time detection of polyps in colonoscopy is a critical task for the early prevention of colorectal cancer. The primary difficulties include insufficient extraction of multi-scale contextual cues for polyps of different sizes, inefficient fusion of multi-level features, and a reliance on [...] Read more.

Accurate and real-time detection of polyps in colonoscopy is a critical task for the early prevention of colorectal cancer. The primary difficulties include insufficient extraction of multi-scale contextual cues for polyps of different sizes, inefficient fusion of multi-level features, and a reliance on hand-crafted anchor priors that require extensive tuning and compromise generalization performance. Therefore, we introduce a one-stage anchor-free detector that achieves state-of-the-art accuracy whilst running in real-time on a GTX 1080-Ti GPU workstation. Specifically, to enrich contextual information across a wide spectrum, our Cross-Stage Pyramid Pooling module efficiently aggregates multi-scale contexts through cascaded pooling and cross-stage partial connections. Subsequently, to achieve a robust equilibrium between low-level spatial details and high-level semantics, our Weighted Bidirectional Feature Pyramid Network adaptively integrates features across all scales using learnable channel-wise weights. Furthermore, by reconceptualizing detection as a direct point-to-boundary regression task, our anchor-free head obviates the dependency on hand-tuned priors. This regression is supervised by a Scale-invariant Distance with Aspect-ratio IoU loss, substantially improving localization accuracy for polyps of diverse morphologies. Comprehensive experiments on a large dataset comprising 103,469 colonoscopy frames substantiate the superiority of our method, achieving 98.8% mAP@0.5 and 82.5% mAP@0.5:0.95 at 35.8 FPS. Our method outperforms widely used CNN-based models (e.g., EfficientDet, YOLO series) and recent Transformer-based competitors (e.g., Adamixer, HDETR), demonstrating its potential for clinical application. Full article

(This article belongs to the Special Issue Advanced Biomedical Imaging and Signal Processing)

► Show Figures

Figure 1

28 pages, 20548 KB

Open AccessArticle

KGGCN: A Unified Knowledge Graph-Enhanced Graph Convolutional Network Framework for Chinese Named Entity Recognition

by Xin Chen, Liang He, Weiwei Hu and Sheng Yi

AI 2025, 6(11), 290; https://doi.org/10.3390/ai6110290 - 13 Nov 2025

Viewed by 578

Abstract

Recent advances in Chinese Named Entity Recognition (CNER) have integrated lexical features and factual knowledge into pretrained language models. However, existing lexicon-based methods often inject knowledge as restricted, isolated token-level information, lacking rich semantic and structural context. Knowledge graphs (KGs), comprising relational triples, [...] Read more.

Recent advances in Chinese Named Entity Recognition (CNER) have integrated lexical features and factual knowledge into pretrained language models. However, existing lexicon-based methods often inject knowledge as restricted, isolated token-level information, lacking rich semantic and structural context. Knowledge graphs (KGs), comprising relational triples, offer explicit relational semantics and reasoning capabilities, while Graph Convolutional Networks (GCNs) effectively capture complex sentence structures. We propose KGGCN, a unified KG-enhanced GCN framework for CNER. KGGCN introduces external factual knowledge without disrupting the original word order, employing a novel end-append serialization scheme and a visibility matrix to control interaction scope. The model further utilizes a two-phase GCN stack, combining a standard GCN for robust aggregation with a multi-head attention GCN for adaptive structural refinement, to capture multi-level structural information. Experiments on four Chinese benchmark datasets demonstrate KGGCN’s superior performance. It achieves the highest F1-scores on MSRA (95.96%) and Weibo (71.98%), surpassing previous bests by 0.26 and 1.18 percentage points, respectively. Additionally, KGGCN obtains the highest Recall on OntoNotes (84.28%) and MSRA (96.14%), and the highest Precision on MSRA (95.82%), Resume (96.40%), and Weibo (72.14%). These results highlight KGGCN’s effectiveness in leveraging structured knowledge and multi-phase graph modeling to enhance entity recognition accuracy and coverage across diverse Chinese texts. Full article

► Show Figures

Figure 1

22 pages, 38803 KB

Open AccessArticle

VG-SAM: Visual In-Context Guided SAM for Universal Medical Image Segmentation

by Gang Dai, Qingfeng Wang, Yutao Qin, Gang Wei and Shuangping Huang

Fractal Fract. 2025, 9(11), 722; https://doi.org/10.3390/fractalfract9110722 - 8 Nov 2025

Viewed by 1020

Abstract

Medical image segmentation, driven by the intrinsic fractal characteristics of biological patterns, plays a crucial role in medical image analysis. Recently, universal image segmentation, which aims to build models that generalize robustly to unseen anatomical structures and imaging modalities, has emerged as a [...] Read more.

Medical image segmentation, driven by the intrinsic fractal characteristics of biological patterns, plays a crucial role in medical image analysis. Recently, universal image segmentation, which aims to build models that generalize robustly to unseen anatomical structures and imaging modalities, has emerged as a promising research direction. To achieve this, previous solutions typically follow the in-context learning (ICL) framework, leveraging segmentation priors from a few labeled in-context references to improve prediction performance on out-of-distribution samples. However, these ICL-based methods often overlook the quality of the in-context set and struggle with capturing intricate anatomical details, thus limiting their segmentation accuracy. To address these issues, we propose VG-SAM, which employs a multi-scale in-context retrieval phase and a visual in-context guided segmentation phase. Specifically, inspired by the hierarchical and self-similar properties in fractal structures, we introduce a multi-level feature similarity strategy to select in-context samples that closely match the query image, thereby ensuring the quality of the in-context samples. In the segmentation phase, we propose to generate multi-granularity visual prompts based on the high-quality priors from the selected in-context set. Following this, these visual prompts, along with the semantic guidance signal derived from the in-context set, are seamlessly integrated into an adaptive fusion module, which effectively guides the Segment Anything Model (SAM) with powerful segmentation capabilities to achieve accurate predictions on out-of-distribution query images. Extensive experiments across multiple datasets demonstrate the effectiveness and superiority of our VG-SAM over the state-of-the-art (SOTA) methods. Notably, under the challenging one-shot reference setting, our VG-SAM surpasses SOTA methods by an average of

6.61 %

in DSC across all datasets. Full article

(This article belongs to the Special Issue Fractional and Fractal Methods in Biomedical Imaging and Time Series Learning)

► Show Figures

Figure 1

22 pages, 1208 KB

Open AccessArticle

Geo-MRC: Dynamic Boundary Inference in Machine Reading Comprehension for Nested Geographic Named Entity Recognition

by Yuting Zhang, Jingzhong Li, Pengpeng Li, Tao Liu, Ping Du and Xuan Hao

ISPRS Int. J. Geo-Inf. 2025, 14(11), 431; https://doi.org/10.3390/ijgi14110431 - 2 Nov 2025

Viewed by 583

Abstract

Geographic Named Entity Recognition (Geo-NER) is a crucial task for extracting geography-related entities from unstructured text, and it plays an essential role in geographic information extraction and spatial semantic understanding. Traditional approaches typically treat Geo-NER as a sequence labeling problem, where each token [...] Read more.

Geographic Named Entity Recognition (Geo-NER) is a crucial task for extracting geography-related entities from unstructured text, and it plays an essential role in geographic information extraction and spatial semantic understanding. Traditional approaches typically treat Geo-NER as a sequence labeling problem, where each token is assigned a single label. However, this formulation struggles to handle nested entities effectively. To overcome this limitation, we propose Geo-MRC, an improved model based on a Machine Reading Comprehension (MRC) framework that reformulates Geo-NER as a question-answering task. The model identifies entities by predicting their start positions, end positions, and lengths, enabling precise detection of overlapping and nested entities. Specifically, it constructs a unified input sequence by concatenating a type-specific question (e.g., “What are the location names in the text?”) with the context. This sequence is encoded using BERT, followed by feature extraction and fusion through Gated Recurrent Units (GRU) and multi-scale 1D convolutions, which improve the model’s sensitivity to both multi-level semantics and local contextual information. Finally, a feed-forward neural network (FFN) predicts whether each token corresponds to the start or end of an entity and estimates the span length, allowing for dynamic inference of entity boundaries. Experimental results on multiple public datasets demonstrate that Geo-MRC consistently outperforms strong baselines, with particularly significant gains on datasets containing nested entities. Full article

► Show Figures

Figure 1

18 pages, 13712 KB

Open AccessArticle

Integrating Multiple Semantics of Street View Imagery for Semi-Supervised Building Function Identification

by Fang Fang, Nan Min, Shengwen Li, Yuxiang Zhao, Sishi Gong, Yu Wang and Shunping Zhou

ISPRS Int. J. Geo-Inf. 2025, 14(11), 423; https://doi.org/10.3390/ijgi14110423 - 29 Oct 2025

Viewed by 433

Abstract

Building function identification plays a crucial role in providing basic data for urban planning, management, and various intelligent applications. Today, building function identification methods using Street View Images (SVIs) have made significant progress. However, these methods use the visual features of SVIs to [...] Read more.

Building function identification plays a crucial role in providing basic data for urban planning, management, and various intelligent applications. Today, building function identification methods using Street View Images (SVIs) have made significant progress. However, these methods use the visual features of SVIs to infer building functions, which ignores the contributions of the multiple potential semantics of SVIs, resulting in suboptimal identification accuracy. To address this issue, this study proposes a multi-semantic semi-supervised building function identification (MS-SS-BFI) method, which integrates multi-level visual semantics and spatial contextual semantics to improve building function identification from SVIs. Specifically, a location mapping module was designed to align SVIs with buildings. Additionally, a multi-level visual semantic extraction module was developed to integrate the visual semantics and visual-textual semantics of SVIs. In addition, a semi-supervised spatial interaction module was designed to characterize the spatial context of buildings. Extensive experiments on the Brooklyn dataset show that the proposed method achieves 7.98% improvement in F1-score over the state-of-the-art baseline, demonstrating superior performance and robustness. This work explores a novel approach to building function identification and provides a methodological reference for various SVI-based applications. Full article

► Show Figures

Figure 1

27 pages, 1378 KB

Open AccessArticle

Automated Taxonomy Construction Using Large Language Models: A Comparative Study of Fine-Tuning and Prompt Engineering

by Binh Vu, Rashmi Govindraju Naik, Bao Khanh Nguyen, Sina Mehraeen and Matthias Hemmje

Eng 2025, 6(11), 283; https://doi.org/10.3390/eng6110283 - 22 Oct 2025

Viewed by 1407

Abstract

Taxonomies provide essential hierarchical structures for classifying information, enabling effective retrieval and knowledge organization in diverse domains such as e-commerce, academic research, and web search. Traditional taxonomy construction, heavily reliant on manual curation by domain experts, faces significant challenges in scalability, cost, and [...] Read more.

Taxonomies provide essential hierarchical structures for classifying information, enabling effective retrieval and knowledge organization in diverse domains such as e-commerce, academic research, and web search. Traditional taxonomy construction, heavily reliant on manual curation by domain experts, faces significant challenges in scalability, cost, and consistency when dealing with the exponential growth of digital data. Recent advancements in Large Language Models (LLMs) and Natural Language Processing (NLP) present powerful opportunities for automating this complex process. This paper explores the potential of LLMs for automated taxonomy generation, focusing on methodologies incorporating semantic embedding generation, keyword extraction, and machine learning clustering algorithms. We specifically investigate and conduct a comparative analysis of two primary LLM-based approaches using a dataset of eBay product descriptions. The first approach involves fine-tuning a pre-trained LLM using structured hierarchical data derived from chain-of-layer clustering outputs. The second employs prompt-engineering techniques to guide LLMs in generating context-aware hierarchical taxonomies based on clustered keywords without explicit model retraining. Both methodologies are evaluated for their efficacy in constructing organized multi-level hierarchical taxonomies. Evaluation using semantic similarity metrics (BERTScore and Cosine Similarity) against a ground truth reveals that the fine-tuning approach yields higher overall accuracy and consistency (BERTScore F1: 70.91%; Cosine Similarity: 66.40%) compared to the prompt-engineering approach (BERTScore F1: 61.66%; Cosine Similarity: 60.34%). We delve into the inherent trade-offs between these methods concerning semantic fidelity, computational resource requirements, result stability, and scalability. Finally, we outline potential directions for future research aimed at refining LLM-based taxonomy construction systems to handle large dynamic datasets with enhanced accuracy, robustness, and granularity. Full article

► Show Figures

Figure 1

23 pages, 548 KB

Open AccessArticle

Symmetry- and Asymmetry-Aware Dual-Path Retrieval and In-Context Learning-Based LLM for Equipment Relation Extraction

by Mingfei Tang, Liang Zhang, Zhipeng Yu, Xiaolong Shi and Xiulei Liu

Symmetry 2025, 17(10), 1647; https://doi.org/10.3390/sym17101647 - 4 Oct 2025

Viewed by 619

Abstract

Relation extraction in the equipment domain often exhibits asymmetric patterns, where entities participate in multiple overlapping relations that break the expected structural symmetry of semantic associations. Such asymmetry increases task complexity and reduces extraction accuracy in conventional approaches. To address this issue, we [...] Read more.

Relation extraction in the equipment domain often exhibits asymmetric patterns, where entities participate in multiple overlapping relations that break the expected structural symmetry of semantic associations. Such asymmetry increases task complexity and reduces extraction accuracy in conventional approaches. To address this issue, we propose a symmetry- and asymmetry-aware dual-path retrieval and in-context learning-based large language model. Specifically, the BGE-M3 embedding model is fine-tuned for domain-specific adaptation, and a multi-level retrieval database is constructed to capture both global semantic symmetry at the sentence level and local asymmetric interactions at the relation level. A dual-path retrieval strategy, combined with Reciprocal Rank Fusion, integrates these complementary perspectives, while task-specific prompt templates further enhance extraction accuracy. Experimental results demonstrate that our method not only mitigates the challenges posed by overlapping and asymmetric relations but also leverages the latent symmetry of semantic structures to improve performance. Experimental results show that our approach effectively mitigates challenges from overlapping and asymmetric relations while exploiting latent semantic symmetry, achieving an F1-score of 88.53%, a 1.86% improvement over the strongest baseline (GPT-RE). Full article

(This article belongs to the Special Issue Symmetry and Its Applications in Computer Vision)

► Show Figures

Figure 1

27 pages, 2645 KB

Open AccessArticle

Short-Text Sentiment Classification Model Based on BERT and Dual-Stream Transformer Gated Attention Mechanism

by Song Yang, Jiayao Xing, Zhaoxia Liu and Yunhao Sun

Electronics 2025, 14(19), 3904; https://doi.org/10.3390/electronics14193904 - 30 Sep 2025

Viewed by 1241

Abstract

With the rapid development of social media, short-text data have become increasingly important in fields such as public opinion monitoring, user feedback analysis, and intelligent recommendation systems. However, existing short-text sentiment analysis models often suffer from limited cross-domain adaptability and poor generalization performance. [...] Read more.

With the rapid development of social media, short-text data have become increasingly important in fields such as public opinion monitoring, user feedback analysis, and intelligent recommendation systems. However, existing short-text sentiment analysis models often suffer from limited cross-domain adaptability and poor generalization performance. To address these challenges, this study proposes a novel short-text sentiment classification model based on the Bidirectional Encoder Representations from Transformers (BERTs) and a dual-stream Transformer gated attention mechanism. This model first employs Bidirectional Encoder Representations from Transformers (BERTs) and the Chinese Robustly Optimized BERT Pretraining Approach (Chinese-RoBERTa) to achieve data augmentation and multilevel semantic mining, thereby expanding the training corpus and enhancing minority class coverage. Second, a dual-stream Transformer gated attention mechanism was developed to dynamically adjust feature fusion weights, enhancing adaptability to heterogeneous texts. Finally, the model integrates a Bidirectional Gated Recurrent Unit (BiGRU) with Multi-Head Self-Attention (MHSA) to strengthen sequence information modeling and global context capture, enabling the precise identification of key sentiment dependencies. The model’s superior performance in handling data imbalance and complex textual sentiment logic scenarios is demonstrated by the experimental results, achieving significant improvements in accuracy and F1 score. The F1 score reached 92.4%, representing an average increase of 8.7% over the baseline models. This provides an effective solution for enhancing the performance and expanding the application scenarios of short-text sentiment analysis models. Full article

(This article belongs to the Special Issue Deep Generative Models and Recommender Systems)

► Show Figures

Figure 1

17 pages, 3106 KB

Open AccessArticle

Weakly Supervised Gland Segmentation Based on Hierarchical Attention Fusion and Pixel Affinity Learning

by Yanli Liu, Mengchen Lin, Xiaoqian Sang, Guidong Bao and Yunfeng Wu

Bioengineering 2025, 12(9), 992; https://doi.org/10.3390/bioengineering12090992 - 18 Sep 2025

Viewed by 654

Abstract

Precise segmentation of glands in histopathological images is essential for the diagnosis of colorectal cancer, as the changes in gland morphology are associated with pathological progression. Conventional computer-assisted methods rely on dense pixel-level annotations, which are costly and labor-intensive to obtain. The present [...] Read more.

Precise segmentation of glands in histopathological images is essential for the diagnosis of colorectal cancer, as the changes in gland morphology are associated with pathological progression. Conventional computer-assisted methods rely on dense pixel-level annotations, which are costly and labor-intensive to obtain. The present study proposes a two-stage weakly supervised segmentation framework named Multi-Level Attention and Affinity (MAA). The MAA framework utilizes the image-level labels and combines the Multi-Level Attention Fusion (MAF) and Affinity Refinement (AR) modules. The MAF module extracts the hierarchical features from multiple transformer layers to grasp global semantic context, and generates more comprehensive initial class activation maps. By modeling inter-pixel semantic consistency, the AR module refines pseudo-labels, which can sharpen the boundary delineation and reduce label noise. The experiments on the GlaS dataset showed that the proposed MAA framework achieves the Intersection over Union (IoU) of 81.99% and Dice coefficient of 90.10%, which outperformed the state-of-the-art Online Easy Example Mining (OEEM) method with an improvement of 4.43% in IoU. Such experimental results demonstrated the effectiveness of integrating hierarchical attention mechanisms with affinity-guided refinement for annotation-efficient and robust gland segmentation. Full article

(This article belongs to the Special Issue Recent Progress in Biomedical Image Processing and Analysis)

► Show Figures

Figure 1

20 pages, 4568 KB

Open AccessArticle

Dual-Branch Transformer–CNN Fusion for Enhanced Cloud Segmentation in Remote Sensing Imagery

by Shengyi Cheng, Hangfei Guo, Hailei Wu and Xianjun Du

Appl. Sci. 2025, 15(18), 9870; https://doi.org/10.3390/app15189870 - 9 Sep 2025

Viewed by 866

Abstract

Cloud coverage and obstruction significantly affect the usability of remote sensing images, making cloud detection a key prerequisite for optical remote sensing applications. In existing cloud detection methods, using U-shaped convolutional networks alone has limitations in modeling long-range contexts, while Vision Transformers fall [...] Read more.

Cloud coverage and obstruction significantly affect the usability of remote sensing images, making cloud detection a key prerequisite for optical remote sensing applications. In existing cloud detection methods, using U-shaped convolutional networks alone has limitations in modeling long-range contexts, while Vision Transformers fall short in capturing local spatial features. To address these issues, this study proposes a dual-branch framework, TransCNet, which combines Transformer and CNN architectures to enhance the accuracy and effectiveness of cloud detection. TransCNet addresses this by designing dual encoder branches: a Transformer branch capturing global dependencies and a CNN branch extracting local details. A novel feature aggregation module enables the complementary fusion of multi-level features from both branches at each encoder stage, enhanced by channel attention mechanisms. To mitigate feature dilution during decoding, aggregated features compensate for information loss from sampling operations. Evaluations on 38-Cloud, SPARCS, and a high-resolution Landsat-8 dataset demonstrate TransCNet’s competitive performance across metrics, effectively balancing global semantic understanding and local edge preservation for clearer cloud boundary detection. The approach resolves key limitations in existing cloud detection frameworks through synergistic multi-branch feature integration. Full article

► Show Figures

Figure 1

27 pages, 1157 KB

Open AccessArticle

An Ultra-Lightweight and High-Precision Underwater Object Detection Algorithm for SAS Images

by Deyin Xu, Yisong He, Jiahui Su, Lu Qiu, Lixiong Lin, Jiachun Zheng and Zhiping Xu

Remote Sens. 2025, 17(17), 3027; https://doi.org/10.3390/rs17173027 - 1 Sep 2025

Viewed by 1346

Abstract

Underwater Object Detection (UOD) based on Synthetic Aperture Sonar (SAS) images is one of the core tasks of underwater intelligent perception systems. However, the existing UOD methods suffer from excessive model redundancy, high computational demands, and severe image quality degradation due to noise. [...] Read more.

Underwater Object Detection (UOD) based on Synthetic Aperture Sonar (SAS) images is one of the core tasks of underwater intelligent perception systems. However, the existing UOD methods suffer from excessive model redundancy, high computational demands, and severe image quality degradation due to noise. To mitigate these issues, this paper proposes an ultra-lightweight and high-precision underwater object detection method for SAS images. Based on a single-stage detection framework, four efficient and representative lightweight modules are developed, focusing on three key stages: feature extraction, feature fusion, and feature enhancement. For feature extraction, the Dilated-Attention Aggregation Feature Module (DAAFM) is introduced, which leverages a multi-scale Dilated Attention mechanism for strengthening the model’s capability to perceive key information, thereby improving the expressiveness and spatial coverage of extracted features. For feature fusion, the Channel–Spatial Parallel Attention with Gated Enhancement (CSPA-Gate) module is proposed, which integrates channel–spatial parallel modeling and gated enhancement to achieve effective fusion of multi-level semantic features and dynamic response to salient regions. In terms of feature enhancement, the Spatial Gated Channel Attention Module (SGCAM) is introduced to strengthen the model’s ability to discriminate the importance of feature channels through spatial gating, thereby improving robustness to complex background interference. Furthermore, the Context-Aware Feature Enhancement Module (CAFEM) is designed to guide feature learning using contextual structural information, enhancing semantic consistency and feature stability from a global perspective. To alleviate the challenge of limited sample size of real sonar images, a diffusion generative model is employed to synthesize a set of pseudo-sonar images, which are then combined with the real sonar dataset to construct an augmented training set. A two-stage training strategy is proposed: the model is first trained on the real dataset and then fine-tuned on the synthetic dataset to enhance generalization and improve detection robustness. The SCTD dataset results confirm that the proposed technique achieves better precision than the baseline model with only 10% of its parameter size. Notably, on a hybrid dataset, the proposed method surpasses Faster R-CNN by 10.3% in mAP50 while using only 9% of its parameters. Full article

(This article belongs to the Special Issue Underwater Remote Sensing: Status, New Challenges and Opportunities)

► Show Figures

Figure 1

30 pages, 25011 KB

Open AccessArticle

Multi-Level Contextual and Semantic Information Aggregation Network for Small Object Detection in UAV Aerial Images

by Zhe Liu, Guiqing He and Yang Hu

Drones 2025, 9(9), 610; https://doi.org/10.3390/drones9090610 - 29 Aug 2025

Viewed by 939

Abstract

In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images [...] Read more.

In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images are primarily twofold: (1) Insufficient feature representation: The limited visual information for small objects makes it difficult for models to learn discriminative feature representations. (2) Background confusion: Abundant background information introduces more noise and interference, causing the features of small objects to easily be confused with the background. To address these issues, we propose a Multi-Level Contextual and Semantic Information Aggregation Network (MCSA-Net). MCSA-Net includes three key components: a Spatial-Aware Feature Selection Module (SAFM), a Multi-Level Joint Feature Pyramid Network (MJFPN), and an Attention-Enhanced Head (AEHead). The SAFM employs a sequence of dilated convolutions to extract multi-scale local context features and combines a spatial selection mechanism to adaptively merge these features, thereby obtaining the critical local context required for the objects, which enriches the feature representation of small objects. The MJFPN introduces multi-level connections and weighted fusion to fully leverage the spatial detail features of small objects in feature fusion and enhances the fused features further through a feature aggregation network. Finally, the AEHead is constructed by incorporating a sparse attention mechanism into the detection head. The sparse attention mechanism efficiently models long-range dependencies by computing the attention between the most relevant regions in the image while suppressing background interference, thereby enhancing the model’s ability to perceive targets and effectively improving the detection performance. Extensive experiments on four datasets, VisDrone, UAVDT, MS COCO, and DOTA, demonstrate that the proposed MCSA-Net achieves an excellent detection performance, particularly in small object detection, surpassing several state-of-the-art methods. Full article

(This article belongs to the Special Issue Intelligent Image Processing and Sensing for Drones, 2nd Edition)

► Show Figures

Figure 1

27 pages, 8196 KB

Open AccessArticle

Enhancing Electric Vehicle Charging Infrastructure Planning with Pre-Trained Language Models and Spatial Analysis: Insights from Beijing User Reviews

by Yanxin Hou, Peipei Wang, Zhuozhuang Yao, Xinqi Zheng and Ziying Chen

ISPRS Int. J. Geo-Inf. 2025, 14(9), 325; https://doi.org/10.3390/ijgi14090325 - 24 Aug 2025

Cited by 1 | Viewed by 1280

Abstract

With the growing adoption of electric vehicles, optimizing the user experience of charging infrastructure has become critical. However, extracting actionable insights from the vast number of user reviews remains a significant challenge, impeding demand-driven operational planning for charging stations and degrading the user [...] Read more.

With the growing adoption of electric vehicles, optimizing the user experience of charging infrastructure has become critical. However, extracting actionable insights from the vast number of user reviews remains a significant challenge, impeding demand-driven operational planning for charging stations and degrading the user experience. This study leverages three pre-trained language models to perform sentiment classification and multi-level topic identification on 168,129 user reviews from Beijing, facilitating a comprehensive understanding of user feedback. The experimental results reveal significant task-model specialization: RoBERTa-WWM excels in sentiment analysis (accuracy = 0.917) and fine-grained topic identification (Micro-F1 = 0.844), making it ideal for deep semantic extraction. Conversely, ELECTRA, after sufficient training, demonstrates a strong aptitude for coarse-grained topic summarization, highlighting its strength in high-level semantic generalization. Notably, the models offer capabilities beyond simple classification, including autonomous label normalization and the extraction of valuable information from comments with low information density. Furthermore, integrating textual and spatial analyses revealed striking patterns. We identified an urban–rural emotional gap—suburban users are more satisfied despite fewer facilities—and used geographically weighted regression (GWR) to quantify the spatial differences in the factors affecting user satisfaction in Beijing’s districts. We identified three types of areas requiring differentiated strategies, as follows: the northwestern region is highly sensitive to equipment quality, the central urban area has a complex relationship between supporting facilities and satisfaction, and the emerging adoption area is more sensitive to accessibility and price factors. These findings offer a data-driven framework for charging infrastructure planning, enabling operators to base decisions on real-world user feedback and tailor solutions to specific local contexts. Full article

► Show Figures

Figure 1

20 pages, 2833 KB

Open AccessArticle

A Multi-Level Annotation Model for Fake News Detection: Implementing Kazakh-Russian Corpus via Label Studio

by Madina Sambetbayeva, Anargul Nekessova, Aigerim Yerimbetova, Abdygalym Bayangali, Mira Kaldarova, Duman Telman and Nurzhigit Smailov

Big Data Cogn. Comput. 2025, 9(8), 215; https://doi.org/10.3390/bdcc9080215 - 20 Aug 2025

Cited by 1 | Viewed by 2056

Abstract

This paper presents a multi-level annotation model for detecting fake news in Kazakh and Russian languages, aiming to enhance understanding of disinformation strategies in multilingual digital media environments. Unlike traditional binary models, our approach captures the complexity of disinformation by accounting for both [...] Read more.

This paper presents a multi-level annotation model for detecting fake news in Kazakh and Russian languages, aiming to enhance understanding of disinformation strategies in multilingual digital media environments. Unlike traditional binary models, our approach captures the complexity of disinformation by accounting for both linguistic and cultural factors. To support this, a corpus of over 5000 news texts was manually annotated using the Label Studio platform. The annotation scheme consists of seven interrelated categories: CLAIM, SOURCE, EVIDENCE, DISINFORMATION_TECHNIQUE, AUTHOR_INTENT, TARGET_AUDIENCE, and TIMESTAMP. Inter-annotator agreement, evaluated using Cohen’s Kappa, ranged from 0.72 to 0.81, indicating substantial consistency. The annotated data reveals recurring patterns of disinformation, such as emotional manipulation, targeting of vulnerable individuals, and the strategic concealment of intent. Semantic relations between entities, such as CLAIM → EVIDENCE and CLAIM → AUTHOR_INTENT were formalized to represent disinformation narratives as knowledge graphs. This study contributes the first linguistically and culturally adapted annotation model for Kazakh and Russian languages, providing a robust and empirical resource for building interpretable and context-aware fake news detection systems. The resulting annotated corpus and its semantic structure offer valuable empirical material for further research in natural language processing, computational linguistics, and media studies in low-resource language environments. Full article

► Show Figures

Figure 1

Search Results (68)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (68)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI