MDPI - Publisher of Open Access Journals

31 pages, 1563 KB

Open AccessArticle

Artificial Intelligence-Assisted Determination of Suitable Age Values for Children’s Books

by Feyza Nur Kılıçaslan, Burkay Genç, Fatih Saglam and Arif Altun

Appl. Sci. 2025, 15(21), 11438; https://doi.org/10.3390/app152111438 (registering DOI) - 26 Oct 2025

Identifying age-appropriate books for children is a complex task that requires balancing linguistic, cognitive, and thematic factors. This study introduces an artificial intelligence–supported framework to estimate the Suitable Age Value (SAV) of Turkish children’s books targeting the 2–18-year age range. We employ repeated, [...] Read more.

Identifying age-appropriate books for children is a complex task that requires balancing linguistic, cognitive, and thematic factors. This study introduces an artificial intelligence–supported framework to estimate the Suitable Age Value (SAV) of Turkish children’s books targeting the 2–18-year age range. We employ repeated, stratified

5 \times 5

cross-validation and report out-of-fold (OOF) metrics with 95% confidence intervals for a dataset of 300 Turkish children’s books. As classical baselines, linear/ElasticNet, SVR, Random Forest (RF), and XGBoost are trained on the engineered features; we also evaluate a rule-based Ateşman readability baseline. For text, we use a frozen dbmdz/bert-base-turkish-uncased encoder inside two hybrid variants, Concat and Attention-gated, with fold-internal PCA and metadata selection; augmentation is applied only to the training folds. Finally, we probe a few-shot LLM pipeline (GPT-4o-mini) and a convex blend of RF and LLM predictions. A few-shot LLM markedly outperforms the zero-shot model, and zero-shot performance is unreliable. Among hybrids, Concat performs better than Attention-gated, yet both trail our best classical baseline. A convex RF + LLM blend, learned via bootstrap out-of-bag sampling, achieves a lower RMSE/MAE than either component and a slightly higher QWK. The Ateşman baseline performance is substantially weaker. Overall, the findings were as follows: feature-based RF remains a strong baseline, few-shot LLMs add semantic cues, blending consistently helps, and simple hybrid concatenation beats a lightweight attention gate under our small-N regime. Full article

(This article belongs to the Special Issue Machine Learning-Based Feature Extraction and Selection: 2nd Edition)

► Show Figures

Figure 1

22 pages, 1512 KB

Open AccessArticle

A Data-Driven Multi-Granularity Attention Framework for Sentiment Recognition in News and User Reviews

by Wenjie Hong, Shaozu Ling, Siyuan Zhang, Yinke Huang, Yiyan Wang, Zhengyang Li, Xiangjun Dong and Yan Zhan

Appl. Sci. 2025, 15(21), 11424; https://doi.org/10.3390/app152111424 (registering DOI) - 25 Oct 2025

Viewed by 132

Abstract

Sentiment analysis plays a crucial role in domains such as financial news, user reviews, and public opinion monitoring, yet existing approaches face challenges when dealing with long and domain-specific texts due to semantic dilution, insufficient context modeling, and dispersed emotional signals. To address [...] Read more.

Sentiment analysis plays a crucial role in domains such as financial news, user reviews, and public opinion monitoring, yet existing approaches face challenges when dealing with long and domain-specific texts due to semantic dilution, insufficient context modeling, and dispersed emotional signals. To address these issues, a multi-granularity attention-based sentiment analysis model built on a transformer backbone is proposed. The framework integrates sentence-level and document-level hierarchical modeling, a different-dimensional embedding strategy, and a cross-granularity contrastive fusion mechanism, thereby achieving unified representation and dynamic alignment of local and global emotional features. Static word embeddings combined with dynamic contextual embeddings enhance both semantic stability and context sensitivity, while the cross-granularity fusion module alleviates sparsity and dispersion of emotional cues in long texts, improving robustness and discriminability. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of the proposed model. On the Financial Forum Reviews dataset, it achieves an accuracy of 0.932, precision of 0.928, recall of 0.925, F1-score of 0.926, and AUC of 0.951, surpassing state-of-the-art baselines such as BERT and RoBERTa. On the Financial Product User Reviews dataset, the model obtains an accuracy of 0.902, precision of 0.898, recall of 0.894, and AUC of 0.921, showing significant improvements for short-text sentiment tasks. On the Financial News dataset, it achieves an accuracy of 0.874, precision of 0.869, recall of 0.864, and AUC of 0.895, highlighting its strong adaptability to professional and domain-specific texts. Ablation studies further confirm that the multi-granularity transformer structure, the different-dimensional embedding strategy, and the cross-granularity fusion module each contribute critically to overall performance improvements. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

► Show Figures

Figure 1

22 pages, 61965 KB

Open AccessArticle

The Cercal Sensilla of the Praying Mantis Hierodula patellifera and Statilia maculata: A New Partition Based on the Cerci Ultrastructure

by Yang Wang, Xiaoqun Ding, Huan Li and Yang Liu

Insects 2025, 16(11), 1093; https://doi.org/10.3390/insects16111093 (registering DOI) - 24 Oct 2025

Viewed by 221

Abstract

Cerci function as crucial sensory organs in insects, featuring a diverse array of sensilla on their surface, analogous to those found on antennae. Using scanning electron microscopy (SEM), we characterized the ultrastructure and distribution of cercal sensilla in Hierodula patellifera (H. patellifera [...] Read more.

Cerci function as crucial sensory organs in insects, featuring a diverse array of sensilla on their surface, analogous to those found on antennae. Using scanning electron microscopy (SEM), we characterized the ultrastructure and distribution of cercal sensilla in Hierodula patellifera (H. patellifera) and Statilia maculata (S. maculata). Results show that the cerci of H. patellifera and S. maculata are highly similar, with main differences observed in the number of cercal articles and the length of cerci. The cerci of both species and sexes are composed of multiple cylindrical articles, and there is variation in the number of types of sensilla on their surface articles within sex and individuals. Females possess more cercal articles than males, and their cerci are generally longer than those of males. In both sexes of these praying mantises, four types of cercal sensilla were identified: sensilla filiformia (Sf), sensilla chaetica (Sc), sensilla campaniformia (Sca) and cuticular pore (CP), with sensilla chaetica further classified into two subtypes (ScI, ScII). Sc are widely distributed over the entire cerci, while Sf are distributed in a circular pattern on the cercal articles. While the overall distribution patterns of cercal sensilla were conserved between the sexes, significant sexual dimorphism was observed in the morphological parameters of the sensory hairs, including their quantity, length, and basal diameter. Based on distinct sensilla arrangements on the cerci, we propose a novel zoning of the cerci into four parts (I–IV), which reflects a functional gradient specialized for reproductive roles: the proximal region is enriched with robust mechanoreceptors likely involved in mating and oviposition, the central region serves as a multimodal hub for integrating courtship and mating cues, and the distal region is simplified for close-range substrate assessment. These findings highlight the adaptive evolution of cercal sensilla in relation to reproductive behaviors and provide a morphological basis for future studies on mantis phylogeny and sensory ecology. Full article

(This article belongs to the Section Insect Physiology, Reproduction and Development)

► Show Figures

Figure 1

19 pages, 961 KB

Open AccessReview

Context-Dependent Roles of Siglec-F⁺ Neutrophils

by Kisung Sheen, Taesoo Choi and Man S. Kim

Biomedicines 2025, 13(11), 2601; https://doi.org/10.3390/biomedicines13112601 - 24 Oct 2025

Viewed by 208

Abstract

Recent studies in murine disease models have identified Siglec-F⁺ neutrophils, which express a marker traditionally associated with eosinophils, as a functionally distinct population characterized by extended lifespans and context-dependent roles. While conventional neutrophils typically return to the bone marrow or undergo apoptosis [...] Read more.

Recent studies in murine disease models have identified Siglec-F⁺ neutrophils, which express a marker traditionally associated with eosinophils, as a functionally distinct population characterized by extended lifespans and context-dependent roles. While conventional neutrophils typically return to the bone marrow or undergo apoptosis at the site of inflammation, these cells remain in tissues for extended periods. These cells demonstrate remarkable functional plasticity, promote bacterial clearance and immune activation during infections, foster immunosuppression and tumor progression in cancer, and contribute to tissue remodeling in fibrotic diseases. In this review, we examine the key features governing Siglec-F⁺ neutrophil differentiation and function—including Siglec-F signaling, metabolic programming, and upstream cytokine cues—and explore how targeting these pathways may offer promising avenues for precision immunomodulation. Full article

(This article belongs to the Collection Feature Papers in Immunology and Immunotherapy)

► Show Figures

Figure 1

26 pages, 32866 KB

Open AccessArticle

Low-Altitude Multi-Object Tracking via Graph Neural Networks with Cross-Attention and Reliable Neighbor Guidance

by Hanxiang Qian, Xiaoyong Sun, Runze Guo, Shaojing Su, Bing Ding and Xiaojun Guo

Remote Sens. 2025, 17(20), 3502; https://doi.org/10.3390/rs17203502 - 21 Oct 2025

Viewed by 353

Abstract

In low-altitude multi-object tracking (MOT), challenges such as frequent inter-object occlusion and complex non-linear motion disrupt the appearance of individual targets and the continuity of their trajectories, leading to frequent tracking failures. We posit that the relatively stable spatio-temporal relationships within object groups [...] Read more.

In low-altitude multi-object tracking (MOT), challenges such as frequent inter-object occlusion and complex non-linear motion disrupt the appearance of individual targets and the continuity of their trajectories, leading to frequent tracking failures. We posit that the relatively stable spatio-temporal relationships within object groups (e.g., pedestrians and vehicles) offer powerful contextual cues to resolve such ambiguities. We present NOWA-MOT (Neighbors Know Who We Are), a novel tracking-by-detection framework designed to systematically exploit this principle through a multi-stage association process. We make three primary contributions. First, we introduce a Low-Confidence Occlusion Recovery (LOR) module that dynamically adjusts detection scores by integrating IoU, a novel Recovery IoU (RIoU) metric, and location similarity to surrounding objects, enabling occluded targets to participate in high-priority matching. Second, for initial data association, we propose a Graph Cross-Attention (GCA) mechanism. In this module, separate graphs are constructed for detections and trajectories, and a cross-attention architecture is employed to propagate rich contextual information between them, yielding highly discriminative feature representations for robust matching. Third, to resolve the remaining ambiguities, we design a cascaded Matched Neighbor Guidance (MNG) module, which uniquely leverages the reliably matched pairs from the first stage as contextual anchors. Through MNG, star-shaped topological features are built for unmatched objects relative to their stable neighbors, enabling accurate association even when intrinsic features are weak. Our comprehensive experimental evaluation on the VisDrone2019 and UAVDT datasets confirms the superiority of our approach, achieving state-of-the-art HOTA scores of 51.34% and 62.69%, respectively, and drastically reducing identity switches compared to previous methods. Full article

(This article belongs to the Special Issue Multi-Object Detection and Feature Extraction of Remote Sensing Images)

► Show Figures

Figure 1

23 pages, 12870 KB

Open AccessArticle

Time-Frequency Conditional Enhanced Transformer-TimeGAN for Motor Fault Data Augmentation

by Binbin Li, Yu Zhang, Ruijie Ren, Weijia Liu and Gang Xu

Machines 2025, 13(10), 969; https://doi.org/10.3390/machines13100969 - 20 Oct 2025

Viewed by 211

Abstract

Data augmentation is crucial for electric motor fault diagnosis and lifetime prediction. However, the diversity of operating conditions and the challenge of augmenting small datasets often limit existing models. To address this, we propose an enhanced TimeGAN framework that couples the original architecture [...] Read more.

Data augmentation is crucial for electric motor fault diagnosis and lifetime prediction. However, the diversity of operating conditions and the challenge of augmenting small datasets often limit existing models. To address this, we propose an enhanced TimeGAN framework that couples the original architecture with transformer modules to jointly exploit time- and frequency-domain information to improve the fidelity of synthetic motor signals. The method fuses raw waveforms, envelope features, and instantaneous phase-change cues to strengthen temporal representation learning. The generator further incorporates frequency-domain descriptors and adaptively balances time–frequency contributions through learnable weighting, thereby improving generative performance. In addition, a state-conditioning mechanism (via explicit state annotations) enables controlled synthesis across distinct operating states. Comprehensive evaluations—including PCA and t-SNE visualizations, distance metrics such as DTW and FID, and downstream classifier tests—demonstrate strong adaptability and robustness on both public and in-house datasets, substantially enhancing the quality of generated time series. Full article

(This article belongs to the Section Electrical Machines and Drives)

► Show Figures

Figure 1

18 pages, 2704 KB

Open AccessArticle

Enhanced Real-Time Highway Object Detection for Construction Zone Safety Using YOLOv8s-MTAM

by Wen-Piao Lin, Chun-Chieh Wang, En-Cheng Li and Chien-Hung Yeh

Sensors 2025, 25(20), 6420; https://doi.org/10.3390/s25206420 - 17 Oct 2025

Viewed by 356

Abstract

Reliable object detection is crucial for autonomous driving, particularly in highway construction zones where early hazard recognition ensures safety. This paper introduces an enhanced YOLOv8s-based detection system incorporating a motion-temporal attention module (MTAM) to improve robustness under high-speed and dynamic conditions. The proposed [...] Read more.

Reliable object detection is crucial for autonomous driving, particularly in highway construction zones where early hazard recognition ensures safety. This paper introduces an enhanced YOLOv8s-based detection system incorporating a motion-temporal attention module (MTAM) to improve robustness under high-speed and dynamic conditions. The proposed architecture integrates a cross-stage partial (CSP) backbone, feature pyramid network-path aggregation network (FPN-PAN) feature fusion, and advanced loss functions to achieve high accuracy and temporal consistency. MTAM leverages temporal convolutions and attention mechanisms to capture motion cues, enabling effective detection of blurred or partially occluded objects. A custom dataset of 34,240 images, expanded through extensive data augmentation and 9-Mosaic transformations, is used for training. Experimental results demonstrate strong performance with mAP(IoU[0.5]) of 90.77 ± 0.68% and mAP(IoU[0.5:0.95]) of 70.20 ± 0.33%. Real-world highway tests confirm recognition rates of 96% for construction vehicles, 92% for roadside warning signs, and 84% for flag bearers. The results validate the framework’s suitability for real-time deployment in intelligent transportation systems. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

23 pages, 1945 KB

Open AccessArticle

A Symmetry-Informed Multimodal LLM-Driven Approach to Robotic Object Manipulation: Lowering Entry Barriers in Mechatronics Education

by Jorge Gudiño-Lau, Miguel Durán-Fonseca, Luis E. Anido-Rifón and Pedro C. Santana-Mancilla

Symmetry 2025, 17(10), 1756; https://doi.org/10.3390/sym17101756 - 17 Oct 2025

Viewed by 335

Abstract

The integration of Large Language Models (LLMs), particularly Visual Language Models (VLMs), into robotics promises more intuitive human–robot interactions; however, challenges remain in efficiently translating high-level commands into precise physical actions. This paper presents a novel architecture for vision-based object manipulation that leverages [...] Read more.

The integration of Large Language Models (LLMs), particularly Visual Language Models (VLMs), into robotics promises more intuitive human–robot interactions; however, challenges remain in efficiently translating high-level commands into precise physical actions. This paper presents a novel architecture for vision-based object manipulation that leverages a VLM’s reasoning capabilities while incorporating symmetry principles to enhance operational efficiency. Implemented on a Yahboom DOFBOT educational robot with a Jetson Nano platform, our system introduces a prompt-based framework that uniquely embeds symmetry-related cues to streamline feature extraction and object detection from visual data. This methodology, which utilizes few-shot learning, enables the VLM to generate more accurate and contextually relevant commands for manipulation tasks by efficiently interpreting the symmetric and asymmetric features of objects. The experimental results in controlled scenarios demonstrate that our symmetry-informed approach significantly improves the robot’s interaction efficiency and decision-making accuracy compared to generic prompting strategies. This work contributes a robust method for integrating fundamental vision principles into modern generative AI workflows for robotics. Furthermore, its implementation on an accessible educational platform shows its potential to simplify complex robotics concepts for engineering education and research. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Object Detection, Object Tracking, and Behaviour Understanding)

► Show Figures

Graphical abstract

24 pages, 3732 KB

Open AccessReview

The Elias University Hospital Approach: A Visual Guide to Ultrasound-Guided Botulinum Toxin Injection in Spasticity, Part IV—Distal Lower Limb Muscles

by Marius Nicolae Popescu, Claudiu Căpeț, Cristina Popescu and Mihai Berteanu

Toxins 2025, 17(10), 508; https://doi.org/10.3390/toxins17100508 - 16 Oct 2025

Viewed by 440

Abstract

Spasticity of the distal lower limb substantially impairs stance, gait, and quality of life in patients with upper motor neuron lesions. Although ultrasound-guided botulinum toxin A (BoNT-A) injections are increasingly employed, structured, muscle-specific visual guidance for the distal lower limb remains limited. This [...] Read more.

Spasticity of the distal lower limb substantially impairs stance, gait, and quality of life in patients with upper motor neuron lesions. Although ultrasound-guided botulinum toxin A (BoNT-A) injections are increasingly employed, structured, muscle-specific visual guidance for the distal lower limb remains limited. This study provides a comprehensive guide for ultrasound-guided BoNT-A injections across ten key distal lower limb muscles: gastrocnemius, soleus, tibialis posterior, flexor hallucis longus, flexor digitorum longus, tibialis anterior, extensor hallucis longus, flexor digitorum brevis, flexor hallucis brevis, and extensor digitorum longus. For each muscle, we present (1) Anatomical positioning relative to osseous landmarks; (2) Sonographic identification cues and dynamic features; (3) Zones of intramuscular neural arborization optimal for injection; (4) Practical injection protocols derived from literature and clinical experience. High-resolution ultrasound images and dynamic videos illustrate real-life muscle behavior and guide injection site selection. This guide facilitates precise targeting by correlating sonographic signs with optimal injection zones, addresses common spastic patterns—including equinus, varus, claw toe, and hallux deformities—and integrates fascial anatomy with motor-point mapping. This article completes the Elias University Hospital visual series, providing clinicians with a unified framework for effective spasticity management to improve gait, posture, and patient autonomy. Full article

► Show Figures

Figure 1

21 pages, 2429 KB

Open AccessArticle

Visualizing Spatial Cognition for Wayfinding Design: Examining Gaze Behaviors Using Mobile Eye Tracking in Counseling Service Settings

by Jain Kwon, Alea Schmidt, Chenyi Luo, Eunwoo Jun and Karina Martinez

ISPRS Int. J. Geo-Inf. 2025, 14(10), 406; https://doi.org/10.3390/ijgi14100406 - 16 Oct 2025

Viewed by 495

Abstract

Wayfinding with minimal effort is essential for reducing cognitive load and emotional stress in unfamiliar environments. This exploratory quasi-experimental study investigated wayfinding challenges in a university building housing three spatially dispersed counseling centers and three academic departments that share the building entrances, lobby, [...] Read more.

Wayfinding with minimal effort is essential for reducing cognitive load and emotional stress in unfamiliar environments. This exploratory quasi-experimental study investigated wayfinding challenges in a university building housing three spatially dispersed counseling centers and three academic departments that share the building entrances, lobby, and hallways. Using mobile eye tracking with concurrent think-aloud protocols and schematic mapping, we examined visual attention patterns during predefined navigation tasks performed by 24 first-time visitors. Findings revealed frequent fixations on non-informative structural features, while existing wayfinding cues were often overlooked. High rates of null gazes indicated unsuccessful visual searching. Thematic analysis of verbal data identified eight key issues, including spatial confusion, aesthetic monotony, and inadequate signage. Participants frequently described the environment as disorienting and emotionally taxing, comparing it to institutional settings such as hospitals. In response, we developed wayfinding design proposals informed by our research findings, stakeholder needs, and contextual priorities. We used an experiential digital twin that prioritized perceptual fidelity to analyze the current wayfinding challenges, develop experimental protocols, and discuss design options and costs. This study offers a transferable methodological framework for identifying wayfinding challenges through convergent analysis of gaze patterns and verbal protocols, demonstrating how empirical findings can inform targeted wayfinding design interventions. Full article

(This article belongs to the Special Issue Indoor Mobile Mapping and Location-Based Knowledge Services)

► Show Figures

Figure 1

16 pages, 5544 KB

Open AccessArticle

Visual Feature Domain Audio Coding for Anomaly Sound Detection Application

by Subin Byun and Jeongil Seo

Algorithms 2025, 18(10), 646; https://doi.org/10.3390/a18100646 - 15 Oct 2025

Viewed by 267

Abstract

Conventional audio and video codecs are designed for human perception, often discarding subtle spectral cues that are essential for machine-based analysis. To overcome this limitation, we propose a machine-oriented compression framework that reinterprets spectrograms as visual objects and applies Feature Coding for Machines [...] Read more.

Conventional audio and video codecs are designed for human perception, often discarding subtle spectral cues that are essential for machine-based analysis. To overcome this limitation, we propose a machine-oriented compression framework that reinterprets spectrograms as visual objects and applies Feature Coding for Machines (FCM) to anomalous sound detection (ASD). In our approach, audio signals are transformed log-mel spectrograms, from which intermediate feature maps are extracted, compressed, and reconstructed through the FCM pipeline. For comparison, we implement AAC-LC (Advanced Audio Coding Low Complexity) as a representative perceptual audio codec and VVC (Versatile Video Coding) as spectrogram-based video codec. Experiments were conducted on the DCASE (Detection and Classification of Acoustic Scenes and Events) 2023 Task 2 dataset, covering four machine types (fan, valve, toycar, slider), with anomaly detection performed using the official Autoencoder baseline model released in DCASE 2024. Detection scores were computed from reconstruction error and Mahalanobis distance. The results show that the proposed FCM-based ACoM (Audio Coding for Machines) achieves comparable or superior performance to AAC at less than half the bitrate, reliably preserving critical features even under ultra-low bitrate conditions (1.3–6.3 kbps). While VVC retains competitive performance only at high bitrates, it degrades sharply at low bitrates. These findings demonstrate that feature-based compression offers a promising direction for next-generation ACoM standardization, enabling efficient and robust ASD in bandwidth-constrained industrial environments. Full article

(This article belongs to the Special Issue Visual Attributes in Computer Vision Applications)

► Show Figures

Figure 1

28 pages, 32292 KB

Open AccessArticle

Contextual Feature Fusion-Based Keyframe Selection Using Semantic Attention and Diversity-Aware Optimization for Video Summarization

by Chitrakala S and Aparyay Kumar

Symmetry 2025, 17(10), 1737; https://doi.org/10.3390/sym17101737 - 15 Oct 2025

Viewed by 304

Abstract

Training-free video summarization tackles the challenge of selecting the most informative keyframes from a video without relying on costly training or complex deep models. This work introduces C2FVS-DPP (Contextual Feature Fusion Video Summarization with Determinantal Point Process), a lightweight framework that [...] Read more.

Training-free video summarization tackles the challenge of selecting the most informative keyframes from a video without relying on costly training or complex deep models. This work introduces C2FVS-DPP (Contextual Feature Fusion Video Summarization with Determinantal Point Process), a lightweight framework that generates concise video summaries by jointly modeling semantic importance, visual diversity, temporal structure, and symmetry. The design centers on a symmetry-aware fusion strategy, where appearance, motion, and semantic cues are aligned in a unified embedding space, and on a reward-guided optimization logic that balances representativeness and diversity. Specifically, appearance features from ResNet-50, motion cues from optical flow, and semantic representations from BERT-encoded BLIP captions are fused into a contextual embedding. A Transformer encoder assigns importance scores, followed by shot boundary detection and K-Medoids clustering to identify candidate keyframes. These candidates are refined through a reward-based re-ranking mechanism that integrates semantic relevance, representativeness, and visual uniqueness, while a Determinantal Point Process (DPP) enforces globally diverse selection under a keyframe budget. To enable reliable evaluation, enhanced versions of the SumMe and TVSum50 datasets were curated to reduce redundancy and increase semantic density. On these curated benchmarks, C2FVS-DPP achieves F1-scores of 0.22 and 0.43 and fidelity scores of 0.16 and 0.40 on SumMe and TVSum50, respectively, surpassing existing models on both metrics. In terms of compression ratio, the framework records 0.9959 on SumMe and 0.9940 on TVSum50, remaining highly competitive with the best-reported values of 0.9981 and 0.9983. These results highlight the strength of C2FVS-DPP as an inference-driven, symmetry-aware, and resource-efficient solution for video summarization. Full article

► Show Figures

Figure 1

22 pages, 3239 KB

Open AccessArticle

Feature-Level Vehicle-Infrastructure Cooperative Perception with Adaptive Fusion for 3D Object Detection

by Shuangzhi Yu, Jiankun Peng, Shaojie Wang, Di Wu and Chunye Ma

Smart Cities 2025, 8(5), 171; https://doi.org/10.3390/smartcities8050171 - 14 Oct 2025

Viewed by 450

Abstract

As vehicle-centric perception struggles with occlusion and dense traffic, vehicle-infrastructure cooperative perception (VICP) offers a viable route to extend sensing coverage and robustness. This study proposes a feature-level VICP framework that fuses vehicle- and roadside-derived visual features via V2X communication. The model integrates [...] Read more.

As vehicle-centric perception struggles with occlusion and dense traffic, vehicle-infrastructure cooperative perception (VICP) offers a viable route to extend sensing coverage and robustness. This study proposes a feature-level VICP framework that fuses vehicle- and roadside-derived visual features via V2X communication. The model integrates four components: regional feature reconstruction (RFR) for transferring region-specific roadside cues, context-driven channel attention (CDCA) for channel recalibration, uncertainty-weighted fusion (UWF) for confidence-guided weighting, and point sampling voxel fusion (PSVF) for efficient alignment. Evaluated on the DAIR-V2X-C benchmark, our method consistently outperforms state-of-the-art feature-level fusion baselines, achieving improved

{A P}_{3 D}

and

{A P}_{B E V}

(reported settings: 16.31% and 21.49%, respectively). Ablations show RFR provides the largest single-module gain +3.27%

{A P}_{3 D}

and +3.85%

{A P}_{B E V}

, UWF yields substantial robustness gains, and CDCA offers modest calibration benefits. The framework enhances occlusion handling and cross-view detection while reducing dependence on explicit camera calibration, supporting more generalizable cooperative perception. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

18 pages, 5377 KB

Open AccessArticle

M³ENet: A Multi-Modal Fusion Network for Efficient Micro-Expression Recognition

by Ke Zhao, Xuanyu Liu and Guangqian Yang

Sensors 2025, 25(20), 6276; https://doi.org/10.3390/s25206276 - 10 Oct 2025

Viewed by 393

Abstract

Micro-expression recognition (MER) aims to detect brief and subtle facial movements that reveal suppressed emotions, discerning authentic emotional responses in scenarios such as visitor experience analysis in museum settings. However, it remains a highly challenging task due to the fleeting duration, low intensity, [...] Read more.

Micro-expression recognition (MER) aims to detect brief and subtle facial movements that reveal suppressed emotions, discerning authentic emotional responses in scenarios such as visitor experience analysis in museum settings. However, it remains a highly challenging task due to the fleeting duration, low intensity, and limited availability of annotated data. Most existing approaches rely solely on either appearance or motion cues, thereby restricting their ability to capture expressive information fully. To overcome these limitations, we propose a lightweight multi-modal fusion network, termed M³ENet, which integrates both motion and appearance cues through early-stage feature fusion. Specifically, our model extracts horizontal, vertical, and strain-based optical flow between the onset and apex frames, alongside RGB images from the onset, apex, and offset frames. These inputs are processed by two modality-specific subnetworks, whose features are fused to exploit complementary information for robust classification. To improve generalization in low data regimes, we employ targeted data augmentation and adopt focal loss to mitigate class imbalance. Extensive experiments on five benchmark datasets, including CASME I, CASME II, CAS(ME)², SAMM, and MMEW, demonstrate that M³ENet achieves state-of-the-art performance with high efficiency. Ablation studies and Grad-CAM visualizations further confirm the effectiveness and interpretability of the proposed architecture. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)

► Show Figures

Figure 1

20 pages, 3126 KB

Open AccessArticle

Few-Shot Image Classification Algorithm Based on Global–Local Feature Fusion

by Lei Zhang, Xinyu Yang, Xiyuan Cheng, Wenbin Cheng and Yiting Lin

AI 2025, 6(10), 265; https://doi.org/10.3390/ai6100265 - 9 Oct 2025

Viewed by 623

Abstract

Few-shot image classification seeks to recognize novel categories from only a handful of labeled examples, but conventional metric-based methods that rely mainly on global image features often produce unstable prototypes under extreme data scarcity, while local-descriptor approaches can lose context and suffer from [...] Read more.

Few-shot image classification seeks to recognize novel categories from only a handful of labeled examples, but conventional metric-based methods that rely mainly on global image features often produce unstable prototypes under extreme data scarcity, while local-descriptor approaches can lose context and suffer from inter-class local-pattern overlap. To address these limitations, we propose a Global–Local Feature Fusion network that combines a frozen, pretrained global feature branch with a self-attention based multi-local feature fusion branch. Multiple random crops are encoded by a shared backbone (ResNet-12), projected to Query/Key/Value embeddings, and fused via scaled dot-product self-attention to suppress background noise and highlight discriminative local cues. The fused local representation is concatenated with the global feature to form robust class prototypes used in a prototypical-network style classifier. On four benchmarks, our method achieves strong improvements: Mini-ImageNet 70.31% ± 0.20 (1-shot)/85.91% ± 0.13 (5-shot), Tiered-ImageNet 73.37% ± 0.22/87.62% ± 0.14, FC-100 47.01% ± 0.20/64.13% ± 0.19, and CUB-200-2011 82.80% ± 0.18/93.19% ± 0.09, demonstrating consistent gains over competitive baselines. Ablation studies show that (1) naive local averaging improves over global-only baselines, (2) self-attention fusion yields a large additional gain (e.g., +4.50% in 1-shot on Mini-ImageNet), and (3) concatenating global and fused local features gives the best overall performance. These results indicate that explicitly modeling inter-patch relations and fusing multi-granularity cues produces markedly more discriminative prototypes in few-shot regimes. Full article

► Show Figures

Figure 1

Search Results (594)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (594)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI