MDPI - Publisher of Open Access Journals

28 pages, 3503 KiB

Open AccessArticle

Structure-Aware and Format-Enhanced Transformer for Accident Report Modeling

by Wenhua Zeng, Wenhu Tang, Diping Yuan, Hui Zhang, Pinsheng Duan and Shikun Hu

Appl. Sci. 2025, 15(14), 7928; https://doi.org/10.3390/app15147928 - 16 Jul 2025

Modeling accident investigation reports is crucial for elucidating accident causation mechanisms, analyzing risk evolution processes, and formulating effective accident prevention strategies. However, such reports are typically long, hierarchically structured, and information-dense, posing unique challenges for existing language models. To address these domain-specific characteristics, [...] Read more.

Modeling accident investigation reports is crucial for elucidating accident causation mechanisms, analyzing risk evolution processes, and formulating effective accident prevention strategies. However, such reports are typically long, hierarchically structured, and information-dense, posing unique challenges for existing language models. To address these domain-specific characteristics, this study proposes SAFE-Transformer, a Structure-Aware and Format-Enhanced Transformer designed for long-document modeling in the emergency safety context. SAFE-Transformer adopts a dual-stream encoding architecture to separately model symbolic section features and heading text, integrates hierarchical depth and format types into positional encodings, and introduces a dynamic gating unit to adaptively fuse headings with paragraph semantics. We evaluate the model on a multi-label accident intelligence classification task using a real-world corpus of 1632 official reports from high-risk industries. Results demonstrate that SAFE-Transformer effectively captures hierarchical semantic structure and outperforms strong long-text baselines. Further analysis reveals an inverted U-shaped performance trend across varying report lengths and highlights the role of attention sparsity and label distribution in long-text modeling. This work offers a practical solution for structurally complex safety documents and provides methodological insights for downstream applications in safety supervision and risk analysis. Full article

(This article belongs to the Special Issue Advances in Smart Construction and Intelligent Buildings)

► Show Figures

Figure 1

21 pages, 3826 KiB

Open AccessArticle

UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding

by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang

Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025

Viewed by 157

Abstract

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article

(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)

► Show Figures

Figure 1

17 pages, 1609 KiB

Open AccessArticle

Parallel Multi-Scale Semantic-Depth Interactive Fusion Network for Depth Estimation

by Chenchen Fu, Sujunjie Sun, Ning Wei, Vincent Chau, Xueyong Xu and Weiwei Wu

J. Imaging 2025, 11(7), 218; https://doi.org/10.3390/jimaging11070218 - 1 Jul 2025

Viewed by 273

Abstract

Self-supervised depth estimation from monocular image sequences provides depth information without costly sensors like LiDAR, offering significant value for autonomous driving. Although self-supervised algorithms can reduce the dependence on labeled data, the performance is still affected by scene occlusions, lighting differences, and sparse [...] Read more.

Self-supervised depth estimation from monocular image sequences provides depth information without costly sensors like LiDAR, offering significant value for autonomous driving. Although self-supervised algorithms can reduce the dependence on labeled data, the performance is still affected by scene occlusions, lighting differences, and sparse textures. Existing methods do not consider the enhancement and interaction fusion of features. In this paper, we propose a novel parallel multi-scale semantic-depth interactive fusion network. First, we adopt a multi-stage feature attention network for feature extraction, and a parallel semantic-depth interactive fusion module is introduced to refine edges. Furthermore, we also employ a metric loss based on semantic edges to take full advantage of semantic geometric information. Our network is trained and evaluated on KITTI datasets. The experimental results show that the methods achieve satisfactory performance compared to other existing methods. Full article

► Show Figures

Figure 1

24 pages, 2802 KiB

Open AccessArticle

MSDCA: A Multi-Scale Dual-Branch Network with Enhanced Cross-Attention for Hyperspectral Image Classification

by Ning Jiang, Shengling Geng, Yuhui Zheng and Le Sun

Remote Sens. 2025, 17(13), 2198; https://doi.org/10.3390/rs17132198 - 26 Jun 2025

Viewed by 324

Abstract

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, [...] Read more.

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, a multiscale 3D spatial–spectral feature extraction module (3D-SSF) employs parallel 3D convolutional branches with diverse kernel sizes and dilation rates, enabling hierarchical modeling of spatial–spectral representations from large-scale patches and effectively capturing both fine-grained textures and global context. Second, a multi-branch directional feature module (MBDFM) enhances the network’s sensitivity to directional patterns and long-range spatial relationships. It achieves this by applying axis-aware depthwise separable convolutions along both horizontal and vertical axes, thereby significantly improving the representation of spatial features. Finally, the enhanced cross-attention Transformer encoder (ECATE) integrates a dual-branch fusion strategy, where a cross-attention stream learns semantic dependencies across multi-scale tokens, and a residual path ensures the preservation of structural integrity. The fused features are further refined through lightweight channel and spatial attention modules. This adaptive alignment process enhances the discriminative power of heterogeneous spatial–spectral features. The experimental results on three widely used benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches in terms of classification accuracy and robustness. Notably, the framework is particularly effective for small-sample classes and complex boundary regions, while maintaining high computational efficiency. Full article

► Show Figures

Graphical abstract

21 pages, 4359 KiB

Open AccessArticle

Identification of NAPL Contamination Occurrence States in Low-Permeability Sites Using UNet Segmentation and Electrical Resistivity Tomography

by Mengwen Gao, Yu Xiao and Xiaolei Zhang

Appl. Sci. 2025, 15(13), 7109; https://doi.org/10.3390/app15137109 - 24 Jun 2025

Viewed by 193

Abstract

To address the challenges in identifying NAPL contamination within low-permeability clay sites, this study innovatively integrates high-density electrical resistivity tomography (ERT) with a UNet deep learning model to establish an intelligent contamination detection system. Taking an industrial site in Shanghai as the research [...] Read more.

To address the challenges in identifying NAPL contamination within low-permeability clay sites, this study innovatively integrates high-density electrical resistivity tomography (ERT) with a UNet deep learning model to establish an intelligent contamination detection system. Taking an industrial site in Shanghai as the research object, we collected apparent resistivity data using the WGMD-9 system, obtained resistivity profiles through inversion imaging, and constructed training sets by generating contamination labels via K-means clustering. A semantic segmentation model with skip connections and multi-scale feature fusion was developed based on the UNet architecture to achieve automatic identification of contaminated areas. Experimental results demonstrate that the model achieves a mean Intersection over Union (mIoU) of 86.58%, an accuracy (Acc) of 99.42%, a precision (Pre) of 75.72%, a recall (Rec) of 76.80%, and an F1 score (f1) of 76.23%, effectively overcoming the noise interference in electrical anomaly interpretation through conventional geophysical methods in low-permeability clay, while outperforming DeepLabV3, DeepLabV3+, PSPNet, and LinkNet models. Time-lapse resistivity imaging verifies the feasibility of dynamic monitoring for contaminant migration, while the integration of the VGG-16 encoder and hyperparameter optimization (learning rate of 0.0001 and batch size of 8) significantly enhances model performance. Case visualization reveals high consistency between segmentation results and actual contamination distribution, enabling precise localization of spatial morphology for contamination plumes. This technological breakthrough overcomes the high-cost and low-efficiency limitations of traditional borehole sampling, providing a high-precision, non-destructive intelligent detection solution for contaminated site remediation. Full article

► Show Figures

Figure 1

20 pages, 2848 KiB

Open AccessArticle

A Dual-Branch Network for Intra-Class Diversity Extraction in Panchromatic and Multispectral Classification

by Zihan Huang, Pengyu Tian, Hao Zhu, Pute Guo and Xiaotong Li

Remote Sens. 2025, 17(12), 1998; https://doi.org/10.3390/rs17121998 - 10 Jun 2025

Viewed by 322

Abstract

With the rapid development of remote sensing technology, satellites can now capture multispectral (MS) and panchromatic (PAN) images simultaneously. MS images offer rich spectral details, while PAN images provide high spatial resolutions. Effectively leveraging their complementary strengths and addressing modality gaps are key [...] Read more.

With the rapid development of remote sensing technology, satellites can now capture multispectral (MS) and panchromatic (PAN) images simultaneously. MS images offer rich spectral details, while PAN images provide high spatial resolutions. Effectively leveraging their complementary strengths and addressing modality gaps are key challenges in improving the classification performance. From the perspective of deep learning, this paper proposes a novel dual-source remote sensing classification framework named the Diversity Extraction and Fusion Classifier (DEFC-Net). A central innovation of our method lies in introducing a modality-specific intra-class diversity modeling mechanism for the first time in dual-source classification. Specifically, the intra-class diversity identification and splitting (IDIS) module independently analyzes the intra-class variance within each modality to identify semantically broad classes, and it applies an optimized K-means method to split such classes into fine-grained sub-classes. In particular, due to the inherent representation differences between the MS and PAN modalities, the same class may be split differently in each modality, allowing modality-aware class refinement that better captures fine-grained discriminative features in dual perspectives. To handle the class imbalance introduced by both natural long-tailed distributions and class splitting, we design a long-tailed ensemble learning module (LELM) based on a multi-expert structure to reduce bias toward head classes. Furthermore, a dual-modal knowledge distillation (DKD) module is developed to align cross-modal feature spaces and reconcile the label inconsistency arising from modality-specific class splitting, thereby facilitating effective information fusion across modalities. Extensive experiments on datasets show that our method significantly improves the classification performance. The code was accessed on 11 April 2025. Full article

► Show Figures

Figure 1

22 pages, 12020 KiB

Open AccessArticle

TFF-Net: A Feature Fusion Graph Neural Network-Based Vehicle Type Recognition Approach for Low-Light Conditions

by Huizhi Xu, Wenting Tan, Yamei Li and Yue Tian

Sensors 2025, 25(12), 3613; https://doi.org/10.3390/s25123613 - 9 Jun 2025

Viewed by 579

Abstract

Accurate vehicle type recognition in low-light environments remains a critical challenge for intelligent transportation systems (ITSs). To address the performance degradation caused by insufficient lighting, complex backgrounds, and light interference, this paper proposes a Twin-Stream Feature Fusion Graph Neural Network (TFF-Net) model. The [...] Read more.

Accurate vehicle type recognition in low-light environments remains a critical challenge for intelligent transportation systems (ITSs). To address the performance degradation caused by insufficient lighting, complex backgrounds, and light interference, this paper proposes a Twin-Stream Feature Fusion Graph Neural Network (TFF-Net) model. The model employs multi-scale convolutional operations combined with an Efficient Channel Attention (ECA) module to extract discriminative local features, while independent convolutional layers capture hierarchical global representations. These features are mapped as nodes to construct fully connected graph structures. Hybrid graph neural networks (GNNs) process the graph structures and model spatial dependencies and semantic associations. TFF-Net enhances the representation of features by fusing local details and global context information from the output of GNNs. To further improve its robustness, we propose an Adaptive Weighted Fusion-Bagging (AWF-Bagging) algorithm, which dynamically assigns weights to base classifiers based on their F1 scores. TFF-Net also includes dynamic feature weighting and label smoothing techniques for solving the category imbalance problem. Finally, the proposed TFF-Net is integrated into YOLOv11n (a lightweight real-time object detector) with an improved adaptive loss function. For experimental validation in low-light scenarios, we constructed the low-light vehicle dataset VDD-Light based on the public dataset UA-DETRAC. Experimental results demonstrate that our model achieves 2.6% and 2.2% improvements in mAP50 and mAP50-95 metrics over the baseline model. Compared to mainstream models and methods, the proposed model shows excellent performance and practical deployment potential. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

22 pages, 1877 KiB

Open AccessArticle

Malicious Cloud Service Traffic Detection Based on Multi-Feature Fusion

by Zhouguo Chen, Chen Deng, Xiang Gao, Xinze Li and Hangyu Hu

Electronics 2025, 14(11), 2190; https://doi.org/10.3390/electronics14112190 - 28 May 2025

Viewed by 287

Abstract

With the rapid growth of cloud computing, malicious attacks targeting cloud services have become increasingly sophisticated and prevalent. To address the limitations of traditional detection methods—such as reliance on single-dimensional features and poor generalization—we propose a novel malicious request detection model based on [...] Read more.

With the rapid growth of cloud computing, malicious attacks targeting cloud services have become increasingly sophisticated and prevalent. To address the limitations of traditional detection methods—such as reliance on single-dimensional features and poor generalization—we propose a novel malicious request detection model based on multi-feature fusion. The model adopts a dual-branch architecture that independently extracts and learns from statistical attributes (e.g., field lengths, entropy) and field attributes (e.g., semantic content of the requested fields). To enhance feature representation, an attention-based fusion mechanism is introduced to dynamically weight and integrate field-level features, while a Gini coefficient-guided random forest algorithm is used to select the most informative statistical features. This design enables the model to capture both structural and semantic characteristics of cloud service traffic. Extensive experiments on the benchmark CSIC 2010 dataset and a real-world labeled cloud service dataset demonstrated that the proposed model significantly outperforms existing approaches in terms of accuracy, precision, recall, and F1 score. These results validate the effectiveness and robustness of our multi-feature fusion approach for detecting malicious requests in cloud environments. Full article

(This article belongs to the Special Issue Advancements in AI-Driven Cybersecurity and Securing AI Systems)

► Show Figures

Figure 1

21 pages, 3538 KiB

Open AccessArticle

MFFP-Net: Building Segmentation in Remote Sensing Images via Multi-Scale Feature Fusion and Foreground Perception Enhancement

by Huajie Xu, Qiukai Huang, Haikun Liao, Ganxiao Nong and Wei Wei

Remote Sens. 2025, 17(11), 1875; https://doi.org/10.3390/rs17111875 - 28 May 2025

Viewed by 448

Abstract

The accurate segmentation of small target buildings in high-resolution remote sensing images remains challenging due to two critical issues: (1) small target buildings often occupy few pixels in complex backgrounds, leading to frequent background confusion, and (2) significant intra-class variance complicates feature representation [...] Read more.

The accurate segmentation of small target buildings in high-resolution remote sensing images remains challenging due to two critical issues: (1) small target buildings often occupy few pixels in complex backgrounds, leading to frequent background confusion, and (2) significant intra-class variance complicates feature representation compared to conventional semantic segmentation tasks. To address these challenges, we propose a novel Multi-Scale Feature Fusion and Foreground Perception Enhancement Network (MFFP-Net). This framework introduces three key innovations: (1) a Multi-Scale Feature Fusion (MFF) module that hierarchically aggregates shallow features through cross-level connections to enhance fine-grained detail preservation, (2) a Foreground Perception Enhancement (FPE) module that establishes pixel-wise affinity relationships within foreground regions to mitigate intra-class variance effects, and (3) a Dual-Path Attention (DPA) mechanism combining parallel global and local attention pathways to jointly capture structural details and long-range contextual dependencies. Experimental results demonstrate that the IoU of the proposed method achieves improvements of 0.44%, 0.98% and 0.61% compared to mainstream state-of-the-art methods on the WHU Building, Massachusetts Building, and Inria Aerial Image Labeling datasets, respectively, validating its effectiveness in handling small targets and intra-class variance while maintaining robustness in complex scenarios. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

35 pages, 5356 KiB

Open AccessArticle

SAM-Guided Concrete Bridge Damage Segmentation with Mamba–ResNet Hierarchical Fusion Network

by Hao Li, Jianxi Yang, Shixin Jiang and Xiaoxia Yang

Electronics 2025, 14(8), 1497; https://doi.org/10.3390/electronics14081497 - 8 Apr 2025

Cited by 1 | Viewed by 742

Abstract

Automated damage segmentation for concrete bridges is a fundamental task in infrastructure maintenance, yet existing systems often depend heavily on large annotated datasets, which are costly and time-consuming to produce. This paper presents an innovative framework for concrete bridge damage segmentation, leveraging the [...] Read more.

Automated damage segmentation for concrete bridges is a fundamental task in infrastructure maintenance, yet existing systems often depend heavily on large annotated datasets, which are costly and time-consuming to produce. This paper presents an innovative framework for concrete bridge damage segmentation, leveraging the Segment Anything Model (SAM) to reduce the reliance on extensive annotated data while enhancing segmentation accuracy and efficiency. Firstly, a SAM-guided mask generation network is introduced, which utilizes the SAM’s segmentation capabilities to generate supplementary supervision labels for damage segmentation. Then, a novel point-prompting strategy, incorporating saliency information, is proposed to refine SAM’s prompts, ensuring accurate mask generation for complex damage patterns. Next, a trainable semantic segmentation network is designed, integrating MambaVision and ResNet as dual backbones to capture multi-level features from concrete bridge damages. To fuse these features effectively, a Hierarchical Attention Fusion (HAF) mechanism is introduced. Finally, a Polarized Self-Attention (PSA) decoder is employed to improve segmentation precision. Experiments on a dataset of 10,000 concrete bridge images with box-level annotations achieved state-of-the-art performance, with an MIoU of 60.13%, PA of 74.02%, and MDice of 75.40%, outperforming existing segmentation models. In summary, this study improves the accuracy of concrete bridge damage segmentation through a series of innovative methods and strategies, and the concrete bridge damage segmentation algorithm opens up new horizons and directions. Full article

► Show Figures

Figure 1

27 pages, 17923 KiB

Open AccessArticle

A Semantically Guided Deep Supervised Hashing Model for Multi-Label Remote Sensing Image Retrieval

by Bowen Liu, Shibin Liu and Wei Liu

Remote Sens. 2025, 17(5), 838; https://doi.org/10.3390/rs17050838 - 27 Feb 2025

Viewed by 754

Abstract

With the rapid growth of remote sensing data, efficiently managing and retrieving large-scale remote sensing images has become a significant challenge. Specifically, for multi-label image retrieval, single-scale feature extraction methods often fail to capture the rich and complex information inherent in these images. [...] Read more.

With the rapid growth of remote sensing data, efficiently managing and retrieving large-scale remote sensing images has become a significant challenge. Specifically, for multi-label image retrieval, single-scale feature extraction methods often fail to capture the rich and complex information inherent in these images. Additionally, the sheer volume of data creates challenges in retrieval efficiency. Furthermore, leveraging semantic information for more accurate retrieval remains an open issue. In this paper, we propose a multi-label remote sensing image retrieval method based on an improved Swin Transformer, called Semantically Guided Deep Supervised Hashing (SGDSH). The method aims to enhance feature extraction capabilities and improve retrieval precision. By utilizing multi-scale information through an end-to-end learning approach with a multi-scale feature fusion module, SGDSH effectively integrates both shallow and deep features. A classification layer is introduced to assist in training the hash codes, incorporating RS image category information to improve retrieval accuracy. The model is optimized for multi-label retrieval through a novel loss function that combines classification loss, pairwise similarity loss, and hash code quantization loss. Experimental results on three publicly available remote sensing datasets, with varying sizes and label distributions, demonstrate that SGDSH outperforms state-of-the-art multi-label hashing methods in terms of average accuracy and weighted average precision. Moreover, SGDSH returns more relevant images with higher label similarity to query images. These findings confirm the effectiveness of SGDSH for large-scale remote sensing image retrieval tasks and provide new insights for future research on multi-label remote sensing image retrieval. Full article

► Show Figures

Figure 1

20 pages, 2858 KiB

Open AccessArticle

A Hybrid Intention Recognition Framework with Semantic Inference for Financial Customer Service

by Nian Cai, Shishan Li, Jiajie Xu, Yinfeng Tian, Yinghong Zhou and Jiacheng Liao

Electronics 2025, 14(3), 495; https://doi.org/10.3390/electronics14030495 - 25 Jan 2025

Viewed by 906

Abstract

Automatic intention recognition in financial service scenarios faces challenges such as limited corpus size, high colloquialism, and ambiguous intentions. This paper proposes a hybrid intention recognition framework for financial customer service, which involves semi-supervised learning data augmentation, label semantic inference, and text classification. [...] Read more.

Automatic intention recognition in financial service scenarios faces challenges such as limited corpus size, high colloquialism, and ambiguous intentions. This paper proposes a hybrid intention recognition framework for financial customer service, which involves semi-supervised learning data augmentation, label semantic inference, and text classification. A semi-supervised learning method is designed to augment the limited corpus data obtained from the Chinese financial service scenario, which combines back-translation with BERT models. Then, a K-means-based semantic inference method is introduced to extract label semantic information from categorized corpus data, serving as constraints for subsequent text classification. Finally, a BERT-based text classification network is designed to recognize the intentions in financial customer service, involving a multi-level feature fusion for corpus information and label semantic information. During the multi-level feature fusion, a shallow-to-deep (StD) mechanism is designed to alleviate feature collapse. To validate our hybrid framework, 2977 corpus texts about loan service are provided by a financial company in China. Experimental results demonstrate that our hybrid framework outperforms existing deep learning methods in financial customer service intention recognition, achieving an accuracy of 89.06%, precision of 90.27%, recall of 90.40%, and an F1 score of 90.07%. This study demonstrates the potential of the hybrid framework to automatic intention recognition in financial customer service, which is beneficial for the improvement of the financial service quality. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 3811 KiB

Open AccessArticle

A Named Entity Recognition Model for Chinese Electricity Violation Descriptions Based on Word-Character Fusion and Multi-Head Attention Mechanisms

by Lingwen Meng, Yulin Wang, Yuanjun Huang, Dingli Ma, Xinshan Zhu and Shumei Zhang

Energies 2025, 18(2), 401; https://doi.org/10.3390/en18020401 - 17 Jan 2025

Viewed by 749

Abstract

Due to the complexity and technicality of named entity recognition (NER) in the power grid field, existing methods are ineffective at identifying specialized terms in power grid operation record texts. Therefore, this paper proposes a Chinese power violation description entity recognition model based [...] Read more.

Due to the complexity and technicality of named entity recognition (NER) in the power grid field, existing methods are ineffective at identifying specialized terms in power grid operation record texts. Therefore, this paper proposes a Chinese power violation description entity recognition model based on word-character fusion and multi-head attention mechanisms. The model first utilizes a collected power grid domain corpus to train a Word2Vec model, which produces static word vector representations. These static word vectors are then integrated with the dynamic character vector features of the input text generated by the BERT model, thereby mitigating the impact of segmentation errors on the NER model and enhancing the model’s ability to identify entity boundaries. The combined vectors are subsequently input into a BiGRU model for learning contextual features. The output from the BiGRU layer is then passed to an attention mechanism layer to obtain enhanced semantic features, which highlight key semantics and improve the model’s contextual understanding ability. Finally, the CRF layer decodes the output to generate the globally optimal label sequence with the highest probability. Experimental results on the constructed power grid field operation violation description dataset demonstrate that the proposed NER model outperforms the traditional BERT-BiLSTM-CRF model, with an average improvement of 1.58% in precision, recall, and F1-score. This demonstrates the effectiveness of the model design and further enhances the accuracy of entity recognition in the power grid domain. Full article

(This article belongs to the Section A1: Smart Grids and Microgrids)

► Show Figures

Figure 1

19 pages, 6995 KiB

Open AccessArticle

A Classification Model for Fine-Grained Silkworm Cocoon Images Based on Bilinear Pooling and Adaptive Feature Fusion

by Mochen Liu, Xin Hou, Mingrui Shang, Eunice Oluwabunmi Owoola, Guizheng Zhang, Wei Wei, Zhanhua Song and Yinfa Yan

Agriculture 2024, 14(12), 2363; https://doi.org/10.3390/agriculture14122363 - 22 Dec 2024

Viewed by 1291

Abstract

The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose [...] Read more.

The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose a serious challenge in accurately locating the effective areas and classifying silkworm cocoons. To improve the perception of intra-class features and the classification accuracy, this paper proposes a bilinear pooling classification model (B-Res41-ASE) based on adaptive multi-scale feature fusion and enhancement. B-Res41-ASE consists of three parts: a feature extraction module, a feature fusion module, and a feature enhancement module. Firstly, the backbone network, ResNet41, is constructed based on the bilinear pooling algorithm to extract complete cocoon features. Secondly, the adaptive spatial feature fusion module (ASFF) is introduced to fuse different semantic information to solve the problem of fine-grained information loss in the process of feature extraction. Finally, the squeeze and excitation module (SE) is used to suppress redundant information, enhance the weight of distinguishable regions, and reduce classification bias. Compared with the widely used classification network, the proposed model achieves the highest classification performance in the test set, with accuracy of 97.0% and an F1-score of 97.5%. The accuracy of B-Res41-ASE is 3.1% and 2.6% higher than that of the classification networks AlexNet and GoogLeNet, respectively, while the F1-score is 2.5% and 2.2% higher, respectively. Additionally, the accuracy of B-Res41-ASE is 1.9% and 7.7% higher than that of the Bilinear CNN and HBP, respectively, while the F1-score is 1.6% and 5.7% higher. The experimental results show that the proposed classification model without complex labelling outperforms other cocoon classification algorithms in terms of classification accuracy and robustness, providing a theoretical basis for the intelligent sorting of silkworm cocoons. Full article

(This article belongs to the Section Digital Agriculture)

► Show Figures

Figure 1

22 pages, 9696 KiB

Open AccessArticle

Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval

by Qiang Zou, Shuli Cheng, Anyu Du and Jiayi Chen

Entropy 2024, 26(11), 911; https://doi.org/10.3390/e26110911 - 27 Oct 2024

Viewed by 1591

Abstract

Deep hashing technology, known for its low-cost storage and rapid retrieval, has become a focal point in cross-modal retrieval research as multimodal data continue to grow. However, existing supervised methods often overlook noisy labels and multiscale features in different modal datasets, leading to [...] Read more.

Deep hashing technology, known for its low-cost storage and rapid retrieval, has become a focal point in cross-modal retrieval research as multimodal data continue to grow. However, existing supervised methods often overlook noisy labels and multiscale features in different modal datasets, leading to higher information entropy in the generated hash codes and features, which reduces retrieval performance. The variation in text annotation information across datasets further increases the information entropy during text feature extraction, resulting in suboptimal outcomes. Consequently, reducing the information entropy in text feature extraction, supplementing text feature information, and enhancing the retrieval efficiency of large-scale media data are critical challenges in cross-modal retrieval research. To tackle these, this paper introduces the Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval (TEGAH) framework. TEGAH incorporates a deep text feature extraction network and a multiscale label region fusion network to minimize information entropy and optimize feature extraction. Additionally, a Graph-Attention-based modal feature fusion network is designed to efficiently integrate multimodal information, enhance the affinity of the network for different modes, and retain more semantic information. Extensive experiments on three multilabel datasets demonstrate that the TEGAH framework significantly outperforms state-of-the-art cross-modal hashing methods. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

Search Results (60)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (60)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI