Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (105)

Search Parameters:
Keywords = multi-label text classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 3705 KB  
Article
Cross-Platform Multi-Modal Transfer Learning Framework for Cyberbullying Detection
by Weiqi Zhang, Chengzu Dong, Aiting Yao, Asef Nazari and Anuroop Gaddam
Electronics 2026, 15(2), 442; https://doi.org/10.3390/electronics15020442 - 20 Jan 2026
Viewed by 170
Abstract
Cyberbullying and hate speech increasingly appear in multi-modal social media posts, where images and text are combined in diverse and fast changing ways across platforms. These posts differ in style, vocabulary and layout, and labeled data are sparse and noisy, which makes it [...] Read more.
Cyberbullying and hate speech increasingly appear in multi-modal social media posts, where images and text are combined in diverse and fast changing ways across platforms. These posts differ in style, vocabulary and layout, and labeled data are sparse and noisy, which makes it difficult to train detectors that are both reliable and deployable under tight computational budgets. Many high performing systems rely on large vision language backbones, full parameter fine tuning, online retrieval or model ensembles, which raises training and inference costs. We present a parameter efficient cross-platform multi-modal transfer learning framework for cyberbullying and hateful content detection. Our framework has three components. First, we perform domain adaptive pretraining of a compact ViLT backbone on in domain image-text corpora. Second, we apply parameter efficient fine tuning that updates only bias terms, a small subset of LayerNorm parameters and the classification head, leaving the inference computation graph unchanged. Third, we use noise aware knowledge distillation from a stronger teacher built from pretrained text and CLIP based image-text encoders, where only high confidence, temperature scaled predictions are used as soft labels during training, and teacher models and any retrieval components are used only offline. We evaluate primarily on Hateful Memes and use IMDB as an auxiliary text only benchmark to show that the deployment aware PEFT + offline-KD recipe can still be applied when other modalities are unavailable. On Hateful Memes, our student updates only 0.11% of parameters and retain about 96% of the AUROC of full fine-tuning. Full article
(This article belongs to the Special Issue Data Privacy and Protection in IoT Systems)
Show Figures

Figure 1

33 pages, 10634 KB  
Article
Examining the Nature and Dimensions of Artificial Intelligence Incidents: A Machine Learning Text Analytics Approach
by Wullianallur Raghupathi, Jie Ren and Tanush Kulkarni
AppliedMath 2026, 6(1), 11; https://doi.org/10.3390/appliedmath6010011 - 9 Jan 2026
Viewed by 276
Abstract
As artificial intelligence systems proliferate across critical societal domains, understanding the nature, patterns, and evolution of AI-related harms has become essential for effective governance. Despite growing incident repositories, systematic computational analysis of AI incident discourse remains limited, with prior research constrained by small [...] Read more.
As artificial intelligence systems proliferate across critical societal domains, understanding the nature, patterns, and evolution of AI-related harms has become essential for effective governance. Despite growing incident repositories, systematic computational analysis of AI incident discourse remains limited, with prior research constrained by small samples, single-method approaches, and absence of temporal analysis spanning major capability advances. This study addresses these gaps through a comprehensive multi-method text analysis of 3494 AI incident records from the OECD AI Policy Observatory, spanning January 2014 through October 2024. Six complementary analytical approaches were applied: Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) topic modeling to discover thematic structures; K-Means and BERTopic clustering for pattern identification; VADER sentiment analysis for emotional framing assessment; and LIWC psycholinguistic profiling for cognitive and communicative dimension analysis. Cross-method comparison quantified categorization robustness across all four clustering and topic modeling approaches. Key findings reveal dramatic temporal shifts and systematic risk patterns. Incident reporting increased 4.6-fold following ChatGPT’s (5.2) November 2022 release (from 12.0 to 95.9 monthly incidents), accompanied by vocabulary transformation from embodied AI terminology (facial recognition, autonomous vehicles) toward generative AI discourse (ChatGPT, hallucination, jailbreak). Six robust thematic categories emerged consistently across methods: autonomous vehicles (84–89% cross-method alignment), facial recognition (66–68%), deepfakes, ChatGPT/generative AI, social media platforms, and algorithmic bias. Risk concentration is pronounced: 49.7% of incidents fall within two harm categories (system safety 29.1%, physical harms 20.6%); private sector actors account for 70.3%; and 48% occur in the United States. Sentiment analysis reveals physical safety incidents receive notably negative framing (autonomous vehicles: −0.077; child safety: −0.326), while policy and generative AI coverage trend positive (+0.586 to +0.633). These findings have direct governance implications. The thematic concentration supports sector-specific regulatory frameworks—mandatory audit trails for hiring algorithms, simulation testing for autonomous vehicles, transparency requirements for recommender systems, accuracy standards for facial recognition, and output labeling for generative AI. Cross-method validation demonstrates which incident categories are robust enough for standardized regulatory classification versus those requiring context-dependent treatment. The rapid emergence of generative AI incidents underscores the need for governance mechanisms responsive to capability advances within months rather than years. Full article
(This article belongs to the Section Computational and Numerical Mathematics)
Show Figures

Figure 1

25 pages, 1075 KB  
Article
Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer
by Deling Huang, Zanxiong Li, Jian Yu and Yulong Zhou
Information 2026, 17(1), 58; https://doi.org/10.3390/info17010058 - 8 Jan 2026
Viewed by 279
Abstract
Few-Shot Text Classification (FSTC) aims to classify text accurately into predefined categories using minimal training samples. Recently, prompt-tuning-based methods have achieved promising results by constructing verbalizers that map input data to the label space, thereby maximizing the utilization of pre-trained model features. However, [...] Read more.
Few-Shot Text Classification (FSTC) aims to classify text accurately into predefined categories using minimal training samples. Recently, prompt-tuning-based methods have achieved promising results by constructing verbalizers that map input data to the label space, thereby maximizing the utilization of pre-trained model features. However, existing verbalizer construction methods often rely on external knowledge bases, which require complex noise filtering and manual refinement, making the process time-consuming and labor-intensive, while approaches based on pre-trained language models (PLMs) frequently overlook inherent prediction biases. Furthermore, conventional data augmentation methods focus on modifying input instances while overlooking the integral role of label semantics in prompt tuning. This disconnection often leads to a trade-off where increased sample diversity comes at the cost of semantic consistency, resulting in marginal improvements. To address these limitations, this paper first proposes a novel Bayesian Mutual Information-based method that optimizes label mapping to retain general PLM features while reducing reliance on irrelevant or unfair attributes to mitigate latent biases. Based on this method, we propose two synergistic generators that synthesize semantically consistent samples by integrating label word information from the verbalizer to effectively enrich data distribution and alleviate sparsity. To guarantee the reliability of the augmented set, we propose a Low-Entropy Selector that serves as a semantic filter, retaining only high-confidence samples to safeguard the model against ambiguous supervision signals. Furthermore, we propose a Difficulty-Aware Adversarial Training framework that fosters generalized feature learning, enabling the model to withstand subtle input perturbations. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods on most few-shot and full-data splits, with F1 score improvements of up to +2.8% on the standard AG’s News benchmark and +1.0% on the challenging DBPedia benchmark. Full article
Show Figures

Graphical abstract

23 pages, 1485 KB  
Article
TextShelter: Text Adversarial Example Defense Based on Input Reconstruction
by Guoqin Chang, Haichang Gao, Nuo Cheng, Zhou Yao and Haodong Li
Electronics 2025, 14(23), 4706; https://doi.org/10.3390/electronics14234706 - 29 Nov 2025
Viewed by 541
Abstract
Effective identification of textual adversarial examples is a pressing need for safeguarding application security and maintaining cybersecurity. However, most existing adversarial defense methods for natural language processing can only resist a single form of attack and lack generalizability. To address this issue, this [...] Read more.
Effective identification of textual adversarial examples is a pressing need for safeguarding application security and maintaining cybersecurity. However, most existing adversarial defense methods for natural language processing can only resist a single form of attack and lack generalizability. To address this issue, this paper proposes a simple, efficient, and versatile defense method named TextShelter, which mitigates the limitations of existing approaches that rely on specific attack assumptions and struggle to handle real-world complex adversarial samples. TextShelter integrates three modules—Homoglyph Reversion, Spelling Correction, and Reconstruction-based Backtranslation—and enhances the defense efficiency of each module through careful design and optimization. By collaboratively combining the outputs of these modules, the method achieves effective defense against multi-granularity hybrid perturbations without requiring knowledge of the target model’s structure or parameters, nor any model retraining. Experiments on three datasets including IMDb show that TextShelter can effectively restore the original output labels of adversarial examples, improving classification accuracy by up to 60%. Compared with existing mainstream defense methods, it enhances defensive capability by approximately 50%. Furthermore, TextShelter performs well in terms of sentiment preservation, robustness, and transferability, demonstrating promising extensibility. Full article
(This article belongs to the Special Issue Advancements in AI-Driven Cybersecurity and Securing AI Systems)
Show Figures

Figure 1

30 pages, 2155 KB  
Article
Extreme Multi-Label Text Classification for Less-Represented Languages and Low-Resource Environments: Advances and Lessons Learned
by Nikola Ivačič, Blaž Škrlj, Boshko Koloski, Senja Pollak, Nada Lavrač and Matthew Purver
Mach. Learn. Knowl. Extr. 2025, 7(4), 142; https://doi.org/10.3390/make7040142 - 11 Nov 2025
Viewed by 1020
Abstract
Amid ongoing efforts to develop extremely large, multimodal models, there is increasing interest in efficient Small Language Models (SLMs) that can operate without reliance on large data-centre infrastructure. However, recent SLMs (e.g., LLaMA or Phi) with up to three billion parameters are predominantly [...] Read more.
Amid ongoing efforts to develop extremely large, multimodal models, there is increasing interest in efficient Small Language Models (SLMs) that can operate without reliance on large data-centre infrastructure. However, recent SLMs (e.g., LLaMA or Phi) with up to three billion parameters are predominantly trained in high-resource languages, such as English, which limits their applicability to industries that require robust NLP solutions for less-represented languages and low-resource settings, particularly those requiring low latency and adaptability to evolving label spaces. This paper examines a retrieval-based approach to multi-label text classification (MLC) for a media monitoring dataset, with a particular focus on less-represented languages, such as Slovene. This dataset presents an extreme MLC challenge, with instances labelled using up to twelve thousand categories. The proposed method, which combines retrieval with computationally efficient prediction, effectively addresses challenges related to multilinguality, resource constraints, and frequent label changes. We adopt a model-agnostic approach that does not rely on a specific model architecture or language selection. Our results demonstrate that techniques from the extreme multi-label text classification (XMC) domain outperform traditional Transformer-based encoder models, particularly in handling dynamic label spaces without requiring continuous fine-tuning. Additionally, we highlight the effectiveness of this approach in scenarios involving rare labels, where baseline models struggle with generalisation. Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
Show Figures

Graphical abstract

18 pages, 1914 KB  
Article
Leveraging Transformer with Self-Attention for Multi-Label Emotion Classification in Crisis Tweets
by Patricia Anthony and Jing Zhou
Informatics 2025, 12(4), 114; https://doi.org/10.3390/informatics12040114 - 22 Oct 2025
Viewed by 1795
Abstract
Social media platforms have become a widely used medium for individuals to express complex and multifaceted emotions. Traditional single-label emotion classification methods fall short in accurately capturing the simultaneous presence of multiple emotions within these texts. To address this limitation, we propose a [...] Read more.
Social media platforms have become a widely used medium for individuals to express complex and multifaceted emotions. Traditional single-label emotion classification methods fall short in accurately capturing the simultaneous presence of multiple emotions within these texts. To address this limitation, we propose a classification model that enhances the pre-trained Cardiff NLP transformer by integrating additional self-attention layers. Experimental results show our approach achieves a micro-F1 score of 0.7208, a macro-F1 score of 0.6192, and an average Jaccard index of 0.6066, which is an overall improvement of approximately 3.00% compared to the baseline. We apply this model to a real-world dataset of tweets related to the 2011 Christchurch earthquakes as a case study to demonstrate its ability to capture multi-category emotional expressions and detect co-occurring emotions that single-label approaches would miss. Our analysis revealed distinct emotional patterns aligned with key seismic events, including overlapping positive and negative emotions, and temporal dynamics of emotional response. This work contributes a robust method for fine-grained emotion analysis which can aid disaster response, mental health monitoring and social research. Full article
(This article belongs to the Special Issue Practical Applications of Sentiment Analysis)
Show Figures

Figure 1

36 pages, 3396 KB  
Article
Graph-Enhanced Prompt Tuning for Evidence-Grounded HFACS Classification in Power-System Safety
by Wenhua Zeng, Wenhu Tang, Diping Yuan, Bo Zhang, Na Xu and Hui Zhang
Energies 2025, 18(20), 5389; https://doi.org/10.3390/en18205389 - 13 Oct 2025
Viewed by 734
Abstract
Power-system safety is fundamental to protecting lives and ensuring reliable grid operation. Yet, hierarchical text classification (HTC) methods struggle with domain-dense accident narratives that require cross-sentence reasoning, often yielding limited fine-grained recognition, inconsistent label paths, and weak evidence traceability. We propose EG-HPT (Evidence-Grounded [...] Read more.
Power-system safety is fundamental to protecting lives and ensuring reliable grid operation. Yet, hierarchical text classification (HTC) methods struggle with domain-dense accident narratives that require cross-sentence reasoning, often yielding limited fine-grained recognition, inconsistent label paths, and weak evidence traceability. We propose EG-HPT (Evidence-Grounded Hierarchy-Aware Prompt Tuning), which augments hierarchical prompt tuning with Global Pointer-based nested-entity recognition and a sentence–entity heterogeneous graph to aggregate cross-sentence cues; label-aware attention selects Top-k evidence nodes and a weighted InfoNCE objective aligns label and evidence representations, while a hierarchical separation loss and an ancestor-completeness constraint regularize the taxonomy. On a HFACS-based power-accident corpus, EG-HPT consistently outperforms strong baselines in Micro-F1, Macro-F1, and path-constrained Micro-F1 (C-Micro-F1), with ablations confirming the contributions of entity evidence and graph aggregation. These results indicate a deployable, interpretable solution for automated risk factor analysis, enabling auditable evidence chains and supporting multi-granularity accident intelligence in safety-critical operations. Full article
(This article belongs to the Special Issue AI, Big Data, and IoT for Smart Grids and Electric Vehicles)
Show Figures

Figure 1

29 pages, 3273 KB  
Article
Development Analysis of China’s New-Type Power System Based on Governmental and Media Texts via Multi-Label BERT Classification
by Mingyuan Zhou, Heng Chen, Minghong Liu, Yinan Wang, Lingshuang Liu and Yan Zhang
Energies 2025, 18(17), 4650; https://doi.org/10.3390/en18174650 - 2 Sep 2025
Viewed by 1317
Abstract
In response to China’s dual-carbon strategy, this study proposes a comprehensive analytical framework to identify the evolutionary pathways of key policy tasks in developing a new-type power system. A dual-channel data acquisition process was designed to extract, standardize, and segment policy documents and [...] Read more.
In response to China’s dual-carbon strategy, this study proposes a comprehensive analytical framework to identify the evolutionary pathways of key policy tasks in developing a new-type power system. A dual-channel data acquisition process was designed to extract, standardize, and segment policy documents and online texts into a unified corpus. A multi-label BERT classification model was then developed, incorporating domain-specific terminology injection, label-wise attention, dynamic threshold scanning, and imbalance-aware weighting. The model was trained and validated on 200 energy news articles, 100 official policy releases, and 10 strategic planning documents. By the 10th epoch, it achieved convergence with a Macro-F1 of 0.831, Micro-F1 of 0.849, and Samples-F1 of 0.855. Ablation studies confirmed the significant performance gain over simplified configurations. Structural label analysis showed “Build system-friendly new energy power stations” was the most frequent label (107 in plans, 80 in news, 24 in policies) and had the highest co-occurrence (81 times) with “Optimize and strengthen the main grid framework.” The label co-occurrence network revealed multi-layered couplings across generation, transmission, and storage. The Priority Evaluation Index (PEI) further identified “Build shared energy storage power stations” as a structurally central task (centrality = 0.71) despite its lower frequency, highlighting its latent strategic importance. Within the domain of national-level public policy and planning documents, the proposed framework shows reliable and reusable performance. Generalization to sub-national and project-level corpora is left for future work, where we will extend the corpus and reassess robustness without altering the core methodology. Full article
Show Figures

Figure 1

28 pages, 1712 KB  
Article
Identifying Literary Microgenres and Writing Style Differences in Romanian Novels with ReaderBench and Large Language Models
by Aura Cristina Udrea, Stefan Ruseti, Vlad Pojoga, Stefan Baghiu, Andrei Terian and Mihai Dascalu
Future Internet 2025, 17(9), 397; https://doi.org/10.3390/fi17090397 - 30 Aug 2025
Cited by 1 | Viewed by 1186
Abstract
Recent developments in natural language processing, particularly large language models (LLMs), create new opportunities for literary analysis in underexplored languages like Romanian. This study investigates stylistic heterogeneity and genre blending in 175 late 19th- and early 20th-century Romanian novels, each classified by literary [...] Read more.
Recent developments in natural language processing, particularly large language models (LLMs), create new opportunities for literary analysis in underexplored languages like Romanian. This study investigates stylistic heterogeneity and genre blending in 175 late 19th- and early 20th-century Romanian novels, each classified by literary historians into one of 17 genres. Our findings reveal that most novels do not adhere to a single genre label but instead combine elements of multiple (micro)genres, challenging traditional single-label classification approaches. We employed a dual computational methodology combining an analysis with Romanian-tailored linguistic features with general-purpose LLMs. ReaderBench, a Romanian-specific framework, was utilized to extract surface, syntactic, semantic, and discourse features, capturing fine-grained linguistic patterns. Alternatively, we prompted two LLMs (Llama3.3 70B and DeepSeek-R1 70B) to predict genres at the paragraph level, leveraging their ability to detect contextual and thematic coherence across multiple narrative scales. Statistical analyses using Kruskal–Wallis and Mann–Whitney tests identified genre-defining features at both novel and chapter levels. The integration of these complementary approaches enhances microgenre detection beyond traditional classification capabilities. ReaderBench provides quantifiable linguistic evidence, while LLMs capture broader contextual patterns; together, they provide a multi-layered perspective on literary genre that reflects the complex and heterogeneous character of fictional texts. Our results argue that both language-specific and general-purpose computational tools can effectively detect stylistic diversity in Romanian fiction, opening new avenues for computational literary analysis in limited-resourced languages. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))
Show Figures

Figure 1

42 pages, 3460 KB  
Review
A Survey of Multi-Label Text Classification Under Few-Shot Scenarios
by Wenlong Hu, Qiang Fan, Hao Yan, Xinyao Xu, Shan Huang and Ke Zhang
Appl. Sci. 2025, 15(16), 8872; https://doi.org/10.3390/app15168872 - 12 Aug 2025
Cited by 3 | Viewed by 5575
Abstract
Multi-label text classification is a fundamental and important task in natural language processing, with widespread applications in specialized domains such as sentiment analysis, legal document classification, and medical coding. However, real-world applications often face challenges such as high annotation costs, data scarcity, and [...] Read more.
Multi-label text classification is a fundamental and important task in natural language processing, with widespread applications in specialized domains such as sentiment analysis, legal document classification, and medical coding. However, real-world applications often face challenges such as high annotation costs, data scarcity, and long-tailed label distributions. These issues are particularly pronounced in professional fields like healthcare and law, significantly limiting the performance of classification models. This paper focuses on the topic of few-shot multi-label text classification and provides a systematic survey of current research progress and mainstream techniques. From multiple perspectives, including modeling under few-shot settings, research status, technical approaches, commonly used datasets, and evaluation metrics, this study comprehensively reviews the existing literature and advances. At the technical level, the methods are broadly categorized into data augmentation and model training. The latter includes paradigms such as transfer learning, prompt learning, metric learning, meta-learning, graph neural networks, and attention mechanisms. In addition, this survey explores the research and progress of specific tasks under few-shot multi-label scenarios, such as multi-label aspect category detection, multi-label intent detection, and hierarchical multi-label text classification. In terms of experimental resources, this review compiles commonly used datasets along with their characteristics and categorizes evaluation metrics that are widely adopted in few-shot multi-label classification settings. Finally, it discusses the key research challenges and outlines future directions, offering insights to guide further investigation in this field. Full article
Show Figures

Figure 1

19 pages, 303 KB  
Article
Beyond Traditional Classifiers: Evaluating Large Language Models for Robust Hate Speech Detection
by Basel Barakat and Sardar Jaf
Computation 2025, 13(8), 196; https://doi.org/10.3390/computation13080196 - 10 Aug 2025
Viewed by 3748
Abstract
Hate speech detection remains a significant challenge due to the nuanced and context-dependent nature of hateful language. Traditional classifiers, trained on specialized corpora, often struggle to accurately identify subtle or manipulated hate speech. This paper explores the potential of utilizing large language models [...] Read more.
Hate speech detection remains a significant challenge due to the nuanced and context-dependent nature of hateful language. Traditional classifiers, trained on specialized corpora, often struggle to accurately identify subtle or manipulated hate speech. This paper explores the potential of utilizing large language models (LLMs) to address these limitations. By leveraging their extensive training on diverse texts, LLMs demonstrate a superior ability to understand context, which is crucial for effective hate speech detection. We conduct a comprehensive evaluation of various LLMs on both binary and multi-label hate speech datasets to assess their performance. Our findings aim to clarify the extent to which LLMs can enhance hate speech classification accuracy, particularly in complex and challenging cases. Full article
Show Figures

Graphical abstract

27 pages, 3503 KB  
Article
Structure-Aware and Format-Enhanced Transformer for Accident Report Modeling
by Wenhua Zeng, Wenhu Tang, Diping Yuan, Hui Zhang, Pinsheng Duan and Shikun Hu
Appl. Sci. 2025, 15(14), 7928; https://doi.org/10.3390/app15147928 - 16 Jul 2025
Cited by 1 | Viewed by 1374
Abstract
Modeling accident investigation reports is crucial for elucidating accident causation mechanisms, analyzing risk evolution processes, and formulating effective accident prevention strategies. However, such reports are typically long, hierarchically structured, and information-dense, posing unique challenges for existing language models. To address these domain-specific characteristics, [...] Read more.
Modeling accident investigation reports is crucial for elucidating accident causation mechanisms, analyzing risk evolution processes, and formulating effective accident prevention strategies. However, such reports are typically long, hierarchically structured, and information-dense, posing unique challenges for existing language models. To address these domain-specific characteristics, this study proposes SAFE-Transformer, a Structure-Aware and Format-Enhanced Transformer designed for long-document modeling in the emergency safety context. SAFE-Transformer adopts a dual-stream encoding architecture to separately model symbolic section features and heading text, integrates hierarchical depth and format types into positional encodings, and introduces a dynamic gating unit to adaptively fuse headings with paragraph semantics. We evaluate the model on a multi-label accident intelligence classification task using a real-world corpus of 1632 official reports from high-risk industries. Results demonstrate that SAFE-Transformer effectively captures hierarchical semantic structure and outperforms strong long-text baselines. Further analysis reveals an inverted U-shaped performance trend across varying report lengths and highlights the role of attention sparsity and label distribution in long-text modeling. This work offers a practical solution for structurally complex safety documents and provides methodological insights for downstream applications in safety supervision and risk analysis. Full article
(This article belongs to the Special Issue Advances in Smart Construction and Intelligent Buildings)
Show Figures

Figure 1

21 pages, 3826 KB  
Article
UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding
by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang
Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025
Viewed by 2891
Abstract
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article
(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)
Show Figures

Figure 1

17 pages, 380 KB  
Article
Multi-Head Hierarchical Attention Framework with Multi-Level Learning Optimization Strategy for Legal Text Recognition
by Ke Zhang, Yufei Tu, Jun Lu, Zhongliang Ai, Zhonglin Liu, Licai Wang and Xuelin Liu
Electronics 2025, 14(10), 1946; https://doi.org/10.3390/electronics14101946 - 10 May 2025
Cited by 2 | Viewed by 1674
Abstract
Owing to the rapid increase in the amount of legal text data and the increasing demand for intelligent processing, multi-label legal text recognition is becoming increasingly important in practical applications such as legal information retrieval and case classification. However, traditional methods have limitations [...] Read more.
Owing to the rapid increase in the amount of legal text data and the increasing demand for intelligent processing, multi-label legal text recognition is becoming increasingly important in practical applications such as legal information retrieval and case classification. However, traditional methods have limitations in handling the complex semantics and multi-label characteristics of legal texts, making it difficult to accurately extract feature and effective category information. Therefore, this study proposes a novel multi-head hierarchical attention framework suitable for multi-label legal text recognition tasks. This framework comprises a feature extraction module and a hierarchical module. The former extracts multi-level semantic representations of text, while the latter obtains multi-label category information. In addition, this study proposes a novel hierarchical learning optimization strategy that balances the learning needs of multi-level semantic representation and multi-label category information through data preprocessing, loss calculation, and weight updating, effectively accelerating the convergence speed of framework training. We conducted comparative experiments on the legal domain dataset CAIL2021 and the general multi-label recognition datasets AAPD and Web of Science (WOS). The results indicate that the method proposed in this study is significantly superior to mainstream methods in legal and general scenarios, demonstrating excellent performance. The study findings are expected to be widely applied in the field of intelligent processing of legal information, improving the accuracy of intelligent classification of judicial cases and further promoting the digitalization and intelligence process of the legal industry. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

18 pages, 2052 KB  
Article
Research on the Automatic Multi-Label Classification of Flight Instructor Comments Based on Transformer and Graph Neural Networks
by Zejian Liang, Yunxiang Zhao, Mengyuan Wang, Hong Huang and Haiwen Xu
Aerospace 2025, 12(5), 407; https://doi.org/10.3390/aerospace12050407 - 4 May 2025
Cited by 2 | Viewed by 1060
Abstract
With the rapid advancement of the civil aviation sector and the concurrent expansion of pilot training programs, a pressing need arises for more efficient assessment methodologies during the pilot training process. Traditional written evaluations conducted by flight instructors are often marred by subjectivity [...] Read more.
With the rapid advancement of the civil aviation sector and the concurrent expansion of pilot training programs, a pressing need arises for more efficient assessment methodologies during the pilot training process. Traditional written evaluations conducted by flight instructors are often marred by subjectivity and inefficiency, rendering them inadequate to satisfy the stringent demands of Competency-Based Training and Assessment (CBTA) frameworks. To address this challenge, this study presents a novel multi-label classification model that seamlessly integrates RoBERTa, a robust language model, with Graph Convolutional Networks (GCNs). By simultaneously modeling text features and label interdependencies, this model enables the automated, multi-dimensional classification of instructor evaluations. It incorporates a dynamic weight fusion strategy, which intelligently adjusts the output weights of RoBERTa and GCNs based on label correlations. Additionally, it introduces a label co-occurrence graph convolution layer, designed to capture intricate higher-order dependencies among labels. This study is based on a real-world dataset comprising 1078 evaluations and 158 labels, covering six major dimensions, including operational capabilities and communication skills. To provide context for the improvement, the proposed RoBERTa + GCN model is compared with key baseline models, such as BERT and LSTM. The results show that the RoBERTa + GCN model achieves an F1 score of 0.9737, representing an average improvement of 4.73% over these traditional methods. This approach enhances the consistency and efficiency of flight training assessments and provides new insights into integrating natural language processing and graph neural networks, demonstrating broad application prospects. Full article
(This article belongs to the Special Issue New Trends in Aviation Development 2024–2025)
Show Figures

Figure 1

Back to TopTop