Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (48)

Search Parameters:
Keywords = regional vocabulary

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 2805 KB  
Article
The Transformation and Cultural Adaptation of Jātaka Elements in Classic Malay Literature
by Siaw Hung Ng
Religions 2025, 16(12), 1532; https://doi.org/10.3390/rel16121532 - 5 Dec 2025
Viewed by 554
Abstract
The literature of the Malay world, profoundly influenced by Indian traditions, frequently adheres to the narrative patterns found in Indian literature. With the rise of Islam, literary works in the Parrot Story collection were used to propagate Islamic teachings, while subsequent adaptations and [...] Read more.
The literature of the Malay world, profoundly influenced by Indian traditions, frequently adheres to the narrative patterns found in Indian literature. With the rise of Islam, literary works in the Parrot Story collection were used to propagate Islamic teachings, while subsequent adaptations and reinterpretations have led to relatively independent content. Within the framework of Sanskrit culture, the Jātaka Tales have also exerted a significant influence. Before the widespread adoption of written texts, these tales were transmitted orally and gradually evolved into written literature as local languages developed. Traveling along maritime trade routes, these tales were adapted through the use of indigenous vocabulary, reinterpretation of plots, and structural imitation in the Malay world. While grounded in Buddhist thought, these tales also reflect the social and cultural realities of the Malay world. The dissemination of Jātaka Tales across Southeast Asia underscores the broader religious and cultural diffusion patterns facilitated by maritime networks. This paper situates Jātaka literature within a broader context of religious and cultural exchange throughout the Asian maritime realm, examining the intersection of Jātaka Tales with early Malay regional narrative traditions and Indian literature. Specifically, it compares several parallel Jātaka stories in parrot stories such as the Persian version Tūtī Nāmah and its Malay translation Hikayat Bayan Budiman, demonstrating their transformation across various languages and cultures, revealing a complex process of cultural negotiation. In addition to Indic influences, the Malay literary tradition was shaped through interactions with Sinitic religious and artistic currents, fostering a syncretic environment where Hindu, Buddhist, and later, Islamic elements coexisted and merged, illuminating the dynamic interplay of Indic and Sinitic influences on the development of Malay literary traditions. Full article
(This article belongs to the Special Issue Buddhist Literature and Art across Eurasia)
11 pages, 771 KB  
Article
VisPower: Curriculum-Guided Multimodal Alignment for Fine-Grained Anomaly Perception in Power Systems
by Huaguang Yan, Zhenyu Chen, Jianguang Du, Yunfeng Yan and Shuai Zhao
Electronics 2025, 14(23), 4747; https://doi.org/10.3390/electronics14234747 - 2 Dec 2025
Cited by 1 | Viewed by 380
Abstract
Precise perception of subtle anomalies in power equipment—such as insulator cracks, conductor corrosion, or foreign intrusions—is vital for ensuring the reliability of smart grids. However, foundational vision-language models (VLMs) like CLIP exhibit poor domain transfer and fail to capture minute defect semantics. We [...] Read more.
Precise perception of subtle anomalies in power equipment—such as insulator cracks, conductor corrosion, or foreign intrusions—is vital for ensuring the reliability of smart grids. However, foundational vision-language models (VLMs) like CLIP exhibit poor domain transfer and fail to capture minute defect semantics. We propose VisPower, a curriculum-guided multimodal alignment framework that progressively enhances fine-grained perception through two training stages: (1) Semantic Grounding, leveraging 100 K long-caption pairs to establish a robust linguistic-visual foundation, and (2) Contrastive Refinement, using 24 K region-level and hard-negative samples to strengthen discrimination among visually similar anomalies. Trained on our curated PowerAnomalyVL dataset, VisPower achieves an 18.4% absolute gain in zero-shot retrieval accuracy and a 16.8% improvement in open-vocabulary defect detection (OV-DD) over strong CLIP baselines. These results demonstrate the effectiveness of curriculum-based multimodal alignment for high-stakes industrial anomaly perception. Full article
(This article belongs to the Section Industrial Electronics)
Show Figures

Figure 1

29 pages, 166274 KB  
Article
Bridging Vision Foundation and Vision–Language Models for Open-Vocabulary Semantic Segmentation of UAV Imagery
by Fan Li, Zhaoxiang Zhang, Xuanbin Wang, Xuan Wang and Yuelei Xu
Remote Sens. 2025, 17(22), 3704; https://doi.org/10.3390/rs17223704 - 13 Nov 2025
Viewed by 1158
Abstract
Open-vocabulary semantic segmentation (OVSS) is of critical importance for unmanned aerial vehicle (UAV) imagery, as UAV scenes are highly dynamic and characterized by diverse, unpredictable object categories. Current OVSS approaches mainly rely on the zero-shot capabilities of vision–language models (VLMs), but their image-level [...] Read more.
Open-vocabulary semantic segmentation (OVSS) is of critical importance for unmanned aerial vehicle (UAV) imagery, as UAV scenes are highly dynamic and characterized by diverse, unpredictable object categories. Current OVSS approaches mainly rely on the zero-shot capabilities of vision–language models (VLMs), but their image-level pretraining objectives yield ambiguous spatial relationships and coarse-grained feature representations, resulting in suboptimal performance in UAV scenes. In this work, we propose a novel hybrid framework for OVSS in UAV imagery, named HOSU, which leverages the priors of vision foundation models to unleash the potential of vision–language models in representing complex spatial distributions and capturing fine-grained small-object details in UAV scenes. Specifically, we propose a distribution-aware fine-tuning method that aligns CLIP with DINOv2 across intra- and inter-region feature distributions, enhancing the capacity of CLIP to model complex scene semantics and capture fine-grained details critical for UAV imagery. Meanwhile, we propose a text-guided multi-level regularization mechanism that leverages the text embeddings of CLIP to impose semantic constraints on the visual features, preventing their drift from the original semantic space during fine-tuning and ensuring stable vision–language correspondence. Finally, to address the pervasive occlusion in UAV imagery, we propose a mask-based feature consistency strategy that enables the model to learn stable representations, remaining robust against viewpoint-induced occlusions. Extensive experiments across four training settings on six UAV datasets demonstrate that our approach consistently achieves state-of-the-art performance compared with previous methods, while comprehensive ablation studies and analyses further validate its effectiveness. Full article
Show Figures

Graphical abstract

14 pages, 447 KB  
Systematic Review
Meat Adulteration in the MENA and GCC Regions: A Scoping Review of Risks, Detection Technologies, and Regulatory Challenges
by Zeina Daher, Mahmoud Mohamadin, Adem Rama, Amal Salem Saeed Albedwawi, Hind Mahmoud Mahaba and Sultan Ali Al Taher
Foods 2025, 14(21), 3743; https://doi.org/10.3390/foods14213743 - 31 Oct 2025
Viewed by 1398
Abstract
Background: Meat adulteration poses serious public health, economic, and religious concerns, particularly in the Middle East and North Africa (MENA) and Gulf Cooperation Council (GCC) regions where halal authenticity is essential. While isolated studies have reported undeclared species in meat products, a comprehensive [...] Read more.
Background: Meat adulteration poses serious public health, economic, and religious concerns, particularly in the Middle East and North Africa (MENA) and Gulf Cooperation Council (GCC) regions where halal authenticity is essential. While isolated studies have reported undeclared species in meat products, a comprehensive regional synthesis of prevalence, detection technologies, and regulatory responses has been lacking. Methods: This scoping review followed PRISMA-ScR guidelines. A systematic search of PubMed, Scopus, and Web of Science from database inception to 15 September 2025 was conducted using controlled vocabulary (MeSH) and free-text terms. Eligible studies included laboratory-based investigations of meat adulteration in MENA and GCC countries. Data were charted on study characteristics, adulteration types, detection methods, and regulatory context. Results: Out of 50 records screened, 35 studies were included, covering 27 MENA/GCC countries. Prevalence of adulteration varied widely, from 5% in UAE surveillance studies to 66.7% in Egyptian native sausages. Undeclared species most frequently detected were poultry, donkey, equine, pig, and dog. Molecular methods, particularly PCR and qPCR, were most widely applied, followed by ELISA and spectroscopy. Recent studies introduced biosensors, AI-assisted spectroscopy, and blockchain traceability, but adoption in regulatory practice remains limited. Conclusions: Meat adulteration in the MENA and GCC regions is localized and product-specific rather than uniformly widespread. Detection technologies are advancing, yet regulatory enforcement and halal-sensitive verification remain fragmented. Strengthening laboratory capacity, harmonizing regional standards, and investing in portable biosensors, AI-enhanced spectral tools, and blockchain-based traceability are critical for consumer trust, halal integrity, and food safety. Full article
Show Figures

Figure 1

25 pages, 4531 KB  
Article
Interoperable Knowledge Graphs for Localized Supply Chains: Leveraging Graph Databases and RDF Standards
by Vishnu Kumar
Logistics 2025, 9(4), 144; https://doi.org/10.3390/logistics9040144 - 13 Oct 2025
Viewed by 2791
Abstract
Background: Ongoing challenges such as geopolitical conflicts, trade disruptions, economic sanctions, and political instability have underscored the urgent need for large manufacturing enterprises to improve resilience and reduce dependence on global supply chains. Integrating regional and local Small- and Medium-Sized Enterprises (SMEs) [...] Read more.
Background: Ongoing challenges such as geopolitical conflicts, trade disruptions, economic sanctions, and political instability have underscored the urgent need for large manufacturing enterprises to improve resilience and reduce dependence on global supply chains. Integrating regional and local Small- and Medium-Sized Enterprises (SMEs) has been proposed as a strategic approach to enhance supply chain localization, yet barriers such as limited visibility, qualification hurdles, and integration difficulties persist. Methods: This study proposes a comprehensive knowledge graph driven framework for representing and discovering SMEs, implemented as a proof-of-concept in the U.S. BioPharma sector. The framework constructs a curated knowledge graph in Neo4j, converts it to Resource Description Framework (RDF) format, and aligns it with the Schema.org vocabulary to enable semantic interoperability and enhance the discoverability of SMEs. Results: The developed knowledge graph, consisting of 488 nodes and 11,520 edges, enabled accurate multi-hop SME discovery with query response times under 10 milliseconds. RDF serialization produced 16,086 triples, validated across platforms to confirm interoperability and semantic consistency. Conclusions: The proposed framework provides a scalable, adaptable, and generalizable solution for SME discovery and supply chain localization, offering a practical pathway to strengthen resilience in diverse manufacturing industries. Full article
Show Figures

Figure 1

17 pages, 6174 KB  
Article
Tracking Change in Rock Art Vocabularies and Styles at Marapikurrinya (Port Hedland, Northwest Australia)
by Sam Harper
Arts 2025, 14(5), 123; https://doi.org/10.3390/arts14050123 - 11 Oct 2025
Viewed by 854
Abstract
Track engravings dominate the rock art assemblage across Marapikurrinya (Port Hedland) in Northwest Australia, with social change through time linked to changes in how and when this graphic vocabulary is employed. Discrete styles have been identified within the broader engraving body, which is [...] Read more.
Track engravings dominate the rock art assemblage across Marapikurrinya (Port Hedland) in Northwest Australia, with social change through time linked to changes in how and when this graphic vocabulary is employed. Discrete styles have been identified within the broader engraving body, which is argued to have been produced semi-continuously over the last 7000 years, from the point of sea-level stabilisation in this region. It is proposed that changes in these styles reflect and negotiate environmental, demographic, and social changes. In the most recent stylistic phases, track motifs dominate, and it is argued to reflect change in marking strategy, from localised territorial bounded art to regional social harmonisation. This paper explores the potential functions of track motifs as a vocabulary distinct from other figurative art, using Marapikurrinya as a case study. Full article
(This article belongs to the Special Issue Advances in Rock Art Studies)
Show Figures

Figure 1

24 pages, 4764 KB  
Article
Mask-Guided Teacher–Student Learning for Open-Vocabulary Object Detection in Remote Sensing Images
by Shuojie Wang, Yu Song, Jiajun Xiang, Yanyan Chen, Ping Zhong and Ruigang Fu
Remote Sens. 2025, 17(19), 3385; https://doi.org/10.3390/rs17193385 - 9 Oct 2025
Viewed by 1512
Abstract
Open-vocabulary object detection in remote sensing aims to detect novel categories not seen during training, which is crucial for practical aerial image analysis applications. While some approaches accomplish this task through large-scale data construction, such methods incur substantial annotation and computational costs. In [...] Read more.
Open-vocabulary object detection in remote sensing aims to detect novel categories not seen during training, which is crucial for practical aerial image analysis applications. While some approaches accomplish this task through large-scale data construction, such methods incur substantial annotation and computational costs. In contrast, we focus on efficient utilization of limited datasets. However, existing methods such as CastDet struggle with inefficient data utilization and class imbalance issues in pseudo-label generation for novel categories. We propose an enhanced open-vocabulary detection framework that addresses these limitations through two key innovations. First, we introduce a selective masking strategy that enables direct utilization of partially annotated images by masking base category regions in teacher model inputs. This approach eliminates the need for strict data separation and significantly improves data efficiency. Second, we develop a dynamic frequency-based class weighting that automatically adjusts category weights based on real-time pseudo-label statistics to mitigate class imbalance issues. Our approach integrates these components into a student–teacher learning framework with RemoteCLIP for novel category classification. Comprehensive experiments demonstrate significant improvements on both datasets: on VisDroneZSD, we achieve 42.7% overall mAP and 41.4% harmonic mean, substantially outperforming existing methods. On DIOR dataset, our method achieves 63.7% overall mAP with 49.5% harmonic mean. Our framework achieves more balanced performance between base and novel categories, providing a practical and data-efficient solution for open-vocabulary aerial object detection. Full article
Show Figures

Figure 1

21 pages, 3747 KB  
Article
Open-Vocabulary Crack Object Detection Through Attribute-Guided Similarity Probing
by Hyemin Yoon and Sangjin Kim
Appl. Sci. 2025, 15(19), 10350; https://doi.org/10.3390/app151910350 - 24 Sep 2025
Viewed by 1879
Abstract
Timely detection of road surface defects such as cracks and potholes is critical for ensuring traffic safety and reducing infrastructure maintenance costs. While recent advances in image-based deep learning techniques have shown promise for automated road defect detection, existing models remain limited to [...] Read more.
Timely detection of road surface defects such as cracks and potholes is critical for ensuring traffic safety and reducing infrastructure maintenance costs. While recent advances in image-based deep learning techniques have shown promise for automated road defect detection, existing models remain limited to closed-set detection settings, making it difficult to recognize newly emerging or fine-grained defect types. To address this limitation, we propose an attribute-aware open-vocabulary crack detection (AOVCD) framework, which leverages the alignment capability of pretrained vision–language models to generalize beyond fixed class labels. In this framework, crack types are represented as combinations of visual attributes, enabling semantic grounding between image regions and natural language descriptions. To support this, we extend the existing PPDD dataset with attribute-level annotations and incorporate a multi-label attribute recognition task as an auxiliary objective. Experimental results demonstrate that the proposed AOVCD model outperforms existing baselines. In particular, compared to CLIP-based zero-shot inference, the proposed model achieves approximately a 10-fold improvement in average precision (AP) for novel crack categories. Attribute classification performance—covering geometric, spatial, and textural features—also increases by 40% in balanced accuracy (BACC) and 23% in AP. These results indicate that integrating structured attribute information enhances generalization to previously unseen defect types, especially those involving subtle visual cues. Our study suggests that incorporating attribute-level alignment within a vision–language framework can lead to more adaptive and semantically grounded defect recognition systems. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 881 KB  
Article
Text-Guided Spatio-Temporal 2D and 3D Data Fusion for Multi-Object Tracking with RegionCLIP
by Youlin Liu, Zainal Rasyid Mahayuddin and Mohammad Faidzul Nasrudin
Appl. Sci. 2025, 15(18), 10112; https://doi.org/10.3390/app151810112 - 16 Sep 2025
Viewed by 1336
Abstract
3D Multi-Object Tracking (3D MOT) is a critical task in autonomous systems, where accurate and robust tracking of multiple objects in dynamic environments is essential. Traditional approaches primarily rely on visual or geometric features, often neglecting the rich semantic information available in textual [...] Read more.
3D Multi-Object Tracking (3D MOT) is a critical task in autonomous systems, where accurate and robust tracking of multiple objects in dynamic environments is essential. Traditional approaches primarily rely on visual or geometric features, often neglecting the rich semantic information available in textual modalities. In this paper, we propose Text-Guided 3D Multi-Object Tracking (TG3MOT), a novel framework that incorporates Vision-Language Models (VLMs) into the YONTD architecture to improve 3D MOT performance. Our framework leverages RegionCLIP, a multimodal open-vocabulary detector, to achieve fine-grained alignment between image regions and textual concepts, enabling the incorporation of semantic information into the tracking process. To address challenges such as occlusion, blurring, and ambiguous object appearances, we introduce the Target Semantic Matching Module (TSM), which quantifies the uncertainty of semantic alignment and filters out unreliable regions. Additionally, we propose the 3D Feature Exponential Moving Average Module (3D F-EMA) to incorporate temporal information, improving robustness in noisy or occluded scenarios. Furthermore, the Gaussian Confidence Fusion Module (GCF) is introduced to weight historical trajectory confidences based on temporal proximity, enhancing the accuracy of trajectory management. We evaluate our framework on the KITTI dataset and compare it with the YONTD baseline. Extensive experiments demonstrate that although the overall HOTA gain of TG3MOT is modest (+0.64%), our method achieves substantial improvements in association accuracy (+0.83%) and significantly reduces ID switches (−16.7%). These improvements are particularly valuable in real-world autonomous driving scenarios, where maintaining consistent trajectories under occlusion and ambiguous appearances is crucial for downstream tasks such as trajectory prediction and motion planning. The code will be made publicly available. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 6635 KB  
Article
Research on the Language System of Rural Cultural Landscapes in Jiufanggou, Dawu County, Based on the Concept of Isomorphism
by Rui Li, Yawei Zhang, Chenshuo Wang, Xuanxuan Xu and Wanshi Li
Land 2025, 14(9), 1895; https://doi.org/10.3390/land14091895 - 16 Sep 2025
Viewed by 620
Abstract
[Objective] Currently, there are limitations in the understanding of rural cultural landscape: they are often perceived as material spatial entities, with a lack of exploration of their intangible elements and neglect of the isomorphism between the material and intangible elements of cultural landscapes. [...] Read more.
[Objective] Currently, there are limitations in the understanding of rural cultural landscape: they are often perceived as material spatial entities, with a lack of exploration of their intangible elements and neglect of the isomorphism between the material and intangible elements of cultural landscapes. In the context of rural cultural revitalization, it is necessary to explore the regional protection elements of rural cultural landscapes from the perspective of isomorphism. [Methods/Process] This study employs relevant linguistic theories to extract and construct a framework for a language system with regional characteristics for rural cultural landscapes from an isomorphous perspective. By deconstructing the rural cultural landscape pattern of Jiufangou in Dawu County, it summarizes the relationships and isomorphous nature between the constituent elements of this language system. [Results/Conclusions] The study identifies eight core landscape terms. These lexical units form landscape sentences based on four typical scenarios. The study then analyzed the landscape grammatical structures of different scenarios from four dimensions and explored the deep semantic meanings and contextual rules of Jiufanggou Village’s cultural landscape. Finally, this study utilizes a schematic diagram of the “vocabulary–grammar–sentence” nested structure of the Jiufanggou cultural landscape to visually illustrate the interconnections and patterns of cultural landscape elements in Jiufanggou Village across different contexts. Building on this, the study explores the structural equivalence between the material and immaterial elements of rural cultural landscapes. Overall, the construction of a nested linguistic system for rural cultural landscapes is not only about analyzing spatial forms but more importantly about exploring the underlying logical order and traditional wisdom behind spatial creation, thereby achieving the goals of associative protection, the inheritance of diverse cultures, and the continuation of the vitality of rural cultural landscapes. Full article
(This article belongs to the Special Issue Land Use, Heritage and Ecosystem Services)
Show Figures

Figure 1

14 pages, 2231 KB  
Article
OpenMamba: Introducing State Space Models to Open-Vocabulary Semantic Segmentation
by Viktor Ungur and Călin-Adrian Popa
Appl. Sci. 2025, 15(16), 9087; https://doi.org/10.3390/app15169087 - 18 Aug 2025
Viewed by 3142
Abstract
Open-vocabulary semantic segmentation aims to label each pixel of an image based on text descriptions provided at inference time. Recent approaches for this task are based on methods which require two stages: the first one uses a mask generator to generate mask proposals, [...] Read more.
Open-vocabulary semantic segmentation aims to label each pixel of an image based on text descriptions provided at inference time. Recent approaches for this task are based on methods which require two stages: the first one uses a mask generator to generate mask proposals, while the other one deals with segment classification using a pre-trained vision–language model, such as CLIP. However, since CLIP is pre-trained on natural images, the model struggles with segmentation masks because of their abstract nature. In this paper, we introduce OpenMamba, a novel approach to creating high-level guidance maps to assist in extracting CLIP features within the masked regions for classification. High-level guidance maps are generated by leveraging both visual and textual modalities and introducing State Space Duality (SSD) as an efficient way to tackle the open-vocabulary semantic segmentation task. Also, we propose a new matching technique for the mask proposals, based on IoU with a dynamic threshold conditioned by mask quality and we introduce a contrastive-based loss to assure that similar mask proposals achieve similar CLIP embeddings. Comprehensive experiments across open-vocabulary benchmarks show that our method can achieve superior performance compared to other approaches while managing to reduce memory consumption. Full article
Show Figures

Figure 1

14 pages, 288 KB  
Article
Cross-Regional Students’ Engagement and Teacher Relationships Across Online and In-School Learning
by Huiqi Hu, Yijun Wang and Wolfgang Jacquet
Educ. Sci. 2025, 15(8), 993; https://doi.org/10.3390/educsci15080993 - 5 Aug 2025
Viewed by 1771
Abstract
This study examines how teacher–student relationships and school engagement change across online and in-school learning, based on the experiences of 105 cross-regional secondary vocational students in China. Using questionnaire surveys, the study explores students’ perceptions and learning needs in both settings. The findings [...] Read more.
This study examines how teacher–student relationships and school engagement change across online and in-school learning, based on the experiences of 105 cross-regional secondary vocational students in China. Using questionnaire surveys, the study explores students’ perceptions and learning needs in both settings. The findings confirm that teachers play a consistently positive role in promoting student engagement across both online and in-school learning modalities. While affective engagement was higher during online learning, driven by stronger teacher responsiveness and improved student–teacher relationships, students reported increased pride in their schools after returning home, reflecting a renewed appreciation. In-school learning was associated with higher behavioral engagement and greater motivation, despite tensions over intensified academic tasks. Online learning facilitated cognitive engagement through easier vocabulary searches; nevertheless, poor home environments reduced motivation. Enhancing engagement may require offering students autonomy, valuing their input, and clarifying the relevance of the learning content. Full article
21 pages, 3826 KB  
Article
UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding
by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang
Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025
Viewed by 2814
Abstract
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article
(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)
Show Figures

Figure 1

24 pages, 4913 KB  
Article
Region-Wise Recognition and Classification of Arabic Dialects and Vocabulary: A Deep Learning Approach
by Fawaz S. Al–Anzi and Bibin Shalini Sundaram Thankaleela
Appl. Sci. 2025, 15(12), 6516; https://doi.org/10.3390/app15126516 - 10 Jun 2025
Viewed by 2546
Abstract
This article presents a unique approach to Arabic dialect identification using a pre-trained speech classification model. The system categorizes Arabic audio clips into their respective dialects by employing 1D and 2D convolutional neural network technologies built from diverse dialects from the Arab region [...] Read more.
This article presents a unique approach to Arabic dialect identification using a pre-trained speech classification model. The system categorizes Arabic audio clips into their respective dialects by employing 1D and 2D convolutional neural network technologies built from diverse dialects from the Arab region using deep learning models. Its objective is to enhance traditional linguistic handling and speech technology by accurately classifying Arabic audio clips into their corresponding dialects. The techniques involved include record gathering, preprocessing, feature extraction, prototypical architecture, and assessment metrics. The algorithm distinguishes various Arabic dialects, such as A (Arab nation authorized dialectal), EGY (Egyptian Arabic), GLF (Gulf Arabic), LAV and LF (Levantine Arabic, spoken in Syria, Lebanon, and Jordan), MSA (Modern Standard Arabic), NOR (North African Arabic), and SA (Saudi Arabic). Experimental results demonstrate the efficiency of the proposed approach in accurately determining diverse Arabic dialects, achieving a testing accuracy of 94.28% and a validation accuracy of 95.55%, surpassing traditional machine learning models such as Random Forest and SVM and advanced erudition models such as CNN and CNN2D. Full article
(This article belongs to the Special Issue Speech Recognition and Natural Language Processing)
Show Figures

Figure 1

25 pages, 2508 KB  
Article
OVSLT: Advancing Sign Language Translation with Open Vocabulary
by Ai Wang, Junhui Li, Wuyang Luan and Lei Pan
Electronics 2025, 14(5), 1044; https://doi.org/10.3390/electronics14051044 - 6 Mar 2025
Viewed by 3485
Abstract
Hearing impairments affect approximately 1.5 billion individuals worldwide, highlighting the critical need for effective communication tools between deaf and hearing populations. Traditional sign language translation (SLT) models predominantly rely on gloss-based methods, which convert visual sign language inputs into intermediate gloss sequences before [...] Read more.
Hearing impairments affect approximately 1.5 billion individuals worldwide, highlighting the critical need for effective communication tools between deaf and hearing populations. Traditional sign language translation (SLT) models predominantly rely on gloss-based methods, which convert visual sign language inputs into intermediate gloss sequences before generating textual translations. However, these methods are constrained by their reliance on extensive annotated data, susceptibility to error propagation, and inadequate handling of low-frequency or unseen sign language vocabulary, thus limiting their scalability and practical application. Drawing upon multimodal translation theory, this study proposes the open-vocabulary sign language translation (OVSLT) method, designed to overcome these challenges by integrating open-vocabulary principles. OVSLT introduces two pivotal modules: Enhanced Caption Generation and Description (CGD), and Grid Feature Grouping with Advanced Alignment Techniques. The Enhanced CGD module employs a GPT model enhanced with a Negative Retriever and Semantic Retrieval-Augmented Features (SRAF) to produce semantically rich textual descriptions of sign gestures. In parallel, the Grid Feature Grouping module applies Grid Feature Grouping, contrastive learning, feature-discriminative contrastive loss, and balanced region loss scaling to refine visual feature representations, ensuring robust alignment with textual descriptions. We evaluated OVSLT on the PHOENIX-14T and CSLDaily datasets. The results demonstrated a ROUGE score of 29.6% on the PHOENIX-14T dataset and 30.72% on the CSLDaily dataset, significantly outperforming existing models. These findings underscore the versatility and effectiveness of OVSLT, showcasing the potential of open-vocabulary approaches to surpass the limitations of traditional SLT systems and contribute to the evolving field of multimodal translation. Full article
Show Figures

Figure 1

Back to TopTop