Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,217)

Search Parameters:
Keywords = visual representation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 1163 KiB  
Article
Crowds of Feminists: The Hybrid Activist Poetics of “No Manifesto” and Jennif(f)er Tamayo’s YOU DA ONE
by Becca Klaver
Humanities 2025, 14(7), 153; https://doi.org/10.3390/h14070153 - 18 Jul 2025
Abstract
This essay examines two hybrid poetic texts that emerged from a period of feminist activism in U.S. and global poetry communities from 2014 to 2017: the collaboratively, anonymously authored “No Manifesto” (2015) and the radically revised second edition of the book of poetry [...] Read more.
This essay examines two hybrid poetic texts that emerged from a period of feminist activism in U.S. and global poetry communities from 2014 to 2017: the collaboratively, anonymously authored “No Manifesto” (2015) and the radically revised second edition of the book of poetry and visual art YOU DA ONE by Jennif(f)er Tamayo. “No Manifesto” and YOU DA ONE embrace the hybrid tactics of collectivity, incongruity, and nonresolution as ways of protesting sexism and sexual violence in poetry communities. Synthesizing theories of hybridity from poetry criticism as well as immigrant and borderlands studies, the essay defines hybridity as a literary representation of cultural positions forcefully imposed upon subjects. Born out of the domination of sexual and state violence, hybridity marks the wound that remakes the subject, who develops strategies for resistance. By refusing to play by the rules of poetic or social discourse—the logics of domination that would have them be singular, cohesive, and compliant—Tamayo and the authors of “No Manifesto” insist on alternative ways of performing activism, composing literature, and entering the public sphere. These socially engaged, hybrid poetic texts demonstrate the power of the collective to disrupt the social and literary status quo. Full article
(This article belongs to the Special Issue Hybridity and Border Crossings in Contemporary North American Poetry)
22 pages, 1342 KiB  
Article
Multi-Scale Attention-Driven Hierarchical Learning for Fine-Grained Visual Categorization
by Zhihuai Hu, Rihito Kojima and Xian-Hua Han
Electronics 2025, 14(14), 2869; https://doi.org/10.3390/electronics14142869 - 18 Jul 2025
Abstract
Fine-grained visual categorization (FGVC) presents significant challenges due to subtle inter-class variation and significant intra-class diversity, often leading to limited discriminative capacity in global representations. Existing methods inadequately capture localized, class-relevant features across multiple semantic levels, especially under complex spatial configurations. To address [...] Read more.
Fine-grained visual categorization (FGVC) presents significant challenges due to subtle inter-class variation and significant intra-class diversity, often leading to limited discriminative capacity in global representations. Existing methods inadequately capture localized, class-relevant features across multiple semantic levels, especially under complex spatial configurations. To address these challenges, we introduce a Multi-scale Attention-driven Hierarchical Learning (MAHL) framework that iteratively refines feature representations via scale-adaptive attention mechanisms. Specifically, fully connected (FC) classifiers are applied to spatially pooled feature maps at multiple network stages to capture global semantic context. The learned FC weights are then projected onto the original high-resolution feature maps to compute spatial contribution scores for the predicted class, serving as attention cues. These multi-scale attention maps guide the selection of discriminative regions, which are hierarchically integrated into successive training iterations to reinforce both global and local contextual dependencies. Moreover, we explore a generalized pooling operation that parametrically fuses average and max pooling, enabling richer contextual retention in the encoded features. Comprehensive evaluations on benchmark FGVC datasets demonstrate that MAHL consistently outperforms state-of-the-art methods, validating its efficacy in learning robust, class-discriminative, high-resolution representations through attention-guided hierarchical refinement. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Image Classification)
Show Figures

Figure 1

15 pages, 1599 KiB  
Article
Visual Representations in AI: A Study on the Most Discriminatory Algorithmic Biases in Image Generation
by Yazmina Vargas-Veleda, María del Mar Rodríguez-González and Iñigo Marauri-Castillo
Journal. Media 2025, 6(3), 110; https://doi.org/10.3390/journalmedia6030110 - 18 Jul 2025
Abstract
This study analyses algorithmic biases in AI-generated images, focusing on aesthetic violence, gender stereotypes, and weight discrimination. By examining images produced by the DALL-E Nature and Flux 1 systems, it becomes evident how these tools reproduce and amplify hegemonic beauty standards, excluding bodily [...] Read more.
This study analyses algorithmic biases in AI-generated images, focusing on aesthetic violence, gender stereotypes, and weight discrimination. By examining images produced by the DALL-E Nature and Flux 1 systems, it becomes evident how these tools reproduce and amplify hegemonic beauty standards, excluding bodily diversity. Likewise, gender representations reinforce traditional roles, sexualising women and limiting the presence of non-normative bodies in positive contexts. The results show that training data and the algorithms used significantly influence these trends, perpetuating exclusionary visual narratives. The research highlights the need to develop more inclusive and ethical AI models, with diverse data that reflect the plurality of bodies and social realities. The study concludes that artificial intelligence (AI), far from being neutral, actively contributes to the reproduction of power structures and inequality, posing an urgent challenge for the development and regulation of these technologies. Full article
Show Figures

Figure 1

21 pages, 5917 KiB  
Article
VML-UNet: Fusing Vision Mamba and Lightweight Attention Mechanism for Skin Lesion Segmentation
by Tang Tang, Haihui Wang, Qiang Rao, Ke Zuo and Wen Gan
Electronics 2025, 14(14), 2866; https://doi.org/10.3390/electronics14142866 - 17 Jul 2025
Abstract
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks [...] Read more.
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks is crucial for accurate lesion localization and optimized clinical workflows. We propose the VML-UNet, a lightweight segmentation network with core innovations including the CPMamba module and the multi-scale local supervision module (MLSM). The CPMamba module integrates the visual state space (VSS) block and a channel prior attention mechanism to enable efficient modeling of spatial relationships with linear computational complexity through dynamic channel-space weight allocation, while preserving channel feature integrity. The MLSM enhances local feature perception and reduces the inference burden. Comparative experiments were conducted on three public datasets, including ISIC2017, ISIC2018, and PH2, with ablation experiments performed on ISIC2017. VML-UNet achieves 0.53 M parameters, 2.18 MB memory usage, and 1.24 GFLOPs time complexity, with its performance on the datasets outperforming comparative networks, validating its effectiveness. This study provides valuable references for developing lightweight, high-performance skin lesion segmentation networks, advancing the field of skin lesion segmentation. Full article
(This article belongs to the Section Bioelectronics)
Show Figures

Figure 1

17 pages, 3612 KiB  
Article
MPVT: An Efficient Multi-Modal Prompt Vision Tracker for Visual Target Tracking
by Jianyu Xie, Yan Fu, Junlin Zhou, Tianxiang He, Xiaopeng Wang, Yuke Fang and Duanbing Chen
Appl. Sci. 2025, 15(14), 7967; https://doi.org/10.3390/app15147967 - 17 Jul 2025
Abstract
Visual target tracking is a fundamental task in computer vision. Combining multi-modal information with tracking leverages complementary information, which improves the precision and robustness of trackers. Traditional multi-modal tracking methods typically employ a full fine-tuning scheme, i.e., fine-tuning pre-trained single-modal models to multi-modal [...] Read more.
Visual target tracking is a fundamental task in computer vision. Combining multi-modal information with tracking leverages complementary information, which improves the precision and robustness of trackers. Traditional multi-modal tracking methods typically employ a full fine-tuning scheme, i.e., fine-tuning pre-trained single-modal models to multi-modal tasks. However, this approach suffers from low transfer learning efficiency, catastrophic forgetting, and high cross-task deployment costs. To address these issues, we propose an efficient model named multi-modal prompt vision tracker (MPVT) based on an efficient prompt-tuning paradigm. Three key components are involved in the model: a decoupled input enhancement module, a dynamic adaptive prompt fusion module, and a fully connected head network module. The decoupled input enhancement module enhances input representations via positional and type embedding. The dynamic adaptive prompt fusion module achieves efficient prompt tuning and multi-modal interaction using scaled convolution and low-rank cross-modal attention mechanisms. The fully connected head network module addresses the shortcomings of traditional convolutional head networks such as inductive biases. Experimental results from RGB-T, RGB-D, and RGB-E scenarios show that MPVT outperforms state-of-the-art methods. Moreover, MPVT can save 43.8% GPU memory usage and reduce training time by 62.9% compared with a full-parameter fine-tuning model. Full article
(This article belongs to the Special Issue Advanced Technologies Applied for Object Detection and Tracking)
Show Figures

Figure 1

15 pages, 1142 KiB  
Technical Note
Terrain and Atmosphere Classification Framework on Satellite Data Through Attentional Feature Fusion Network
by Antoni Jaszcz and Dawid Połap
Remote Sens. 2025, 17(14), 2477; https://doi.org/10.3390/rs17142477 - 17 Jul 2025
Abstract
Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information [...] Read more.
Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information about space for autonomous systems, identifying landscape elements, or monitoring and maintaining the infrastructure and environment. Hence, in this paper, we propose a neural classifier architecture that analyzes different features by the parallel processing of information in the network and combines them with a feature fusion mechanism. The neural architecture model takes into account different types of features by extracting them by focusing on spatial, local patterns and multi-scale representation. In addition, the classifier is guided by an attention mechanism for focusing more on different channels, spatial information, and even feature pyramid mechanisms. Atrous convolutional operators were also used in such an architecture as better context feature extractors. The proposed classifier architecture is the main element of the modeled framework for satellite data analysis, which is based on the possibility of training depending on the client’s desire. The proposed methodology was evaluated on three publicly available classification datasets for remote sensing: satellite images, Visual Terrain Recognition, and USTC SmokeRS, where the proposed model achieved accuracy scores of 97.8%, 100.0%, and 92.4%, respectively. The obtained results indicate the effectiveness of the proposed attention mechanisms across different remote sensing challenges. Full article
Show Figures

Figure 1

14 pages, 4648 KiB  
Article
Cyber-Physical System and 3D Visualization for a SCADA-Based Drinking Water Supply: A Case Study in the Lerma Basin, Mexico City
by Gabriel Sepúlveda-Cervantes, Eduardo Vega-Alvarado, Edgar Alfredo Portilla-Flores and Eduardo Vivanco-Rodríguez
Future Internet 2025, 17(7), 306; https://doi.org/10.3390/fi17070306 - 17 Jul 2025
Abstract
Cyber-physical systems such as Supervisory Control and Data Acquisition (SCADA) have been applied in industrial automation and infrastructure management for decades. They are hybrid tools for administration, monitoring, and continuous control of real physical systems through their computational representation. SCADA systems have evolved [...] Read more.
Cyber-physical systems such as Supervisory Control and Data Acquisition (SCADA) have been applied in industrial automation and infrastructure management for decades. They are hybrid tools for administration, monitoring, and continuous control of real physical systems through their computational representation. SCADA systems have evolved along with computing technology, from their beginnings with low-performance computers, monochrome monitors and communication networks with a range of a few hundred meters, to high-performance systems with advanced 3D graphics and wired and wireless computer networks. This article presents a methodology for the design of a SCADA system with a 3D Visualization for Drinking Water Supply, and its implementation in the Lerma Basin System of Mexico City as a case study. The monitoring of water consumption from the wells is presented, as well as the pressure levels throughout the system. The 3D visualization is generated from the GIS information and the communication is carried out using a hybrid radio frequency transmission system, satellite, and telephone network. The pumps that extract water from each well are teleoperated and monitored in real time. The developed system can be scaled to generate a simulator of water behavior of the Lerma Basin System and perform contingency planning. Full article
Show Figures

Figure 1

27 pages, 7645 KiB  
Article
VMMT-Net: A Dual-Branch Parallel Network Combining Visual State Space Model and Mix Transformer for Land–Sea Segmentation of Remote Sensing Images
by Jiawei Wu, Zijian Liu, Zhipeng Zhu, Chunhui Song, Xinghui Wu and Haihua Xing
Remote Sens. 2025, 17(14), 2473; https://doi.org/10.3390/rs17142473 - 16 Jul 2025
Viewed by 76
Abstract
Land–sea segmentation is a fundamental task in remote sensing image analysis, and plays a vital role in dynamic coastline monitoring. The complex morphology and blurred boundaries of coastlines in remote sensing imagery make fast and accurate segmentation challenging. Recent deep learning approaches lack [...] Read more.
Land–sea segmentation is a fundamental task in remote sensing image analysis, and plays a vital role in dynamic coastline monitoring. The complex morphology and blurred boundaries of coastlines in remote sensing imagery make fast and accurate segmentation challenging. Recent deep learning approaches lack the ability to model spatial continuity effectively, thereby limiting a comprehensive understanding of coastline features in remote sensing imagery. To address this issue, we have developed VMMT-Net, a novel dual-branch semantic segmentation framework. By constructing a parallel heterogeneous dual-branch encoder, VMMT-Net integrates the complementary strengths of the Mix Transformer and the Visual State Space Model, enabling comprehensive modeling of local details, global semantics, and spatial continuity. We design a Cross-Branch Fusion Module to facilitate deep feature interaction and collaborative representation across branches, and implement a customized decoder module that enhances the integration of multiscale features and improves boundary refinement of coastlines. Extensive experiments conducted on two benchmark remote sensing datasets, GF-HNCD and BSD, demonstrate that the proposed VMMT-Net outperforms existing state-of-the-art methods in both quantitative metrics and visual quality. Specifically, the model achieves mean F1-scores of 98.48% (GF-HNCD) and 98.53% (BSD) and mean intersection-over-union values of 97.02% (GF-HNCD) and 97.11% (BSD). The model maintains reasonable computational complexity, with only 28.24 M parameters and 25.21 GFLOPs, striking a favorable balance between accuracy and efficiency. These results indicate the strong generalization ability and practical applicability of VMMT-Net in real-world remote sensing segmentation tasks. Full article
(This article belongs to the Special Issue Application of Remote Sensing in Coastline Monitoring)
Show Figures

Figure 1

21 pages, 41202 KiB  
Article
Copper Stress Levels Classification in Oilseed Rape Using Deep Residual Networks and Hyperspectral False-Color Images
by Yifei Peng, Jun Sun, Zhentao Cai, Lei Shi, Xiaohong Wu, Chunxia Dai and Yubin Xie
Horticulturae 2025, 11(7), 840; https://doi.org/10.3390/horticulturae11070840 - 16 Jul 2025
Viewed by 54
Abstract
In recent years, heavy metal contamination in agricultural products has become a growing concern in the field of food safety. Copper (Cu) stress in crops not only leads to significant reductions in both yield and quality but also poses potential health risks to [...] Read more.
In recent years, heavy metal contamination in agricultural products has become a growing concern in the field of food safety. Copper (Cu) stress in crops not only leads to significant reductions in both yield and quality but also poses potential health risks to humans. This study proposes an efficient and precise non-destructive detection method for Cu stress in oilseed rape, which is based on hyperspectral false-color image construction using principal component analysis (PCA). By comprehensively capturing the spectral representation of oilseed rape plants, both the one-dimensional (1D) spectral sequence and spatial image data were utilized for multi-class classification. The classification performance of models based on 1D spectral sequences was compared from two perspectives: first, between machine learning and deep learning methods (best accuracy: 93.49% vs. 96.69%); and second, between shallow and deep convolutional neural networks (CNNs) (best accuracy: 95.15% vs. 96.69%). For spatial image data, deep residual networks were employed to evaluate the effectiveness of visible-light and false-color images. The RegNet architecture was chosen for its flexible parameterization and proven effectiveness in extracting multi-scale features from hyperspectral false-color images. This flexibility enabled RegNetX-6.4GF to achieve optimal performance on the dataset constructed from three types of false-color images, with the model reaching a Macro-Precision, Macro-Recall, Macro-F1, and Accuracy of 98.17%, 98.15%, 98.15%, and 98.15%, respectively. Furthermore, Grad-CAM visualizations revealed that latent physiological changes in plants under heavy metal stress guided feature learning within CNNs, and demonstrated the effectiveness of false-color image construction in extracting discriminative features. Overall, the proposed technique can be integrated into portable hyperspectral imaging devices, enabling real-time and non-destructive detection of heavy metal stress in modern agricultural practices. Full article
Show Figures

Figure 1

20 pages, 3714 KiB  
Article
Seed Mixes in Landscape Design and Management: An Untapped Conservation Tool for Pollinators in Cities
by Cláudia Fernandes, Ana Medeiros, Catarina Teixeira, Miguel Porto, Mafalda Xavier, Sónia Ferreira and Ana Afonso
Land 2025, 14(7), 1477; https://doi.org/10.3390/land14071477 - 16 Jul 2025
Viewed by 127
Abstract
Urban green spaces are increasingly recognized as important habitats for pollinators, and wildflower seed mixes marketed as pollinator-friendly are gaining popularity, though their actual conservation value remains poorly understood. This study provides the first systematic screening of commercially available seed mixes in Portugal, [...] Read more.
Urban green spaces are increasingly recognized as important habitats for pollinators, and wildflower seed mixes marketed as pollinator-friendly are gaining popularity, though their actual conservation value remains poorly understood. This study provides the first systematic screening of commercially available seed mixes in Portugal, evaluating their taxonomic composition, origin, life cycle traits, and potential to support pollinator communities. A total of 229 seed mixes were identified. Although these have a predominance of native species (median 86%), the taxonomic diversity was limited, with 91% of mixes comprising species from only one or two families, predominantly Poaceae and Fabaceae, potentially restricting the range of floral resources available to pollinators. Only 21 seed mixes met the criteria for being pollinator-friendly, based on a three-step decision tree prioritizing native species, extended flowering periods, and visual diversity. These showed the highest percentage of native species (median 87%) and a greater representation of flowering plants. However, 76% of all mixes still included at least one non-native species, although none is considered invasive. Perennial species dominated all seed mix types, indicating the potential for the long-term persistence of wildflower meadows in urban spaces. Despite their promise, the ecological quality and transparency of the seed mix composition remain inconsistent, with limited certification or information on species origin. This highlights the need for clearer labeling, regulatory guidance, and ecologically informed formulations. Seed mixes, if properly designed and implemented, represent a largely untapped yet cost-effective tool for enhancing the pollinator habitats and biodiversity within urban landscapes. Full article
Show Figures

Figure 1

37 pages, 8356 KiB  
Article
Voxel-Based Digital Twin Framework for Earthwork Construction
by Muhammad Shoaib Khan, Hyuk Soo Cho and Jongwon Seo
Appl. Sci. 2025, 15(14), 7899; https://doi.org/10.3390/app15147899 - 15 Jul 2025
Viewed by 91
Abstract
Earthwork construction presents significant challenges due to its unique characteristics, including irregular topography, inhomogeneous geotechnical properties, dynamic operations involving heavy equipment, and continuous terrain updates over time. Existing methods often fail to accurately capture these complexities, support semantic attributes, simulate realistic equipment–environment interactions, [...] Read more.
Earthwork construction presents significant challenges due to its unique characteristics, including irregular topography, inhomogeneous geotechnical properties, dynamic operations involving heavy equipment, and continuous terrain updates over time. Existing methods often fail to accurately capture these complexities, support semantic attributes, simulate realistic equipment–environment interactions, and update the model dynamically during construction. Moreover, most current digital solutions lack an integrated framework capable of linking geotechnical semantics with construction progress in a continuously evolving terrain. This study introduces a novel, voxel-based digital twin framework tailored for earthwork construction. Unlike previous studies that relied on surface, mesh, or layer-based representations, our approach leverages semantically enriched voxelization to encode spatial, material, and behavioral attributes at a high resolution. The proposed framework connects the physical and digital representations of the earthwork environment and is structured into five modules. The data acquisition module gathers terrain, geotechnical, design, and construction data. Virtual models are created for the earthwork in as-planned and as-built models. The digital twin core module utilizes voxels to create a realistic earthwork environment that integrates the as-planned and as-built models, facilitating model–equipment interaction and updating models for progress monitoring. The visualization and simulation module enables model–equipment interaction based on evolving as-built conditions. Finally, the monitoring and analysis module provides volumetric progress insights, semantic material information, and excavation tracking. The key innovation of this framework lies in multi-resolution voxel modeling, semantic mapping of geotechnical properties, and supporting dynamic updates during ongoing construction, enabling model–equipment interaction and material-specific construction progress monitoring. The framework is validated through real-world case studies, demonstrating its effectiveness in providing realistic representations, model–equipment interactions, and supporting progress information and operational insights. Full article
(This article belongs to the Section Civil Engineering)
Show Figures

Figure 1

21 pages, 7084 KiB  
Article
Chinese Paper-Cutting Style Transfer via Vision Transformer
by Chao Wu, Yao Ren, Yuying Zhou, Ming Lou and Qing Zhang
Entropy 2025, 27(7), 754; https://doi.org/10.3390/e27070754 - 15 Jul 2025
Viewed by 142
Abstract
Style transfer technology has seen substantial attention in image synthesis, notably in applications like oil painting, digital printing, and Chinese landscape painting. However, it is often difficult to generate migrated images that retain the essence of paper-cutting art and have strong visual appeal [...] Read more.
Style transfer technology has seen substantial attention in image synthesis, notably in applications like oil painting, digital printing, and Chinese landscape painting. However, it is often difficult to generate migrated images that retain the essence of paper-cutting art and have strong visual appeal when trying to apply the unique style of Chinese paper-cutting art to style transfer. Therefore, this paper proposes a new method for Chinese paper-cutting style transformation based on the Transformer, aiming at realizing the efficient transformation of Chinese paper-cutting art styles. Specifically, the network consists of a frequency-domain mixture block and a multi-level feature contrastive learning module. The frequency-domain mixture block explores spatial and frequency-domain interaction information, integrates multiple attention windows along with frequency-domain features, preserves critical details, and enhances the effectiveness of style conversion. To further embody the symmetrical structures and hollowed hierarchical patterns intrinsic to Chinese paper-cutting, the multi-level feature contrastive learning module is designed based on a contrastive learning strategy. This module maximizes mutual information between multi-level transferred features and content features, improves the consistency of representations across different layers, and thus accentuates the unique symmetrical aesthetics and artistic expression of paper-cutting. Extensive experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches in both qualitative and quantitative evaluations. Additionally, we created a Chinese paper-cutting dataset that, although modest in size, represents an important initial step towards enriching existing resources. This dataset provides valuable training data and a reference benchmark for future research in this field. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

20 pages, 5700 KiB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Viewed by 186
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

21 pages, 3826 KiB  
Article
UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding
by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang
Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025
Viewed by 182
Abstract
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article
(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)
Show Figures

Figure 1

37 pages, 2921 KiB  
Article
A Machine-Learning-Based Data Science Framework for Effectively and Efficiently Processing, Managing, and Visualizing Big Sequential Data
by Alfredo Cuzzocrea, Islam Belmerabet, Abderraouf Hafsaoui and Carson K. Leung
Computers 2025, 14(7), 276; https://doi.org/10.3390/computers14070276 - 14 Jul 2025
Viewed by 366
Abstract
In recent years, the open data initiative has led to the willingness of many governments, researchers, and organizations to share their data and make it publicly available. Healthcare, disease, and epidemiological data, such as privacy statistics on patients who have suffered from epidemic [...] Read more.
In recent years, the open data initiative has led to the willingness of many governments, researchers, and organizations to share their data and make it publicly available. Healthcare, disease, and epidemiological data, such as privacy statistics on patients who have suffered from epidemic diseases such as the Coronavirus disease 2019 (COVID-19), are examples of open big data. Therefore, huge volumes of valuable data have been generated and collected at high speed from a wide variety of rich data sources. Analyzing these open big data can be of social benefit. For example, people gain a better understanding of disease by analyzing and mining disease statistics, which can inspire them to participate in disease prevention, detection, control, and combat. Visual representation further improves data understanding and corresponding results for analysis and mining, as a picture is worth a thousand words. In this paper, we present a visual data science solution for the visualization and visual analysis of large sequence data. These ideas are illustrated by the visualization and visual analysis of sequences of real epidemiological data of COVID-19. Through our solution, we enable users to visualize the epidemiological data of COVID-19 over time. It also allows people to visually analyze data and discover relationships between popular features associated with COVID-19 cases. The effectiveness of our visual data science solution in improving the user experience of visualization and visual analysis of large sequence data is demonstrated by the real-life evaluation of these sequenced epidemiological data of COVID-19. Full article
(This article belongs to the Special Issue Computational Science and Its Applications 2024 (ICCSA 2024))
Show Figures

Figure 1

Back to TopTop