Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,015)

Search Parameters:
Keywords = text–visual

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2699 KiB  
Article
How to Talk to Your Classifier: Conditional Text Generation with Radar–Visual Latent Space
by Julius Ott, Huawei Sun, Lorenzo Servadei and Robert Wille
Sensors 2025, 25(14), 4467; https://doi.org/10.3390/s25144467 (registering DOI) - 17 Jul 2025
Abstract
Many radar applications rely primarily on visual classification for their evaluations. However, new research is integrating textual descriptions alongside visual input and showing that such multimodal fusion improves contextual understanding. A critical issue in this area is the effective alignment of coded text [...] Read more.
Many radar applications rely primarily on visual classification for their evaluations. However, new research is integrating textual descriptions alongside visual input and showing that such multimodal fusion improves contextual understanding. A critical issue in this area is the effective alignment of coded text with corresponding images. To this end, our paper presents an adversarial training framework that generates descriptive text from the latent space of a visual radar classifier. Our quantitative evaluations show that this dual-task approach maintains a robust classification accuracy of 98.3% despite the inclusion of Gaussian-distributed latent spaces. Beyond these numerical validations, we conduct a qualitative study of the text output in relation to the classifier’s predictions. This analysis highlights the correlation between the generated descriptions and the assigned categories and provides insight into the classifier’s visual interpretation processes, particularly in the context of normally uninterpretable radar data. Full article
35 pages, 1458 KiB  
Article
User Comment-Guided Cross-Modal Attention for Interpretable Multimodal Fake News Detection
by Zepu Yi, Chenxu Tang and Songfeng Lu
Appl. Sci. 2025, 15(14), 7904; https://doi.org/10.3390/app15147904 - 15 Jul 2025
Viewed by 50
Abstract
In order to address the pressing challenge posed by the proliferation of fake news in the digital age, we emphasize its profound and harmful impact on societal structures, including the misguidance of public opinion, the erosion of social trust, and the exacerbation of [...] Read more.
In order to address the pressing challenge posed by the proliferation of fake news in the digital age, we emphasize its profound and harmful impact on societal structures, including the misguidance of public opinion, the erosion of social trust, and the exacerbation of social polarization. Current fake news detection methods are largely limited to superficial text analysis or basic text–image integration, which face significant limitations in accurately identifying deceptive information. To bridge this gap, we propose the UC-CMAF framework, which comprehensively integrates news text, images, and user comments through an adaptive co-attention fusion mechanism. The UC-CMAF workflow consists of four key subprocesses: multimodal feature extraction, cross-modal adaptive collaborative attention fusion of news text and images, cross-modal attention fusion of user comments with news text and images, and finally, input of fusion features into a fake news detector. Specifically, we introduce multi-head cross-modal attention heatmaps and comment importance visualizations to provide interpretability support for the model’s predictions, revealing key semantic areas and user perspectives that influence judgments. Through the cross-modal adaptive collaborative attention mechanism, UC-CMAF achieves deep semantic alignment between news text and images and uses social signals from user comments to build an enhanced credibility evaluation path, offering a new paradigm for interpretable fake information detection. Experimental results demonstrate that UC-CMAF consistently outperforms 15 baseline models across two benchmark datasets, achieving F1 Scores of 0.894 and 0.909. These results validate the effectiveness of its adaptive cross-modal attention mechanism and the incorporation of user comments in enhancing both detection accuracy and interpretability. Full article
(This article belongs to the Special Issue Explainable Artificial Intelligence Technology and Its Applications)
Show Figures

Figure 1

20 pages, 5700 KiB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Viewed by 155
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

24 pages, 1040 KiB  
Article
The Role of Visual Cues in Online Reviews: How Image Complexity Shapes Review Helpfulness
by Yongjie Chu, Xinru Liu and Cengceng Liu
J. Theor. Appl. Electron. Commer. Res. 2025, 20(3), 181; https://doi.org/10.3390/jtaer20030181 - 15 Jul 2025
Viewed by 166
Abstract
Online reviews play a critical role in shaping consumer decisions and providing valuable insights to enhance the products and services for businesses. As visual content becomes increasingly prevalent in reviews, it is essential to understand how image complexity influences review helpfulness. Despite the [...] Read more.
Online reviews play a critical role in shaping consumer decisions and providing valuable insights to enhance the products and services for businesses. As visual content becomes increasingly prevalent in reviews, it is essential to understand how image complexity influences review helpfulness. Despite the growing importance of images, the impact of color diversity and texture homogeneity on review helpfulness remains underexplored. Grounded in Information Diagnosticity Theory and Dual Coding Theory, this study investigates the relationship between image complexity and review helpfulness, as well as the moderating role of review text readability. Using a large-scale dataset from the hotel and travel sectors, the findings reveal that color diversity has a positive effect on review helpfulness, while texture homogeneity follows an inverted U-shaped relationship with helpfulness. Furthermore, text readability strengthens the positive impact of texture homogeneity, making moderately homogeneous images more effective when paired with clear and well-structured text. Heterogeneity analysis demonstrates that these effects vary across product categories. The results advance the understanding of multimodal information processing in online reviews, providing actionable guidance for platforms and businesses to refine the review systems. Full article
(This article belongs to the Section e-Commerce Analytics)
Show Figures

Figure 1

21 pages, 3826 KiB  
Article
UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding
by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang
Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025
Viewed by 157
Abstract
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article
(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)
Show Figures

Figure 1

23 pages, 3614 KiB  
Article
A Multimodal Semantic-Enhanced Attention Network for Fake News Detection
by Weijie Chen, Yuzhuo Dang and Xin Zhang
Entropy 2025, 27(7), 746; https://doi.org/10.3390/e27070746 - 12 Jul 2025
Viewed by 272
Abstract
The proliferation of social media platforms has triggered an unprecedented increase in multimodal fake news, creating pressing challenges for content authenticity verification. Current fake news detection systems predominantly rely on isolated unimodal analysis (text or image), failing to exploit critical cross-modal correlations or [...] Read more.
The proliferation of social media platforms has triggered an unprecedented increase in multimodal fake news, creating pressing challenges for content authenticity verification. Current fake news detection systems predominantly rely on isolated unimodal analysis (text or image), failing to exploit critical cross-modal correlations or leverage latent social context cues. To bridge this gap, we introduce the SCCN (Semantic-enhanced Cross-modal Co-attention Network), a novel framework that synergistically combines multimodal features with refined social graph signals. Our approach innovatively combines text, image, and social relation features through a hierarchical fusion framework. First, we extract modality-specific features and enhance semantics by identifying entities in both text and visual data. Second, an improved co-attention mechanism selectively integrates social relations while removing irrelevant connections to reduce noise and exploring latent informative links. Finally, the model is optimized via cross-entropy loss with entropy minimization. Experimental results for benchmark datasets (PHEME and Weibo) show that SCCN consistently outperforms existing approaches, achieving relative accuracy enhancements of 1.7% and 1.6% over the best-performing baseline methods in each dataset. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

20 pages, 4538 KiB  
Article
Image Captioning Method Based on CLIP-Combined Local Feature Enhancement and Multi-Scale Semantic Guidance
by Liang Wang, Mengxue Zhang, Meiqing Jiao, Enru Chen, Yuru Ma and Jun Wang
Electronics 2025, 14(14), 2809; https://doi.org/10.3390/electronics14142809 - 12 Jul 2025
Viewed by 201
Abstract
To address the issues of modeling the relationships between multiple local region objects in images and enhancing local region features, as well as mapping global image semantics to global text semantics and local region image semantics to local text semantics, a novel image [...] Read more.
To address the issues of modeling the relationships between multiple local region objects in images and enhancing local region features, as well as mapping global image semantics to global text semantics and local region image semantics to local text semantics, a novel image captioning method based on CLIP and integrating local feature enhancement and multi-scale semantic guidance is proposed. The model employs ViT as the global visual encoder, Faster R-CNN as the local region visual encoder, BERT as the text encoder, and GPT-2 as the text decoder. By constructing a KNN graph of local image features, the model models the relationships between local region objects and then enhances the local region features using a graph attention network. Additionally, a multi-scale semantic guidance method is utilized to calculate the global and local semantic weights, thereby improving the accuracy of scene description and attribute detail description generated by the GPT-2 decoder. Evaluated on MSCOCO and Flickr30k datasets, the model achieves a significant improvement in the core metric CIDEr over established strong baselines, with 4.7% higher CIDEr than OFA on MSCOCO, and 16.6% higher CIDEr than Unified VLP on Flickr30k. Ablation studies and qualitative analysis validate the effectiveness of each proposed module. Full article
Show Figures

Figure 1

13 pages, 1574 KiB  
Article
SnapStick: Merging AI and Accessibility to Enhance Navigation for Blind Users
by Shehzaib Shafique, Gian Luca Bailo, Silvia Zanchi, Mattia Barbieri, Walter Setti, Giulio Sciortino, Carlos Beltran, Alice De Luca, Alessio Del Bue and Monica Gori
Technologies 2025, 13(7), 297; https://doi.org/10.3390/technologies13070297 - 11 Jul 2025
Viewed by 200
Abstract
Navigational aids play a vital role in enhancing the mobility and independence of blind and visually impaired (VI) individuals. However, existing solutions often present challenges related to discomfort, complexity, and limited ability to provide detailed environmental awareness. To address these limitations, we introduce [...] Read more.
Navigational aids play a vital role in enhancing the mobility and independence of blind and visually impaired (VI) individuals. However, existing solutions often present challenges related to discomfort, complexity, and limited ability to provide detailed environmental awareness. To address these limitations, we introduce SnapStick, an innovative assistive technology designed to improve spatial perception and navigation. SnapStick integrates a Bluetooth-enabled smart cane, bone-conduction headphones, and a smartphone application powered by the Florence-2 Vision Language Model (VLM) to deliver real-time object recognition, text reading, bus route detection, and detailed scene descriptions. To assess the system’s effectiveness and user experience, eleven blind participants evaluated SnapStick, and usability was measured using the System Usability Scale (SUS). In addition to the 94% accuracy, the device received an SUS score of 84.7%, indicating high user satisfaction, ease of use, and comfort. Participants reported that SnapStick significantly improved their ability to navigate, recognize objects, identify text, and detect landmarks with greater confidence. The system’s ability to provide accurate and accessible auditory feedback proved essential for real-world applications, making it a practical and user-friendly solution. These findings highlight SnapStick’s potential to serve as an effective assistive device for blind individuals, enhancing autonomy, safety, and navigation capabilities in daily life. Future work will explore further refinements to optimize user experience and adaptability across different environments. Full article
(This article belongs to the Section Assistive Technologies)
Show Figures

Figure 1

16 pages, 1610 KiB  
Article
Cascaded Dual-Inpainting Network for Scene Text
by Chunmei Liu
Appl. Sci. 2025, 15(14), 7742; https://doi.org/10.3390/app15147742 - 10 Jul 2025
Viewed by 115
Abstract
Scene text inpainting is a significant research challenge in visual text processing, with critical applications spanning incomplete traffic sign comprehension, degraded container-code recognition, occluded vehicle license plate processing, and other incomplete scene text processing systems. In this paper, a cascaded dual-inpainting network for [...] Read more.
Scene text inpainting is a significant research challenge in visual text processing, with critical applications spanning incomplete traffic sign comprehension, degraded container-code recognition, occluded vehicle license plate processing, and other incomplete scene text processing systems. In this paper, a cascaded dual-inpainting network for scene text (CDINST) is proposed. The architecture integrates two scene text inpainting models to reconstruct the text foreground: the Structure Generation Module (SGM) and Structure Reconstruction Module (SRM). The SGM primarily performs preliminary foreground text reconstruction and extracts text structures. Building upon the SGM’s guidance, the SRM subsequently enhances the foreground structure reconstruction through structure-guided refinement. The experimental results demonstrate compelling performance on the benchmark dataset, showcasing both the effectiveness of the proposed dual-inpainting network and its accuracy in incomplete scene text recognition. The proposed network achieves an average recognition accuracy improvement of 11.94% compared to baseline methods for incomplete scene text recognition tasks. Full article
Show Figures

Figure 1

22 pages, 1661 KiB  
Article
UniText: A Unified Framework for Chinese Text Detection, Recognition, and Restoration in Ancient Document and Inscription Images
by Lu Shen, Zewei Wu, Xiaoyuan Huang, Boliang Zhang, Su-Kit Tang, Jorge Henriques and Silvia Mirri
Appl. Sci. 2025, 15(14), 7662; https://doi.org/10.3390/app15147662 - 8 Jul 2025
Viewed by 260
Abstract
Processing ancient text images presents significant challenges due to severe visual degradation, missing glyph structures, and various types of noise caused by aging. These issues are particularly prominent in Chinese historical documents and stone inscriptions, where diverse writing styles, multi-angle capturing, uneven lighting, [...] Read more.
Processing ancient text images presents significant challenges due to severe visual degradation, missing glyph structures, and various types of noise caused by aging. These issues are particularly prominent in Chinese historical documents and stone inscriptions, where diverse writing styles, multi-angle capturing, uneven lighting, and low contrast further hinder the performance of traditional OCR techniques. In this paper, we propose a unified neural framework, UniText, for the detection, recognition, and glyph restoration of Chinese characters in images of historical documents and inscriptions. UniText operates at the character level and processes full-page inputs, making it robust to multi-scale, multi-oriented, and noise-corrupted text. The model adopts a multi-task architecture that integrates spatial localization, semantic recognition, and visual restoration through stroke-aware supervision and multi-scale feature aggregation. Experimental results on our curated dataset of ancient Chinese texts demonstrate that UniText achieves a competitive performance in detection and recognition while producing visually faithful restorations under challenging conditions. This work provides a technically scalable and generalizable framework for image-based document analysis, with potential applications in historical document processing, digital archiving, and broader tasks in text image understanding. Full article
Show Figures

Figure 1

44 pages, 1067 KiB  
Review
Toward Adaptive and Immune-Inspired Viable Supply Chains: A PRISMA Systematic Review of Mathematical Modeling Trends
by Andrés Polo, Daniel Morillo-Torres and John Willmer Escobar
Mathematics 2025, 13(14), 2225; https://doi.org/10.3390/math13142225 - 8 Jul 2025
Viewed by 411
Abstract
This study presents a systematic literature review on the mathematical modeling of resilient and viable supply chains, grounded in the PRISMA methodology and applied to a curated corpus of 235 peer-reviewed scientific articles published between 2011 and 2025. The search strategy was implemented [...] Read more.
This study presents a systematic literature review on the mathematical modeling of resilient and viable supply chains, grounded in the PRISMA methodology and applied to a curated corpus of 235 peer-reviewed scientific articles published between 2011 and 2025. The search strategy was implemented across four major academic databases (Scopus and Web of Science) using Boolean operators to capture intersections among the core concepts of supply chains, resilience, viability, and advanced optimization techniques. The screening process involved a double manual assessment of titles, abstracts, and full texts, based on inclusion criteria centered on the presence of formal mathematical models, computational approaches, and thematic relevance. As a result of the selection process, six thematic categories were identified, clustering the literature according to their analytical objectives and methodological approaches: viability-oriented modeling, resilient supply chain optimization, agile and digitally enabled supply chains, logistics optimization and network configuration, uncertainty modeling, and immune system-inspired approaches. These categories were validated through a bibliometric analysis and a thematic map that visually represents the density and centrality of core research topics. Descriptive analysis revealed a significant increase in scientific output starting in 2020, driven by post-pandemic concerns and the accelerated digitalization of logistics operations. At the methodological level, a high degree of diversity in modeling techniques was observed, with an emphasis on mixed-integer linear programming (MILP), robust optimization, multi-objective modeling, and the increasing use of bio-inspired algorithms, artificial intelligence, and simulation frameworks. The results confirm a paradigm shift toward integrative frameworks that combine robustness, adaptability, and Industry 4.0 technologies, as well as a growing interest in biological metaphors applied to resilient system design. Finally, the review identifies research gaps related to the formal integration of viability under disruptive scenarios, the operationalization of immune-inspired models in logistics environments, and the need for hybrid approaches that jointly address resilience, agility, and sustainability. Full article
(This article belongs to the Section D2: Operations Research and Fuzzy Decision Making)
Show Figures

Figure 1

20 pages, 4752 KiB  
Article
Designing an AI-Supported Framework for Literary Text Adaptation in Primary Classrooms
by Savvas A. Chatzichristofis, Alexandros Tsopozidis, Avgousta Kyriakidou-Zacharoudiou, Salomi Evripidou and Angelos Amanatiadis
AI 2025, 6(7), 150; https://doi.org/10.3390/ai6070150 - 8 Jul 2025
Viewed by 350
Abstract
Background/Objectives: This paper introduces a pedagogically grounded framework for transforming canonical literary texts in primary education through generative AI. Guided by multiliteracies theory, Vygotskian pedagogy, and epistemic justice, the system aims to enhance interpretive literacy, developmental alignment, and cultural responsiveness among learners aged [...] Read more.
Background/Objectives: This paper introduces a pedagogically grounded framework for transforming canonical literary texts in primary education through generative AI. Guided by multiliteracies theory, Vygotskian pedagogy, and epistemic justice, the system aims to enhance interpretive literacy, developmental alignment, and cultural responsiveness among learners aged 7–12. Methods: The proposed system enables educators to perform age-specific text simplification, visual re-narration, lexical reinvention, and multilingual augmentation through a suite of modular tools. Central to the design is the Ethical–Pedagogical Validation Layer (EPVL), a GPT-powered auditing module that evaluates AI-generated content across four normative dimensions: developmental appropriateness, cultural sensitivity, semantic fidelity, and ethical transparency. Results: The framework was fully implemented and piloted with primary educators (N = 8). The pilot demonstrated high usability, curricular alignment, and perceived value for classroom application. Unlike commercial Large Language Models (LLMs), the system requires no prompt engineering and supports editable, policy-aligned controls for normative localization. Conclusions: By embedding ethical evaluation within the generative loop, the framework fosters calibrated trust in human–AI collaboration and mitigates cultural stereotyping and ideological distortion. It advances a scalable, inclusive model for educator-centered AI integration, offering a new pathway for explainable and developmentally appropriate AI use in literary education. Full article
(This article belongs to the Special Issue AI Bias in the Media and Beyond)
Show Figures

Figure 1

22 pages, 7735 KiB  
Article
Visual Perception of Peripheral Screen Elements: The Impact of Text and Background Colors
by Snježana Ivančić Valenko, Marko Čačić, Ivana Žiljak Stanimirović and Anja Zorko
Appl. Sci. 2025, 15(14), 7636; https://doi.org/10.3390/app15147636 - 8 Jul 2025
Viewed by 231
Abstract
Visual perception of screen elements depends on their color, font, and position in the user interface design. Objects in the central part of the screen are perceived more easily than those in the peripheral areas. However, the peripheral space is valuable for applications [...] Read more.
Visual perception of screen elements depends on their color, font, and position in the user interface design. Objects in the central part of the screen are perceived more easily than those in the peripheral areas. However, the peripheral space is valuable for applications like advertising and promotion and should not be overlooked. Optimizing the design of elements in this area can improve user attention to peripheral visual stimuli during focused tasks. This study aims to evaluate how different combinations of text and background color affect the visibility of moving textual stimuli in the peripheral areas of the screen, while attention is focused on a central task. This study investigates how background color, combined with white or black text, affects the attention of participants. It also identifies which background color makes a specific word most noticeable in the peripheral part of the screen. We designed quizzes to present stimuli with black or white text on various background colors in the peripheral regions of the screen. The background colors tested were blue, red, yellow, green, white, and black. While saturation and brightness were kept constant, the color tone was varied. Among ten combinations of background and text color, we aimed to determine the most noticeable combination in the peripheral part of the screen. The combination of white text on a blue background resulted in the shortest detection time (1.376 s), while black text on a white background achieved the highest accuracy rate at 79%. The results offer valuable insights for improving peripheral text visibility in user interfaces across various visual communication domains such as video games, television content, and websites, where peripheral information must remain noticeable despite centrally focused user attention and complex viewing conditions. Full article
Show Figures

Figure 1

26 pages, 1804 KiB  
Article
Dependency-Aware Entity–Attribute Relationship Learning for Text-Based Person Search
by Wei Xia, Wenguang Gan and Xinpan Yuan
Big Data Cogn. Comput. 2025, 9(7), 182; https://doi.org/10.3390/bdcc9070182 - 7 Jul 2025
Viewed by 283
Abstract
Text-based person search (TPS), a critical technology for security and surveillance, aims to retrieve target individuals from image galleries using textual descriptions. The existing methods face two challenges: (1) ambiguous attribute–noun association (AANA), where syntactic ambiguities lead to incorrect associations between attributes and [...] Read more.
Text-based person search (TPS), a critical technology for security and surveillance, aims to retrieve target individuals from image galleries using textual descriptions. The existing methods face two challenges: (1) ambiguous attribute–noun association (AANA), where syntactic ambiguities lead to incorrect associations between attributes and the intended nouns; and (2) textual noise and relevance imbalance (TNRI), where irrelevant or non-discriminative tokens (e.g., ‘wearing’) reduce the saliency of critical visual attributes in the textual description. To address these aspects, we propose the dependency-aware entity–attribute alignment network (DEAAN), a novel framework that explicitly tackles AANA through dependency-guided attention and TNRI via adaptive token filtering. The DEAAN introduces two modules: (1) dependency-assisted implicit reasoning (DAIR) to resolve AANA through syntactic parsing, and (2) relevance-adaptive token selection (RATS) to suppress TNRI by learning token saliency. Experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid demonstrate state-of-the-art performance, with the DEAAN achieving a Rank-1 accuracy of 76.71% and an mAP of 69.07% on CUHK-PEDES, surpassing RDE by 0.77% in Rank-1 and 1.51% in mAP. Ablation studies reveal that DAIR and RATS individually improve Rank-1 by 2.54% and 3.42%, while their combination elevates the performance by 6.35%, validating their synergy. This work bridges structured linguistic analysis with adaptive feature selection, demonstrating practical robustness in surveillance-oriented TPS scenarios. Full article
Show Figures

Figure 1

40 pages, 2828 KiB  
Review
Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions
by Syed Arman Rabbani, Mohamed El-Tanani, Shrestha Sharma, Syed Salman Rabbani, Yahia El-Tanani, Rakesh Kumar and Manita Saini
BioMedInformatics 2025, 5(3), 37; https://doi.org/10.3390/biomedinformatics5030037 - 7 Jul 2025
Viewed by 874
Abstract
Generative artificial intelligence (AI) is rapidly transforming healthcare systems since the advent of OpenAI in 2022. It encompasses a class of machine learning techniques designed to create new content and is classified into large language models (LLMs) for text generation and image-generating models [...] Read more.
Generative artificial intelligence (AI) is rapidly transforming healthcare systems since the advent of OpenAI in 2022. It encompasses a class of machine learning techniques designed to create new content and is classified into large language models (LLMs) for text generation and image-generating models for creating or enhancing visual data. These generative AI models have shown widespread applications in clinical practice and research. Such applications range from medical documentation and diagnostics to patient communication and drug discovery. These models are capable of generating text messages, answering clinical questions, interpreting CT scan and MRI images, assisting in rare diagnoses, discovering new molecules, and providing medical education and training. Early studies have indicated that generative AI models can improve efficiency, reduce administrative burdens, and enhance patient engagement, although most findings are preliminary and require rigorous validation. However, the technology also raises serious concerns around accuracy, bias, privacy, ethical use, and clinical safety. Regulatory bodies, including the FDA and EMA, are beginning to define governance frameworks, while academic institutions and healthcare organizations emphasize the need for transparency, supervision, and evidence-based implementation. Generative AI is not a replacement for medical professionals but a potential partner—augmenting decision-making, streamlining communication, and supporting personalized care. Its responsible integration into healthcare could mark a paradigm shift toward more proactive, precise, and patient-centered systems. Full article
Show Figures

Figure 1

Back to TopTop