MDPI - Publisher of Open Access Journals

28 pages, 32292 KB

Open AccessArticle

Contextual Feature Fusion-Based Keyframe Selection Using Semantic Attention and Diversity-Aware Optimization for Video Summarization

by Chitrakala S and Aparyay Kumar

Symmetry 2025, 17(10), 1737; https://doi.org/10.3390/sym17101737 - 15 Oct 2025

Viewed by 372

Abstract

Training-free video summarization tackles the challenge of selecting the most informative keyframes from a video without relying on costly training or complex deep models. This work introduces C2FVS-DPP (Contextual Feature Fusion Video Summarization with Determinantal Point Process), a lightweight framework that [...] Read more.

Training-free video summarization tackles the challenge of selecting the most informative keyframes from a video without relying on costly training or complex deep models. This work introduces C2FVS-DPP (Contextual Feature Fusion Video Summarization with Determinantal Point Process), a lightweight framework that generates concise video summaries by jointly modeling semantic importance, visual diversity, temporal structure, and symmetry. The design centers on a symmetry-aware fusion strategy, where appearance, motion, and semantic cues are aligned in a unified embedding space, and on a reward-guided optimization logic that balances representativeness and diversity. Specifically, appearance features from ResNet-50, motion cues from optical flow, and semantic representations from BERT-encoded BLIP captions are fused into a contextual embedding. A Transformer encoder assigns importance scores, followed by shot boundary detection and K-Medoids clustering to identify candidate keyframes. These candidates are refined through a reward-based re-ranking mechanism that integrates semantic relevance, representativeness, and visual uniqueness, while a Determinantal Point Process (DPP) enforces globally diverse selection under a keyframe budget. To enable reliable evaluation, enhanced versions of the SumMe and TVSum50 datasets were curated to reduce redundancy and increase semantic density. On these curated benchmarks, C2FVS-DPP achieves F1-scores of 0.22 and 0.43 and fidelity scores of 0.16 and 0.40 on SumMe and TVSum50, respectively, surpassing existing models on both metrics. In terms of compression ratio, the framework records 0.9959 on SumMe and 0.9940 on TVSum50, remaining highly competitive with the best-reported values of 0.9981 and 0.9983. These results highlight the strength of C2FVS-DPP as an inference-driven, symmetry-aware, and resource-efficient solution for video summarization. Full article

► Show Figures

Figure 1

19 pages, 19843 KB

Open AccessFeature PaperArticle

Distinguishing Human- and AI-Generated Image Descriptions Using CLIP Similarity and Transformer-Based Classification

by Daniela Onita, Matei-Vasile Căpîlnaș and Adriana Baciu (Birlutiu)

Mathematics 2025, 13(19), 3228; https://doi.org/10.3390/math13193228 - 9 Oct 2025

Viewed by 458

Abstract

Recent advances in vision-language models such as BLIP-2 have made AI-generated image descriptions increasingly fluent and difficult to distinguish from human-authored texts. This paper investigates whether such differences can still be reliably detected by introducing a novel bilingual dataset of English and Romanian [...] Read more.

Recent advances in vision-language models such as BLIP-2 have made AI-generated image descriptions increasingly fluent and difficult to distinguish from human-authored texts. This paper investigates whether such differences can still be reliably detected by introducing a novel bilingual dataset of English and Romanian captions. The English subset was derived from the T4SA dataset, while AI-generated captions were produced with BLIP-2 and translated into Romanian using MarianMT; human-written Romanian captions were collected via manual annotation. We analyze the problem from two perspectives: (i) semantic alignment, using CLIP similarity, and (ii) supervised classification with both traditional and transformer-based models. Our results show that BERT achieves over 95% cross-validation accuracy (F1 = 0.95, ROC AUC = 0.99) in distinguishing AI from human texts, while simpler classifiers such as Logistic Regression also reach competitive scores (F1 ≈ 0.88). Beyond classification, semantic and linguistic analyses reveal systematic cross-lingual differences: English captions are significantly longer and more verbose, whereas Romanian texts—often more concise—exhibit higher alignment with visual content. Romanian was chosen as a representative low-resource language, where studying such differences provides insights into multilingual AI detection and challenges in vision-language modeling. These findings emphasize the novelty of our contribution: a publicly available bilingual dataset and the first systematic comparison of human vs. AI-generated captions in both high- and low-resource languages. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Deep Learning: Innovations and Applications)

► Show Figures

Figure 1

29 pages, 7882 KB

Open AccessArticle

From Concept to Representation: Modeling Driving Capability and Task Demand with a Multimodal Large Language Model

by Haoran Zhou, Alexander Carballo, Keisuke Fujii and Kazuya Takeda

Sensors 2025, 25(18), 5805; https://doi.org/10.3390/s25185805 - 17 Sep 2025

Viewed by 562

Abstract

Driving safety hinges on the dynamic interplay between task demand and driving capability, yet these concepts lack a unified, quantifiable formulation. In this work, we present a framework based on a multimodal large language model that transforms heterogeneous driving signals—scene images, maneuver descriptions, [...] Read more.

Driving safety hinges on the dynamic interplay between task demand and driving capability, yet these concepts lack a unified, quantifiable formulation. In this work, we present a framework based on a multimodal large language model that transforms heterogeneous driving signals—scene images, maneuver descriptions, control inputs, and surrounding traffic states—into low-dimensional embeddings of task demand and driving capability. By projecting both embeddings into a shared latent space, the framework yields an interpretable measurement of task difficulty that alerts to capability shortfalls before unsafe behavior arises. Built upon a customized BLIP 2 backbone and fine-tuned on diverse simulated driving scenarios, the model respects consistency within tasks, captures impairment-related capability degradation, and can transfer to real-world motorway data without additional training. These findings endorse the framework as a concise yet effective step toward proactive, explainable risk assessment in intelligent vehicles. Full article

(This article belongs to the Special Issue Advances in Intelligent Transportation Systems Based on Sensor Fusion: 2nd Edition)

► Show Figures

Figure 1

18 pages, 808 KB

Open AccessArticle

Towards AI-Based Strep Throat Detection and Interpretation for Remote Australian Indigenous Communities

by Prasanna Asokan, Thanh Thu Truong, Duc Son Pham, Kit Yan Chan, Susannah Soon, Andrew Maiorana and Cate Hollingsworth

Sensors 2025, 25(18), 5636; https://doi.org/10.3390/s25185636 - 10 Sep 2025

Viewed by 510

Abstract

Streptococcus pharyngitis (strep throat) poses a significant health challenge in rural and remote Indigenous communities in Australia, where access to medical resources is limited. Delays in diagnosis and treatment increase the risk of serious complications, including acute rheumatic fever and rheumatic heart disease. [...] Read more.

Streptococcus pharyngitis (strep throat) poses a significant health challenge in rural and remote Indigenous communities in Australia, where access to medical resources is limited. Delays in diagnosis and treatment increase the risk of serious complications, including acute rheumatic fever and rheumatic heart disease. This paper presents a proof-of-concept AI-based diagnostic model designed to support clinicians in underserved communities. The model combines a lightweight Swin Transformer–based image classifier with a BLIP-2-based explainable image annotation system. The classifier predicts strep throat from throat images, while the explainable model enhances transparency by identifying key clinical features such as tonsillar swelling, erythema, and exudate, with synthetic labels generated using GPT-4o-mini. The classifier achieves 97.1% accuracy and an ROC-AUC of 0.993, with an inference time of 13.8 ms and a model size of 28 million parameters; these results demonstrate suitability for deployment in resource-constrained settings. As a proof-of-concept, this work illustrates the potential of AI-assisted diagnostics to improve healthcare access and could benefit similar research efforts that support clinical decision-making in remote and underserved regions. Full article

(This article belongs to the Special Issue Transformer-Based Deep Learning in Medical Imaging and Healthy Sensors)

► Show Figures

Figure 1

24 pages, 1747 KB

Open AccessArticle

HortiVQA-PP: Multitask Framework for Pest Segmentation and Visual Question Answering in Horticulture

by Zhongxu Li, Chenxi Du, Shengrong Li, Yaqi Jiang, Linwan Zhang, Changhao Ju, Fansen Yue and Min Dong

Horticulturae 2025, 11(9), 1009; https://doi.org/10.3390/horticulturae11091009 - 25 Aug 2025

Viewed by 1047

Abstract

A multimodal interactive system, HortiVQA-PP, is proposed for horticultural scenarios, with the aim of achieving precise identification of pests and their natural predators, modeling ecological co-occurrence relationships, and providing intelligent question-answering services tailored to agricultural users. The system integrates three core modules: semantic [...] Read more.

A multimodal interactive system, HortiVQA-PP, is proposed for horticultural scenarios, with the aim of achieving precise identification of pests and their natural predators, modeling ecological co-occurrence relationships, and providing intelligent question-answering services tailored to agricultural users. The system integrates three core modules: semantic segmentation, pest–predator co-occurrence detection, and knowledge-enhanced visual question answering. A multimodal dataset comprising 30 pest categories and 10 predator categories has been constructed, encompassing annotated images and corresponding question–answer pairs. In the semantic segmentation task, HortiVQA-PP outperformed existing models across all five evaluation metrics, achieving a precision of 89.6%, recall of 85.2%, F1-score of 87.3%,

m A P @ 50

of 82.4%, and IoU of 75.1%, representing an average improvement of approximately 4.1% over the Segment Anything model. For the pest–predator co-occurrence matching task, the model attained a multi-label accuracy of 83.5%, a reduced Hamming Loss of 0.063, and a macro-F1 score of 79.4%, significantly surpassing methods such as ASL and ML-GCN, thereby demonstrating robust structural modeling capability. In the visual question answering task, the incorporation of a horticulture-specific knowledge graph enhanced the model’s reasoning ability. The system achieved 48.7% in BLEU-4, 54.8% in ROUGE-L, 43.3% in METEOR, 36.9% in exact match (EM), and a GPT expert score of 4.5, outperforming mainstream models including BLIP-2, Flamingo, and MiniGPT-4 across all metrics. Experimental results indicate that HortiVQA-PP exhibits strong recognition and interaction capabilities in complex pest scenarios, offering a high-precision, interpretable, and widely applicable artificial intelligence solution for digital horticulture. Full article

(This article belongs to the Special Issue Applied Artificial Intelligence in Digital Horticulture: Practices and Innovations)

► Show Figures

Figure 1

27 pages, 5654 KB

Open AccessArticle

Intelligent Detection and Description of Foreign Object Debris on Airport Pavements via Enhanced YOLOv7 and GPT-Based Prompt Engineering

by Hanglin Cheng, Ruoxi Zhang, Ruiheng Zhang, Yihao Li, Yang Lei and Weiguang Zhang

Sensors 2025, 25(16), 5116; https://doi.org/10.3390/s25165116 - 18 Aug 2025

Viewed by 912

Abstract

Foreign Object Debris (FOD) on airport pavements poses a serious threat to aviation safety, making accurate detection and interpretable scene understanding crucial for operational risk management. This paper presents an integrated multi-modal framework that combines an enhanced YOLOv7-X detector, a cascaded YOLO-SAM segmentation [...] Read more.

Foreign Object Debris (FOD) on airport pavements poses a serious threat to aviation safety, making accurate detection and interpretable scene understanding crucial for operational risk management. This paper presents an integrated multi-modal framework that combines an enhanced YOLOv7-X detector, a cascaded YOLO-SAM segmentation module, and a structured prompt engineering mechanism to generate detailed semantic descriptions of detected FOD. Detection performance is improved through the integration of Coordinate Attention, Spatial–Depth Conversion (SPD-Conv), and a Gaussian Similarity IoU (GSIoU) loss, leading to a 3.9% gain in mAP@0.5 for small objects with only a 1.7% increase in inference latency. The YOLO-SAM cascade leverages high-quality masks to guide structured prompt generation, which incorporates spatial encoding, material attributes, and operational risk cues, resulting in a substantial improvement in description accuracy from 76.0% to 91.3%. Extensive experiments on a dataset of 12,000 real airport images demonstrate competitive detection and segmentation performance compared to recent CNN- and transformer-based baselines while achieving robust semantic generalization in challenging scenarios, such as complete darkness, low-light, high-glare nighttime conditions, and rainy weather. A runtime breakdown shows that the enhanced YOLOv7-X requires 40.2 ms per image, SAM segmentation takes 142.5 ms, structured prompt construction adds 23.5 ms, and BLIP-2 description generation requires 178.6 ms, resulting in an end-to-end latency of 384.8 ms per image. Although this does not meet strict real-time video requirements, it is suitable for semi-real-time or edge-assisted asynchronous deployment, where detection robustness and semantic interpretability are prioritized over ultra-low latency. The proposed framework offers a practical, deployable solution for airport FOD monitoring, combining high-precision detection with context-aware description generation to support intelligent runway inspection and maintenance decision-making. Full article

(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)

► Show Figures

Figure 1

25 pages, 3906 KB

Open AccessArticle

When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review Helpfulness

by Chao Ma, Chen Yang and Ying Yu

J. Theor. Appl. Electron. Commer. Res. 2025, 20(2), 144; https://doi.org/10.3390/jtaer20020144 - 12 Jun 2025

Cited by 1 | Viewed by 1683

Abstract

Images contain more visual semantic information. Consumers first view multimodal online reviews with images. Research on the helpfulness of reviews on e-commerce platforms mainly focuses on text, lacking insights into the product attributes reflected by review images and the relationship between images and [...] Read more.

Images contain more visual semantic information. Consumers first view multimodal online reviews with images. Research on the helpfulness of reviews on e-commerce platforms mainly focuses on text, lacking insights into the product attributes reflected by review images and the relationship between images and text. Studying the relationship between images and text in online reviews can better explain consumer behavior and help consumers make purchasing decisions. Taking multimodal online review data from shopping platforms as the research object, this study proposes a research framework based on the Cognitive Theory of Multimedia Learning (CTML). It utilizes multiple pre-trained models, such as BLIP2 and machine learning methods, to construct metrics. A fuzzy-set qualitative comparative analysis (fsQCA) is conducted to explore the configurational effects of antecedent variables of multimodal online reviews on review helpfulness. The study identifies five configurational paths that lead to high review helpfulness. Specific review cases are used to examine the contribution paths of these configurations to perceived helpfulness, providing a new perspective for future research on multimodal online reviews. Targeted recommendations are made for operators and merchants based on the research findings, offering theoretical support for platforms to fully leverage the potential value of user-generated content. Full article

(This article belongs to the Topic Digital Marketing Dynamics: From Browsing to Buying)

► Show Figures

Figure 1

8 pages, 1840 KB

Open AccessProceeding Paper

Image Descriptions for Visually Impaired Individuals to Locate Restroom Facilities

by Cheng-Si He, Nan-Kai Lo, Yu-Huan Chien and Siao-Si Lin

Eng. Proc. 2025, 92(1), 13; https://doi.org/10.3390/engproc2025092013 - 25 Apr 2025

Cited by 1 | Viewed by 485

Abstract

Since visually impaired individuals cannot observe their surroundings, they face challenges in accurately locating objects. Particularly in restrooms, where various facilities are spread across a limited space, the risk of tripping and being injured significantly increases. To prevent such accidents, individuals with visual [...] Read more.

Since visually impaired individuals cannot observe their surroundings, they face challenges in accurately locating objects. Particularly in restrooms, where various facilities are spread across a limited space, the risk of tripping and being injured significantly increases. To prevent such accidents, individuals with visual impairments need help to navigate these facilities. Therefore, we designed a head-mounted device that utilized artificial intelligence (AI) to enhance its functionality. The ESP32-CAM was implemented to capture and transmit images to a computer. The images were then converted into a model-compatible format for the bootstrapping language-image pre-training (BLIP) model to process and generate English descriptions (i.e., written captions). Then, Google Text-to-Speech (gTTS) was employed to convert these descriptions into speech, which was delivered audibly through a speaker. The SacreBLEU and MOS scores indicated that the developed device produced relatively accurate, natural, and intelligible spoken directions. The device assists visually impaired individuals in navigating and locating the restroom facilities to a satisfactory level. Full article

(This article belongs to the Proceedings of 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering)

► Show Figures

Figure 1

9 pages, 515 KB

Open AccessArticle

The Effect of SARS-CoV-2 Vaccination on HIV Viral Load in Patients Under Bictegravir/Tenofovir Alafenamide/Emtricitabine Therapy: A Retrospective Observational Study

by Giuseppe Pipitone, Giacomo Ciusa, Stefano Agrenzano, Francesco Di Lorenzo, Caterina Sagnelli, Antonio Cascio, Chiara Iaria and TheBICivico StudyGroup

Healthcare 2025, 13(8), 926; https://doi.org/10.3390/healthcare13080926 - 17 Apr 2025

Viewed by 794

Abstract

Background: The aim of our study is to evaluate the impact of SARS-CoV-2 vaccination on HIV viremia in patients treated under bictegravir-based therapy. Methods: We conducted a retrospective observational study in a tertiary hospital, analyzing data from 152 patients treated with BIC/TAF/FTC between [...] Read more.

Background: The aim of our study is to evaluate the impact of SARS-CoV-2 vaccination on HIV viremia in patients treated under bictegravir-based therapy. Methods: We conducted a retrospective observational study in a tertiary hospital, analyzing data from 152 patients treated with BIC/TAF/FTC between 2020 and 2022. Patients were divided into two groups: “vaccinated” (110/152) and “unvaccinated” (42/152) against SARS-CoV-2. The outcomes considered were the presence of “blips” (detectable viremia ≥ 20 copies/mL), “rebound” (viremia ≥ 50 copies/mL), and virological failures. Results: A lower incidence of blips in the “vaccinated” group compared to the “unvaccinated” group (9.1% vs. 28.6%, p = 0.002), and a reduced risk of blips in the vaccinated group (OR 3.8, 95% CI 1.4–9.8) were noticed. The rebound rate was lower in the vaccinated group compared to non-vaccinated, with a statistically significant difference (respectively, 2.7% vs. 11.9%, p = 0.037). Conclusions: our data suggest that SARS-CoV-2 vaccination may stimulate an immune response that enhances CD4+ and CD8+ cell function, contributing to a reduction in the number of blips and maintaining good viro-immunological control in patients with HIV, supporting the importance of vaccination in this population. Full article

► Show Figures

Figure 1

16 pages, 4281 KB

Open AccessArticle

Analysis of Operational Effects of Bus Lanes with Intermittent Priority with Spatio-Temporal Clear Distance and CAV Platoon Coordinated Lane Changing in Intelligent Transportation Environment

by Pei Jiang, Xinlu Ma and Yibo Li

Sensors 2025, 25(8), 2538; https://doi.org/10.3390/s25082538 - 17 Apr 2025

Viewed by 678

Abstract

Bus lanes with intermittent priority (BLIP) are designed to optimize road resource allocation. The advent of connected and automated vehicles (CAVs) promotes the implementation of BLIP. However, it is crucial to find an effective method to intermittently grant right-of-way to CAVs. In this [...] Read more.

Bus lanes with intermittent priority (BLIP) are designed to optimize road resource allocation. The advent of connected and automated vehicles (CAVs) promotes the implementation of BLIP. However, it is crucial to find an effective method to intermittently grant right-of-way to CAVs. In this paper, we introduce a BLIP method with spatio-temporal clear distance (BLIP-ST) and a CAV control method in an intelligent transportation environment. When CAVs access BLIP-ST, the constraints of the moving gap between buses are considered. When CAVs leave BLIP-ST, coordination with the nearest CAV platoon on the adjacent lane is considered to cope with situations where CAVs cannot find the appropriate space. Then, the proposed method was simulated by an open boundary cellular automaton model. The results showed that with the same inflow, a CAV-sharing bus lane could significantly improve road traffic efficiency, and it is the most significant when the CAV penetration rate is medium, with the average road speed increasing from 6.67 km/h to 30.53 km/h. Meanwhile, when the CAV penetration rate is medium, BLIP-ST operates with the best efficiency at different strategies. This was due to the fact that when the penetration rate is too high, BLIP-ST is excessively occupied, which affects public transportation priority. When the penetration rate is too low, BLIP-ST cannot be fully utilized. In addition, regardless of the penetration rate of CAV, CAV platoon collaborative lane changing is better than single CAV collaborative lane changing in terms of improving road traffic efficiency and can increase the average road speed by 8–19%. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

19 pages, 1054 KB

Open AccessArticle

Enhanced BLIP-2 Optimization Using LoRA for Generating Dashcam Captions

by Minjun Cho, Sungwoo Kim, Dooho Choi and Yunsick Sung

Appl. Sci. 2025, 15(7), 3712; https://doi.org/10.3390/app15073712 - 28 Mar 2025

Cited by 3 | Viewed by 4133

Abstract

Autonomous driving technology has advanced significantly. However, it is challenging to accurately generate captions for driving environment scenes, which involve dynamic elements such as vehicles, traffic signals, road conditions, weather, and the time of day. Capturing these elements accurately is important for improving [...] Read more.

Autonomous driving technology has advanced significantly. However, it is challenging to accurately generate captions for driving environment scenes, which involve dynamic elements such as vehicles, traffic signals, road conditions, weather, and the time of day. Capturing these elements accurately is important for improving situational awareness in autonomous systems. Driving environment scene captioning is an important part of generating driving scenarios and enhancing the interpretability of autonomous systems. However, traditional vision–language models struggle with domain adaptation since autonomous driving datasets with detailed captions of dashcam-recorded scenes are limited and they cannot effectively capture diverse driving environment factors. In this paper, we propose an enhanced method based on the bootstrapping language-image pre-training with frozen vision encoders and large language Model (BLIP-2) to optimize the domain adaptation by improving scene captioning in autonomous driving environments. It comprises two steps: (1) transforming structured dataset labels into descriptive captions in natural language using a large language model (LLM), and (2) optimizing Q-Former in a BLIP-2 module with low-rank adaptation (LoRA) to achieve efficient domain adaptation. The structured dataset labels are originally stored in JSON format, where driving environment scene factors are encoded as key-value pairs that represent attributes such as the object type, position, and state. Using the Large-Scale Diverse Driving Video Database (BDD-100K), our method significantly improves performance, achieving BLEU-4, CIDEr, and SPICE scores that were each approximately 1.5 times those for the baseline BLIP-2, respectively. Higher scores show the effectiveness of LoRA-based optimization and, hence, its suitability for autonomous driving applications. Our method effectively enhances accuracy, contextual relevance, and interpretability, contributing to improved scene understanding in autonomous driving systems. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

28 pages, 6705 KB

Open AccessArticle

Multimodal AI and Large Language Models for Orthopantomography Radiology Report Generation and Q&A

by Chirath Dasanayaka, Kanishka Dandeniya, Maheshi B. Dissanayake, Chandira Gunasena and Ruwan Jayasinghe

Appl. Syst. Innov. 2025, 8(2), 39; https://doi.org/10.3390/asi8020039 - 17 Mar 2025

Cited by 4 | Viewed by 3789

Abstract

Access to high-quality dental healthcare remains a challenge in many countries due to limited resources, lack of trained professionals, and time-consuming report generation tasks. An intelligent clinical decision support system (ICDSS), which can make informed decisions based on past data, is an innovative [...] Read more.

Access to high-quality dental healthcare remains a challenge in many countries due to limited resources, lack of trained professionals, and time-consuming report generation tasks. An intelligent clinical decision support system (ICDSS), which can make informed decisions based on past data, is an innovative solution to address these shortcomings while improving continuous patient support in dental healthcare. This study proposes a viable solution with the aid of multimodal artificial intelligence (AI) and large language models (LLMs), focusing on their application for generating orthopantomography radiology reports and answering questions in the dental domain. This work also discusses efficient adaptation methods of LLMs for specific language and application domains. The proposed system primarily consists of a Blip-2-based caption generator tuned on DPT images followed by a Llama 3 8B based LLM for radiology report generation. The performance of the entire system is evaluated in two ways. The diagnostic performance of the system achieved an overall accuracy of 81.3%, with specific detection rates of 87.9% for dental caries, 89.7% for impacted teeth, 88% for bone loss, and 81.8% for periapical lesions. Subjective evaluation of AI-generated radiology reports by certified dental professionals demonstrates an overall accuracy score of 7.5 out of 10. In addition, the proposed solution includes a question-answering platform in the native Sinhala language, alongside the English language, designed to function as a chatbot for dental-related queries. We hope that this platform will eventually bridge the gap between dental services and patients, created due to a lack of human resources. Overall, our proposed solution creates new opportunities for LLMs in healthcare by introducing a robust end-to-end system for the automated generation of dental radiology reports and enhancing patient interaction and awareness. Full article

(This article belongs to the Special Issue Advancing Healthcare Through Intelligent Clinical Decision Support Systems: Techniques, Applications, and Future Directions)

► Show Figures

Figure 1

20 pages, 8720 KB

Open AccessArticle

Impacts of an Intermittent Bus Lane on Local Air Quality: Lessons from an Effectiveness Study

by Neelakshi Hudda, Isabelle S. Woollacott, Nisitaa Karen Clement Pradeep and John L. Durant

Environments 2025, 12(1), 33; https://doi.org/10.3390/environments12010033 - 20 Jan 2025

Viewed by 1543

Abstract

Bus lanes with intermittent prioritization (BLIPs) have been proposed as a way to reduce traffic burden and improve air quality along busy urban streets; however, to date, the impacts of BLIPs on local-scale air quality have not been thoroughly evaluated, due in part [...] Read more.

Bus lanes with intermittent prioritization (BLIPs) have been proposed as a way to reduce traffic burden and improve air quality along busy urban streets; however, to date, the impacts of BLIPs on local-scale air quality have not been thoroughly evaluated, due in part to challenges in study design. We measured traffic-emission proxies—black carbon aerosol and ultrafine particles—before and after the installation of a BLIP in the Boston area (Massachusetts, USA) in 2021, and compared our data with traffic measurements to determine whether changes in air quality were attributable to changes in traffic patterns. We used both stationary and mobile monitoring to characterize temporal and spatial variations in air quality both before and after the BLIP went into operation. Although the BLIP led to a reduction in traffic volume (~20%), we did not find evidence that this reduction caused a significant change in local air quality. Nonetheless, substantial spatial and temporal differences in pollutant concentrations were observed; the highest concentrations occurred closest to a nearby highway along a section of the bus lane that was in an urban canyon, likely causing pollutant trapping. Wind direction was a dominant influence: pollutant concentrations were generally higher during winds that oriented the bus lane downwind of or parallel to the highway. Based on our findings, we recommend in future studies to evaluate the effectiveness of BLIPs that: (i) traffic and air quality measurements be collected simultaneously for several non-weekend days immediately before and immediately after bus lanes are first put into operation; (ii) the evaluation should be performed when other significant changes in motorists’ driving behavior and bus ridership are not anticipated; and (iii) coordinated efforts be made to increase bus ridership and incentivize motorists to avoid using the bus lane during the hours of intermittent prioritization. Full article

(This article belongs to the Special Issue Advances in Urban Air Pollution)

► Show Figures

Figure 1

17 pages, 464 KB

Open AccessArticle

FGeo-Parser: Autoformalization and Solution of Plane Geometric Problems

by Na Zhu, Xiaokai Zhang, Qike Huang, Fangzhen Zhu, Zhenbing Zeng and Tuo Leng

Symmetry 2025, 17(1), 8; https://doi.org/10.3390/sym17010008 - 24 Dec 2024

Cited by 2 | Viewed by 1953

Abstract

Automatic geometric problem-solving is an active and challenging subfield at the intersection of AI and mathematics, where geometric problem parsing plays a critical role. It involves converting geometric diagram and text into certain formal language. Due to the complexity of geometric shapes and [...] Read more.

Automatic geometric problem-solving is an active and challenging subfield at the intersection of AI and mathematics, where geometric problem parsing plays a critical role. It involves converting geometric diagram and text into certain formal language. Due to the complexity of geometric shapes and the diversity of geometric relationships, geometric problem parsing demands that the parser exhibit cross-modal comprehension and reasoning capabilities. In this paper, we propose an enhanced geometric problem parsing method called FGeo-Parser, which converts problem diagrams and text into the formal language of the FormalGeo. It also supports reverse formalization to generate human-like solutions, reflecting the symmetry between parsing and generating. Specifically, diagram parser leverages the BLIP to generate the construction CDL and image CDL, while text parser employs the T5 to produce the text CDL and goal CDL where these neural networks are both based on a symmetric encoder–decoder architecture. With the assistance of a theorem predictor, these CDLs were automatically parsed and step-by-step reasoning was executed within FGPS. Finally, the reasoning process was input into a solution generator, which subsequently produced a human-like solution process. Additionally, we re-annotated problem diagrams and text based on the FormalGeo7K dataset. The formalization experiments on the new dataset achieved a match accuracy of 91.51% and a perfect accuracy of 56.47%, while the combination with the theorem predictor achieved a problem-solving accuracy of 63.45%. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Machine Learning)

► Show Figures

Figure 1

15 pages, 2741 KB

Open AccessArticle

SC-Phi2: A Fine-Tuned Small Language Model for StarCraft II Build Order Prediction

by Muhammad Junaid Khan and Gita Sukthankar

AI 2024, 5(4), 2338-2352; https://doi.org/10.3390/ai5040115 - 13 Nov 2024

Cited by 3 | Viewed by 2820

Abstract

Background: This article introduces SC-Phi2, a fine-tuned StarCraft II small language model. Small language models, like Phi2, Gemma, and DistilBERT, are streamlined versions of large language models (LLMs) with fewer parameters that require less computational power and memory to run. Method: To teach [...] Read more.

Background: This article introduces SC-Phi2, a fine-tuned StarCraft II small language model. Small language models, like Phi2, Gemma, and DistilBERT, are streamlined versions of large language models (LLMs) with fewer parameters that require less computational power and memory to run. Method: To teach Microsoft’s Phi2 model about StarCraft, we create a new SC2 text dataset with information about StarCraft races, roles, and actions and use it to fine-tune Phi-2 with self-supervised learning. We pair this language model with a Vision Transformer (ViT) from the pre-trained BLIP-2 (Bootstrapping Language Image Pre-training) model, fine-tuning it on the StarCraft replay dataset, MSC. This enables us to construct dynamic prompts that include visual game state information. Results: Unlike the large models used in StarCraft LLMs such as GPT-3.5, Phi2 is trained primarily on textbook data and contains little inherent knowledge of StarCraft II beyond what is provided by our training process. By using LoRA (Low-rank Adaptation) and quantization, our model can be trained on a single GPU. We demonstrate that our model performs well at build order prediction, an important StarCraft macromanagement task. Conclusions: Our research on the usage of small models is a step towards reducing the carbon footprint of AI agents. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

Search Results (47)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (47)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI