Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,173)

Search Parameters:
Keywords = large multimodal models

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 15638 KB  
Article
Multimodal Generative AI for Construction-Site Management and Monitoring: A Field-Based Evaluation
by Alon Urlainis, Eran Haronian and Amichai Mitelman
Smart Cities 2026, 9(7), 114; https://doi.org/10.3390/smartcities9070114 - 2 Jul 2026
Abstract
Modern construction sites generate large volumes of visual, spatial, and operational data that can support data-driven project delivery, improved monitoring, and reliable decision-making within the smart-city built environment. However, construction management still relies heavily on human observation and manual interpretation, limiting the transformation [...] Read more.
Modern construction sites generate large volumes of visual, spatial, and operational data that can support data-driven project delivery, improved monitoring, and reliable decision-making within the smart-city built environment. However, construction management still relies heavily on human observation and manual interpretation, limiting the transformation of field data into structured information for sustainable urban infrastructure delivery. Multimodal generative artificial intelligence (GenAI) offers a promising approach for interpreting construction-site data, yet its performance under real site conditions remains insufficiently examined, particularly across tasks requiring different levels of visual recognition, contextual reasoning, and professional judgment. This paper presents a field-based evaluation of multimodal GenAI models using 1186 images collected from 17 active construction sites. The evaluation considered three widely available general-purpose multimodal GenAI assistants: Gemini, ChatGPT, and Microsoft Copilot. Four major construction management tasks were assessed: construction activity identification, progress tracking, execution defect detection, and safety hazard identification. The GenAI outputs were compared against ground-truth evaluations established by human experts. The results suggest that GenAI performs more reliably in descriptive and visually explicit tasks than in judgment-intensive tasks requiring engineering interpretation. Activity identification achieved the strongest performance, whereas execution defect detection was the most challenging. The findings indicate that GenAI can support visual site interpretation and improve construction management efficiency, while highlighting the need for human oversight and verification in smart-city infrastructure delivery. Full article
Show Figures

Figure 1

26 pages, 2389 KB  
Article
LLMs in Automated Assessment: A Role-Based Taxonomy and Framework for Controlled Educational Integration
by Anastasia Vangelova and Veska Gancheva
Appl. Sci. 2026, 16(13), 6617; https://doi.org/10.3390/app16136617 - 2 Jul 2026
Abstract
Large language models (LLMs) are reshaping the automated assessment of open-ended student responses. Compared with earlier rule-based, statistical, and feature-engineered approaches, they enable a deeper interpretation of meaning, context, and argumentation. This development can be understood as a fifth generation of automated scoring [...] Read more.
Large language models (LLMs) are reshaping the automated assessment of open-ended student responses. Compared with earlier rule-based, statistical, and feature-engineered approaches, they enable a deeper interpretation of meaning, context, and argumentation. This development can be understood as a fifth generation of automated scoring systems, but it also raises a new question: not only what LLMs can do, but also how they can be deployed in education in a controlled and reliable manner. This paper presents a role-based taxonomy that distinguishes between generative LLMs used as direct virtual graders, encoder transformers used as semantic tools, and intermediate text-to-text models used in more formalized assessment tasks. It also discusses the main limitations of standalone LLM graders, including hallucinations, probabilistic instability, limited interpretability, bias, and weak grounding in domain-specific content. To address these issues, the paper presents a developed framework implemented in an integrated assessment system built on role prompting, rubric-constrained grading, Retrieval-Augmented Generation (RAG), structured machine-readable outputs, workflow orchestration, and LMS integration. The framework is further extended to multimodal assessment through vision-based evaluation of visual artifacts such as UML state diagrams. The main contribution of the paper is not only a conceptual framework, but also its realization in a working integrated system for automated assessment in a more traceable, pedagogically grounded, and institutionally reliable way. Full article
(This article belongs to the Special Issue Application of Semantic Web Technologies for E-Learning)
Show Figures

Figure 1

24 pages, 950 KB  
Review
Reimagining Nodal Staging in Colorectal Cancer: Toward a Novel Non-Invasive Imaging Approach
by Perla Moreno, Michela Orsi, Karl-Philippe Beaudet, Rania Benyahya, Leonardo Sosa-Valencia, Stéphane Cotin, Alfonso Lapergola and Alain García Vázquez
Cancers 2026, 18(13), 2139; https://doi.org/10.3390/cancers18132139 - 2 Jul 2026
Abstract
Colorectal cancer (CRC) remains the third most common malignancy worldwide and a leading cause of cancer mortality, largely driven by metastatic dissemination. Among metastatic routes, lymphatic spread is crucial to determine the prognosis and establish an adequate therapeutic strategy. Lymph node metastasis (LNM) [...] Read more.
Colorectal cancer (CRC) remains the third most common malignancy worldwide and a leading cause of cancer mortality, largely driven by metastatic dissemination. Among metastatic routes, lymphatic spread is crucial to determine the prognosis and establish an adequate therapeutic strategy. Lymph node metastasis (LNM) defines stage III disease in the TNM classification, guiding adjuvant chemotherapy and surgical planning. However, nodal staging based on lymphadenectomy and histopathology is invasive, time-consuming, and may lead to overtreatment. Conventional imaging modalities, including computed tomography, magnetic resonance imaging, and endorectal ultrasound, show limited sensitivity and specificity for small or micro-metastatic nodes. Despite multimodal progress, no non-invasive technique reliably identifies malignant nodes in real time. PET–MRI, contrast-enhanced ultrasound, photoacoustic and fluorescence approaches, ICG mapping, and sentinel node biopsy improve detection but remain limited by specificity, cost, or availability. Extranodal extension (ENE) and tumor deposits (TDs) carry major prognostic value, reflecting aggressive biology and association with distant spread. Meanwhile, phylogenetic studies challenge linear dissemination models, indicating that some metastases arise directly from the primary tumor or TDs rather than LNMs. These data support refinement of staging and surgical strategies according to tumor biology rather than purely anatomical criteria. High-frequency quantitative ultrasound (HF-QUS) enables real-time, operator-independent, three-dimensional nodal assessment with reported sensitivity and specificity exceeding 85%. Combined with artificial intelligence and molecular profiling, it may support biologically informed staging, reduce unnecessary surgery, and foster precision oncology. Lymphatic dissemination in CRC offers a platform to merge tumor biology with technological innovation, where advanced imaging, molecular insight, and artificial intelligence may redefine nodal staging toward precision, non-invasive care. Full article
(This article belongs to the Special Issue Innovations in Colorectal Cancer)
Show Figures

Figure 1

46 pages, 3026 KB  
Article
Keyframe Selection and Multimodal Fusion for Product Recognition in E-Commerce Live Streaming
by Yichuan Zheng, Jin Shi and Wei Shen
Appl. Sci. 2026, 16(13), 6585; https://doi.org/10.3390/app16136585 - 1 Jul 2026
Abstract
Product recognition in e-commerce live streaming is hindered by rapid viewpoint changes, occlusions, motion blur, and inconsistencies between visual and spoken information. Existing approaches typically focus on individual components such as detection, OCR, or speech recognition, which limits their effectiveness in end-to-end structured [...] Read more.
Product recognition in e-commerce live streaming is hindered by rapid viewpoint changes, occlusions, motion blur, and inconsistencies between visual and spoken information. Existing approaches typically focus on individual components such as detection, OCR, or speech recognition, which limits their effectiveness in end-to-end structured product understanding. To address this problem, we propose an integrated framework that combines task-oriented keyframe selection with multimodal semantic fusion. The framework first uses D-FINE to localize product regions and then selects informative frames through two complementary strategies. Strategy A considers both detection confidence and Laplacian-based sharpness, while Strategy B combines detection confidence with a learned quality component estimated by an EfficientNetV2-M regression model. OCR, visual-semantic recognition, and ASR are then applied to extract complementary evidence, and a Qwen3.5-27B large language model is used to structure and fuse multimodal evidence into standardized product outputs, including brand, product name, and category. Experiments on an in-house e-commerce livestreaming dataset demonstrate substantial gains over a last-frame baseline. Strategy B achieves the best overall result, improving the Perfect Match Rate from 0.609 to 0.775 and the Semantic Similarity from 0.697 to 0.802. Ablation studies further show that the full multimodal framework consistently outperforms unimodal and dual-modality variants under both frame selection strategies. In addition, Top-K analysis indicates that single-frame inference provides a practical balance between OCR evidence completeness and efficiency. Efficiency analysis shows that the per-video API monetary cost remains low under the pricing configuration used in this study, while API latency is mainly limited by Qwen3.5-27B LLM calls for evidence structuring and final fusion. Overall, the proposed framework offers an effective and extensible solution for structured product recognition in complex live-streaming scenarios. Full article
Show Figures

Figure 1

25 pages, 20391 KB  
Article
Deformable Medical Image Registration with KAN-Based Implicit Neural Representations
by Nikita A. Drozdov, Marat O. Zinovev and Dmitry V. Sorokin
Mach. Learn. Knowl. Extr. 2026, 8(7), 184; https://doi.org/10.3390/make8070184 - 1 Jul 2026
Abstract
Deformable image registration (DIR) is central to medical image analysis, supporting spatial alignment for longitudinal studies and multi-modal fusion. Learning-based methods such as CNNs and transformers provide rapid inference but often require large training datasets and can underperform classical iterative methods for specific [...] Read more.
Deformable image registration (DIR) is central to medical image analysis, supporting spatial alignment for longitudinal studies and multi-modal fusion. Learning-based methods such as CNNs and transformers provide rapid inference but often require large training datasets and can underperform classical iterative methods for specific anatomies or modalities. Implicit neural representations (INRs) offer a data-efficient alternative by modeling deformation fields as continuous coordinate-to-displacement mappings, yet their per-pair optimization makes runtime efficiency and robustness to initialization essential. We introduce KAN-IDIR and RandKAN-IDIR, the first Kolmogorov–Arnold network (KAN)-based INR framework for pairwise-optimized, resolution-independent DIR, designed to improve seed stability and resource efficiency without requiring a large training dataset. KANs use learnable activation functions that are well suited to continuous, physically structured deformation fields. RandKAN-IDIR further reduces cost through randomized basis sampling, preserving registration quality with fewer basis functions. We evaluate the methods on lung CT, brain MRI, and cardiac MRI datasets against pairwise-optimized neural approaches, dataset-trained deep models, and classical baselines. KAN-IDIR and RandKAN-IDIR provide the strongest overall performance among pairwise-optimized neural registration methods across all three datasets, with low computational overhead and superior stability across random initializations. On ACDC, KAN-IDIR also achieves the highest DSC and best deformation regularity among all compared methods. RandKAN-IDIR slightly outperforms adaptive basis selection variants while avoiding their additional training-time complexity. This makes the approach practical for reproducible clinical research use. Source code is publicly available. Full article
22 pages, 5361 KB  
Article
Multi-Engine Collaborative Large Language Models Enhance the Intelligence of Eco-Environmental Monitoring and Governance in China
by Wenpan Li, Yu Feng, Luyu Yan, Kebin Ji, Wanglong Yang, Ming Chang, Qi Zhang and Chuanzhong Chen
Appl. Sci. 2026, 16(13), 6557; https://doi.org/10.3390/app16136557 - 1 Jul 2026
Abstract
The expansion of China’s modernized eco-environmental monitoring networks has generated vast amounts of data. Consequently, traditional, expertise-reliant analysis is increasingly ill-suited for agile regulatory decision-making. Although large language models (LLMs) present a promising alternative, their practical deployment remains limited by domain-specific knowledge gaps, [...] Read more.
The expansion of China’s modernized eco-environmental monitoring networks has generated vast amounts of data. Consequently, traditional, expertise-reliant analysis is increasingly ill-suited for agile regulatory decision-making. Although large language models (LLMs) present a promising alternative, their practical deployment remains limited by domain-specific knowledge gaps, hallucinations and an inherent difficulty in managing multi-faceted ecological tasks. This study introduces EnvSentry, a novel multi-engine collaborative LLM framework designed for intelligent eco-environmental monitoring and governance. EnvSentry coordinates reasoning, instruction, and multimodal engines, supported by a dynamic, vector-indexed knowledge base and retrieval-augmented generation (RAG) to ensure factual veracity. By transitioning operational workflows from fragmented, latent batch processing to integrated, real-time intelligent agent chains, the system achieves a closed-loop capability of intent recognition, data retrieval, and quality control. The model was evaluated across distinct environmental contexts, specifically water quality anomaly detection and air quality forecasting. Results show that EnvSentry yields higher analytical precision and attribution rates than baseline methods, while compressing decision-making latency from hours to seconds. Relative to baseline models, EnvSentry achieves a 25% improvement in water quality attribution accuracy (50% to 75%), a 90% reduction in decision making latency for anomaly detection, and a 10% absolute gain in data anomaly detection accuracy. In air quality forecasting, it reduces expert judgment time from 60 to 20 min and attains >85% agreement with expert forecasts when used by non-specialist personnel. These improvements suggest a practical shift in eco-environmental monitoring—moving from fragmented, reactive measures toward an integrated and proactive system. Consequently, this approach offers a viable path toward data-driven autonomous ecological management. Full article
Show Figures

Figure 1

32 pages, 1088 KB  
Article
Multisource Port Inspection Sensor Fusion with Causal Representation Learning for Cross-Border Anomaly Monitoring
by Jiaxin Yin, Zhengjia Lu, Baodi Xiong, Kai Sun, Ruijia Liu, Yachi Liu and Manzhou Li
Sensors 2026, 26(13), 4142; https://doi.org/10.3390/s26134142 - 1 Jul 2026
Abstract
With the rapid development of cross-border collaboration, intelligent port construction, and international logistics networks, large volumes of multisource heterogeneous data are continuously generated during cross-border circulation. To address the limitations of traditional financial review and compliance auditing methods in characterizing multisource signal coupling, [...] Read more.
With the rapid development of cross-border collaboration, intelligent port construction, and international logistics networks, large volumes of multisource heterogeneous data are continuously generated during cross-border circulation. To address the limitations of traditional financial review and compliance auditing methods in characterizing multisource signal coupling, as well as the tendency of conventional deep models to rely on spurious correlated features with insufficient interpretability, a multisource sensing signal fusion and causally explainable risk identification framework is proposed for cross-border trade anomaly detection. In this framework, electronic trade texts, structured financial declaration fields, GPS/AIS trajectories, port weighing records, RFID data, electronic seal status, X-ray inspection images, cold-chain temperature and humidity records, and vibration data are uniformly modeled as multisource sensing signals in cross-border trade and circulation processes. Subsequently, collaborative representation among textual semantics, attribute fields, logistics status, device records, and entity relationships is achieved through a cross-modal alignment mechanism. On this basis, an engineering-constraint-guided causal risk representation module is designed to reduce the interference of spurious correlated factors, such as regions, ports, transportation modes, and textual styles, in model decisions. Meanwhile, a counterfactual anomaly response module is introduced to analyze the influence of key variable changes on risk outputs, thereby enhancing the model’s ability to identify and explain true anomaly-driving factors. Experimental results show that the proposed method achieves the best overall performance in the cross-border trade anomaly detection task, with Accuracy, Precision, Recall, F1-score, AUC, and PR-AUC reaching 0.927, 0.842, 0.811, 0.826, 0.958, and 0.817, respectively, clearly outperforming baseline models including Logistic Regression, Random Forest, XGBoost, BERT, BERT+MLP, and Multimodal Transformer. In cross-time, cross-region, cross-port, and cross-entity testing scenarios, high F1-score and AUC values are still maintained. Under complex conditions such as text noise, missing modalities, logistics trajectory perturbations, and missing sensing records, only limited performance degradation is observed. Ablation experiments further verify the effective contributions of cross-modal attention, contrastive alignment, causal financial debiasing, counterfactual response, and engineering constraints to performance improvement. Full article
Show Figures

Figure 1

25 pages, 8542 KB  
Article
MMTR: Strategy-Guided Multimodal Table Reasoning with Reflective Self-Correction
by Lixin Bai, Yibo Ming and Yanmin Chen
Information 2026, 17(7), 641; https://doi.org/10.3390/info17070641 - 1 Jul 2026
Abstract
Although multimodal large language models (MLLMs) have achieved remarkable progress in visual question answering, they remain limited in tabular tasks that require fine-grained structured information perception and complex logical reasoning. This limitation primarily stems from the high density of structured information inherent in [...] Read more.
Although multimodal large language models (MLLMs) have achieved remarkable progress in visual question answering, they remain limited in tabular tasks that require fine-grained structured information perception and complex logical reasoning. This limitation primarily stems from the high density of structured information inherent in tables and the scarcity of high-quality instruction tuning data. To address these challenges and improve the model’s reasoning accuracy in tables, we propose MMTR, a strategy-guided multimodal table reasoning method with reflective self-correction. Mechanistically, we design a dual-LoRA architecture: the Strategy LoRA is responsible for generating structured reasoning steps, while the Reflection LoRA verifies and self-corrects these initial outputs. Their synergy empowers the model with a closed-loop capability of “reasoning–reflection–correction”. On the data front, we construct StrTab-QA, a large-scale dataset comprising question-answering, negative, and reflection samples, providing diverse supervision signals. During training, we further introduce a progressive “reasoning-to-reflection” fine-tuning strategy to gradually achieve cross-modal alignment and structural adaptation. Furthermore, coupled with an adaptive resizing and padding scheme, our approach effectively preserves table structures and minimizes information distortion during visual encoding. Extensive experiments demonstrate that MMTR consistently outperforms strong baselines across multiple table reasoning benchmarks. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Graphical abstract

23 pages, 8127 KB  
Article
A Super Memory Processing Unit Based on 3D Stacking and Hybrid Bonding for High-Efficiency AI Computing
by Ruiyong Zhao, Yibo Hu and Jing Chen
Micromachines 2026, 17(7), 802; https://doi.org/10.3390/mi17070802 - 30 Jun 2026
Abstract
DRAM-based in-memory computing integrates computational regions into the main memory, enabling local data processing within the memory, thereby achieving faster and more efficient data computation. However, enhancing system performance requires addressing a critical challenge: achieving more general and sufficiently powerful data processing capabilities [...] Read more.
DRAM-based in-memory computing integrates computational regions into the main memory, enabling local data processing within the memory, thereby achieving faster and more efficient data computation. However, enhancing system performance requires addressing a critical challenge: achieving more general and sufficiently powerful data processing capabilities within DRAM-PIM. Existing DRAM-PIM implementations often suffer from limited computational capabilities due to the shared standard DRAM package area between memory cells and computational circuits or because the operator circuits are overly customized, which limits their ability to meet required data processing demands. To address this issue, in this paper, we propose a Super Memory Processing Unit (SMPU). The SMPU uses Hybrid Bonding technology to 3D-stack DRAM and many-core computational clusters, enabling large-bandwidth (0.25 TB/s per-bank, 2 TB/s for 8-bank system bandwidth) on-chip data transmission between DRAM and the computational cluster via copper interconnects, effectively breaking the memory wall bottleneck of existing computing architectures. The SMPU constructs a dual-channel fine-grained computational cluster at the logical computing layer, providing flexible and ample computility for various AI models, such as ResNet50 and Llama2. The SMPU uses standard DDR protocols and integrates a new memory space allocation and parsing controller to ensure system compatibility without modifying the host-end hardware, facilitating the integration and invocation of computility in memory particles. Additionally, the SMPU features an independent dual-channel memory-management mechanism within the memory particles, enabling simultaneous multi-channel, multi-modal AI model inference. We compared a CPU system equipped with an SMPU to current computing systems using FPGA simulations. The FPGA simulation results show that, under the same computational configuration, the system with the SMPU improves the performance of ResNet50-v1.5 by up to 5.1× and Llama by up to 27.43× compared to the base system, while reducing system power consumption by 71.6% (ResNet50-v1.5) to 77.8% (Llama 7B). Full article
Show Figures

Figure 1

21 pages, 10117 KB  
Article
Activity-Independent Estimation of VO2max from Short-Duration Multimodal Wearable Signals
by Laura Saldaña-Aristizábal, Jhonathan L. Rivas-Caicedo, Kevin Niño-Tejada and Juan F. Patarroyo-Montenegro
Electronics 2026, 15(13), 2843; https://doi.org/10.3390/electronics15132843 - 30 Jun 2026
Viewed by 27
Abstract
Cardiorespiratory fitness is a key indicator of overall health, yet its assessment still largely depends on structured protocols such as cardiopulmonary exercise testing (CPET), which require specialized equipment, trained personnel, and controlled laboratory conditions that limit accessibility. Wearable sensing technologies offer a practical [...] Read more.
Cardiorespiratory fitness is a key indicator of overall health, yet its assessment still largely depends on structured protocols such as cardiopulmonary exercise testing (CPET), which require specialized equipment, trained personnel, and controlled laboratory conditions that limit accessibility. Wearable sensing technologies offer a practical alternative by continuously capturing physiological and biomechanical signals during daily life. However, most wearable-based approaches remain constrained by activity-specific modeling, structured exercise protocols, or prolonged monitoring periods, limiting generalization across real-world behaviors. This work presents an activity-independent machine learning framework for estimating VO2max from short-duration multimodal wearable signals acquired during semi-structured real-world daily activities. The proposed two-stage framework first estimates the metabolic equivalent of task (MET) as a continuous representation of activity intensity, then integrates this estimate with physiological, biomechanical, and demographic features to predict subject-level VO2max. By decoupling physiological demand from explicit activity labels, the framework improves robustness to unseen activities while preserving physiological interpretability. Evaluation under the Leave-One-Subject-Out validation protocol demonstrates that short-duration wearable-derived signals encode meaningful information related to inter-subject differences in cardiorespiratory fitness. These findings support the feasibility of activity-independent, wearable-based fitness estimation and provide a practical foundation for scalable preventive health monitoring in everyday life. Full article
(This article belongs to the Special Issue Ubiquitous Computing and Mobile Computing)
Show Figures

Figure 1

18 pages, 2842 KB  
Article
CollectivIA: Two-Pipeline Multilingual Legal RAG for Moroccan Territorial Governance with LLM-Assisted and Regex-Based Chunking
by Firiel Zouak, Omar El Beqqali and Jamal Riffi
Big Data Cogn. Comput. 2026, 10(7), 212; https://doi.org/10.3390/bdcc10070212 - 29 Jun 2026
Viewed by 74
Abstract
Retrieval grounding is crucial for high-stakes administrative applications, since large language models remain prone to hallucinations when addressing legal questions. This problem is particularly relevant in Moroccan territorial governance, where official legislative PDFs have highly heterogeneous digital quality, user interactions often occur in [...] Read more.
Retrieval grounding is crucial for high-stakes administrative applications, since large language models remain prone to hallucinations when addressing legal questions. This problem is particularly relevant in Moroccan territorial governance, where official legislative PDFs have highly heterogeneous digital quality, user interactions often occur in Moroccan Darija, and the legal corpus is bilingual Arabic–French. This paper presents CollectivIA, a multilingual Retrieval-Augmented Generation system implemented for Moroccan territorial governance law. The system supports queries in French, Arabic, and Moroccan Darija and indexes 2272 article-level segments from sixteen official legislative documents. We compare two end-to-end retrieval pipelines: an LLM-assisted semantic chunking architecture using Gemini and ChromaDB and a regex-based chunking architecture using FAISS. Based on an expanded multilingual benchmark of 150 legal queries, with 50 queries per language group, the LLM-assisted pipeline achieved higher RAGAS scores than the regex-based pipeline, particularly improving Context Precision from 0.315 to 0.818. The multimodal Vision fallback successfully recovered 456 articles, which remained inaccessible under the regex-based pipeline. Overall, the LLM-assisted pipeline yielded legal boundaries with greater coherence and retrieved contexts with higher focus, while the regex-based design maintained a broader source diversity. These results suggest that LLM-assisted semantic chunking with multimodal fallback is a promising approach to enhance multilingual legal RAG over heterogeneous Moroccan legal corpora. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))
Show Figures

Figure 1

32 pages, 8144 KB  
Article
Evaluating In-Vehicle Multimodal Interaction via Multimodal Behavioral Signals: A Theory-Driven Tool Chain and Sim-to-Real Pilot Study
by Xinyi Li, Gang Guo, Qihang Sun, Yingzhang Wu and Wenbo Li
Multimodal Technol. Interact. 2026, 10(7), 73; https://doi.org/10.3390/mti10070073 - 29 Jun 2026
Viewed by 139
Abstract
Multitasking is pervasive in multimodal interaction, particularly within safety-critical domains like driving. Evaluating the impact of In-Vehicle Multimodal Interaction (IVMI) on drivers is critical, yet existing methods predominantly rely on post hoc subjective surveys or coarse unimodal monitoring. Grounded in Multiple Resource Theory [...] Read more.
Multitasking is pervasive in multimodal interaction, particularly within safety-critical domains like driving. Evaluating the impact of In-Vehicle Multimodal Interaction (IVMI) on drivers is critical, yet existing methods predominantly rely on post hoc subjective surveys or coarse unimodal monitoring. Grounded in Multiple Resource Theory and following a Research through Design methodology, we operationalized this theory into a non-intrusive tool chain that evaluates IVMI impact from multimodal behavioral signals (visual, touch, and driving) and supports real-time, objective evaluation in both simulated and real-world domains. To mitigate the Sim-to-Real gap, the method combines real-world multimodal data acquisition with a modality-decoupled cross-domain calibration. Its feasibility was evaluated through a simulator study (n=27) and a small-nscale real-world on-road pilot study (n=3). The results suggest that the tool chain effectively acquires high-fidelity data to support the previously developed evaluation model (Quadratic Weighted Kappa = 0.916) and achieves a preliminary calibration of cross-domain latent feature spaces. As its reference labels are behaviorally derived and share a common basis with the model inputs, this agreement indicates internal consistency rather than independent construct validation. Crucially, while multimodal interaction behaviors (visual and touch) exhibited relatively high cross-domain consistency, real-world driving behaviors showed systematic magnitude suppression. This finding is tentatively interpreted, as a hypothesis to be tested in future work, through the lens of Risk Homeostasis Theory, and highlights the necessity of monitoring multimodal interaction behaviors rather than relying solely on vehicle telemetry. Overall, this research develops and provides preliminary feasibility evidence for a theory-driven cross-domain tool chain, indicating its potential to objectively quantify multimodal interaction impacts in real-world multitasking contexts. Given the small, homogeneous on-road sample, these pilot-stage results should be read as feasibility evidence and a methodological basis for future large-scale, demographically diverse validation. Full article
Show Figures

Figure 1

25 pages, 510 KB  
Article
Adaptive Self-Attention Graph Pooling for Drug–Target Affinity Prediction
by Changli Li and Guangyue Li
Int. J. Mol. Sci. 2026, 27(13), 5861; https://doi.org/10.3390/ijms27135861 - 29 Jun 2026
Viewed by 142
Abstract
Drug–target affinity (DTA) prediction is a critical step in drug discovery and precision medicine. Although graph neural networks (GNNs) have achieved remarkable progress, existing graph pooling methods rely on fixed ratios, failing to adapt to the structural diversity of molecules and proteins, which [...] Read more.
Drug–target affinity (DTA) prediction is a critical step in drug discovery and precision medicine. Although graph neural networks (GNNs) have achieved remarkable progress, existing graph pooling methods rely on fixed ratios, failing to adapt to the structural diversity of molecules and proteins, which leads to information loss or redundant feature retention. To address this issue, we propose the Adaptive Self-Attention Graph Pooling (ASAGPooling) mechanism, which introduces a learnable pooling ratio that dynamically adjusts node retention during training. Furthermore, we develop ASAG-DTA, a multi-modal framework that integrates GNNs with Transformers to jointly model molecular graphs, protein contact maps, SMILES sequences, and FASTA sequences. While ASAGPooling achieves competitive prediction accuracy (MSE = 0.186 on Davis), we acknowledge that it does not surpass the state-of-the-art DynHeter-DTA (MSE = 0.130), which incorporates a more complex dynamic heterogeneous graph architecture. Instead, the key contribution of ASAGPooling lies in its adaptability, interpretability, and computational efficiency. It can eliminate the need for manually tuned pooling ratios, enable direct visualization of retained key residues/atoms, and reduce model complexity. This makes ASAG-DTA a practical lightweight alternative for large-scale virtual screening scenarios where computational resources are constrained. Full article
(This article belongs to the Section Molecular Informatics)
Show Figures

Figure 1

26 pages, 2182 KB  
Review
An Overview of Large Agricultural Models: Current Status, Applications, and Future Perspectives
by Rui Guo, Dongbo Wang, Xue Zhao and Haotian Hu
Agriculture 2026, 16(13), 1419; https://doi.org/10.3390/agriculture16131419 - 29 Jun 2026
Viewed by 92
Abstract
With the rapid development of general artificial intelligence, large models have gradually become the key force driving the digital transformation of the field. Agriculture has distinct domain characteristics, and traditional deep learning models are difficult to meet its cross-regional and cross-task requirements. Large [...] Read more.
With the rapid development of general artificial intelligence, large models have gradually become the key force driving the digital transformation of the field. Agriculture has distinct domain characteristics, and traditional deep learning models are difficult to meet its cross-regional and cross-task requirements. Large models specifically designed for the agricultural field can integrate multi-source data and prior knowledge to break through this bottleneck. Therefore, tracking the development trend of large agricultural models is an important prerequisite for building new, quality productive forces in smart agriculture and promoting the digital transformation of agriculture. This article conducts a literature search and review around the research on large agricultural models, following the PRISMA guidelines. It combines the keywords of large models, crops, livestock breeding, etc., and only includes journal papers from 2022 to 2026, totaling 713 articles. Then, it performs topic modeling to deeply clarify the current research and application status, and summarizes the challenges faced and makes future research prospects. Existing evidence indicates that current large agricultural models are gradually developing towards agents and embodied intelligence, and are widely applied in scenarios such as agricultural knowledge services, pest and disease diagnosis and prevention, livestock and fishery breeding, and smart agricultural machinery control. However, they still face many key challenges, and further exploration is needed in theoretical methods and practical applications. In the future, research can be further deepened and expanded in areas such as the construction of high-quality data sets, the construction of domain evaluation systems, strengthening model reliability, building multi-agent systems, and lightweight deployment of large models and embodied intelligence. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
53 pages, 1656 KB  
Review
Application of Artificial Intelligence Algorithms in the Comprehensive Care of Patients with Breast Cancer
by Dorota Bartusik-Aebisher, Sara Czech, Jakub Szpara, Avijit Paul, Marvin Xavierselvan and David Aebisher
Algorithms 2026, 19(7), 524; https://doi.org/10.3390/a19070524 - 29 Jun 2026
Viewed by 247
Abstract
Breast cancer remains one of the most significant challenges in modern oncology, while advances in artificial intelligence (AI) are creating new opportunities to improve diagnosis, prognosis, and treatment personalization. The aim of this review was to summarize current and emerging applications of AI [...] Read more.
Breast cancer remains one of the most significant challenges in modern oncology, while advances in artificial intelligence (AI) are creating new opportunities to improve diagnosis, prognosis, and treatment personalization. The aim of this review was to summarize current and emerging applications of AI in the comprehensive care of patients with breast cancer. This study was conducted as a structured narrative review with elements of integrative evidence synthesis based on publications retrieved from PubMed/MEDLINE, Scopus, Web of Science, and Embase. The review included studies evaluating machine learning and deep learning approaches, such as support vector machines, random forests, convolutional neural networks, Vision Transformers, foundation models, self-supervised learning, federated learning, and multimodal AI systems. The strongest clinical evidence currently concerns AI-supported mammographic screening, where large prospective and real-world studies suggest improvements in cancer detection and workflow efficiency. Applications involving MRI, ultrasound, histopathology, molecular prediction, treatment-response assessment, and treatment selection have shown promising performance, but most remain investigational because of limited prospective multicenter validation. Emerging approaches integrating imaging, pathological, molecular, and clinical data show considerable potential for precision oncology. AI may also support treatment selection, patient monitoring, and survivorship care. Despite promising results, widespread clinical implementation remains limited by challenges related to data heterogeneity, model interpretability, external validation, and integration into clinical workflows. Further prospective multicenter studies are required to establish the safety, reliability, and clinical utility of AI-driven systems in breast cancer care. Full article
Show Figures

Figure 1

Back to TopTop