Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (5,212)

Search Parameters:
Keywords = tasking capability

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 1621 KB  
Article
Transfer Learning Approach with Features Block Selection via Genetic Algorithm for High-Imbalance and Multi-Label Classification of HPA Confocal Microscopy Images
by Vincenzo Taormina, Domenico Tegolo and Cesare Valenti
Bioengineering 2025, 12(12), 1379; https://doi.org/10.3390/bioengineering12121379 - 18 Dec 2025
Abstract
Advances in deep learning are impressive in various fields and have achieved performance beyond human capabilities in tasks such as image classification, as demonstrated in competitions such as the ImageNet Large Scale Visual Recognition Challenge. Nonetheless, complex applications like medical imaging continue to [...] Read more.
Advances in deep learning are impressive in various fields and have achieved performance beyond human capabilities in tasks such as image classification, as demonstrated in competitions such as the ImageNet Large Scale Visual Recognition Challenge. Nonetheless, complex applications like medical imaging continue to present significant challenges; a prime example is the Human Protein Atlas (HPA) dataset, which is computationally challenging and complex due to the high-class imbalance with the presence of rare patterns and the need for multi-label classification. It includes 28 distinct patterns and more than 500 unique label combinations, with protein localization that can appear in different cellular regions such as the nucleus, the cytoplasm, and the nuclear membrane. Moreover, the dataset provides four distinct channels for each sample, adding to its complexity, with green representing the target protein, red indicating microtubules, blue showing the nucleus, and yellow depicting the endoplasmic reticulum. We propose a two-phase transfer learning approach based on feature-block extraction from twelve ImageNet-pretrained CNNs. In the first phase, we address single-label multiclass classification using CNNs as feature extractors combined with SVM classifiers on a subset of the HPA dataset. We demonstrate that the simple concatenation of feature blocks extracted from different CNNs improves performance. Furthermore, we apply a genetic algorithm to select the sub-optimal combination of feature blocks. In the second phase, based on the results of the previous stage, we apply two simple multi-label classification strategies and compare their performance with four classifiers. Our method integrates image-level and cell-level analysis. At the image level, we assess the discriminative contribution of individual and combined channels, showing that the green channel is the strongest individually but benefits from combinations with red and yellow. At the cellular level, we extract features from the nucleus and nuclear-membrane ring, an analysis not previously explored in the HPA literature, which proves effective for recognizing rare patterns. Combining these perspectives enhances the detection of rare classes, achieving an F1 score of 0.8 for “Rods & Rings”, outperforming existing approaches. Accurate identification of rare patterns is essential for biological and clinical applications, underscoring the significance of our contribution. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

38 pages, 3484 KB  
Article
From Prompts to Paths: Large Language Models for Zero-Shot Planning in Unmanned Ground Vehicle Simulation
by Kelvin Olaiya, Giovanni Delnevo, Chan-Tong Lam, Giovanni Pau and Paola Salomoni
Drones 2025, 9(12), 875; https://doi.org/10.3390/drones9120875 - 18 Dec 2025
Abstract
This paper explores the capability of Large Language Models (LLMs) to perform zero-shot planning through multimodal reasoning, with a particular emphasis on applications to Unmanned Ground Vehicles (UGVs) and unmanned platforms in general. We present a modular system architecture that integrates a general-purpose [...] Read more.
This paper explores the capability of Large Language Models (LLMs) to perform zero-shot planning through multimodal reasoning, with a particular emphasis on applications to Unmanned Ground Vehicles (UGVs) and unmanned platforms in general. We present a modular system architecture that integrates a general-purpose LLM with visual and spatial inputs for adaptive planning to iteratively guide UGV behavior. Although the framework is demonstrated in a ground-based setting, it directly extends to other unmanned systems, where semantic reasoning and adaptive planning are increasingly critical for autonomous mission execution. To assess performance, we employ a continuous evaluation metric that jointly considers distance and orientation, offering a more informative and fine-grained alternative to binary success measures. We evaluate a foundational LLM (i.e., Gemini 2.0 Flash, Google DeepMind) on a suite of zero-shot navigation and exploration tasks in simulated environments. Unlike prior LLM-robot systems that rely on fine-tuning or learned waypoint policies, we evaluate a purely zero-shot, stepwise LLM planner that receives no task demonstrations and reasons only from the sensed data. Our findings show that LLMs exhibit encouraging signs of goal-directed spatial planning and partial task completion, even in a zero-shot setting. However, inconsistencies in plan generation across models highlight the need for task-specific adaptation or fine-tuning. These findings highlight the potential of LLM-based multimodal reasoning to enhance autonomy in UGV and drone navigation, bridging high-level semantic understanding with robust spatial planning. Full article
(This article belongs to the Special Issue Advances in Guidance, Navigation, and Control)
Show Figures

Figure 1

21 pages, 2054 KB  
Article
Attack Detection of Federated Learning Model Based on Attention Mechanism Optimization in Connected Vehicles
by Lanying Liu, Fujun Wang and Ning Du
World Electr. Veh. J. 2025, 16(12), 679; https://doi.org/10.3390/wevj16120679 - 18 Dec 2025
Abstract
To address the problem of decreased model accuracy and poor global aggregation performance among existing methods in non-independent and identically distributed (non-IID) data backgrounds, the author proposes a method for attack detection in the Internet of Vehicles based on the attention mechanism optimization [...] Read more.
To address the problem of decreased model accuracy and poor global aggregation performance among existing methods in non-independent and identically distributed (non-IID) data backgrounds, the author proposes a method for attack detection in the Internet of Vehicles based on the attention mechanism optimization of federated learning models. The author uses a combination of CNN and LSTM as the basic detection framework, integrating self-attention modules to optimize the spatiotemporal feature modeling effect. At the same time, an adaptive aggregation algorithm based on attention weights was designed in the federated aggregation stage, providing the model with stronger stability and generalization ability when dealing with data differences among nodes. In order to comprehensively evaluate the performance of the model, the experimental part is based on real datasets such as CICDDoS2019. The experimental results show that the federated learning model based on attention mechanism optimization proposed by the author demonstrates significant advantages in the task of detecting vehicle networking attacks. Compared with traditional methods, the new model improves attack detection accuracy by more than 5% in non-IID data environments, accelerates aggregation convergence speed, reduces aggregation epochs by more than 20%, and achieves stronger data privacy protection and real-time defense capabilities. Conclusion: This method not only improves the adaptability of the model in complex vehicle networking environments, but also effectively reduces the overall computational and communication overhead of the system. Full article
(This article belongs to the Section Automated and Connected Vehicles)
Show Figures

Figure 1

11 pages, 669 KB  
Article
Sensorimotor Parameters Predict Performance on the Bead Maze Hand Function Test
by Vivian L. Rose, Komal K. Kukkar, Tzuan A. Chen and Pranav J. Parikh
Sensors 2025, 25(24), 7670; https://doi.org/10.3390/s25247670 - 18 Dec 2025
Abstract
Understanding the forces imparted onto an object during manipulation can shed light on the quality of daily manual behaviors. We have developed an objective measure of the quality of hand function in children, the Bead Maze Hand Function Test, which quantifies how well [...] Read more.
Understanding the forces imparted onto an object during manipulation can shed light on the quality of daily manual behaviors. We have developed an objective measure of the quality of hand function in children, the Bead Maze Hand Function Test, which quantifies how well the individual performs the activity by integrating measures of time and force control. Our main objectives were to examine associations between performance (total force output) on the Bead Maze Hand Function Test (BMHFT) and (1) performance on a sensitive measure of force scaling obtained on a laboratory-based dexterous manipulation task, and (2) general sensory and motor parameters important for fine motor skills. A total of 39 typically developing participants ranging in age from 5 to 10 years old (n = 28) and 15 to 17 years (n = 11). We found that the anticipatory coordination of digit forces was the best predictor of performance on the Bead Maze Hand Function test. We also found that factors such as age, gender, and pinch strength were associated with the BMHFT performance. These findings support the integration of more sensitive sensorimotor metrics, such as the total applied force, into clinical assessments. Linking the development of sensorimotor capabilities to functional task performance may facilitate more targeted and effective intervention strategies, ultimately improving a child’s participation in daily activities. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

21 pages, 1957 KB  
Article
Temporal Capsule Feature Network for Eye-Tracking Emotion Recognition
by Qingfeng Gu, Jiannan Chi, Cong Zhang, Boxiang Cao, Jiahui Liu and Yu Wang
Brain Sci. 2025, 15(12), 1343; https://doi.org/10.3390/brainsci15121343 - 18 Dec 2025
Abstract
Eye Tracking (ET) parameters, as physiological signals, are widely applied in emotion recognition and show promising performance. However, emotion recognition relying on ET parameters still faces several challenges: (1) insufficient extraction of temporal dynamic information from the ET parameters; (2) a lack of [...] Read more.
Eye Tracking (ET) parameters, as physiological signals, are widely applied in emotion recognition and show promising performance. However, emotion recognition relying on ET parameters still faces several challenges: (1) insufficient extraction of temporal dynamic information from the ET parameters; (2) a lack of sophisticated features with strong emotional specificity, which restricts the model’s robustness and individual generalization capability. To address these issues, we propose a novel Temporal Capsule Feature Network (TCFN) for ET parameter-based emotion recognition. The network incorporates a Window Feature Module to extract Eye Movement temporal dynamic information and a specialized Capsule Network Module to mine complementary and collaborative relationships among features. The MLP Classification Module realizes feature-to-category conversion, and a Dual-Loss Mechanism is integrated to optimize overall performance. Experimental results demonstrate the superiority of the proposed model: the average accuracy reaches 83.27% for Arousal and 89.94% for Valence (three-class tasks) on the eSEE-d dataset, and the accuracy rate of four-category across-session emotion recognition is 63.85% on the SEED-IV dataset. Full article
(This article belongs to the Section Behavioral Neuroscience)
Show Figures

Figure 1

31 pages, 5350 KB  
Article
Deep Learning-Based Fatigue Monitoring in Natural Environments: Multi-Level Fatigue State Classification
by Yuqi Wang, Ruochen Dang, Bingliang Hu and Quan Wang
Bioengineering 2025, 12(12), 1374; https://doi.org/10.3390/bioengineering12121374 - 18 Dec 2025
Abstract
In today’s fast-paced world, the escalating workloads faced by individuals have rendered fatigue a pressing concern that cannot be overlooked. Fatigue not only signals the need for individuals to take a break but also has far-reaching implications for both individuals and society across [...] Read more.
In today’s fast-paced world, the escalating workloads faced by individuals have rendered fatigue a pressing concern that cannot be overlooked. Fatigue not only signals the need for individuals to take a break but also has far-reaching implications for both individuals and society across various domains, including health, safety, productivity, and the economy. While numerous prior studies have explored fatigue monitoring, many of them have been conducted within controlled experimental settings. These experiments typically require subjects to engage in specific tasks over extended periods to induce profound fatigue. However, there has been a limited focus on assessing daily fatigue in natural, real-world environments. To address this gap, this study introduces a daily fatigue monitoring system. We have developed a wearable device capable of capturing subjects’ ECG signals in their everyday lives. We recruited 12 subjects to participate in a 14-day fatigue monitoring experiment. Leveraging the acquired ECG data, we propose machine learning models based on manually extracted features as well as a deep learning model called C-BL to classify subjects’ fatigue levels into three categories: normal, slight fatigue, and fatigued. Our results demonstrate that the proposed end-to-end deep learning model outperforms other approaches with an accuracy rate of 83.3%, establishing its reliability for daily fatigue monitoring. Full article
(This article belongs to the Special Issue Computational Intelligence for Healthcare)
Show Figures

Figure 1

31 pages, 1805 KB  
Article
Fractional-Order African Vulture Optimization for Optimal Power Flow and Global Engineering Optimization
by Abdul Wadood, Hani Albalawi, Shahbaz Khan, Bakht Muhammad Khan and Aadel Mohammed Alatwi
Fractal Fract. 2025, 9(12), 825; https://doi.org/10.3390/fractalfract9120825 - 17 Dec 2025
Abstract
This paper proposes a novel fractional-order African vulture optimization algorithm (FO-AVOA) for solving the optimal reactive power dispatch (ORPD) problem. By integrating fractional calculus into the conventional AVOA framework, the proposed method enhances the exploration–exploitation balance, accelerates convergence, and improves solution robustness. The [...] Read more.
This paper proposes a novel fractional-order African vulture optimization algorithm (FO-AVOA) for solving the optimal reactive power dispatch (ORPD) problem. By integrating fractional calculus into the conventional AVOA framework, the proposed method enhances the exploration–exploitation balance, accelerates convergence, and improves solution robustness. The ORPD problem is formulated as a constrained optimization task with the objective of minimizing real power losses while satisfying generator voltage limits, transformer tap ratios, and reactive power compensator constraints. The general optimization capability of the FO-AVOA is verified using the CEC 2017, 2020, and 2022 benchmark functions. In addition, the method is applied to the IEEE 30-bus and IEEE 57-bus test systems. The results demonstrate significant power loss reductions of up to 15.888% and 24.39% for the IEEE 30-bus and IEEE 57-bus systems, respectively, compared with the conventional AVOA and other state-of-the-art optimization algorithms, along with strong robustness and stability across independent runs. These findings confirm the effectiveness of the FO-AVOA as a reliable optimization tool for modern power system applications. Full article
37 pages, 8987 KB  
Article
A Method for UAV Path Planning Based on G-MAPONet Reinforcement Learning
by Jian Deng, Honghai Zhang, Yuetan Zhang, Mingzhuang Hua and Yaru Sun
Drones 2025, 9(12), 871; https://doi.org/10.3390/drones9120871 - 17 Dec 2025
Abstract
To address the issues of efficiency and robustness in UAV trajectory planning under complex environments, this paper proposes a Graph Multi-Head Attention Policy Optimization Network (G-MAPONet) algorithm that integrates Graph Attention (GAT), Multi-Head Attention (MHA), and Group Relative Policy Optimization (GRPO). The algorithm [...] Read more.
To address the issues of efficiency and robustness in UAV trajectory planning under complex environments, this paper proposes a Graph Multi-Head Attention Policy Optimization Network (G-MAPONet) algorithm that integrates Graph Attention (GAT), Multi-Head Attention (MHA), and Group Relative Policy Optimization (GRPO). The algorithm adopts a three-layer architecture of “GAT layer for local feature perception–MHA for global semantic reasoning–GRPO for policy optimization”, comprehensively achieving the goals of dynamic graph convolution quantization and global adaptive parallel decoupled dynamic strategy adjustment. Comparative experiments in multi-dimensional spatial environments demonstrate that the Gat_Mha combined mechanism exhibits significant superiority compared to single attention mechanisms, which verifies the efficient representation capability of the dual-layer hybrid attention mechanism in capturing environmental features. Additionally, ablation experiments integrating Gat, Mha, and GRPO algorithms confirm that the dual-layer fusion mechanism of Gat and Mha yields better improvement effects. Finally, comparisons with traditional reinforcement learning algorithms across multiple performance metrics show that the G-MAPONet algorithm reduces the number of convergence episodes (NCE) by an average of more than 19.14%, increases the average reward (AR) by over 16.20%, and successfully completes all dynamic path planning (PPTC) tasks; meanwhile, the algorithm’s reward values and obstacle avoidance success rate are significantly higher than those of other algorithms. Compared with the baseline APF algorithm, its reward value is improved by 8.66%, and the obstacle avoidance repetition rate is also enhanced, which further verifies the effectiveness of the improved G-MAPONet algorithm. In summary, through the dual-layer complementary mode of GAT and MHA, the G-MAPONet algorithm overcomes the bottlenecks of traditional dynamic environment modeling and multi-scale optimization, enhances the decision-making capability of UAVs in unstructured environments, and provides a new technical solution for trajectory planning in intelligent logistics and distribution. Full article
Show Figures

Figure 1

33 pages, 2685 KB  
Review
Predicting Coastal Flooding and Overtopping with Machine Learning: Review and Future Prospects
by Moeketsi L. Duiker, Victor Ramos, Francisco Taveira-Pinto and Paulo Rosa-Santos
J. Mar. Sci. Eng. 2025, 13(12), 2384; https://doi.org/10.3390/jmse13122384 - 16 Dec 2025
Abstract
Flooding and overtopping are major concerns in coastal areas due to their potential to cause severe damage to infrastructure, economic activities, and human lives. Traditional methods for predicting these phenomena include numerical and physical models, as well as empirical formulations. However, these methods [...] Read more.
Flooding and overtopping are major concerns in coastal areas due to their potential to cause severe damage to infrastructure, economic activities, and human lives. Traditional methods for predicting these phenomena include numerical and physical models, as well as empirical formulations. However, these methods have limitations, such as the high computational costs, reliance on extensive field data, and reduced accuracy under complex conditions. Recent advances in machine learning (ML) offer new opportunities to improve predictive capabilities in coastal engineering. This paper reviews ML applications for coastal flooding and overtopping prediction, analyzing commonly used models, data sources, and preprocessing techniques. Several studies report that ML models can match or exceed the performance of traditional approaches, such as empirical EurOtop formulas or high-fidelity numerical models, particularly in controlled laboratory datasets where numerical models are computationally intensive and empirical methods show larger estimation errors. However, their advantages remain task- and data-dependent, and their generalization and interpretability may lag behind physics-based methods. This review also examines recent developments, such as hybrid approaches, real-time monitoring, and explainable artificial intelligence, which show promise in addressing these limitations and advancing the operational use of ML in coastal flooding and overtopping prediction. Full article
(This article belongs to the Special Issue Coastal Disaster Assessment and Response—2nd Edition)
Show Figures

Figure 1

26 pages, 1232 KB  
Article
DLF: A Deep Active Ensemble Learning Framework for Test Case Generation
by Yaogang Lu, Yibo Peng and Dongqing Zhu
Information 2025, 16(12), 1109; https://doi.org/10.3390/info16121109 - 16 Dec 2025
Abstract
High-quality test cases are vital for ensuring software reliability and security. However, existing symbolic execution tools generally rely on single-path search strategies, have limited feature extraction capability, and exhibit unstable model predictions. These limitations make them prone to local optima in complex or [...] Read more.
High-quality test cases are vital for ensuring software reliability and security. However, existing symbolic execution tools generally rely on single-path search strategies, have limited feature extraction capability, and exhibit unstable model predictions. These limitations make them prone to local optima in complex or cross-scenario tasks and hinder their ability to balance testing quality with execution efficiency. To address these challenges, this paper proposes a Deep Active Ensemble Learning Framework for symbolic execution path exploration. During training, the framework integrates active learning with ensemble learning to reduce annotation costs and improve model robustness, while constructing a heterogeneous model pool to leverage complementary model strengths. In the testing stage, a dynamic ensemble mechanism based on sample similarity adaptively selects the optimal predictive model to guide symbolic path exploration. In addition, a gated graph neural network is employed to extract structural and semantic features from the control flow graph, improving program behavior understanding. To balance efficiency and coverage, a dynamic sliding window mechanism based on branch density enables real-time window adjustment under path complexity awareness. Experimental results on multiple real-world benchmark programs show that the proposed framework detects up to 16 vulnerabilities and achieves a cumulative 27.5% increase in discovered execution paths in hybrid fuzzing. Furthermore, the dynamic sliding window mechanism raises the F1 score to 93%. Full article
Show Figures

Graphical abstract

21 pages, 1406 KB  
Article
Receipt Information Extraction with Joint Multi-Modal Transformer and Rule-Based Model
by Xandru Mifsud, Leander Grech, Adriana Baldacchino, Léa Keller, Gianluca Valentino and Adrian Muscat
Mach. Learn. Knowl. Extr. 2025, 7(4), 167; https://doi.org/10.3390/make7040167 - 16 Dec 2025
Viewed by 101
Abstract
A receipt information extraction task requires both textual and spatial analyses. Early receipt analysis systems primarily relied on template matching to extract data from spatially structured documents. However, these methods lack generalizability across various document layouts and require defining the specific spatial characteristics [...] Read more.
A receipt information extraction task requires both textual and spatial analyses. Early receipt analysis systems primarily relied on template matching to extract data from spatially structured documents. However, these methods lack generalizability across various document layouts and require defining the specific spatial characteristics of unseen document sources. The advent of convolutional and recurrent neural networks has led to models that generalize better over unseen document layouts, and more recently, multi-modal transformer-based models, which consider a combination of text, visual, and layout inputs, have led to an even more significant boost in document-understanding capabilities. This work focuses on the joint use of a neural multi-modal transformer and a rule-based model and studies whether this combination achieves higher performance levels than the transformer on its own. A comprehensively annotated dataset, comprising real-world and synthetic receipts, was specifically developed for this study. The open source optical character recognition model DocTR was used to textually scan receipts and, together with an image, provided input to the classifier model. The open-source pre-trained LayoutLMv3 transformer-based model was augmented with a classifier model head, which was trained for classifying textual data into 12 predefined labels, such as date, price, and shop name. The methods implemented in the rule-based model were manually designed and consisted of four types: pattern-matching rules based on regular expressions and logic, database search-based methods for named entities, spatial pattern discovery guided by statistical metrics, and error correcting mechanisms based on confidence scores and local distance metrics. Following hyperparameter tuning of the classifier head and the integration of a rule-based model, the system achieved an overall F1 score of 0.98 in classifying textual data, including line items, from receipts. Full article
Show Figures

Figure 1

27 pages, 814 KB  
Article
Concurrency Bug Detection via Static Analysis and Large Language Models
by Zuocheng Feng, Yiming Chen, Kaiwen Zhang, Xiaofeng Li and Guanjun Liu
Future Internet 2025, 17(12), 578; https://doi.org/10.3390/fi17120578 - 15 Dec 2025
Viewed by 47
Abstract
Concurrency bugs originate from complex and improper synchronization of shared resources, presenting a significant challenge for detection. Traditional static analysis relies heavily on expert knowledge and frequently fails when code is non-compilable. Conversely, large language models struggle with semantic sparsity, inadequate comprehension of [...] Read more.
Concurrency bugs originate from complex and improper synchronization of shared resources, presenting a significant challenge for detection. Traditional static analysis relies heavily on expert knowledge and frequently fails when code is non-compilable. Conversely, large language models struggle with semantic sparsity, inadequate comprehension of concurrent semantics, and the tendency to hallucinate. To address the limitations of static analysis in capturing complex concurrency semantics and the hallucination risks associated with large language models, this study proposes ConSynergy. This novel framework integrates the structural rigor of static analysis with the semantic reasoning capabilities of large language models. The core design employs a robust task decomposition strategy that decomposes concurrency bug detection into a four-stage pipeline: shared resource identification, concurrency-aware slicing, data-flow reasoning, and formal verification. This approach fundamentally mitigates hallucinations from large language models caused by insufficient program context. First, the framework identifies shared resources and applies a concurrency-aware program slicing technique to precisely extract concurrency-related structural features, thereby alleviating semantic sparsity. Second, to enhance the large language model’s comprehension of concurrent semantics, we design a concurrency data-flow analysis based on Chain-of-Thought prompting. Third, the framework incorporates a Satisfiability Modulo Theories solver to ensure the reliability of detection results, alongside an iterative repair mechanism based on large language models that dramatically reduces dependency on code compilability. Extensive experiments on three mainstream concurrency bug datasets, including DataRaceBench, the concurrency subset of Juliet, and DeepRace, demonstrate that ConSynergy achieves an average precision and recall of 80.0% and 87.1%, respectively. ConSynergy outperforms state-of-the-art baselines by 10.9% to 68.2% in average F1 score, demonstrating significant potential for practical application. Full article
Show Figures

Figure 1

16 pages, 4368 KB  
Article
DistMLLM: Enhancing Multimodal Large Language Model Serving in Heterogeneous Edge Computing
by Xingyu Yuan, Hui Chen, Lei Liu and He Li
Sensors 2025, 25(24), 7612; https://doi.org/10.3390/s25247612 - 15 Dec 2025
Viewed by 78
Abstract
Multimodal Large Language Models (MLLMs) offer powerful capabilities for processing and generating text, image, and audio data, enabling real-time intelligence in diverse applications. Deploying MLLM services at the edge can reduce transmission latency and enhance responsiveness, but it also introduces significant challenges due [...] Read more.
Multimodal Large Language Models (MLLMs) offer powerful capabilities for processing and generating text, image, and audio data, enabling real-time intelligence in diverse applications. Deploying MLLM services at the edge can reduce transmission latency and enhance responsiveness, but it also introduces significant challenges due to the high computational demands of these models and the heterogeneity of edge devices. In this paper, we propose DistMLLM, a profit-oriented framework that enables efficient MLLM service deployment in heterogeneous edge environments. DistMLLM disaggregates multimodal tasks into encoding and inference stages, assigning them to different devices based on capability. To optimize task allocation under uncertain device conditions and competing provider interests, it employs a multi-agent bandit algorithm that jointly learns and schedules encoder and inference tasks. Extensive simulations demonstrate that DistMLLM consistently achieves higher long-term profit and lower regret than strong baselines, offering a scalable and adaptive solution for edge-based MLLM services. Full article
(This article belongs to the Special Issue Edge Computing for Beyond 5G and Wireless Sensor Networks)
Show Figures

Figure 1

18 pages, 2485 KB  
Article
Adaptive Token Boundaries: Towards Integrating Human Chunking Mechanisms into Multimodal LLMs
by Dongxing Yu
Information 2025, 16(12), 1106; https://doi.org/10.3390/info16121106 - 15 Dec 2025
Viewed by 73
Abstract
Recent advancements in multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing diverse data types, yet significant disparities persist between human cognitive processes and computational approaches to multimodal information integration. This research presents a systematic investigation into the parallels between human [...] Read more.
Recent advancements in multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing diverse data types, yet significant disparities persist between human cognitive processes and computational approaches to multimodal information integration. This research presents a systematic investigation into the parallels between human cross-modal chunking mechanisms and token representation methodologies in MLLMs. Through empirical studies comparing human performance patterns with model behaviors across visual–linguistic tasks, we demonstrate that conventional static tokenization schemes fundamentally constrain current models’ capacity to simulate the dynamic, context-sensitive nature of human information processing. We propose a novel framework for dynamic cross-modal tokenization that incorporates adaptive boundaries, hierarchical representations, and alignment mechanisms grounded in cognitive science principles. Quantitative evaluations demonstrate that our approach yields statistically significant improvements over state-of-the-art models on benchmark tasks (+7.8% on Visual Question Answering (p < 0.001), 5.3% on Complex Scene Description) while exhibiting more human-aligned error patterns and attention distributions. These findings contribute to the theoretical understanding of the relationship between human cognition and artificial intelligence, while providing empirical evidence for developing more cognitively plausible AI systems. Full article
Show Figures

Graphical abstract

46 pages, 10909 KB  
Article
NDFNGO: Enhanced Northern Goshawk Optimization Algorithm for Image Segmentation
by Xiajie Zhao, Zuowen Bao, Yu Shao and Na Liang
Biomimetics 2025, 10(12), 837; https://doi.org/10.3390/biomimetics10120837 - 15 Dec 2025
Viewed by 63
Abstract
The gradual deterioration of fresco pictorial information presents a formidable obstacle for conservators dedicated to protecting humanity’s shared cultural legacy. Currently, scholars in the field of mural conservation predominantly focus on image segmentation techniques as a vital tool for facilitating mural restoration and [...] Read more.
The gradual deterioration of fresco pictorial information presents a formidable obstacle for conservators dedicated to protecting humanity’s shared cultural legacy. Currently, scholars in the field of mural conservation predominantly focus on image segmentation techniques as a vital tool for facilitating mural restoration and protection. However, the existing image segmentation methods frequently fall short of delivering optimal segmentation results. To address this issue, this study introduces a novel mural image segmentation approach termed NDFNGO, which integrates a nonlinear differential learning strategy, a decay factor, and a Fractional-order adaptive learning strategy into the Northern Goshawk Optimization (NGO) algorithm to enhance segmentation performance. Firstly, the nonlinear differential learning strategy is incorporated to harness the diversity and adaptability of differential tactics, thereby augmenting the algorithm’s global exploration capabilities and effectively improving its ability to pinpoint optimal segmentation threshold regions. Secondly, drawing on the properties of nonlinear functions, a decay factor is proposed to achieve a more harmonious balance between the exploration and exploitation phases. Finally, by integrating historical individual data, the Fractional-order adaptive learning strategy is employed to reinforce the algorithm’s exploitation capabilities, thereby further refining the quality of image segmentation. Subsequently, the proposed method was evaluated through tests on twelve mural image segmentation tasks. The results indicate that the NDFNGO algorithm achieves victory rates of 95.85%, 97.9%, 97.9%, and 95.8% in terms of the fitness function metric, PSNR metric, SSIM metric, and FSIM metric, respectively. These findings demonstrate the algorithm’s high performance in mural image segmentation, as it retains a significant amount of original image information, thereby underscoring the superiority of the technology proposed in this study for addressing this challenge. Full article
Show Figures

Figure 1

Back to TopTop