Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,752)

Search Parameters:
Keywords = text generation models

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 259 KB  
Article
Combating Economic Disinformation with AI: Insights from the EkonInfoChecker Project
by Vesna Buterin, Dragan Čišić and Ivan Gržeta
FinTech 2025, 4(4), 60; https://doi.org/10.3390/fintech4040060 (registering DOI) - 1 Nov 2025
Abstract
Economic disinformation causes significant harm, resulting in substantial losses for the global economy. Each year, it is estimated that around USD 78 billion is lost due to the spread of false or misleading information, with a major share stemming from stock market fluctuations [...] Read more.
Economic disinformation causes significant harm, resulting in substantial losses for the global economy. Each year, it is estimated that around USD 78 billion is lost due to the spread of false or misleading information, with a major share stemming from stock market fluctuations and misguided decisions. In Croatia, the rapid spread of economic misinformation further threatens decision-making and institutional credibility. The EkonInfoChecker project was established to address this issue by combining human fact-checking with AI-based detection. This paper presents the project’s AI component, which adapts English-language datasets (FakeNews Corpus 1.0 and WELFake) into Croatian, yielding over 170,000 articles in economics, finance, and business. We trained and evaluated six models—FastText, NBSVM, BiGRU, BERT, DistilBERT, and the Croatian-specific BERTić—using precision, recall, F1-score, and ROC-AUC. Results show that transformer-based models consistently outperform traditional approaches, with BERTić achieving the highest accuracy, reflecting its advantage as a language-specific model. The study demonstrates that AI can effectively support fact-checking by pre-screening economic content and flagging high-risk items for human review. However, limitations include reliance on translated datasets, reduced performance on complex categories such as satire and pseudoscience, and challenges in generalizing to real-time Croatian media. These findings underscore the need for native datasets, hybrid human-AI workflows, and governance aligned with the EU AI Act. Full article
16 pages, 416 KB  
Article
From Claims to Stance: Zero-Shot Detection with Pragmatic-Aware Multi-Agent Reasoning
by Zhiyu Xie, Fuqiang Niu, Genan Dai and Bowen Zhang
Electronics 2025, 14(21), 4298; https://doi.org/10.3390/electronics14214298 (registering DOI) - 31 Oct 2025
Abstract
Stance detection aims to identify whether a text expresses a favorable, opposing, or neutral attitude toward a given target and has become increasingly important for analyzing public discourse on social media. Existing approaches, ranging from supervised neural models to prompt-based large language models [...] Read more.
Stance detection aims to identify whether a text expresses a favorable, opposing, or neutral attitude toward a given target and has become increasingly important for analyzing public discourse on social media. Existing approaches, ranging from supervised neural models to prompt-based large language models (LLMs), face two persistent challenges: the scarcity of annotated stance data across diverse targets and the difficulty of generalizing to unseen targets under pragmatic and rhetorical variation. To address these issues, we propose PAMR (Pragmatic-Aware Multi-Agent Reasoning), a zero-shot stance detection framework that decomposes stance inference into structured reasoning steps. PAMR orchestrates three LLM-driven agents—a linguistic parser that extracts pragmatic markers and canonicalizes claims, an NLI-based estimator that produces calibrated stance probabilities through consensus voting, and a counterfactual and view-switching auditor that probes robustness under controlled rewrites. A stability-aware fusion integrates these signals, conservatively abstaining when evidence is uncertain or inconsistent. Experiments on SemEval-2016 and COVID-19-Stance show that PAMR achieves macro-F1 scores of 71.9% and 73.0%, surpassing strong zero-shot baselines (FOLAR and LogiMDF) by +2.0% and +3.1%. Ablation results confirm that pragmatic cues and counterfactual reasoning substantially enhance robustness and interpretability, underscoring the value of explicit reasoning and pragmatic awareness for reliable zero-shot stance detection on social media. Full article
(This article belongs to the Special Issue Advances in Social Bots)
17 pages, 1397 KB  
Article
A Novel Approach for Reliable Classification of Marine Low Cloud Morphologies with Vision–Language Models
by Ehsan Erfani and Farnaz Hosseinpour
Atmosphere 2025, 16(11), 1252; https://doi.org/10.3390/atmos16111252 (registering DOI) - 31 Oct 2025
Abstract
Marine low clouds have a strong impact on Earth’s system but remain a major source of uncertainty in anthropogenic radiative forcing simulated by general circulation models. This uncertainty arises from incomplete understanding of the many processes controlling their evolution and interactions. A key [...] Read more.
Marine low clouds have a strong impact on Earth’s system but remain a major source of uncertainty in anthropogenic radiative forcing simulated by general circulation models. This uncertainty arises from incomplete understanding of the many processes controlling their evolution and interactions. A key feature of these clouds is their diverse mesoscale morphologies, which are closely tied to their microphysical and radiative properties but remain difficult to characterize with satellite retrievals and numerical models. Here, we develop and apply a vision–language model (VLM) to classify marine low cloud morphologies using two independent datasets based on Moderate Resolution Imaging Spectroradiometer (MODIS) satellite imagery: (1) mesoscale cellular convection types of sugar, gravel, fish, and flower (SGFF; 8800 total samples) and (2) marine stratocumulus (Sc) types of stratus, closed cells, open cells, and other cells (260 total samples). By conditioning frozen image encoders on descriptive prompts, the VLM leverages multimodal priors learned from large-scale image–text training, making it less sensitive to limited sample size. Results show that the k-fold cross-validation of VLM achieves an overall accuracy of 0.84 for SGFF, comparable to prior deep learning benchmarks for the same cloud types, and retains robust performance under the reduction in SGFF training size. For the Sc dataset, the VLM attains 0.86 accuracy, whereas the image-only model is unreliable under such a limited training set. These findings highlight the potential of VLMs as efficient and accurate tools for cloud classification under very low samples, offering new opportunities for satellite remote sensing and climate model evaluation. Full article
Show Figures

Figure 1

14 pages, 3066 KB  
Article
Unpaired Image Captioning via Cross-Modal Semantic Alignment
by Yong Yang, Kai Zhou and Ge Ren
Appl. Sci. 2025, 15(21), 11588; https://doi.org/10.3390/app152111588 - 30 Oct 2025
Viewed by 67
Abstract
Image captioning, as a representative cross-modal task, faces significant challenges, including high annotation costs and modality alignment difficulties. To address these issues, this paper proposes CMSA, an image captioning framework that does not require paired image-text data. The framework integrates a generator, a [...] Read more.
Image captioning, as a representative cross-modal task, faces significant challenges, including high annotation costs and modality alignment difficulties. To address these issues, this paper proposes CMSA, an image captioning framework that does not require paired image-text data. The framework integrates a generator, a discriminator, and a reward module, employing a collaborative multi-module optimization strategy to enhance caption quality. The generator builds multi-level joint feature representations based on a contrastive language-image pretraining model, effectively mitigating the modality alignment problem and guiding the language model to generate text highly consistent with image semantics. The discriminator learns linguistic styles from external corpora and evaluates textual naturalness, providing critical reward signals to the generator. The reward module combines image-text relevance and textual quality metrics, optimizing the generator parameters through reinforcement learning to further improve semantic accuracy and language expressiveness. CMSA adopts a progressive multi-stage training strategy that, combined with joint feature modeling and reinforcement learning mechanisms, significantly reduces reliance on costly annotated data. Experimental results demonstrate that CMSA significantly outperforms existing methods across multiple evaluation metrics on the MSCOCO and Flickr30k datasets, exhibiting superior performance and strong cross-dataset generalization ability. Full article
Show Figures

Figure 1

38 pages, 23830 KB  
Article
Improving Audio Steganography Transmission over Various Wireless Channels
by Azhar A. Hamdi, Asmaa A. Eyssa, Mahmoud I. Abdalla, Mohammed ElAffendi, Ali Abdullah S. AlQahtani, Abdelhamied A. Ateya and Rania A. Elsayed
J. Sens. Actuator Netw. 2025, 14(6), 106; https://doi.org/10.3390/jsan14060106 - 30 Oct 2025
Viewed by 147
Abstract
Ensuring the security and privacy of confidential data during transmission is a critical challenge, necessitating advanced techniques to protect against unwarranted disclosures. Steganography, a concealment technique, enables secret information to be embedded in seemingly harmless carriers such as images, audio, and video. This [...] Read more.
Ensuring the security and privacy of confidential data during transmission is a critical challenge, necessitating advanced techniques to protect against unwarranted disclosures. Steganography, a concealment technique, enables secret information to be embedded in seemingly harmless carriers such as images, audio, and video. This work proposes two secure audio steganography models based on the least significant bit (LSB) and discrete wavelet transform (DWT) techniques for concealing different types of multimedia data (i.e., text, image, and audio) in audio files, representing an enhancement of current research that tends to focus on embedding a single type of multimedia data. The first model (secured model (1)) focuses on high embedding capacity, while the second model (secured model (2)) focuses on improved security. The performance of the two proposed secure models was tested under various conditions. The models’ robustness was greatly enhanced using convolutional encoding with binary phase shift keying (BPSK). Experimental results indicated that the correlation coefficient (Cr) of the extracted secret audio in secured model (1) increased by 18.88% and by 16.18% in secured model (2) compared to existing methods. In addition, the Cr of the extracted secret image in secured model (1) was improved by 0.1% compared to existing methods. The peak signal-to-noise ratio (PSNR) of the steganography audio of secured model (1) was improved by 49.95% and 14.44% compared to secured model (2) and previous work, respectively. Furthermore, both models were evaluated in an orthogonal frequency division multiplexing (OFDM) system over various wireless channels, i.e., Additive White Gaussian Noise (AWGN), fading, and SUI-6 channels. In order to enhance the system performance, OFDM was combined with differential phase shift keying (DPSK) modulation and convolutional coding. The results demonstrate that secured model (1) is highly immune to noise generated by wireless channels and is the optimum technique for secure audio steganography on noisy communication channels. Full article
Show Figures

Figure 1

22 pages, 1329 KB  
Article
Voices of Researchers: Ethics and Artificial Intelligence in Qualitative Inquiry
by Juan Luis Cabanillas-García, María Cruz Sánchez-Gómez and Irene del Brío-Alonso
Information 2025, 16(11), 938; https://doi.org/10.3390/info16110938 - 28 Oct 2025
Viewed by 281
Abstract
The rapid emergence of Generative Artificial Intelligence (GenAI) has sparked a growing debate about its ethical, methodological, and epistemological implications for qualitative research. This study aimed to examine and deeply understand researchers’ perceptions regarding the use of GenAI tools in different phases of [...] Read more.
The rapid emergence of Generative Artificial Intelligence (GenAI) has sparked a growing debate about its ethical, methodological, and epistemological implications for qualitative research. This study aimed to examine and deeply understand researchers’ perceptions regarding the use of GenAI tools in different phases of the qualitative research process. The study involved a sample of 214 researchers from diverse disciplinary areas, with publications indexed in Web of Science or Scopus that apply qualitative methods. Data collection was conducted using an open-ended questionnaire, and analysis was carried out using coding and thematic analysis procedures, which allowed us to identify patterns of perception, user experiences, and barriers. The findings show that, while GenAI is valued for its ability to optimize tasks such as corpus organization, initial coding, transcription, translation, and information synthesis, its implementation raises concerns regarding privacy, consent, authorship, the reliability of results, and the loss of interpretive depth. Furthermore, a dual ecosystem is observed, where some researchers already incorporate it, mainly generative text assistants like ChatGPT, while others have yet to use it or are unfamiliar with it. Overall, the results suggest that the most solid path is an assisted model, supported by clear ethical frameworks, adapted methodological guidelines, and critical training for responsible and humanistic use. Full article
(This article belongs to the Special Issue Generative AI Technologies: Shaping the Future of Higher Education)
Show Figures

Figure 1

20 pages, 690 KB  
Article
VLM-as-a-Judge Approaches for Evaluating Visual Narrative Coherence in Historical Photographical Records
by Brian Keith, Claudio Meneses, Mauricio Matus, María Constanza Castro and Diego Urrutia
Electronics 2025, 14(21), 4199; https://doi.org/10.3390/electronics14214199 - 27 Oct 2025
Viewed by 197
Abstract
Evaluating the coherence of visual narrative sequences extracted from image collections remains a challenge in digital humanities and computational journalism. While mathematical coherence metrics based on visual embeddings provide objective measures, they require computational resources and technical expertise to interpret. We propose using [...] Read more.
Evaluating the coherence of visual narrative sequences extracted from image collections remains a challenge in digital humanities and computational journalism. While mathematical coherence metrics based on visual embeddings provide objective measures, they require computational resources and technical expertise to interpret. We propose using vision-language models (VLMs) as judges to evaluate visual narrative coherence, comparing two approaches: caption-based evaluation that converts images to text descriptions and direct vision evaluation that processes images without intermediate text generation. Through experiments on 126 narratives from historical photographs, we show that both approaches achieve weak-to-moderate correlations with mathematical coherence metrics (r = 0.28–0.36) while differing in reliability and efficiency. Direct VLM evaluation achieves higher inter-rater reliability (ICC()=0.718 vs. 0.339) but requires 10.8× more computation time after initial caption generation. Both methods successfully discriminate between human-curated, algorithmically extracted, and random narratives, with all pairwise comparisons achieving statistical significance (p<0.05, with five of six comparisons at p<0.001). Human sequences consistently score highest, followed by algorithmic extractions, then random sequences. Our findings indicate that the choice between approaches depends on application requirements: caption-based for efficient large-scale screening versus direct vision for consistent curatorial assessment. Full article
(This article belongs to the Special Issue Artificial Intelligence-Driven Emerging Applications)
Show Figures

Figure 1

26 pages, 3155 KB  
Article
Symmetry and Asymmetry in Pre-Trained Transformer Models: A Comparative Study of TinyBERT, BERT, and RoBERTa for Chinese Educational Text Classification
by Munire Muhetaer, Xiaoyan Meng, Jing Zhu, Aixiding Aikebaier, Liyaer Zu and Yawen Bai
Symmetry 2025, 17(11), 1812; https://doi.org/10.3390/sym17111812 - 27 Oct 2025
Viewed by 283
Abstract
With the advancement of educational informatization, vast amounts of Chinese text are generated across online platforms and digital textbooks. Effectively classifying such text is essential for intelligent education systems. This study conducts a systematic comparative evaluation of three Transformer-based models—TinyBERT-4L, BERT-base-Chinese, and RoBERTa-wwm-ext—for [...] Read more.
With the advancement of educational informatization, vast amounts of Chinese text are generated across online platforms and digital textbooks. Effectively classifying such text is essential for intelligent education systems. This study conducts a systematic comparative evaluation of three Transformer-based models—TinyBERT-4L, BERT-base-Chinese, and RoBERTa-wwm-ext—for Chinese educational text classification. Using a balanced four-category subset of the THUCNews corpus (Education, Technology, Finance, and Stock), the research investigates the trade-off between classification effectiveness and computational efficiency under a unified experimental framework. The experimental results show that RoBERTa-wwm-ext achieves the highest effectiveness (93.12% Accuracy, 93.08% weighted F1), validating the benefits of whole-word masking and extended pre-training. BERT-base-Chinese maintains a balanced performance (91.74% Accuracy, 91.66% F1) with moderate computational demand. These findings reveal a clear symmetry–asymmetry dynamic: structural symmetry arises from the shared Transformer encoder and identical fine-tuning setup, while asymmetry emerges from differences in model scale and pre-training strategy. This interplay leads to distinct accuracy–latency trade-offs, providing practical guidance for deploying pre-trained language models in resource-constrained intelligent education systems. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

31 pages, 2985 KB  
Article
Heterogeneous Ensemble Sentiment Classification Model Integrating Multi-View Features and Dynamic Weighting
by Song Yang, Jiayao Xing, Zongran Dong and Zhaoxia Liu
Electronics 2025, 14(21), 4189; https://doi.org/10.3390/electronics14214189 - 27 Oct 2025
Viewed by 225
Abstract
With the continuous growth of user reviews, identifying underlying sentiment across multi-source texts efficiently and accurately has become a significant challenge in NLP. Traditional single models in cross-domain sentiment analysis often exhibit insufficient stability, limited generalization capabilities, and sensitivity to class imbalance. Existing [...] Read more.
With the continuous growth of user reviews, identifying underlying sentiment across multi-source texts efficiently and accurately has become a significant challenge in NLP. Traditional single models in cross-domain sentiment analysis often exhibit insufficient stability, limited generalization capabilities, and sensitivity to class imbalance. Existing ensemble methods predominantly rely on static weighting or voting strategies among homogeneous models, failing to fully leverage the complementary advantages between models. To address these issues, this study proposes a heterogeneous ensemble sentiment classification model integrating multi-view features and dynamic weighting. At the feature learning layer, the model constructs three complementary base learners, a RoBERTa-FC for extracting global semantic features, a BERT-BiGRU for capturing temporal dependencies, and a TextCNN-Attention for focusing on local semantic features, thereby achieving multi-level text representation. At the decision layer, a meta-learner is used to fuse multi-view features, and dynamic uncertainty weighting and attention weighting strategies are employed to adaptively adjust outputs from different base learners. Experimental results across multiple domains demonstrate that the proposed model consistently outperforms single learners and comparison methods in terms of Accuracy, Precision, Recall, F1 Score, and Macro-AUC. On average, the ensemble model achieves a Macro-AUC of 0.9582 ± 0.023 across five datasets, with an Accuracy of 0.9423, an F1 Score of 0.9590, and a Macro-AUC of 0.9797 on the AlY_ds dataset. Moreover, in cross-dataset ranking evaluation based on equally weighted metrics, the model consistently ranks within the top two, confirming its superior cross-domain adaptability and robustness. These findings highlight the effectiveness of the proposed framework in enhancing sentiment classification performance and provide valuable insights for future research on lightweight dynamic ensembles, multilingual, and multimodal applications. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

22 pages, 1550 KB  
Article
Leveraging RAG with ACP & MCP for Adaptive Intelligent Tutoring
by Horia Alexandru Modran
Appl. Sci. 2025, 15(21), 11443; https://doi.org/10.3390/app152111443 - 26 Oct 2025
Viewed by 455
Abstract
This paper presents a protocol-driven hybrid architecture that integrates Retrieval-Augmented Generation (RAG) with two complementary protocols—A Model Context Protocol (MCP) and an Agent Communication Protocol (ACP)—to deliver adaptive, transparent, and interoperable intelligent tutoring for higher-education STEM courses. MCP stores, fuses, and exposes session-, [...] Read more.
This paper presents a protocol-driven hybrid architecture that integrates Retrieval-Augmented Generation (RAG) with two complementary protocols—A Model Context Protocol (MCP) and an Agent Communication Protocol (ACP)—to deliver adaptive, transparent, and interoperable intelligent tutoring for higher-education STEM courses. MCP stores, fuses, and exposes session-, task- and course-level context (learning goals, prior errors, instructor flags, and policy constraints), while ACP standardizes multipart messaging and orchestration among specialized tutor agents (retrievers, context managers, pedagogical policy agents, execution tools, and generators). A Python prototype indexes curated course materials (two course corpora: a text-focused PDF and a multimodal PDF/transcript corpus) into a vector store and applies MCP-mediated re-ranking (linear fusion of semantic similarity, MCP relevance, instructor tags, and recency) before RAG prompt assembly. In a held-out evaluation (240 annotated QA pairs) and human studies (36 students, 12 instructors), MCP-aware re-ranking improved Recall@1, increased citation fidelity, reduced unsupported numerical claims, and raised human ratings for factuality and pedagogical appropriateness. Case studies demonstrate improved context continuity, scaffolded hinting under instructor policies, and useful multimodal grounding. The paper concludes that the ACP–MCP–RAG combination enables more trustworthy, auditable, and pedagogically aligned tutoring agents and outlines directions for multimodal extensions, learned re-rankers, and large-scale institutional deployment. Full article
(This article belongs to the Special Issue Applied Machine Learning for Information Retrieval)
Show Figures

Figure 1

29 pages, 2242 KB  
Systematic Review
Artificial Intelligence for Optimizing Solar Power Systems with Integrated Storage: A Critical Review of Techniques, Challenges, and Emerging Trends
by Raphael I. Areola, Abayomi A. Adebiyi and Katleho Moloi
Electricity 2025, 6(4), 60; https://doi.org/10.3390/electricity6040060 - 25 Oct 2025
Viewed by 526
Abstract
The global transition toward sustainable energy has significantly accelerated the deployment of solar power systems. Yet, the inherent variability of solar energy continues to present considerable challenges in ensuring its stable and efficient integration into modern power grids. As the demand for clean [...] Read more.
The global transition toward sustainable energy has significantly accelerated the deployment of solar power systems. Yet, the inherent variability of solar energy continues to present considerable challenges in ensuring its stable and efficient integration into modern power grids. As the demand for clean and dependable energy sources intensifies, the integration of artificial intelligence (AI) with solar systems, particularly those coupled with energy storage, has emerged as a promising and increasingly vital solution. It explores the practical applications of machine learning (ML), deep learning (DL), fuzzy logic, and emerging generative AI models, focusing on their roles in areas such as solar irradiance forecasting, energy management, fault detection, and overall operational optimisation. Alongside these advancements, the review also addresses persistent challenges, including data limitations, difficulties in model generalization, and the integration of AI in real-time control scenarios. We included peer-reviewed journal articles published between 2015 and 2025 that apply AI methods to PV + ESS, with empirical evaluation. We excluded studies lacking evaluation against baselines or those focusing solely on PV or ESS in isolation. We searched IEEE Xplore, Scopus, Web of Science, and Google Scholar up to 1 July 2025. Two reviewers independently screened titles/abstracts and full texts; disagreements were resolved via discussion. Risk of bias was assessed with a custom tool evaluating validation method, dataset partitioning, baseline comparison, overfitting risk, and reporting clarity. Results were synthesized narratively by grouping AI techniques (forecasting, MPPT/control, dispatch, data augmentation). We screened 412 records and included 67 studies published between 2018 and 2025, following a documented PRISMA process. The review revealed that AI-driven techniques significantly enhance performance in solar + battery energy storage system (BESS) applications. In solar irradiance and PV output forecasting, deep learning models in particular, long short-term memory (LSTM) and hybrid convolutional neural network–LSTM (CNN–LSTM) architectures repeatedly outperform conventional statistical methods, obtaining significantly lower Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and higher R-squared. Smarter energy dispatch and market-based storage decisions are made possible by reinforcement learning and deep reinforcement learning frameworks, which increase economic returns and lower curtailment risks. Furthermore, hybrid metaheuristic–AI optimisation improves control tuning and system sizing with increased efficiency and convergence. In conclusion, AI enables transformative gains in forecasting, dispatch, and optimisation for solar-BESSs. Future efforts should focus on explainable, robust AI models, standardized benchmark datasets, and real-world pilot deployments to ensure scalability, reliability, and stakeholder trust. Full article
Show Figures

Figure 1

37 pages, 10732 KB  
Review
Advances on Multimodal Remote Sensing Foundation Models for Earth Observation Downstream Tasks: A Survey
by Guoqing Zhou, Lihuang Qian and Paolo Gamba
Remote Sens. 2025, 17(21), 3532; https://doi.org/10.3390/rs17213532 - 24 Oct 2025
Viewed by 338
Abstract
Remote sensing foundation models (RSFMs) have demonstrated excellent feature extraction and reasoning capabilities under the self-supervised learning paradigm of “unlabeled datasets—model pre-training—downstream tasks”. These models achieve superior accuracy and performance compared to existing models across numerous open benchmark datasets. However, when confronted with [...] Read more.
Remote sensing foundation models (RSFMs) have demonstrated excellent feature extraction and reasoning capabilities under the self-supervised learning paradigm of “unlabeled datasets—model pre-training—downstream tasks”. These models achieve superior accuracy and performance compared to existing models across numerous open benchmark datasets. However, when confronted with multimodal data, such as optical, LiDAR, SAR, text, video, and audio, the RSFMs exhibit limitations in cross-modal generalization and multi-task learning. Although several reviews have addressed the RSFMs, there is currently no comprehensive survey dedicated to vision–X (vision, language, audio, position) multimodal RSFMs (MM-RSFMs). To tackle this gap, this article provides a systematic review of MM-RSFMs from a novel perspective. Firstly, the key technologies underlying MM-RSFMs are reviewed and analyzed, and the available multimodal RS pre-training datasets are summarized. Then, recent advances in MM-RSFMs are classified according to the development of backbone networks and cross-modal interaction methods of vision–X, such as vision–vision, vision–language, vision–audio, vision–position, and vision–language–audio. Finally, potential challenges are analyzed, and perspectives for MM-RSFMs are outlined. This survey from this paper reveals that current MM-RSFMs face the following key challenges: (1) a scarcity of high-quality multimodal datasets, (2) limited capability for multimodal feature extraction, (3) weak cross-task generalization, (4) absence of unified evaluation criteria, and (5) insufficient security measures. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

15 pages, 1823 KB  
Article
Improved Quadtree-Based Selection of Single Images for 3D Generation
by Wanyun Li, Yue Liu, Yuqiang Fang, Yasheng Zhang, Yao Lu and Gege Sun
Sensors 2025, 25(21), 6559; https://doi.org/10.3390/s25216559 - 24 Oct 2025
Viewed by 393
Abstract
With the rapid development of large generative models for 3D content, image-to-3D and text-to-3D generation has become a major focus in computer vision and graphics. Single-view 3D reconstruction, in particular, offers a convenient and practical solution. However, the way to automatically choose the [...] Read more.
With the rapid development of large generative models for 3D content, image-to-3D and text-to-3D generation has become a major focus in computer vision and graphics. Single-view 3D reconstruction, in particular, offers a convenient and practical solution. However, the way to automatically choose the best image from a large collection to optimize reconstruction quality and efficiency is very important. This paper proposes a novel image selection framework based on multi-feature fusion quadtree structure. Here, we introduce a new image selection method based on a multi-feature quadtree structure. Our approach integrates various visual and semantic features and uses a hierarchical quadtree to efficiently evaluate image content. This allows us to identify the most informative and reconstruction-friendly image from large datasets. We then use Tencent’s Hunyuan 3D model to verify that the selected image improves reconstruction performance. Experimental results show that our method outperforms existing approaches across key metrics. Baseline methods achieved average scores of 6.357 in Accuracy, 6.967 in Completeness, and 6.662 Overall. Our method reduced these to 4.238, 5.166, and 4.702, corresponding to an average error reduction of 29.5%. These results confirm that our approach reduces reconstruction errors, improves geometric consistency, and yields more visually plausible 3D models. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

34 pages, 385 KB  
Review
Machine Learning in MRI Brain Imaging: A Review of Methods, Challenges, and Future Directions
by Martyna Ottoni, Anna Kasperczuk and Luis M. N. Tavora
Diagnostics 2025, 15(21), 2692; https://doi.org/10.3390/diagnostics15212692 - 24 Oct 2025
Viewed by 462
Abstract
In recent years, machine learning (ML) has been increasingly used in many fields, including medicine. Magnetic resonance imaging (MRI) is a non-invasive and effective diagnostic technique; however, manual image analysis is time-consuming and prone to human variability. In response, ML models have been [...] Read more.
In recent years, machine learning (ML) has been increasingly used in many fields, including medicine. Magnetic resonance imaging (MRI) is a non-invasive and effective diagnostic technique; however, manual image analysis is time-consuming and prone to human variability. In response, ML models have been developed to support MRI analysis, particularly in segmentation and classification tasks. This work presents an updated narrative review of ML applications in brain MRI, with a focus on tumor classification and segmentation. A literature search was conducted in PubMed and Scopus databases and Mendeley Catalog (MC)—a publicly accessible bibliographic catalog linked to Elsevier’s Scopus indexing system—covering the period from January 2020 to April 2025. The included studies focused on patients with primary or secondary brain neoplasms and applied machine learning techniques to MRI data for classification or segmentation purposes. Only original research articles written in English and reporting model validation were considered. Studies using animal models, non-imaging data, lacking proper validation, or without accessible full texts (e.g., abstract-only records or publications unavailable through institutional access) were excluded. In total, 108 studies met all inclusion criteria and were analyzed qualitatively. In general, models based on convolutional neural networks (CNNs) were found to dominate current research due to their ability to extract spatial features directly from imaging data. Reported classification accuracies ranged from 95% to 99%, while Dice coefficients for segmentation tasks varied between 0.83 and 0.94. Hybrid architectures (e.g., CNN-SVM, CNN-LSTM) achieved strong results in both classification and segmentation tasks, with accuracies above 95% and Dice scores around 0.90. Transformer-based models, such as the Swin Transformer, reached the highest performance, up to 99.9%. Despite high reported accuracy, challenges remain regarding overfitting, generalization to real-world clinical data, and lack of standardized evaluation protocols. Transfer learning and data augmentation were frequently applied to mitigate limited data availability, while radiomics-based models introduced new avenues for personalized diagnostics. ML has demonstrated substantial potential in enhancing brain MRI analysis and supporting clinical decision-making. Nevertheless, further progress requires rigorous clinical validation, methodological standardization, and comparative benchmarking to bridge the gap between research settings and practical deployment. Full article
(This article belongs to the Special Issue Brain/Neuroimaging 2025–2026)
27 pages, 5184 KB  
Article
Making Smart Cities Human-Centric: A Framework for Dynamic Resident Demand Identification and Forecasting
by Wen Zhang, Bin Guo, Wei Zhao, Yutong He and Xinyu Wang
Sustainability 2025, 17(21), 9423; https://doi.org/10.3390/su17219423 - 23 Oct 2025
Viewed by 328
Abstract
Smart cities offer new opportunities for urban governance and sustainable development. However, at the current stage, the construction and development of smart cities generally exhibit a technology-driven tendency, neglecting real resident demand, which contradicts the “human-centric” principle. Traditional top-down methods of demand collection [...] Read more.
Smart cities offer new opportunities for urban governance and sustainable development. However, at the current stage, the construction and development of smart cities generally exhibit a technology-driven tendency, neglecting real resident demand, which contradicts the “human-centric” principle. Traditional top-down methods of demand collection struggle to capture the dynamics and heterogeneity of public demand. At the same time, government service platforms, as one dimension of smart city construction, have accumulated massive amounts of user-generated data, providing new solutions for this challenge. This paper aims to construct a big data-driven analytical framework for dynamically identifying and accurately forecasting core resident demand. The study uses Xi’an City, Shaanxi Province, China, as a case study, utilising user messages from People.cn spanning 2011 to 2023. These messages cover various domains, including urban construction, healthcare, education, and transportation, as the data source. The People.cn message board is China’s most significant nationwide online political platform. Its institutionalised feedback mechanism ensures data content focuses on highly representative specific grievances, rather than the broad emotional expressions on social media. The study employs user messages from People.cn from 2011 to 2023 as its data source, encompassing urban construction, healthcare, education, and transportation. First, a large language model (LLM) was used to preprocess and clean the raw data. Subsequently, the BERTopic model was applied to identify ten core demand themes and construct their monthly time series, thereby overcoming the limitations of traditional methods in short-text semantic recognition. Finally, by integrating variational mode decomposition (VMD) with support vector machines (SVMs), a hybrid demand forecasting model was established to mitigate the risk of overfitting in deep learning when forecasting small-sample time series. The empirical results show that the proposed LLM-BERTopic-VMD-SVM framework exhibits excellent performance, with the goodness-of-fit (R2) on various demand themes ranging from 0.93 to 0.96. This study proposes an effective analytical framework for identifying and forecasting resident demand. It provides a decision-support tool for city managers to achieve proactive and fine-grained governance, thereby offering a viable empirical pathway to promote the transformation of smart cities from technology-centric to human-centric. Full article
Show Figures

Figure 1

Back to TopTop