Journal Description
Information
Information
is a scientific, peer-reviewed, open access journal of information science and technology, data, knowledge, and communication, published monthly online by MDPI. The International Society for the Study of Information (IS4SI) is affiliated with Information and its members receive discounts on the article processing charges.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), Ei Compendex, dblp, and other databases.
- Journal Rank: JCR - Q2 (Computer Science, Information Systems) / CiteScore - Q2 (Information Systems)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 20.9 days after submission; acceptance to publication is undertaken in 3.6 days (median values for papers published in this journal in the second half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
- Journal Cluster of Information Systems and Technology: Analytics, Applied System Innovation, Cryptography, Data, Digital, Informatics, Information, Journal of Cybersecurity and Privacy and Multimedia.
Impact Factor:
2.9 (2024);
5-Year Impact Factor:
3.0 (2024)
Latest Articles
Hierarchical Knowledge Distillation for Efficient Model Compression and Transfer: A Multi-Level Aggregation Approach
Information 2026, 17(1), 70; https://doi.org/10.3390/info17010070 (registering DOI) - 12 Jan 2026
Abstract
The success of large-scale deep learning models in remote sensing tasks has been transformative, enabling significant advances in image classification, object detection, and image–text retrieval. However, their computational and memory demands pose challenges for deployment in resource-constrained environments. Knowledge distillation (KD) alleviates these
[...] Read more.
The success of large-scale deep learning models in remote sensing tasks has been transformative, enabling significant advances in image classification, object detection, and image–text retrieval. However, their computational and memory demands pose challenges for deployment in resource-constrained environments. Knowledge distillation (KD) alleviates these issues by transferring knowledge from a strong teacher to a student model, which can be compact for efficient deployment or architecturally matched to improve accuracy under the same inference budget. In this paper, we introduce Hierarchical Multi-Segment Knowledge Distillation (HIMS_KD), a multi-stage framework that sequentially distills knowledge from a teacher into multiple assistant models specialized in low-, mid-, and high-level representations, and then aggregates their knowledge into the final student. We integrate feature-level alignment, auxiliary similarity-logit alignment, and supervised loss during distillation. Experiments on benchmark remote sensing datasets (RSITMD and RSICD) show that HIMS_KD improves retrieval performance and enhances zero-shot classification; and when a compact student is used, it reduces deployment cost while retaining strong accuracy.
Full article
(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)
►
Show Figures
Open AccessArticle
Efficient Quantization of Pretrained Deep Networks via Adaptive Block Transform Coding
by
Milan Dubljanin, Stefan Panić, Milan Savić, Milan Dejanović and Oliver Popović
Information 2026, 17(1), 69; https://doi.org/10.3390/info17010069 (registering DOI) - 12 Jan 2026
Abstract
This work investigates the effectiveness of block transform coding (BTC) as a lightweight, training-free quantization strategy for compressing the weights of pretrained deep neural networks. The proposed method applies a rule-based block transform with variance and root mean square error (RMSE)-driven stopping criteria,
[...] Read more.
This work investigates the effectiveness of block transform coding (BTC) as a lightweight, training-free quantization strategy for compressing the weights of pretrained deep neural networks. The proposed method applies a rule-based block transform with variance and root mean square error (RMSE)-driven stopping criteria, enabling substantial reductions in bit precision while preserving the statistical structure of convolutional and fully connected layer weights. Unlike uniform 8-bit quantization, BTC dynamically adjusts bit usage across layers and achieves significantly lower distortion for the same compression budget. We evaluate BTC across many pretrained architectures and tabular benchmarks. Experimental results show that BTC consistently reduces storage to 4–7.7 bits per weight while maintaining accuracy within 2–3% of the 32-bit floating point (FP32) baseline. To further assess scalability and baseline strength, BTC is additionally evaluated on large-scale ImageNet models and compared against a calibrated percentile-based uniform post-training quantization method. The results show that BTC achieves a substantially lower effective bit-width while incurring only a modest accuracy reduction relative to calibration-aware 8-bit quantization, highlighting a favorable compression–accuracy trade-off. BTC also exhibits stable behavior across successive post-training quantization (PTQ) configurations, low quantization noise, and smooth RMSE trends, outperforming naïve uniform quantization under aggressive compression. These findings confirm that BTC provides a scalable, architecture-agnostic, and training-free quantization mechanism suitable for deployment in memory- and computing-constrained environments.
Full article
(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)
►▼
Show Figures

Graphical abstract
Open AccessArticle
A Machine Learning-Based AQM to Synergize Heterogeneous Congestion Control Algorithms
by
Ya Gao, Yunji Li and Chunjuan Diao
Information 2026, 17(1), 68; https://doi.org/10.3390/info17010068 (registering DOI) - 11 Jan 2026
Abstract
The coexistence of heterogeneous congestion control algorithms causes network unfairness and performance degradation. However, existing solutions suffer from the following issues: poor isolation reduces the overall performance, while sensitivity to tuning complicates deployment. In this work, we propose Warbler, a machine learning-driven active
[...] Read more.
The coexistence of heterogeneous congestion control algorithms causes network unfairness and performance degradation. However, existing solutions suffer from the following issues: poor isolation reduces the overall performance, while sensitivity to tuning complicates deployment. In this work, we propose Warbler, a machine learning-driven active queue management (AQM) framework. Warbler classifies flows based on traffic characteristics and utilizes machine learning to adaptively control the bandwidth allocation to improve fairness. We implemented and evaluated the Warbler prototype on a programmable switch. The experimental results show that Warbler significantly improves the network performance, achieving a near-optimal Jain’s fairness index of 0.99, while reducing the delay to 60% of the baseline, cutting jitter by half, and saving 43% of buffer usage. In terms of scalability, it supports 10,000 concurrent long flows with latency below 0.7 s. The Warbler has a low cost and strong adaptability with no need for precise tuning, demonstrating its potential in dealing with heterogeneous CCAs.
Full article
(This article belongs to the Section Information Systems)
►▼
Show Figures

Figure 1
Open AccessArticle
Breaking the Spatio-Temporal Mismatch: A Preemptive Deep Reinforcement Learning Framework for Misinformation Defense
by
Fulian Yin, Zhiqiang Zhang, Zhenyu Yu, Chang Wu, Junyi Chen and Yuewei Wu
Information 2026, 17(1), 67; https://doi.org/10.3390/info17010067 (registering DOI) - 11 Jan 2026
Abstract
The containment of misinformation diffusion on social media is a critical challenge in computational social science. However, prevailing intervention strategies predominantly rely on static topological metrics or time-agnostic learning models, thereby overlooking the profound impact of temporal–demographic heterogeneity. This oversight frequently results in
[...] Read more.
The containment of misinformation diffusion on social media is a critical challenge in computational social science. However, prevailing intervention strategies predominantly rely on static topological metrics or time-agnostic learning models, thereby overlooking the profound impact of temporal–demographic heterogeneity. This oversight frequently results in a “spatio-temporal mismatch”, where limited intervention resources are misallocated to structurally central but temporarily inactive nodes, particularly during non-stationary propagation bursts driven by exogenous triggers. To bridge this gap, we propose a Spatio-Temporal Deep Reinforcement Learning (ST-DRL) framework for proactive misinformation defense. By seamlessly integrating continuous trigonometric time encoding with demographic-aware Graph Attention Networks, our model explicitly captures the coupling dynamics between group-specific circadian rhythms and event-driven transmission surges. Extensive simulations on heterogeneous networks demonstrate that ST-DRL achieves a Peak Prevalence Reduction of 93.2%, significantly outperforming static heuristics and approaching the theoretical upper bound of oracle-assisted baselines. Crucially, interpretability analysis reveals that the agent autonomously evolves a “Preemptive Strike” strategy—prioritizing the sanitization of high-risk bridge nodes, such as bots, prior to event onsets—thus establishing a new paradigm for predictive rather than reactive network governance.
Full article
(This article belongs to the Special Issue AI and Machine Learning in the Big Data Era: Advanced Algorithms and Real-World Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Weighted Sampling Enclosing Subgraphs-Based Link Prediction in Attributed Graphs
by
Ganglin Hu
Information 2026, 17(1), 66; https://doi.org/10.3390/info17010066 (registering DOI) - 11 Jan 2026
Abstract
Link prediction is a fundamental problem for graphs, which can reveal the potential relationships between users. Graph embedding can easily encode graph structural relations, and heterogeneous attribute features in a continuous vector space, which is effective in link prediction. However, graph embedding methods
[...] Read more.
Link prediction is a fundamental problem for graphs, which can reveal the potential relationships between users. Graph embedding can easily encode graph structural relations, and heterogeneous attribute features in a continuous vector space, which is effective in link prediction. However, graph embedding methods for large-scale graphs suffer high computation and space costs, and sampling enclosing subgraphs is a practical yet efficient way to obtain the most features at the least cost. Nevertheless, the existing sampling techniques may lose essential features when the random sampling number of nodes is not large, as node features are assumed to follow a uniform distribution. In this paper, we propose a novel large-scale graph sampling strategy for link prediction named Weighted Sampling Enclosing subgraphs-based Link prediction (WSEL ) to resolve this issue, which maximumly preserves the structural and attribute features of enclosing subgraphs with less sampling. More specifically, we first extract the feature importance of each node in an enclosing subgraph and then take the node importance as node weight. Then, random walk node sequences are obtained by multiple weighted random walks from a target pair of nodes, generating a weighted sampling of enclosing subgraphs. By leveraging the weighted sampling enclosing subgraphs, WSEL can scale to larger graphs with much less overhead while maintaining some essential information of the original graph. Experiments on real-world datasets demonstrate that our model can scale to larger graphs while maintaining competitive link prediction performance under substantially reduced computational cost.
Full article
(This article belongs to the Special Issue Graph Neural Networks and Transformers for Intelligent Data-Driven Systems)
Open AccessArticle
An SQL Query Description Problem with AI Assistance for an SQL Programming Learning Assistant System
by
Ni Wayan Wardani, Nobuo Funabiki, Htoo Htoo Sandi Kyaw, Zihao Zhu, I Nyoman Darma Kotama, Putu Sugiartawan and I Nyoman Agus Suarya Putra
Information 2026, 17(1), 65; https://doi.org/10.3390/info17010065 - 9 Jan 2026
Abstract
Today, relational databases are widely used in information systems. SQL (structured query language) is taught extensively in universities and professional schools across the globe as a programming language for its data management and accesses. Previously, we have studied a web-based programming learning assistant
[...] Read more.
Today, relational databases are widely used in information systems. SQL (structured query language) is taught extensively in universities and professional schools across the globe as a programming language for its data management and accesses. Previously, we have studied a web-based programming learning assistant system (PLAS) to help novice students learn popular programming languages by themselves through solving various types of exercises. For SQL programming, we have implemented the grammar-concept understanding problem (GUP) and the comment insertion problem (CIP) for its initial studies. In this paper, we propose an SQL Query Description Problem (SDP) as a new exercise type for describing the SQL query to a specified request in a MySQL database system. To reduce teachers’ preparation workloads, we integrate a generative AI-assisted SQL query generator to automatically generate a new SDP instance with a given dataset. An SDP instance consists of a table, a set of questions and corresponding queries. Answer correctness is determined by enhanced string matching against an answer module that includes multiple semantically equivalent canonical queries. For evaluation, we generated 11 SDP instances on basic topics using the generator, where we found that Gemini 3.0 Pro exhibited higher pedagogical consistency compared to ChatGPT-5.0, achieving perfect scores in Sensibleness, Topicality, and Readiness metrics. Then, we assigned the generated instances to 32 undergraduate students at the Indonesian Institute of Business and Technology (INSTIKI). The results showed an average correct answer rate of 95.2% and a mean SUS score of 78, which demonstrates strong initial student performance and system acceptance.
Full article
(This article belongs to the Special Issue Generative AI Transformations in Industrial and Societal Applications)
►▼
Show Figures

Graphical abstract
Open AccessArticle
PhishCluster: Real-Time, Density-Based Discovery of Malicious URL Campaigns from Semantic Embeddings
by
Dimitrios Karapiperis, Georgios Feretzakis and Sarandis Mitropoulos
Information 2026, 17(1), 64; https://doi.org/10.3390/info17010064 - 9 Jan 2026
Abstract
The proliferation of algorithmically generated malicious URLs has overwhelmed traditional threat intelligence systems, necessitating a paradigm shift from reactive, single-instance analysis to proactive, automated campaign discovery. Existing systems excel at finding semantically similar URLs given a known malicious seed but fail to provide
[...] Read more.
The proliferation of algorithmically generated malicious URLs has overwhelmed traditional threat intelligence systems, necessitating a paradigm shift from reactive, single-instance analysis to proactive, automated campaign discovery. Existing systems excel at finding semantically similar URLs given a known malicious seed but fail to provide a real-time, macroscopic view of emerging and evolving attack campaigns from high-velocity data streams. This paper introduces PhishCluster, a novel framework designed to bridge this critical gap. PhishCluster implements a two-phase, online–offline architecture that synergistically combines large-scale Approximate Nearest Neighbor (ANN) search with advanced density-based clustering. The online phase employs an ANN-accelerated maintenance algorithm to process a stream of URL embeddings at unprecedented throughput, summarizing the data into compact, evolving Campaign Micro-Clusters (CMCs). The offline, on-demand phase then applies a hierarchical density-based algorithm to these CMCs, enabling the discovery of arbitrarily shaped, varying-density campaigns without prior knowledge of their number. Our comprehensive experimental evaluation on a synthetic billion-point dataset, designed to mimic real-world campaign dynamics, demonstrates that PhishCluster’s architecture resolves the fundamental trade-off between speed and quality in streaming data analysis. The results validate that PhishCluster achieves an order-of-magnitude improvement in processing throughput over state-of-the-art streaming clustering baselines while simultaneously attaining a superior clustering quality and campaign detection fidelity.
Full article
(This article belongs to the Section Information and Communications Technology)
Open AccessArticle
Unveiling the Impact of Mandatory IP Location Disclosure on Social Media Users’ Shared Emotions: A Regression Discontinuity Analysis Based on Weibo Data
by
Heng Zhang, Aiping Gao, Zhuyu Chen and Xinyuan Lu
Information 2026, 17(1), 63; https://doi.org/10.3390/info17010063 - 9 Jan 2026
Abstract
Social media serves as a vital channel for emotional expression, yet mandatory IP location disclosure raises concerns about how reducing anonymity affects users’ shared emotions, particularly in privacy-sensitive contexts such as mental health discussions. In 2022, all Chinese social media platforms implemented this
[...] Read more.
Social media serves as a vital channel for emotional expression, yet mandatory IP location disclosure raises concerns about how reducing anonymity affects users’ shared emotions, particularly in privacy-sensitive contexts such as mental health discussions. In 2022, all Chinese social media platforms implemented this disclosure feature. This study examines the emotional and behavioral consequences of Sina Weibo’s mandatory IP location disclosure policy, which took effect on 28 April 2022. We collected 193,761 Weibo posts published under the topic of depression from 1 March to 30 June 2022, and applied sentiment analysis combined with regression discontinuity in time (RDiT) to estimate causal effects around the policy threshold. Results indicate that the policy significantly intensified negative emotional expression: the estimated discontinuity is −1.3506 (p < 0.01), meaning posts became more negative immediately after implementation. In contrast, the effect on positive sentiment was comparatively weak and mostly statistically insignificant. Behavioral changes were also observed: both average daily posting volume and average text length are declined. These findings demonstrate that mandatory disclosure can suppress self-disclosure and amplify negative emotional tone in privacy-sensitive settings, offering practical guidance for users, platform designers, and policymakers on implementing transparency features responsibly.
Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification, 2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
Lexicographic Preferences Similarity for Coalition Formation in Complex Markets: Introducing PLPSim, HRECS, ContractLex, PriceLex, F@Lex, and PLPGen
by
Faria Nassiri-Mofakham, Shadi Farid and Katsuhide Fujita
Information 2026, 17(1), 62; https://doi.org/10.3390/info17010062 - 9 Jan 2026
Abstract
►▼
Show Figures
Lexicographic preference trees (LP-Trees) provide a compact and expressive representation for modeling complex decision-making scenarios, yet measuring similarity between complete or partial structures remains a challenge. This study introduces PLPSim, a novel metric for quantifying alignment between partial lexicographic preference trees (PLP-Trees) and
[...] Read more.
Lexicographic preference trees (LP-Trees) provide a compact and expressive representation for modeling complex decision-making scenarios, yet measuring similarity between complete or partial structures remains a challenge. This study introduces PLPSim, a novel metric for quantifying alignment between partial lexicographic preference trees (PLP-Trees) and develops three coalition formation algorithms—HRECS1, HRECS2, and HRECS3—that leverage PLPSim to group agents with similar preferences. We further propose ContractLex and PriceLex protocols (comprising CLF, CFB, CFW, CFA, CFP) for coalition-based contract and pricing strategies, along with a new evaluation metric, F@Lex, which is designed to assess satisfaction under lexicographic preferences. To illustrate the framework, we generate a synthetic dataset (PLPGen) contextualized in a hybrid renewable energy market, where consumers’ PLP-Trees are aggregated and matched with suppliers’ tariff contracts. Experiments across 162 market scenarios, evaluated using Normalized Discounted Cumulative Gain (nDCG), Davies–Bouldin dispersion, and F@Lex, demonstrate that PLPSim-based coalitions outperform baseline approaches. The combination HRECS3 + CFP yields the highest consumer satisfaction, while HRECS3 + CFB achieves balanced satisfaction for both consumers and suppliers. While electricity tariffs and renewable energy contracts—static and dynamic—serve as the motivating example, the proposed framework generalizes to diverse multi-agent systems, offering a foundation for preference-driven coalition formation, adaptive policy design, and sustainable market optimization.
Full article

Graphical abstract
Open AccessArticle
Bidirectional Temporal Attention Convolutional Networks for High-Performance Network Traffic Anomaly Detection
by
Feng Wang, Yufeng Huang and Yifei Shi
Information 2026, 17(1), 61; https://doi.org/10.3390/info17010061 - 9 Jan 2026
Abstract
►▼
Show Figures
Deep learning-based network traffic anomaly detection, particularly using Recurrent Neural Networks (RNNs), often struggles with high computational overhead and difficulties in capturing long-range temporal dependencies. To address these limitations, this paper proposes a Bidirectional Temporal Attention Convolutional Network (Bi-TACN) for robust and efficient
[...] Read more.
Deep learning-based network traffic anomaly detection, particularly using Recurrent Neural Networks (RNNs), often struggles with high computational overhead and difficulties in capturing long-range temporal dependencies. To address these limitations, this paper proposes a Bidirectional Temporal Attention Convolutional Network (Bi-TACN) for robust and efficient network traffic anomaly detection. Specifically, dilated causal convolutions with expanding receptive fields and residual modules are employed to capture multi-scale temporal patterns while effectively mitigating the vanishing gradient. Furthermore, a bidirectional structure integrated with Efficient Channel Attention (ECA) is designed to adaptively weight contextual features, preventing sparse attack indicators from being overwhelmed by dominant normal traffic. A Softmax-based classifier then leverages these refined representations to execute high-performance anomaly detection. Extensive experiments on the NSL-KDD and UNSW-NB15 datasets demonstrate that Bi-TACN achieves average accuracies of 88.51% and 82.5%, respectively, significantly outperforming baseline models such as Bi-TCN and Bi-GRU in terms of both precision and convergence speed.
Full article

Figure 1
Open AccessArticle
Machine Learning Approaches for Early Student Performance Prediction in Programming Education
by
Seifeddine Bouallegue, Aymen Omri and Salem Al-Naemi
Information 2026, 17(1), 60; https://doi.org/10.3390/info17010060 - 8 Jan 2026
Abstract
Intelligent recommender systems are essential for identifying at-risk students and personalizing learning through tailored resources. Accurate prediction of student performance enables these systems to deliver timely interventions and data-driven support. This paper presents the application of machine learning models to predict final exam
[...] Read more.
Intelligent recommender systems are essential for identifying at-risk students and personalizing learning through tailored resources. Accurate prediction of student performance enables these systems to deliver timely interventions and data-driven support. This paper presents the application of machine learning models to predict final exam grades in a university-level programming course, leveraging multi-modal student data to improve prediction accuracy. In particular, a recent raw dataset of students enrolled in a programming course across 36 class sections from the Fall 2024 and Winter 2025 terms was initially processed. The data was collected up to one month before the final exam. From this data, a comprehensive set of features was engineered, including the student’s background, assessment grades and completion times, digital learning interactions, and engagement metrics. Building on this feature set, six machine learning prediction models were initially developed using data from the Fall 2024 term. Both training and testing were conducted on this dataset using cross-validation combined with hyperparameter tuning. The XGBoost model demonstrated strong performance, achieving an accuracy exceeding 91%. To assess the generalizability of the considered models, all models were retrained on the complete Fall 2024 dataset. They were then evaluated on an independent dataset from Winter 2025, with XGBoost achieving the highest accuracy, exceeding 84%. Feature importance analysis has revealed that the midterm grade and the average completion duration of lab assessments are the most influential predictors. This data-driven approach empowers instructors to proactively identify and support at-risk students, enabling adaptive learning environments that deliver personalized learning and timely interventions.
Full article
(This article belongs to the Special Issue Human–Computer Interactions and Computer-Assisted Education)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Analysis of Japanese Twitter Posts Related to COVID-19 Vaccination Focusing on Frequently Occurring Words and Emotional Expressions
by
Keisuke Utsu and Osamu Uchida
Information 2026, 17(1), 59; https://doi.org/10.3390/info17010059 - 8 Jan 2026
Abstract
►▼
Show Figures
The Coronavirus Disease 2019 (COVID-19) pandemic and its prolonged effects have been widely discussed on social media, and these discussions have been analyzed in various studies. A long-term analysis of Twitter (now “X”) posts regarding COVID-19 vaccination is essential for informing policy and
[...] Read more.
The Coronavirus Disease 2019 (COVID-19) pandemic and its prolonged effects have been widely discussed on social media, and these discussions have been analyzed in various studies. A long-term analysis of Twitter (now “X”) posts regarding COVID-19 vaccination is essential for informing policy and improving public health communication strategies. In addition, to prevent the spread of infectious diseases, it is crucial to rapidly promote vaccination while mitigating the impact of negative sentiment toward vaccination on social media platforms. Therefore, identifying the key factors behind negative discussions is important for guiding policy decisions and shaping responses. In this study, we collected Japanese tweets (posts) containing the words “Corona” and “vaccine” that were posted from February 2021 to December 2022. The results indicate that negative sentiment was primarily driven by concerns about adverse reactions and general fear and anxiety, which were particularly prominent before vaccination for the general public began, as well as mentions of pain during and after vaccination. While concerns about adverse reactions persisted throughout the analysis period, their prominence decreased over time as positive reactions became more frequent. Our findings provide insights into the characteristics and key factors behind negative discussions on COVID-19 vaccination in the Japanese context and may help improve public health communication strategies.
Full article

Graphical abstract
Open AccessArticle
Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer
by
Deling Huang, Zanxiong Li, Jian Yu and Yulong Zhou
Information 2026, 17(1), 58; https://doi.org/10.3390/info17010058 - 8 Jan 2026
Abstract
Few-Shot Text Classification (FSTC) aims to classify text accurately into predefined categories using minimal training samples. Recently, prompt-tuning-based methods have achieved promising results by constructing verbalizers that map input data to the label space, thereby maximizing the utilization of pre-trained model features. However,
[...] Read more.
Few-Shot Text Classification (FSTC) aims to classify text accurately into predefined categories using minimal training samples. Recently, prompt-tuning-based methods have achieved promising results by constructing verbalizers that map input data to the label space, thereby maximizing the utilization of pre-trained model features. However, existing verbalizer construction methods often rely on external knowledge bases, which require complex noise filtering and manual refinement, making the process time-consuming and labor-intensive, while approaches based on pre-trained language models (PLMs) frequently overlook inherent prediction biases. Furthermore, conventional data augmentation methods focus on modifying input instances while overlooking the integral role of label semantics in prompt tuning. This disconnection often leads to a trade-off where increased sample diversity comes at the cost of semantic consistency, resulting in marginal improvements. To address these limitations, this paper first proposes a novel Bayesian Mutual Information-based method that optimizes label mapping to retain general PLM features while reducing reliance on irrelevant or unfair attributes to mitigate latent biases. Based on this method, we propose two synergistic generators that synthesize semantically consistent samples by integrating label word information from the verbalizer to effectively enrich data distribution and alleviate sparsity. To guarantee the reliability of the augmented set, we propose a Low-Entropy Selector that serves as a semantic filter, retaining only high-confidence samples to safeguard the model against ambiguous supervision signals. Furthermore, we propose a Difficulty-Aware Adversarial Training framework that fosters generalized feature learning, enabling the model to withstand subtle input perturbations. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods on most few-shot and full-data splits, with F1 score improvements of up to +2.8% on the standard AG’s News benchmark and +1.0% on the challenging DBPedia benchmark.
Full article
(This article belongs to the Collection Natural Language Processing and Applications: Challenges and Perspectives)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Visitor Satisfaction at the Macau Science Center and Its Influencing Factors Based on Multi-Source Social Media Data
by
Jingwei Liang, Qingnian Deng, Yufei Zhu, Jiahai Liang, Chunhong Wu, Liang Zheng and Yile Chen
Information 2026, 17(1), 57; https://doi.org/10.3390/info17010057 - 8 Jan 2026
Abstract
With the rise in experience economy and the popularization of digital technology, user-generated content (UGC) has become a core data source for understanding tourist needs and evaluating the service quality of venues. As a landmark venue that combines science education, interactive experience, and
[...] Read more.
With the rise in experience economy and the popularization of digital technology, user-generated content (UGC) has become a core data source for understanding tourist needs and evaluating the service quality of venues. As a landmark venue that combines science education, interactive experience, and landscape viewing, the service quality of the Macau Science Center directly affects tourists’ travel experience and word-of-mouth dissemination. However, existing studies mostly rely on traditional questionnaire surveys and lack multi-technology collaborative analysis. In order to accurately identify the factors affecting satisfaction, this study uses 788 valid UGC data from five major platforms, namely Google Maps reviews, TripAdvisor, Sina Weibo, Xiaohongshu (Rednote), and Ctrip, from January 2023 to November 2025. It integrates word frequency analysis, semantic network analysis, latent Dirichlet allocation (LDA) topic modeling, and Valence Aware Dictionary and sEntiment Reasoner (VADER) sentiment computing to construct a systematic research framework. The study found that (1) the core attention dimensions of users cover the needs of parent–child and family visits, exhibitions and interactive experiences, ticketing and consumption services, surrounding environment and landscape, emotional evaluation, and recommendation intention. (2) The keyword association network has gradually developed from a loose network in the early stage to a comprehensive experience-dense network. (3) LDA analysis identified five main potential demand themes: comprehensive visiting experience and scenario integration, parent–child interaction and characteristic scenario experience, core venue facilities and ticketing services, visiting value and emotional evaluation, and transportation and surrounding landscapes. (4) User emotions were predominantly positive, accounting for 82.7%, while negative emotions were concentrated in local service details, and the emotional scores showed a fluctuating upward trend. This study provides targeted suggestions for the service optimization of the Macau Science Center and also provides a methodological reference for UGC-driven research in similar cultural venues.
Full article
(This article belongs to the Special Issue Social Media Mining: Algorithms, Insights, and Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Uncertainty-Aware Machine Learning for NBA Forecasting in Digital Betting Markets
by
Matteo Montrucchio, Enrico Barbierato and Alice Gatti
Information 2026, 17(1), 56; https://doi.org/10.3390/info17010056 - 8 Jan 2026
Abstract
►▼
Show Figures
This study introduces a fully uncertainty-aware forecasting framework for NBA games that integrates team-level performance metrics, rolling-form indicators, and spatial shot-chart embeddings. The predictive backbone is a recurrent neural network equipped with Monte Carlo dropout, yielding calibrated sequential probabilities. The model is evaluated
[...] Read more.
This study introduces a fully uncertainty-aware forecasting framework for NBA games that integrates team-level performance metrics, rolling-form indicators, and spatial shot-chart embeddings. The predictive backbone is a recurrent neural network equipped with Monte Carlo dropout, yielding calibrated sequential probabilities. The model is evaluated against strong baselines including logistic regression, XGBoost, convolutional models, a GRU sequence model, and both market-only and non-market-only benchmarks. All experiments rely on strict chronological partitioning (train ≤ 2022, validation 2023, test 2024), ablation tests designed to eliminate any circularity with bookmaker odds, and cross-season robustness checks spanning 2012–2024. Predictive performance is assessed through accuracy, Brier score, log-loss, AUC, and calibration metrics (ECE/MCE), complemented by SHAP-based interpretability to verify that only pre-game information influences predictions. To quantify economic value, calibrated probabilities are fed into a frictionless betting simulator using fractional-Kelly staking, an expected-value threshold, and bootstrap-based uncertainty estimation. Empirically, the uncertainty-aware model delivers systematically better calibration than non-Bayesian baselines and benefits materially from the combination of shot-chart embeddings and recent-form features. Economic value emerges primarily in less-efficient segments of the market: The fused predictor outperforms both market-only and non-market-only variants on moneylines, while spreads and totals show limited exploitable edge, consistent with higher pricing efficiency. Sensitivity studies across Kelly multipliers, EV thresholds, odds caps, and sequence lengths confirm that the findings are robust to modelling and decision-layer perturbations. The paper contributes a reproducible, decision-focused framework linking uncertainty-aware prediction to economic outcomes, clarifying when predictive lift can be monetized in NBA markets, and outlining methodological pathways for improving robustness, calibration, and execution realism in sports forecasting.
Full article

Graphical abstract
Open AccessArticle
RAG-Based Natural Language Interface for Goal-Oriented Knowledge Graphs and Its Evaluation
by
Kosuke Yano, Yoshinobu Kitamura and Kazuhiro Kuwabara
Information 2026, 17(1), 55; https://doi.org/10.3390/info17010055 - 7 Jan 2026
Abstract
Procedural knowledge is essential in specialized domains, and natural language tools for retrieving procedural knowledge are necessary for non-expert users to facilitate their understanding and learning. In this study, we focus on function decomposition trees, a framework for representing procedural knowledge, and propose
[...] Read more.
Procedural knowledge is essential in specialized domains, and natural language tools for retrieving procedural knowledge are necessary for non-expert users to facilitate their understanding and learning. In this study, we focus on function decomposition trees, a framework for representing procedural knowledge, and propose a natural language interface leveraging Retrieval-Augmented Generation (RAG). The natural language interface converts the user’s inputs into SPARQL queries, retrieving relevant data and subsequently presenting them in an accessible and chat-based format. Such a flexible and purpose-driven search facilitates users’ understanding of functions of artifacts or human actions and their performance of these actions. We demonstrate that the tool effectively retrieves actions, goals, and dependencies using an illustrative real-world example of a function decomposition tree. In addition, we evaluated the system by comparing it with ChatGPT 4o and Microsoft GraphRAG. The results suggest that the system can deliver responses that are both necessary and sufficient for users’ needs, while the outputs of other systems lack the key elements and return redundant information.
Full article
(This article belongs to the Special Issue Exploring Traditional and AI-Driven Approaches on Knowledge Graphs and Semantic Web Technologies)
►▼
Show Figures

Figure 1
Open AccessReview
Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms
by
Saidakhror Gulyamov, Said Gulyamov, Andrey Rodionov, Rustam Khursanov, Kambariddin Mekhmonov, Djakhongir Babaev and Akmaljon Rakhimjonov
Information 2026, 17(1), 54; https://doi.org/10.3390/info17010054 - 7 Jan 2026
Abstract
Large language models (LLMs) have rapidly transformed artificial intelligence applications across industries, yet their integration into production systems has unveiled critical security vulnerabilities, chief among them prompt injection attacks. This comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry
[...] Read more.
Large language models (LLMs) have rapidly transformed artificial intelligence applications across industries, yet their integration into production systems has unveiled critical security vulnerabilities, chief among them prompt injection attacks. This comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry security reports, and documented real-world exploits. We examine the taxonomy of prompt injection techniques, including direct jailbreaking and indirect injection through external content. The rise of AI agent systems and the Model Context Protocol (MCP) has dramatically expanded attack surfaces, introducing vulnerabilities such as tool poisoning and credential theft. We document critical incidents including GitHub Copilot’s CVE-2025-53773 remote code execution vulnerability (CVSS 9.6) and ChatGPT’s Windows license key exposure. Research demonstrates that just five carefully crafted documents can manipulate AI responses 90% of the time through Retrieval-Augmented Generation (RAG) poisoning. We propose PALADIN, a defense-in-depth framework implementing five protective layers. This review provides actionable mitigation strategies based on OWASP Top 10 for LLM Applications 2025, identifies fundamental limitations including the stochastic nature problem and alignment paradox, and proposes research directions for architecturally secure AI systems. Our analysis reveals that prompt injection represents a fundamental architectural vulnerability requiring defense-in-depth approaches rather than singular solutions.
Full article
(This article belongs to the Special Issue Emerging Trends in AI-Driven Cyber Security and Digital Forensics)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Evaluating Intralingual Machine Translation Quality: Application of an Adapted MQM Scheme to German Plain Language
by
Silvana Deilen, Sergio Hernández Garrido, Ekaterina Lapshinova-Koltunski, Chris Maaß and Annie Werner
Information 2026, 17(1), 53; https://doi.org/10.3390/info17010053 - 6 Jan 2026
Abstract
This paper presents the results of a study in which we conducted a fine-grained error analysis for intralingual machine translations into Plain Language. As there are no established error schemes for intralingual translations, we adapted the MQM scheme to fit the purposes of
[...] Read more.
This paper presents the results of a study in which we conducted a fine-grained error analysis for intralingual machine translations into Plain Language. As there are no established error schemes for intralingual translations, we adapted the MQM scheme to fit the purposes of intralingual translation and expanded the scheme by error categories that are only relevant to intralingual translation. Our study has revealed that substantial differences exist between general-purpose and domain-specific models, with fine-tuned systems achieving notably higher accuracy and fewer severe errors across most categories. We found that across all four models, most errors occurred in the “Accuracy” category, closely followed by errors in the “Linguistic conventions” category and that all evaluated models produced persistent issues, particularly in terms of accuracy, linguistic conventions, and alignment with the target audience. In addition, we identified subcategories from the MQM scheme that are primarily relevant to interlingual translation, such as “Textual conventions”. Furthermore, we found that manual error annotation is resource-intensive and subjective, highlighting the urgent need for the development of automatic or semi-automatic error annotation tools. We also discuss difficulties that arose in the annotation process and show how methodological limitations might be overcome in future studies. Our findings provide practical directions for improving both machine translation technology and quality assurance frameworks for intralingual translation into Plain Language.
Full article
(This article belongs to the Special Issue Human and Machine Translation: Recent Trends and Foundations)
►▼
Show Figures

Figure 1
Open AccessArticle
TransferLearning-Driven Large-Scale CNN Benchmarking with Explainable AI for Image-Based Dust Detection on Solar Panels
by
Hafeez Anwar
Information 2026, 17(1), 52; https://doi.org/10.3390/info17010052 - 6 Jan 2026
Abstract
Solar panel power plants are typically established in regions with maximum solar irradiation, yet these conditions result in heavy dust accumulation on the panels causing significant performance degradation and reduced power output. The paper addresses this issue via an image-based dust detection solution
[...] Read more.
Solar panel power plants are typically established in regions with maximum solar irradiation, yet these conditions result in heavy dust accumulation on the panels causing significant performance degradation and reduced power output. The paper addresses this issue via an image-based dust detection solution powered by deep learning, particularly convolutional neural networks (CNNs). Most of such solutions use state-of-the-art CNNs either as backbones/features extractors, or propose custom models built upon them. Given such a reliance, future research requires a comprehensive benchmarking of CNN models to identify the ones that achieve superior performance on classifying clean vs. dusty solar panels both with respect to accuracy and efficiency. To this end, we evaluate 100 CNN models that belong to 16 families for image-based dust detection on solar panels, where the pre-trained models of these CNN architectures are used to encode solar panel images. Upon these image encodings, we then train and test a linear support vector machine (SVM) to determine the best-performing models in terms of classification accuracy and training time. The use of such a simple classifier ensures a fair comparison where the encodings do not benefit from the classifier itself and their performance reflects each CNN’s ability to capture the underlying image features. Experiments were conducted on a publicly available dust detection dataset, using stratified shuffle-split with 70–30, 80–20, and 90–10 splits, repeated 10 times. convnext_xxlarge and resnetv2_152 achieved the best classification rates of above 90%, with resnetv2_152 offering superior efficiency that is also supported by features analysis such as tSNE and UMAP, and explainableAI (XAI) such as LIME visualization. To prove their generalization capability, we tested the image encodings of resnetv2_152 on an unseen real-world image dataset captured via a drone camera, which achieved a remarkable accuracy of 96%. Consequently, our findings guide the selection of optimal CNN backbones for future image-based dust detection systems.
Full article
(This article belongs to the Special Issue Addressing Real-World Challenges in Recognition and Classification with Cutting-Edge AI Models and Methods)
►▼
Show Figures

Figure 1
Open AccessArticle
Insights on the Pedagogical Abilities of AI-Powered Tutors in Math Dialogues
by
Verónica Parra, Ana Corica and Daniela Godoy
Information 2026, 17(1), 51; https://doi.org/10.3390/info17010051 - 6 Jan 2026
Abstract
AI-powered tutors that interact with students in question-answering scenarios using large language models (LLMs) as foundational models for generating responses represent a potential scalable solution to the growing demand for one-to-one tutoring. In fields like mathematics, where students often face difficulties, sometimes leading
[...] Read more.
AI-powered tutors that interact with students in question-answering scenarios using large language models (LLMs) as foundational models for generating responses represent a potential scalable solution to the growing demand for one-to-one tutoring. In fields like mathematics, where students often face difficulties, sometimes leading to frustration, easy-to-use natural language interactions emerge as an alternative for enhancing engagement and providing personalized advice. Despite their promising potential, the challenges for LLM-based tutors in the math domain are twofold. First, the absence of genuine reasoning and generalization abilities in LLMs frequently results in mathematical errors, ranging from inaccurate calculations to flawed reasoning steps and even the appearance of contradictions. Second, the pedagogical capabilities of AI-powered tutors must be examined beyond simple question-answering scenarios since their effectiveness in math tutoring largely depends on their ability to guide students in building mathematical knowledge. In this paper, we present a study exploring the pedagogical aspects of LLM-based tutors through the analysis of their responses in math dialogues using feature extraction techniques applied to textual data. The use of natural language processing (NLP) techniques enables the quantification and characterization of several aspects of pedagogical strategies deployed in the answers, which the literature identifies as essential for engaging students and providing valuable guidance in mathematical problem-solving. The findings of this study have direct practical implications in the design of more effective math AI-powered tutors as they highlight the most salient characteristics of valuable responses and can thus inform the training of LLMs.
Full article
(This article belongs to the Special Issue AI Technology-Enhanced Learning and Teaching)
►▼
Show Figures

Figure 1
Journal Menu
► ▼ Journal Menu-
- Information Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Sections & Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Society Collaborations
- Conferences
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
AI, Computers, Electronics, Information, MAKE, Signals
Recent Advances in Label Distribution Learning
Topic Editors: Xin Geng, Ning Xu, Liangxiao JiangDeadline: 31 January 2026
Topic in
Applied Sciences, Computers, Electronics, Information, J. Imaging
Visual Computing and Understanding: New Developments and Trends
Topic Editors: Wei Zhou, Guanghui Yue, Wenhan YangDeadline: 31 March 2026
Topic in
Applied Sciences, Information, Systems, Technologies, Electronics, AI
Challenges and Opportunities of Integrating Service Science with Data Science and Artificial Intelligence
Topic Editors: Dickson K. W. Chiu, Stuart SoDeadline: 30 April 2026
Topic in
Electronics, Future Internet, Technologies, Telecom, Network, Microwave, Information, Signals
Advanced Propagation Channel Estimation Techniques for Sixth-Generation (6G) Wireless Communications
Topic Editors: Han Wang, Fangqing Wen, Xianpeng WangDeadline: 31 May 2026
Conferences
Special Issues
Special Issue in
Information
Smarter Systems: Innovations at the Intersection of AI and Sensor Technology
Guest Editor: Joseph KushDeadline: 15 January 2026
Special Issue in
Information
Machine Learning for the Blockchain
Guest Editors: Georgios Alexandridis, Thanasis Papaioannou, Georgios Siolas, Paraskevi TzouveliDeadline: 31 January 2026
Special Issue in
Information
Emerging Applications of Machine Learning in Healthcare, Industry, and Beyond
Guest Editors: Francesco Isgrò, Huiyu Zhou, Daniele RaviDeadline: 31 January 2026
Special Issue in
Information
Selected Papers of the 10th North American International Conference on Industrial Engineering and Operations Management
Guest Editors: Luis Rabelo, Shahram TajDeadline: 31 January 2026
Topical Collections
Topical Collection in
Information
Knowledge Graphs for Search and Recommendation
Collection Editors: Pierpaolo Basile, Annalina Caputo
Topical Collection in
Information
Augmented Reality Technologies, Systems and Applications
Collection Editors: Ramon Fabregat, Jorge Bacca-Acosta, N.D. Duque-Mendez
Topical Collection in
Information
Natural Language Processing and Applications: Challenges and Perspectives
Collection Editor: Diego Reforgiato Recupero




