MDPI - Publisher of Open Access Journals

27 pages, 2316 KB

Open AccessArticle

Linking Customer Sentiment to Patent-Based Solutions Through Semantic Analysis

by Sakire Nesli Demircioglu, Serkan Altuntas and Turkay Dereli

Appl. Sci. 2026, 16(5), 2570; https://doi.org/10.3390/app16052570 - 7 Mar 2026

Viewed by 198

Analyzing customer feedback is critical for identifying unmet needs in product development and innovation processes. However, current studies often focus only on identifying customer-expressed problems, neglecting to systematically match these problems with technological solutions and transform them into potential product features. This study [...] Read more.

Analyzing customer feedback is critical for identifying unmet needs in product development and innovation processes. However, current studies often focus only on identifying customer-expressed problems, neglecting to systematically match these problems with technological solutions and transform them into potential product features. This study aims to propose a sentiment and semantic analysis-based approach that correlates problems derived from customer feedback with patent-based solutions. The proposed approach utilizes Aspect-Based Sentiment Analysis to identify unmet needs from customer feedback, the BERTopic algorithm to extract solution-oriented themes from patent documents, and short text semantic similarity methods to associate problem-solution pairs. The applicability of the approach is demonstrated using 476 customer product reviews and 3548 patents in the Heating, Ventilation, and Air Conditioning (HVAC) field. The results show that customer-expressed problems can be semantically correlated with patent-based technological solutions, and these matches contribute to the identification of potential product features. The resulting problem-solution matches are structured along technological development horizons and presented as a technology roadmap output. The proposed approach offers a framework supporting systematic problem–solution matching based on sentiment and semantic analysis in technology-intensive sectors with large volumes of unstructured text data. Full article

(This article belongs to the Special Issue Advancements in Natural Language Processing, Semantic Networks, and Sentiment Analysis: 2nd Edition)

► Show Figures

Figure 1

22 pages, 1311 KB

Open AccessSystematic Review

Simulation and Predictive Environmental Modeling for Marine Forecasting: A Systematic Review

by Annamaria Souri and Angelika Kokkinaki

J. Mar. Sci. Eng. 2026, 14(5), 493; https://doi.org/10.3390/jmse14050493 - 4 Mar 2026

Viewed by 415

Abstract

Coastal and marine systems are governed by fragile water-quality dynamics, where disturbances can trigger harmful algal blooms with significant ecological and societal consequences. These pressures have intensified interest in forecasting systems that can anticipate bloom development and support environmental management. This study presents [...] Read more.

Coastal and marine systems are governed by fragile water-quality dynamics, where disturbances can trigger harmful algal blooms with significant ecological and societal consequences. These pressures have intensified interest in forecasting systems that can anticipate bloom development and support environmental management. This study presents a systematic review of simulation-based and predictive environmental modeling approaches used for marine forecasting of water quality and harmful algal bloom phenomena. Following PRISMA guidelines, 11,185 records were identified, 127 articles were screened in full text for eligibility, and 40 peer-reviewed studies published between 2015 and 2025 were included and synthesized using a structured extraction framework capturing modeling paradigms, forecast targets, data inputs, spatial and temporal scope, validation practices, operational context, and reported limitations. The reviewed literature indicates the dominance of predictive and hybrid modeling approaches, with forecasting efforts primarily focused on coastal systems and short-term applications. Harmful algal blooms and chlorophyll-a emerge as dominant forecast targets, commonly supported by satellite observations, in situ measurements, and environmental forcing variables. Despite substantial methodological advances, persistent challenges related to data availability and quality, validation rigor, system integration, and operational deployment remain evident across modeling paradigms. Overall, the findings suggest that while marine forecasting models have become increasingly sophisticated, their translation into reliable and operational systems remains uneven, highlighting the need for closer alignment. Full article

(This article belongs to the Section Marine Environmental Science)

► Show Figures

Figure 1

30 pages, 573 KB

Open AccessArticle

Managerial Myopia, Willingness for Proactive Risk-Taking, and Digital Transformation in Commercial Banks: Evidence from China

by Yuanyuan Huo, Shengnan Wang and Wenlong Miao

Int. J. Financial Stud. 2026, 14(3), 56; https://doi.org/10.3390/ijfs14030056 - 2 Mar 2026

Viewed by 256

Abstract

Digital transformation in commercial banks is a critical enabler of modern financial development. While technological advancement and resource allocation are key drivers, managerial attributes also play a decisive role in shaping transformation trajectories. Managerial myopia—often arising from short-term performance pressures, evolving regulatory expectations, [...] Read more.

Digital transformation in commercial banks is a critical enabler of modern financial development. While technological advancement and resource allocation are key drivers, managerial attributes also play a decisive role in shaping transformation trajectories. Managerial myopia—often arising from short-term performance pressures, evolving regulatory expectations, and cyclical macroeconomic conditions—warrants particular attention. This study examines how managerial myopia constrains banks’ digital transformation by analyzing its direct impact, underlying behavioral mechanisms, and contingent boundary conditions. Using panel data from 55 Chinese listed commercial banks from 2010 to 2021, we construct a text-based measure of managerial myopia through linguistic analysis of annual reports and employ fixed-effects models for estimation. The results show that a short-term managerial orientation significantly impedes digital transformation, primarily by reducing banks’ propensity for proactive risk-taking. However, this inhibitory effect weakens when managers anticipate longer tenures, management teams exhibit greater diversity in overseas experience and functional expertise, or the average educational level is higher. Moreover, the adverse effects are less pronounced in larger banks and those with stronger corporate governance. Increased external scrutiny and intensified market competition further mitigate this negative influence. These findings offer actionable insights for banking stakeholders aiming to strengthen governance, extend managerial time horizons, and foster an innovation-oriented culture conducive to sustained digital advancement. Full article

(This article belongs to the Special Issue InsurTech and FinTech Innovations: Transforming Risk Management and Governance in the Digital Era)

► Show Figures

Figure 1

40 pages, 4394 KB

Open AccessArticle

Forecasting the Price of Gold with Integrated Media Sentiment—A Prediction Framework Based on Online News Sentiment Mining with CNN-QRLSTM

by Yu Ji, Xinyue Lei, Lining Zhang, Jiani Heng and Jianwei Fan

Entropy 2026, 28(3), 271; https://doi.org/10.3390/e28030271 - 28 Feb 2026

Viewed by 270

Abstract

Accurate gold price forecasting is crucial for economic stability and investment decision-making. In order to improve the accuracy of gold price prediction and quantify the uncertainty of gold price fluctuation, this paper proposes a hybrid model (CNN-QRLSTM) that integrates convolutional neural network (CNN) [...] Read more.

Accurate gold price forecasting is crucial for economic stability and investment decision-making. In order to improve the accuracy of gold price prediction and quantify the uncertainty of gold price fluctuation, this paper proposes a hybrid model (CNN-QRLSTM) that integrates convolutional neural network (CNN) and quantile regression long- and short-term memory network (QRLSTM) and innovatively introduces news text data to quantify the media sentiment. We combine EEMD with the Hurst index to remove white noise from the original signal, and the processed data is used as the input layer of the prediction model. Furthermore, to demonstrate the impact of news sentiment on gold prices, this paper employs entropy measurement methods based on information theory to quantify the uncertainty and information content embedded within processed gold price sequences and derived sentiment indicators. The mutual information (MI) algorithm, based on information entropy, captures the nonlinear correlations between financial keywords and market sentiment. It constructs a financial sentiment lexicon (covering keywords such as economic policies and geopolitical conflicts), combines semantic rules with context-weighted strategies, calculates sentiment scores for news texts, and generates daily aggregated media sentiment indicators. This entropy-based perception method not only enhances the interpretability of emotion-driven fluctuations but also provides a theoretical foundation for reducing prediction uncertainty through multi-source data fusion. The experiment uses 2022–2025 daily London gold spot price data, Shanghai Gold Exchange gold price data, and the same period of Gold Investment Network gold market news to carry out the study. The empirical study shows that the synergy of multi-source data fusion and the quantile regression mechanism can improve the accuracy of gold price prediction and the new paradigm of risk interpretation while providing theoretical support for the formulation of quantitative investment strategies. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

27 pages, 1058 KB

Open AccessArticle

An AI-Driven Multimodal Sensor Fusion Framework for Fraud Perception in Short-Video and Live-Streaming Platforms

by Ruixiang Zhao, Xuanhao Zhang, Jinfan Yang, Haofei Li, Zhengjia Lu, Wenrui Xu and Manzhou Li

Sensors 2026, 26(5), 1525; https://doi.org/10.3390/s26051525 - 28 Feb 2026

Viewed by 276

Abstract

With the rapid proliferation of short-video platforms and live-streaming commerce ecosystems, marketing activities are increasingly manifested through complex multimodal sensing signals. These heterogeneous sensor data streams exhibit strong temporal dependency, high cross-modal coupling, and progressive evolutionary characteristics, making early-stage fraud perception particularly challenging [...] Read more.

With the rapid proliferation of short-video platforms and live-streaming commerce ecosystems, marketing activities are increasingly manifested through complex multimodal sensing signals. These heterogeneous sensor data streams exhibit strong temporal dependency, high cross-modal coupling, and progressive evolutionary characteristics, making early-stage fraud perception particularly challenging for conventional unimodal or static analytical paradigms. Existing approaches often fail to effectively capture weak anomalous cues emerging across multimodal channels during the initial stages of fraudulent campaigns. To address these limitations, an artificial intelligence-driven multimodal sensor perception framework is proposed for temporal fraud detection in short-video environments. A multimodal temporal alignment module is first designed to synchronize heterogeneous sensor signals with inconsistent sampling granularities. Subsequently, a shared temporal encoding network is constructed to learn evolution-aware representations across multimodal sensor sequences. On this basis, a cross-modal temporal attention fusion mechanism is introduced to dynamically weight sensor contributions at different behavioral stages. Finally, a fraud evolution modeling and early risk prediction module is developed to characterize the progressive intensification of fraudulent activities and to enable risk assessment under incomplete temporal observations. Extensive experiments conducted on real-world datasets collected from multiple mainstream short-video platforms demonstrate the effectiveness of the proposed AI-driven sensing framework. The model achieves an overall accuracy of 0.941, precision of 0.865, recall of 0.812, and F1 score of 0.838, with the AUC further reaching 0.956, significantly outperforming text-based, vision-based, temporal, and conventional multimodal baselines. In early-stage detection scenarios utilizing only the first 30% of video content, the framework maintains stable performance advantages, achieving a precision of 0.812, recall of 0.704, and F1 score of 0.754, validating its capability for proactive fraud warning. Full article

(This article belongs to the Special Issue Artificial Intelligence-Driven Sensing)

► Show Figures

Figure 1

25 pages, 1558 KB

Open AccessArticle

Towards Scalable Monitoring: An Interpretable Multimodal Framework for Migration Content Detection on TikTok Under Data Scarcity

by Dimitrios Taranis, Gerasimos Razis and Ioannis Anagnostopoulos

Electronics 2026, 15(4), 850; https://doi.org/10.3390/electronics15040850 - 17 Feb 2026

Viewed by 325

Abstract

Short-form video platforms such as TikTok (TikTok Pte. Ltd., Singapore) host large volumes of user-generated, often ephemeral, content related to irregular migration, where relevant cues are distributed across visual scenes, on-screen text, and multilingual captions. Automatically identifying migration-related videos is challenging due to [...] Read more.

Short-form video platforms such as TikTok (TikTok Pte. Ltd., Singapore) host large volumes of user-generated, often ephemeral, content related to irregular migration, where relevant cues are distributed across visual scenes, on-screen text, and multilingual captions. Automatically identifying migration-related videos is challenging due to this multimodal complexity and the scarcity of labeled data in sensitive domains. This paper presents an interpretable multimodal classification framework designed for deployment under data-scarce conditions. We extract features from platform metadata, automated video analysis (Google Cloud Video Intelligence), and Optical Character Recognition (OCR) text, and compare text-only, OCR-only, and vision-only baselines against a multimodal fusion approach using Logistic Regression, Random Forest, and XGBoost. In this pilot study, multimodal fusion consistently improves class separation over single-modality models, achieving an F1-score of 0.92 for the migration-related class under stratified cross-validation. Given the limited sample size, these results are interpreted as evidence of feature separability rather than definitive generalization. Feature importance and SHAP analyses identify OCR-derived keywords, maritime cues, and regional indicators as the most influential predictors. To assess robustness under data scarcity, we apply SMOTE to synthetically expand the training set to 500 samples and evaluate performance on a small held-out set of real videos, observing stable results that further support feature-level robustness. Finally, we demonstrate scalability by constructing a weakly labeled corpus of 600 videos using the identified multimodal cues, highlighting the suitability of the proposed feature set for weakly supervised monitoring at scale. Overall, this work serves as a methodological blueprint for building interpretable multimodal monitoring pipelines in sensitive, low-resource settings. Full article

(This article belongs to the Special Issue Multimodal Learning for Multimedia Content Analysis and Understanding)

► Show Figures

Figure 1

17 pages, 490 KB

Open AccessArticle

Virtual-Document-Augmented Retrieval-Augmented Generation for Power-Domain Knowledge-Base Question Answering with Noise-Enhanced Robustness

by Yanwen Chen, Xiong Luo, Ying Zhou, Qiaojuan Peng, Yuqi Yuan, Ke Chen and Yinghui Liu

Processes 2026, 14(4), 670; https://doi.org/10.3390/pr14040670 - 15 Feb 2026

Viewed by 313

Abstract

The Retrieval-Augmented Generation (RAG) technology enables large language models (LLM) to access external knowledge bases by introducing external documents, enhancing their capability for knowledge question answering in professional domains and generating more reliable responses. It effectively addresses issues such as LLM hallucinations and [...] Read more.

The Retrieval-Augmented Generation (RAG) technology enables large language models (LLM) to access external knowledge bases by introducing external documents, enhancing their capability for knowledge question answering in professional domains and generating more reliable responses. It effectively addresses issues such as LLM hallucinations and knowledge obsolescence. In the electric power domain, RAG technology can be leveraged to fully utilize accumulated corporate data and resources. However, in the retrieval phase of RAG, there are significant differences in the semantic space representation between short sentences and long text documents. Additionally, when generating answers based on retrieved relevant documents, the generator prioritizes highly relevant document fragments, a strategy that may overlook sub-relevant documents containing useful information. This paper uses an LLM to generate hypothetical documents. These documents are combined with the original question to perform similarity retrieval in the corpus, followed by the first round of answer generation. Subsequently, the original question is combined with the answer generated in the first round, and this combined content is used to retrieve relevant documents. Finally, irrelevant documents are added to the context of the retrieved relevant documents to enhance the LLM’s attention to the relevant documents. Based on the above strategies, experiments are conducted on the electricity dataset. The results show that, compared with the naive RAG method, the proposed model achieves a relative improvement of 4.63% in the ROUGE-L metric and 11.32% in the BLEU-4 metric on the electricity dataset. Meanwhile, experiments are also carried out on the public CMRC dataset, and the effectiveness of the proposed method is verified. Full article

(This article belongs to the Section AI-Enabled Process Engineering)

► Show Figures

Figure 1

22 pages, 1730 KB

Open AccessArticle

Toward a Hybrid Intrusion Detection Framework for IIoT Using a Large Language Model

by Musaad Algarni, Mohamed Y. Dahab, Abdulaziz A. Alsulami, Badraddin Alturki and Raed Alsini

Sensors 2026, 26(4), 1231; https://doi.org/10.3390/s26041231 - 13 Feb 2026

Viewed by 385

Abstract

The widespread connectivity of the Industrial Internet of Things (IIoT) improves the efficiency and functionality of connected devices. However, it also raises serious concerns about cybersecurity threats. Implementing an effective intrusion detection system (IDS) for IIoT is challenging due to heterogeneous data, high [...] Read more.

The widespread connectivity of the Industrial Internet of Things (IIoT) improves the efficiency and functionality of connected devices. However, it also raises serious concerns about cybersecurity threats. Implementing an effective intrusion detection system (IDS) for IIoT is challenging due to heterogeneous data, high feature dimensionality, class imbalance, and the risk of data leakage during evaluation. This paper presents a leakage-safe hybrid intrusion detection framework that combines text-based and numerical network flow features in an IIoT environment. Each network flow is converted into a short text description and encoded using a frozen Large Language Model (LLM) called the Bidirectional Encoder Representations from Transformers (BERT) model to obtain fixed semantic embeddings, while numerical traffic features are standardized in parallel. To improve class separation, class prototypes are computed in Principal Component Analysis (PCA) space, and cosine similarity scores for these prototypes are added to the feature set. Class imbalance is handled only in the training data using the Synthetic Minority Over-sampling Technique (SMOTE). A Random Forest (RF) is used to select the top features, followed by a Histogram-based Gradient Boosting (HGB) classifier for final prediction. The proposed framework is evaluated on the Edge-IIoTset and ToN_IoT datasets and achieves promising results. Empirically, the framework attains 98.19% accuracy on Edge-IIoTset and 99.15% accuracy on ToN_IoT, indicating robust, leakage-safe performance. Full article

(This article belongs to the Special Issue AI, Machine Learning (ML), and Large Language Models (LLMs) for Cybersecurity in Sensor Networks)

► Show Figures

Figure 1

27 pages, 2824 KB

Open AccessArticle

De-Identification of Electronic Health Records Using Deep Learning and Transformers

by Fatih Dilmaç and Adil Alpkocak

Appl. Sci. 2026, 16(4), 1692; https://doi.org/10.3390/app16041692 - 8 Feb 2026

Viewed by 306

Abstract

Adoption of electronic health records (EHRs) has significantly advanced healthcare by enabling extensive data storage and analysis for clinical decisions and research. However, sensitive personally identifiable information (PII) within EHRs presents major challenges concerning patient privacy, data security, and regulatory compliance. Effective automated [...] Read more.

Adoption of electronic health records (EHRs) has significantly advanced healthcare by enabling extensive data storage and analysis for clinical decisions and research. However, sensitive personally identifiable information (PII) within EHRs presents major challenges concerning patient privacy, data security, and regulatory compliance. Effective automated de-identification techniques for detecting and removing protected health information (PHI) are thus essential. This study presents one of the first focused studies on Turkish EHR de-identification, comparing traditional sequence-based neural architectures with advanced transformer-based large language models (LLMs) for PHI detection. We introduce and publicly release a manually annotated benchmark dataset of TEHRs, covering diverse PHI types, supporting further research in Turkish clinical text. Two methodologies were evaluated: bidirectional long short-term memory (BiLSTM) models (with and without Conditional Random Fields (CRFs)) and six fine-tuned pre-trained LLMs. Experiments demonstrated the superior performance of transformer-based LLMs, achieving a macro F1 score of 92.20%, significantly outperforming traditional methods. Among sequence-based models, BiLSTM + CRF attained an 83.00% F1 score, exceeding the baseline BiLSTM 78.40%. Results highlight the potential of transformer-based models for privacy-preserving Turkish clinical text and underscore the importance of annotated benchmark datasets. Full article

(This article belongs to the Special Issue Text Mining with Information Extraction: Latest Advances and Prospects)

► Show Figures

Figure 1

19 pages, 2661 KB

Open AccessArticle

Data-Driven Reconstruction of the Singapore Stone: A Numerical Imputation Method of Epigraphic Restoration

by Tehreem Zahra, Francesco Perono Cacciafoco and Muhammad Tayyab Zamir

Information 2026, 17(2), 170; https://doi.org/10.3390/info17020170 - 7 Feb 2026

Viewed by 303

Abstract

One of the key artefacts of epigraphy in Southeast Asia is the Singapore Stone inscription, which is, unfortunately, in a poor condition. There are huge spaces that separate the readable characters, rendering the text incomplete. This renders a traditional reconstruction and interpretation by [...] Read more.

One of the key artefacts of epigraphy in Southeast Asia is the Singapore Stone inscription, which is, unfortunately, in a poor condition. There are huge spaces that separate the readable characters, rendering the text incomplete. This renders a traditional reconstruction and interpretation by philologists extremely challenging. We consider epigraphic restoration as a data-restoration task in this paper. We represent the inscription as a system of categorical symbols, in keeping with the original spatial disposition of characters and spaces. Our model is trained in a conservative, data-driven manner using the observed symbols to learn the local transition statistics, and it takes advantage of this information to make plausible predictions of the most likely characters in missing sequences that are short and well-constrained. The procedure generates a probabilistic hypothesis of restoration, which can be audited, as opposed to one definitive reading. The validation of masked-character recovery demonstrates that the model has a mean top-one error of 53.3%, which represents a significantly worse performance compared with simple baseline methods. The process is focused on interaction and transparency with experts. It relies upon assurance scores and prioritised alternative completions of each proposed reconstruction, as a useful means to produce hypotheses in computational epigraphy and the digital humanities. Full article

► Show Figures

Graphical abstract

11 pages, 1353 KB

Open AccessData Descriptor

Dual-Source Synthetic Uzbek Corpora for Sentiment Analysis and NER with Controlled Emoji Signals

by Bobur Saidov, Vladimir Barakhnin, Shohrux Madirimov, Umid Ibragimov, Shakhboz Meylikulov, Sultonbek Normamatov, Feruza Bahodirova, Javlonbek Matnazarov and Zarnigor Fayzullaeva

Data 2026, 11(2), 28; https://doi.org/10.3390/data11020028 - 1 Feb 2026

Viewed by 387

Abstract

This data descriptor presents two fully synthetic corpora for sentiment analysis and named entity recognition (NER) in Uzbek. The first corpus contains 12,000 hybrid synthetic sentences generated from templates with lexical randomization, automatic insertion of named entities (PER/ORG/LOC), lexicon-based polarity scoring, and a [...] Read more.

This data descriptor presents two fully synthetic corpora for sentiment analysis and named entity recognition (NER) in Uzbek. The first corpus contains 12,000 hybrid synthetic sentences generated from templates with lexical randomization, automatic insertion of named entities (PER/ORG/LOC), lexicon-based polarity scoring, and a controlled emoji distribution. The second corpus includes 3000 “manual-style” sentences designed to resemble short, naturally structured messages. Although the manual-style subset was initially intended to be emoji-free, the released version includes a 39.6% emoji presence (sentences containing at least one emoji) to maintain comparability in emotional markers across corpora. Both corpora are released in CSV, XLSX, and JSONL formats and share a unified schema (id, text, sentiment, entities, entity_type, polarity_score, polarity_source, token_count, emojis, emoji_position, emoji_sentiment, conflict_flag, sentiment_from_polarity_score, split). The dataset is publicly available via Mendeley Data (DOI: 10.17632/y2d5pcyrzz.3). Full article

► Show Figures

Figure 1

33 pages, 550 KB

Open AccessArticle

Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach

by Qidi Yu, Chen Xing, Yanjing He, Sunghee Ahn and Hyung Jong Na

Electronics 2026, 15(2), 443; https://doi.org/10.3390/electronics15020443 - 20 Jan 2026

Viewed by 389

Abstract

This study proposes a hybrid machine learning framework that integrates structured financial indicators and unstructured textual strategy disclosures to improve firm-level management performance prediction. Using corporate business reports from South Korean listed firms, strategic text was extracted and categorized under the Balanced Scorecard [...] Read more.

This study proposes a hybrid machine learning framework that integrates structured financial indicators and unstructured textual strategy disclosures to improve firm-level management performance prediction. Using corporate business reports from South Korean listed firms, strategic text was extracted and categorized under the Balanced Scorecard (BSC) framework into financial, customer, internal process, and learning and growth dimensions. Various machine learning and deep learning models—including k-nearest neighbors (KNNs), support vector machine (SVM), light gradient boosting machine (LightGBM), convolutional neural network (CNN), long short-term memory (LSTM), autoencoder, and transformer—were evaluated, with results showing that the inclusion of strategic textual data significantly enhanced prediction accuracy, precision, recall, area under the curve (AUC), and F1-score. Among individual models, the transformer architecture demonstrated superior performance in extracting context-rich semantic features. A soft-voting ensemble model combining autoencoder, LSTM, and transformer achieved the best overall performance, leading in accuracy and AUC, while the best single deep learning model (transformer) obtained a marginally higher F1 score, confirming the value of hybrid learning. Furthermore, analysis revealed that customer-oriented strategy disclosures were the most predictive among BSC dimensions. These findings highlight the value of integrating financial and narrative data using advanced NLP and artificial intelligence (AI) techniques to develop interpretable and robust corporate performance forecasting models. In addition, we operationalize information security narratives using a reproducible cybersecurity lexicon and derive security disclosure intensity and weight share features that are jointly evaluated with BSC-based strategic vectors. Full article

(This article belongs to the Special Issue Advances in Intelligent Information Processing)

► Show Figures

Figure 1

27 pages, 1843 KB

Open AccessArticle

AI-Driven Modeling of Near-Mid-Air Collisions Using Machine Learning and Natural Language Processing Techniques

by Dothang Truong

Aerospace 2026, 13(1), 80; https://doi.org/10.3390/aerospace13010080 - 12 Jan 2026

Viewed by 437

Abstract

As global airspace operations grow increasingly complex, the risk of near-mid-air collisions (NMACs) poses a persistent and critical challenge to aviation safety. Traditional collision-avoidance systems, while effective in many scenarios, are limited by rule-based logic and reliance on transponder data, particularly in environments [...] Read more.

As global airspace operations grow increasingly complex, the risk of near-mid-air collisions (NMACs) poses a persistent and critical challenge to aviation safety. Traditional collision-avoidance systems, while effective in many scenarios, are limited by rule-based logic and reliance on transponder data, particularly in environments featuring diverse aircraft types, unmanned aerial systems (UAS), and evolving urban air mobility platforms. This paper introduces a novel, integrative machine learning framework designed to analyze NMAC incidents using the rich, contextual information contained within the NASA Aviation Safety Reporting System (ASRS) database. The methodology is structured around three pillars: (1) natural language processing (NLP) techniques are applied to extract latent topics and semantic features from pilot and crew incident narratives; (2) cluster analysis is conducted on both textual and structured incident features to empirically define distinct typologies of NMAC events; and (3) supervised machine learning models are developed to predict pilot decision outcomes (evasive action vs. no action) based on integrated data sources. The analysis reveals seven operationally coherent topics that reflect communication demands, pattern geometry, visibility challenges, airspace transitions, and advisory-driven interactions. A four-cluster solution further distinguishes incident contexts ranging from tower-directed approaches to general aviation pattern and cruise operations. The Random Forest model produces the strongest predictive performance, with topic-based indicators, miss distance, altitude, and operating rule emerging as influential features. The results show that narrative semantics provide measurable signals of coordination load and acquisition difficulty, and that integrating text with structured variables enhances the prediction of maneuvering decisions in NMAC situations. These findings highlight opportunities to strengthen radio practice, manage pattern spacing, improve mixed equipage awareness, and refine alerting in short-range airport area encounters. Full article

(This article belongs to the Section Air Traffic and Transportation)

► Show Figures

Figure 1

16 pages, 64671 KB

Open AccessArticle

A Dual-UNet Diffusion Framework for Personalized Panoramic Generation

by Jing Shen, Leigang Huo, Chunlei Huo and Shiming Xiang

J. Imaging 2026, 12(1), 40; https://doi.org/10.3390/jimaging12010040 - 11 Jan 2026

Viewed by 341

Abstract

While text-to-image and customized generation methods demonstrate strong capabilities in single-image generation, they fall short in supporting immersive applications that require coherent 360° panoramas. Conversely, existing panorama generation models lack customization capabilities. In panoramic scenes, reference objects often appear as minor background elements [...] Read more.

While text-to-image and customized generation methods demonstrate strong capabilities in single-image generation, they fall short in supporting immersive applications that require coherent 360° panoramas. Conversely, existing panorama generation models lack customization capabilities. In panoramic scenes, reference objects often appear as minor background elements and may be multiple in number, while reference images across different views exhibit weak correlations. To address these challenges, we propose a diffusion-based framework for customized multi-view image generation. Our approach introduces a decoupled feature injection mechanism within a dual-UNet architecture to handle weakly correlated reference images, effectively integrating spatial information by concurrently feeding both reference images and noise into the denoising branch. A hybrid attention mechanism enables deep fusion of reference features and multi-view representations. Furthermore, a data augmentation strategy facilitates viewpoint-adaptive pose adjustments, and panoramic coordinates are employed to guide multi-view attention. The experimental results demonstrate our model’s effectiveness in generating coherent, high-quality customized multi-view images. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

16 pages, 1418 KB

Open AccessArticle

Sentiment Analysis of the Public’s Attitude Towards Emergency Infrastructure Projects: A Text Mining Study

by Caiyun Cui, Jinxu Fang, Yong Liu, Xiaowei Han, Qian Li and Yaming Li

Buildings 2026, 16(1), 6; https://doi.org/10.3390/buildings16010006 - 19 Dec 2025

Viewed by 502

Abstract

Considering the significant role that emergency infrastructure projects (EIPs) play globally in responding to emergency events, public sentiment towards EIPs has become an increasingly important factor to consider. However, limited studies have analysed the public’s sentiment specifically towards EIPs in emergency and urgent [...] Read more.

Considering the significant role that emergency infrastructure projects (EIPs) play globally in responding to emergency events, public sentiment towards EIPs has become an increasingly important factor to consider. However, limited studies have analysed the public’s sentiment specifically towards EIPs in emergency and urgent circumstances. This study analyses public sentiment characteristics by collecting objective big data from popular posts and comments related to EIPs on Sina Weibo. Sentiment information was extracted using text mining methods, and sentiment was measured using a long short-term memory (LSTM) model. Findings indicate that (1) Positive sentiment predominates in the data. (2) Public sentiment of temporary EIPs remains relatively stable, while long-term adaptive EIPs earn more pronounced sentiment fluctuation. (3) There are regional differences in public sentiment; Hebei, Shandong and Shanghai exhibit slightly lower stability with positive sentiment being slightly lower than or equal to neutral sentiment. The findings contribute to the literature by focusing innovatively on the public perspective of EIPs under urgent circumstances by exploring public sentiment characteristics and evolution and are of particular significance for related government departments and project managers in decision-making and construction management. Full article

(This article belongs to the Special Issue Urban Infrastructure and Resilient, Sustainable Buildings—2nd Edition)

► Show Figures

Figure 1

Search Results (345)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (345)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI