MDPI - Publisher of Open Access Journals

18 pages, 2586 KB

Open AccessArticle

A Comparative Study of X Data About the NHS Using Sentiment Analysis

by Saeed Ur Rehman, Obi Oluchi Blessing and Anwar Ali

Big Data Cogn. Comput. 2025, 9(10), 244; https://doi.org/10.3390/bdcc9100244 - 24 Sep 2025

Viewed by 1075

This study investigates sentiment analysis of X data about the National Health Service (NHS) during a politically charged period, using lexicon-based, machine learning, and deep learning approaches, as well as topic modelling and aspect-based sentiment analysis (ABSA). This study is distinct in its [...] Read more.

This study investigates sentiment analysis of X data about the National Health Service (NHS) during a politically charged period, using lexicon-based, machine learning, and deep learning approaches, as well as topic modelling and aspect-based sentiment analysis (ABSA). This study is distinct in its comparative evaluation of sentiment analysis techniques on NHS-related tweets during a politically sensitive period, offering insights into public opinion shaped by political discourse. A dataset of 35,000 tweets collected and analysed using various techniques, including VADER, TextBlob, Naive Bayes, Support Vector Machines, Logistic Regression, Ensemble Learning, and BERT. Unlike previous studies that focus on structured feedback or general sentiment, this research uniquely explores unstructured public discourse during an election period, capturing real-time political sentiment towards NHS policies. The sentiment distribution from lexicon-based methods depicted that the presence of stop words could affect model performance. While all models achieved high accuracy on the validation dataset, challenges such as class imbalance and limited labelled data impacted performance, with signs of overfitting observed. Topic modelling identified nine topic clusters, with “waiting list,” “service,” and “immigration” carrying negative sentiments. At the same time, words like “thank,” “support,” “care,” and “team” had the most positive sentiments, reflecting public delight in these areas. ABSA identified positive sentiments towards aspects like “useful service”. This study contributes a comparative framework for evaluating sentiment analysis techniques in politically contextualised healthcare discourse, offering insights for policymakers and researchers. The study underscores the importance of data quality in sentiment analysis. Future research should consider incorporating multilingual datasets, extending data collection periods, optimising deep learning models, and employing hybrid approaches to enhance performance. Full article

► Show Figures

Figure 1

22 pages, 4300 KB

Open AccessArticle

Optimised DNN-Based Agricultural Land Mapping Using Sentinel-2 and Landsat-8 with Google Earth Engine

by Nisha Sharma, Sartajvir Singh and Kawaljit Kaur

Land 2025, 14(8), 1578; https://doi.org/10.3390/land14081578 - 1 Aug 2025

Cited by 2 | Viewed by 2530

Abstract

Agriculture is the backbone of Punjab’s economy, and with much of India’s population dependent on agriculture, the requirement for accurate and timely monitoring of land has become even more crucial. Blending remote sensing with state-of-the-art machine learning algorithms enables the detailed classification of [...] Read more.

Agriculture is the backbone of Punjab’s economy, and with much of India’s population dependent on agriculture, the requirement for accurate and timely monitoring of land has become even more crucial. Blending remote sensing with state-of-the-art machine learning algorithms enables the detailed classification of agricultural lands through thematic mapping, which is critical for crop monitoring, land management, and sustainable development. Here, a Hyper-tuned Deep Neural Network (Hy-DNN) model was created and used for land use and land cover (LULC) classification into four classes: agricultural land, vegetation, water bodies, and built-up areas. The technique made use of multispectral data from Sentinel-2 and Landsat-8, processed on the Google Earth Engine (GEE) platform. To measure classification performance, Hy-DNN was contrasted with traditional classifiers—Convolutional Neural Network (CNN), Random Forest (RF), Classification and Regression Tree (CART), Minimum Distance Classifier (MDC), and Naive Bayes (NB)—using performance metrics including producer’s and consumer’s accuracy, Kappa coefficient, and overall accuracy. Hy-DNN performed the best, with overall accuracy being 97.60% using Sentinel-2 and 91.10% using Landsat-8, outperforming all base models. These results further highlight the superiority of the optimised Hy-DNN in agricultural land mapping and its potential use in crop health monitoring, disease diagnosis, and strategic agricultural planning. Full article

(This article belongs to the Special Issue Advances on Land Cover/Land Use Ontologies for Innovative Production/Utilization of Land Information)

► Show Figures

Figure 1

19 pages, 1774 KB

Open AccessEditor’s ChoiceArticle

Effective Machine Learning Techniques for Dealing with Poor Credit Data

by Dumisani Selby Nkambule, Bhekisipho Twala and Jan Harm Christiaan Pretorius

Risks 2024, 12(11), 172; https://doi.org/10.3390/risks12110172 - 30 Oct 2024

Cited by 5 | Viewed by 3292

Abstract

Credit risk is a crucial component of daily financial services operations; it measures the likelihood that a borrower will default on a loan, incurring an economic loss. By analysing historical data for assessment of the creditworthiness of a borrower, lenders can reduce credit [...] Read more.

Credit risk is a crucial component of daily financial services operations; it measures the likelihood that a borrower will default on a loan, incurring an economic loss. By analysing historical data for assessment of the creditworthiness of a borrower, lenders can reduce credit risk. Data are vital at the core of the credit decision-making processes. Decision-making depends heavily on accurate, complete data, and failure to harness high-quality data would impact credit lenders when assessing the loan applicants’ risk profiles. In this paper, an empirical comparison of the robustness of seven machine learning algorithms to credit risk, namely support vector machines (SVMs), naïve base, decision trees (DT), random forest (RF), gradient boosting (GB), K-nearest neighbour (K-NN), and logistic regression (LR), is carried out using the Lending Club credit data from Kaggle. This task uses seven performance measures, including the F1 Score (recall, accuracy, and precision), ROC-AUC, and HL and MCC metrics. Then, the harnessing of generative adversarial networks (GANs) simulation to enhance the robustness of the single machine learning classifiers for predicting credit risk is proposed. The results show that when GANs imputation is incorporated, the decision tree is the best-performing classifier with an accuracy rate of 93.01%, followed by random forest (92.92%), gradient boosting (92.33%), support vector machine (90.83%), logistic regression (90.76%), and naïve Bayes (89.29%), respectively. The classifier is the worst-performing method with a k-NN (88.68%) accuracy rate. Subsequently, when GANs are optimised, the accuracy rate of the naïve Bayes classifier improves significantly to (90%) accuracy rate. Additionally, the average error rate for these classifiers is over 9%, which implies that the estimates are not far from the actual values. In summary, most individual classifiers are more robust to missing data when GANs are used as an imputation technique. The differences in performance of all seven machine learning algorithms are significant at the 95% level. Full article

(This article belongs to the Special Issue Financial Analysis, Corporate Finance and Risk Management)

► Show Figures

Figure 1

21 pages, 2082 KB

Open AccessReview

The Many Roles of Precision in Action

by Jakub Limanowski, Rick A. Adams, James Kilner and Thomas Parr

Entropy 2024, 26(9), 790; https://doi.org/10.3390/e26090790 - 14 Sep 2024

Cited by 8 | Viewed by 7681

Abstract

Active inference describes (Bayes-optimal) behaviour as being motivated by the minimisation of surprise of one’s sensory observations, through the optimisation of a generative model (of the hidden causes of one’s sensory data) in the brain. One of active inference’s key appeals is its [...] Read more.

Active inference describes (Bayes-optimal) behaviour as being motivated by the minimisation of surprise of one’s sensory observations, through the optimisation of a generative model (of the hidden causes of one’s sensory data) in the brain. One of active inference’s key appeals is its conceptualisation of precision as biasing neuronal communication and, thus, inference within generative models. The importance of precision in perceptual inference is evident—many studies have demonstrated the importance of ensuring precision estimates are correct for normal (healthy) sensation and perception. Here, we highlight the many roles precision plays in action, i.e., the key processes that rely on adequate estimates of precision, from decision making and action planning to the initiation and control of muscle movement itself. Thereby, we focus on the recent development of hierarchical, “mixed” models—generative models spanning multiple levels of discrete and continuous inference. These kinds of models open up new perspectives on the unified description of hierarchical computation, and its implementation, in action. Here, we highlight how these models reflect the many roles of precision in action—from planning to execution—and the associated pathologies if precision estimation goes wrong. We also discuss the potential biological implementation of the associated message passing, focusing on the role of neuromodulatory systems in mediating different kinds of precision. Full article

(This article belongs to the Special Issue From Functional Imaging to Free Energy—Dedicated to Professor Karl Friston on the Occasion of His 65th Birthday)

► Show Figures

Figure 1

20 pages, 951 KB

Open AccessReview

Bayesian Networks for the Diagnosis and Prognosis of Diseases: A Scoping Review

by Kristina Polotskaya, Carlos S. Muñoz-Valencia, Alejandro Rabasa, Jose A. Quesada-Rico, Domingo Orozco-Beltrán and Xavier Barber

Mach. Learn. Knowl. Extr. 2024, 6(2), 1243-1262; https://doi.org/10.3390/make6020058 - 4 Jun 2024

Cited by 29 | Viewed by 17424

Abstract

Bayesian networks (BNs) are probabilistic graphical models that leverage Bayes’ theorem to portray dependencies and cause-and-effect relationships between variables. These networks have gained prominence in the field of health sciences, particularly in diagnostic processes, by allowing the integration of medical knowledge into models [...] Read more.

Bayesian networks (BNs) are probabilistic graphical models that leverage Bayes’ theorem to portray dependencies and cause-and-effect relationships between variables. These networks have gained prominence in the field of health sciences, particularly in diagnostic processes, by allowing the integration of medical knowledge into models and addressing uncertainty in a probabilistic manner. Objectives: This review aims to provide an exhaustive overview of the current state of Bayesian networks in disease diagnosis and prognosis. Additionally, it seeks to introduce readers to the fundamental methodology of BNs, emphasising their versatility and applicability across varied medical domains. Employing a meticulous search strategy with MeSH descriptors in diverse scientific databases, we identified 190 relevant references. These were subjected to a rigorous analysis, resulting in the retention of 60 papers for in-depth review. The robustness of our approach minimised the risk of selection bias. Results: The selected studies encompass a wide range of medical areas, providing insights into the statistical methodology, implementation feasibility, and predictive accuracy of BNs, as evidenced by an average area under the curve (AUC) exceeding 75%. The comprehensive analysis underscores the adaptability and efficacy of Bayesian networks in diverse clinical scenarios. The majority of the examined studies demonstrate the potential of BNs as reliable adjuncts to clinical decision-making. The findings of this review affirm the role of Bayesian networks as accessible and versatile artificial intelligence tools in healthcare. They offer a viable solution to address complex medical challenges, facilitating timely and informed decision-making under conditions of uncertainty. The extensive exploration of Bayesian networks presented in this review highlights their significance and growing impact in the realm of disease diagnosis and prognosis. It underscores the need for further research and development to optimise their capabilities and broaden their applicability in addressing diverse and intricate healthcare challenges. Full article

(This article belongs to the Collection Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction)

► Show Figures

Figure 1

27 pages, 2559 KB

Open AccessArticle

Enhancement of Classifier Performance with Adam and RanAdam Hyper-Parameter Tuning for Lung Cancer Detection from Microarray Data—In Pursuit of Precision

by Karthika M S, Harikumar Rajaguru and Ajin R. Nair

Bioengineering 2024, 11(4), 314; https://doi.org/10.3390/bioengineering11040314 - 26 Mar 2024

Cited by 3 | Viewed by 1929

Abstract

Microarray gene expression analysis is a powerful technique used in cancer classification and research to identify and understand gene expression patterns that can differentiate between different cancer types, subtypes, and stages. However, microarray databases are highly redundant, inherently nonlinear, and noisy. Therefore, extracting [...] Read more.

Microarray gene expression analysis is a powerful technique used in cancer classification and research to identify and understand gene expression patterns that can differentiate between different cancer types, subtypes, and stages. However, microarray databases are highly redundant, inherently nonlinear, and noisy. Therefore, extracting meaningful information from such a huge database is a challenging one. The paper adopts the Fast Fourier Transform (FFT) and Mixture Model (MM) for dimensionality reduction and utilises the Dragonfly optimisation algorithm as the feature selection technique. The classifiers employed in this research are Nonlinear Regression, Naïve Bayes, Decision Tree, Random Forest and SVM (RBF). The classifiers’ performances are analysed with and without feature selection methods. Finally, Adaptive Moment Estimation (Adam) and Random Adaptive Moment Estimation (RanAdam) hyper-parameter tuning techniques are used as improvisation techniques for classifiers. The SVM (RBF) classifier with the Fast Fourier Transform Dimensionality Reduction method and Dragonfly feature selection achieved the highest accuracy of 98.343% with RanAdam hyper-parameter tuning compared to other classifiers. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Graphical abstract

23 pages, 5003 KB

Open AccessArticle

Active Data Selection and Information Seeking

by Thomas Parr, Karl Friston and Peter Zeidman

Algorithms 2024, 17(3), 118; https://doi.org/10.3390/a17030118 - 12 Mar 2024

Cited by 3 | Viewed by 4942

Abstract

Bayesian inference typically focuses upon two issues. The first is estimating the parameters of some model from data, and the second is quantifying the evidence for alternative hypotheses—formulated as alternative models. This paper focuses upon a third issue. Our interest is in the [...] Read more.

Bayesian inference typically focuses upon two issues. The first is estimating the parameters of some model from data, and the second is quantifying the evidence for alternative hypotheses—formulated as alternative models. This paper focuses upon a third issue. Our interest is in the selection of data—either through sampling subsets of data from a large dataset or through optimising experimental design—based upon the models we have of how those data are generated. Optimising data-selection ensures we can achieve good inference with fewer data, saving on computational and experimental costs. This paper aims to unpack the principles of active sampling of data by drawing from neurobiological research on animal exploration and from the theory of optimal experimental design. We offer an overview of the salient points from these fields and illustrate their application in simple toy examples, ranging from function approximation with basis sets to inference about processes that evolve over time. Finally, we consider how this approach to data selection could be applied to the design of (Bayes-adaptive) clinical trials. Full article

(This article belongs to the Special Issue Bayesian Networks and Causal Reasoning)

► Show Figures

Figure 1

16 pages, 5487 KB

Open AccessArticle

Rapid Forecasting of Cyber Events Using Machine Learning-Enabled Features

by Yussuf Ahmed, Muhammad Ajmal Azad and Taufiq Asyhari

Information 2024, 15(1), 36; https://doi.org/10.3390/info15010036 - 11 Jan 2024

Cited by 39 | Viewed by 6382

Abstract

In recent years, there has been a notable surge in both the complexity and volume of targeted cyber attacks, largely due to heightened vulnerabilities in widely adopted technologies. The Prediction and detection of early attacks are vital to mitigating potential risks from cyber [...] Read more.

In recent years, there has been a notable surge in both the complexity and volume of targeted cyber attacks, largely due to heightened vulnerabilities in widely adopted technologies. The Prediction and detection of early attacks are vital to mitigating potential risks from cyber attacks and network resilience. With the rapid increase of digital data and the increasing complexity of cyber attacks, big data has become a crucial tool for intrusion detection and forecasting. By leveraging the capabilities of unstructured big data, intrusion detection and forecasting systems can become more effective in detecting and preventing cyber attacks and anomalies. While some progress has been made on attack prediction, little attention has been given to forecasting cyber events based on time series and unstructured big data. In this research, we used the CSE-CIC-IDS2018 dataset, a comprehensive dataset containing several attacks on a realistic network. Then we used time-series forecasting techniques to construct time-series models with tuned parameters to assess the effectiveness of these techniques, which include Sequential Minimal Optimisation for regression (SMOreg), linear regression and Long Short-Term Memory (LSTM) to forecast the cyber events. We used machine learning algorithms such as Naive Bayes and random forest to evaluate the performance of the models. The best performance results of 90.4% were achieved with Support Vector Machine (SVM) and random forest. Additionally, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics were used to evaluate forecasted event performance. SMOreg’s forecasted events yielded the lowest MAE, while those from linear regression exhibited the lowest RMSE. This work is anticipated to contribute to effective cyber threat detection, aiming to reduce security breaches within critical infrastructure. Full article

(This article belongs to the Special Issue Emerging Research on Neural Networks and Anomaly Detection)

► Show Figures

Figure 1

20 pages, 10895 KB

Open AccessArticle

In-Situ GNSS-R and Radiometer Fusion Soil Moisture Retrieval Model Based on LSTM

by Tianlong Zhang, Lei Yang, Hongtao Nan, Cong Yin, Bo Sun, Dongkai Yang, Xuebao Hong and Ernesto Lopez-Baeza

Remote Sens. 2023, 15(10), 2693; https://doi.org/10.3390/rs15102693 - 22 May 2023

Cited by 3 | Viewed by 3177

Abstract

Global navigation satellite system reflectometry (GNSS-R) is a remote sensing technology of soil moisture measurement using signals of opportunity from GNSS, which has the advantages of low cost, all-weather detection, and multi-platform application. An in situ GNSS-R and radiometer fusion soil moisture retrieval [...] Read more.

Global navigation satellite system reflectometry (GNSS-R) is a remote sensing technology of soil moisture measurement using signals of opportunity from GNSS, which has the advantages of low cost, all-weather detection, and multi-platform application. An in situ GNSS-R and radiometer fusion soil moisture retrieval model based on LSTM (long–short term memory) is proposed to improve accuracy and robustness as to the impacts of vegetation cover and soil surface roughness. The Oceanpal GNSS-R data obtained from the experimental campaign at the Valencia Anchor Station are used as the main input data, and the TB (brightness temperature) and TR (soil roughness and vegetation integrated attenuation coefficient) outputs of the ELBARA-II radiometer are used as auxiliary input data, while field measurements with a Delta-T ML2x ThetaProbe soil moisture sensor were used for reference and validation. The results show that the LSTM model can be used to retrieve soil moisture, and that it performs better in the data fusion scenario with GNSS-R and radiometer. The STD of the multi-satellite fusion model is 0.013. Among the single-satellite models, PRN13, 20, and 32 gave the best retrieval results with STD = 0.011, 0.012, and 0.007, respectively. Full article

(This article belongs to the Special Issue Earth Observation in Support of Sustainable Water Resources Management)

► Show Figures

Graphical abstract

10 pages, 821 KB

Open AccessArticle

Decision Strategies for Absorbance Readings from an Enzyme-Linked Immunosorbent Assay—A Case Study about Testing Genotypes of Sugar Beet (Beta vulgaris L.) for Resistance against Beet Necrotic Yellow Vein Virus (BNYVV)

by Thomas M. Lange, Martin Wutke, Lisa Bertram, Harald Keunecke, Friedrich Kopisch-Obuch and Armin O. Schmitt

Agriculture 2021, 11(10), 956; https://doi.org/10.3390/agriculture11100956 - 2 Oct 2021

Cited by 4 | Viewed by 2593

Abstract

The Beet necrotic yellow vein virus (BNYVV) causes rhizomania in sugar beet (Beta vulgaris L.), which is one of the most destructive diseases in sugar beet worldwide. In breeding projects towards resistance against BNYVV, the enzyme-linked immunosorbent assay (ELISA) is used to [...] Read more.

The Beet necrotic yellow vein virus (BNYVV) causes rhizomania in sugar beet (Beta vulgaris L.), which is one of the most destructive diseases in sugar beet worldwide. In breeding projects towards resistance against BNYVV, the enzyme-linked immunosorbent assay (ELISA) is used to determine the virus concentration in plant roots and, thus, the resistance levels of genotypes. Here, we present a simulation study to generate 10,000 small samples from the estimated density functions of ELISA values from susceptible and resistant sugar beet genotypes. We apply receiver operating characteristic (ROC) analysis to these samples to optimise the cutoff values for sample sizes from two to eight and determine the false positive rates (FPR), true positive rates (TPR), and area under the curve (AUC). We present, furthermore, an alternative approach based upon Bayes factors to improve the decision procedure. The Bayesian approach has proven to be superior to the simple cutoff approach. The presented results could help evaluate or improve existing breeding programs and help design future selection procedures based upon ELISA. An R-script for the classification of sample data based upon Bayes factors is provided. Full article

(This article belongs to the Section Crop Genetics, Genomics and Breeding)

► Show Figures

Figure 1

19 pages, 1586 KB

Open AccessArticle

Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers

by Wenxiu Xie, Meng Ji, Riliu Huang, Tianyong Hao and Chi-Yin Chow

Int. J. Environ. Res. Public Health 2021, 18(16), 8789; https://doi.org/10.3390/ijerph18168789 - 20 Aug 2021

Cited by 5 | Viewed by 3912

Abstract

We aimed to develop machine learning classifiers as a risk-prevention mechanism to help medical professionals with little or no knowledge of the patient’s languages in order to predict the likelihood of clinically significant mistakes or incomprehensible MT outputs based on the features of [...] Read more.

We aimed to develop machine learning classifiers as a risk-prevention mechanism to help medical professionals with little or no knowledge of the patient’s languages in order to predict the likelihood of clinically significant mistakes or incomprehensible MT outputs based on the features of English source information as input to the MT systems. A MNB classifier was developed to provide intuitive probabilistic predictions of erroneous health translation outputs based on the computational modelling of a small number of optimised features of the original English source texts. The best performing multinominal Naïve Bayes classifier (MNB) using a small number of optimised features (8) achieved statistically higher AUC (M = 0.760, SD = 0.03) than the classifier using high-dimension natural features (135) (M = 0.631, SD = 0.006, p < 0.0001, SE = 0.004) and the automatically optimised classifier (22) (M = 0.7231, SD = 0.0084, p < 0.0001, SE = 0.004). Furthermore, MNB (8) had statistically higher sensitivity (M = 0.885, SD = 0.100) compared with the full-feature classifier (135) (M = 0.577, SD = 0.155, p < 0.0001, SE = 0.005) and the automatically optimised classifier (22) (M = 0.731, SD = 0.139, p < 0.0001, SE = 0.0023). Finally, MNB (8) reached statistically higher specificity (M = 0.667, SD = 0.138) compared to the full-feature classifier (135) (M = 0.567, SD = 0.139, p = 0.0002, SE = 0.026) and the automatically optimised classifier (22) (M = 0.633, SD = 0.141, p = 0.0133, SE = 0.026). Full article

(This article belongs to the Special Issue Health Humanities: Social Determinants of Access to Healthcare in Migrant and Minority Populations)

► Show Figures

Figure 1

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI