MDPI - Publisher of Open Access Journals

24 pages, 2338 KiB

Open AccessArticle

XDecompo: Explainable Decomposition Approach in Convolutional Neural Networks for Tumour Image Classification

by Asmaa Abbas, Mohamed Medhat Gaber and Mohammed M. Abdelsamea

Sensors 2022, 22(24), 9875; https://doi.org/10.3390/s22249875 - 15 Dec 2022

Cited by 6 | Viewed by 2993

Of the various tumour types, colorectal cancer and brain tumours are still considered among the most serious and deadly diseases in the world. Therefore, many researchers are interested in improving the accuracy and reliability of diagnostic medical machine learning models. In computer-aided diagnosis, [...] Read more.

Of the various tumour types, colorectal cancer and brain tumours are still considered among the most serious and deadly diseases in the world. Therefore, many researchers are interested in improving the accuracy and reliability of diagnostic medical machine learning models. In computer-aided diagnosis, self-supervised learning has been proven to be an effective solution when dealing with datasets with insufficient data annotations. However, medical image datasets often suffer from data irregularities, making the recognition task even more challenging. The class decomposition approach has provided a robust solution to such a challenging problem by simplifying the learning of class boundaries of a dataset. In this paper, we propose a robust self-supervised model, called XDecompo, to improve the transferability of features from the pretext task to the downstream task. XDecompo has been designed based on an affinity propagation-based class decomposition to effectively encourage learning of the class boundaries in the downstream task. XDecompo has an explainable component to highlight important pixels that contribute to classification and explain the effect of class decomposition on improving the speciality of extracted features. We also explore the generalisability of XDecompo in handling different medical datasets, such as histopathology for colorectal cancer and brain tumour images. The quantitative results demonstrate the robustness of XDecompo with high accuracy of 96.16% and 94.30% for CRC and brain tumour images, respectively. XDecompo has demonstrated its generalization capability and achieved high classification accuracy (both quantitatively and qualitatively) in different medical image datasets, compared with other models. Moreover, a post hoc explainable method has been used to validate the feature transferability, demonstrating highly accurate feature representations. Full article

(This article belongs to the Collection Medical Applications of Sensor Systems and Devices)

► Show Figures

Figure 1

26 pages, 9087 KiB

Open AccessArticle

AHA-AO: Artificial Hummingbird Algorithm with Aquila Optimization for Efficient Feature Selection in Medical Image Classification

by Mohamed Abd Elaziz, Abdelghani Dahou, Shaker El-Sappagh, Alhassan Mabrouk and Mohamed Medhat Gaber

Appl. Sci. 2022, 12(19), 9710; https://doi.org/10.3390/app12199710 - 27 Sep 2022

Cited by 21 | Viewed by 3314

Abstract

This paper presents a system for medical image diagnosis that uses transfer learning (TL) and feature selection techniques. The main aim of TL on pre-trained models such as MobileNetV3 is to extract features from raw images. Here, a novel feature selection optimization algorithm [...] Read more.

This paper presents a system for medical image diagnosis that uses transfer learning (TL) and feature selection techniques. The main aim of TL on pre-trained models such as MobileNetV3 is to extract features from raw images. Here, a novel feature selection optimization algorithm called the Artificial Hummingbird Algorithm based on Aquila Optimization (AHA-AO) is proposed. The AHA-AO is used to select only the most relevant features and ensure the improvement of the overall model classification. Our methodology was evaluated using four datasets, namely, ISIC-2016, PH2, Chest-XRay, and Blood-Cell. We compared the proposed feature selection algorithm with five of the most popular feature selection optimization algorithms. We obtained an accuracy of 87.30% for the ISIC-2016 dataset, 97.50% for the PH2 dataset, 86.90% for the Chest-XRay dataset, and 88.60% for the Blood-cell dataset. The AHA-AO outperformed the other optimization techniques. Moreover, the developed AHA-AO was faster than the other feature selection models during the process of determining the relevant features. The proposed feature selection algorithm successfully improved the performance and the speed of the overall deep learning models. Full article

(This article belongs to the Special Issue The Applications of Machine Learning in Biomedical Science)

► Show Figures

Figure 1

16 pages, 1317 KiB

Open AccessData Descriptor

TED-S: Twitter Event Data in Sports and Politics with Aggregated Sentiments

by Hansi Hettiarachchi, Doaa Al-Turkey, Mariam Adedoyin-Olowe, Jagdev Bhogal and Mohamed Medhat Gaber

Data 2022, 7(7), 90; https://doi.org/10.3390/data7070090 - 30 Jun 2022

Cited by 3 | Viewed by 4108

Abstract

Even though social media contain rich information on events and public opinions, it is impractical to manually filter this information due to data’s vast generation and dynamicity. Thus, automated extraction mechanisms are invaluable to the community. We need real data with ground truth [...] Read more.

Even though social media contain rich information on events and public opinions, it is impractical to manually filter this information due to data’s vast generation and dynamicity. Thus, automated extraction mechanisms are invaluable to the community. We need real data with ground truth labels to build/evaluate such systems. Still, to the best of our knowledge, no available social media dataset covers continuous periods with event and sentiment labels together except for events or sentiments. Datasets without time gaps are huge due to high data generation and require extensive effort for manual labelling. Different approaches, ranging from unsupervised to supervised, have been proposed by previous research targeting such datasets. However, their generic nature mainly fails to capture event-specific sentiment expressions, making them inappropriate for labelling event sentiments. Filling this gap, we propose a novel data annotation approach in this paper involving several neural networks. Our approach outperforms the commonly used sentiment annotation models such as VADER and TextBlob. Also, it generates probability values for all sentiment categories besides providing a single category per tweet, supporting aggregated sentiment analyses. Using this approach, we annotate and release a dataset named TED-S, covering two diverse domains, sports and politics. TED-S has complete subsets of Twitter data streams with both sub-event and sentiment labels, providing the ability to support event sentiment-based research. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

31 pages, 1601 KiB

Open AccessArticle

PGraphD*: Methods for Drift Detection and Localisation Using Deep Learning Modelling of Business Processes

by Khadijah Muzzammil Hanga, Yevgeniya Kovalchuk and Mohamed Medhat Gaber

Entropy 2022, 24(7), 910; https://doi.org/10.3390/e24070910 - 30 Jun 2022

Cited by 1 | Viewed by 2429

Abstract

This paper presents a set of methods, jointly called PGraphD*, which includes two new methods (PGraphDD-QM and PGraphDD-SS) for drift detection and one new method (PGraphDL) for drift localisation in business processes. The methods are based on deep learning and graphs, with PGraphDD-QM [...] Read more.

This paper presents a set of methods, jointly called PGraphD*, which includes two new methods (PGraphDD-QM and PGraphDD-SS) for drift detection and one new method (PGraphDL) for drift localisation in business processes. The methods are based on deep learning and graphs, with PGraphDD-QM and PGraphDD-SS employing a quality metric and a similarity score for detecting drifts, respectively. According to experimental results, PGraphDD-SS outperforms PGraphDD-QM in drift detection, achieving an accuracy score of 100% over the majority of synthetic logs and an accuracy score of 80% over a complex real-life log. Furthermore, PGraphDD-SS detects drifts with delays that are 59% shorter on average compared to the best performing state-of-the-art method. Full article

► Show Figures

Figure 1

17 pages, 659 KiB

Open AccessArticle

Vec2Dynamics: A Temporal Word Embedding Approach to Exploring the Dynamics of Scientific Keywords—Machine Learning as a Case Study

by Amna Dridi, Mohamed Medhat Gaber, Raja Muhammad Atif Azad and Jagdev Bhogal

Big Data Cogn. Comput. 2022, 6(1), 21; https://doi.org/10.3390/bdcc6010021 - 21 Feb 2022

Cited by 2 | Viewed by 5284

Abstract

The study of the dynamics or the progress of science has been widely explored with descriptive and statistical analyses. Also this study has attracted several computational approaches that are labelled together as the Computational History of Science, especially with the rise of data [...] Read more.

The study of the dynamics or the progress of science has been widely explored with descriptive and statistical analyses. Also this study has attracted several computational approaches that are labelled together as the Computational History of Science, especially with the rise of data science and the development of increasingly powerful computers. Among these approaches, some works have studied dynamism in scientific literature by employing text analysis techniques that rely on topic models to study the dynamics of research topics. Unlike topic models that do not delve deeper into the content of scientific publications, for the first time, this paper uses temporal word embeddings to automatically track the dynamics of scientific keywords over time. To this end, we propose Vec2Dynamics, a neural-based computational history approach that reports stability of k-nearest neighbors of scientific keywords over time; the stability indicates whether the keywords are taking new neighborhood due to evolution of scientific literature. To evaluate how Vec2Dynamics models such relationships in the domain of Machine Learning (ML), we constructed scientific corpora from the papers published in the Neural Information Processing Systems (NIPS; actually abbreviated NeurIPS) conference between 1987 and 2016. The descriptive analysis that we performed in this paper verify the efficacy of our proposed approach. In fact, we found a generally strong consistency between the obtained results and the Machine Learning timeline. Full article

(This article belongs to the Special Issue Machine Learning for Dependable Edge Computing Systems and Services)

► Show Figures

Figure 1

18 pages, 1533 KiB

Open AccessArticle

A Time-Series Self-Supervised Learning Approach to Detection of Cyber-physical Attacks in Water Distribution Systems

by Haitham Mahmoud, Wenyan Wu and Mohamed Medhat Gaber

Energies 2022, 15(3), 914; https://doi.org/10.3390/en15030914 - 27 Jan 2022

Cited by 23 | Viewed by 3384

Abstract

Water Distribution System (WDS) threats have significantly grown following the Maroochy shire incident, as evidenced by proofed attacks on water premises. As a result, in addition to traditional solutions (e.g., data encryption and authentication), attack detection is being proposed in WDS to reduce [...] Read more.

Water Distribution System (WDS) threats have significantly grown following the Maroochy shire incident, as evidenced by proofed attacks on water premises. As a result, in addition to traditional solutions (e.g., data encryption and authentication), attack detection is being proposed in WDS to reduce disruption cases. The attack detection system must meet two critical requirements: high accuracy and near real-time detection. This drives us to propose a two-stage detection system that uses self-supervised and unsupervised algorithms to detect Cyber-Physical (CP) attacks. Stage 1 uses heuristic adaptive self-supervised algorithms to achieve near real-time decision-making and detection sensitivity of 66% utilizing Boss. Stage 2 attempts to validate the detection of attacks using an unsupervised algorithm to maintain a detection accuracy of 94% utilizing Isolation Forest. Both stages are examined against time granularity and are empirically analyzed against a variety of performance evaluation indicators. Our findings demonstrate that the algorithms in stage 1 are less favored than those in the literature, but their existence enables near real-time decision-making and detection reliability. In stage 2, the isolation Forest algorithm, in contrast, gives excellent accuracy. As a result, both stages can collaborate to maximize accuracy in a near real-time attack detection system. Full article

(This article belongs to the Special Issue Frontier 2021: Internet of Things Technology for Smart Water Application)

► Show Figures

Figure 1

19 pages, 1792 KiB

Open AccessArticle

Predicting the Economic Impact of the COVID-19 Pandemic in the United Kingdom Using Time-Series Mining

by Ahmed Rakha, Hansi Hettiarachchi, Dina Rady, Mohamed Medhat Gaber, Emad Rakha and Mohammed M. Abdelsamea

Economies 2021, 9(4), 137; https://doi.org/10.3390/economies9040137 - 27 Sep 2021

Cited by 17 | Viewed by 8562

Abstract

The COVID-19 pandemic has brought economic activity to a near standstill as many countries imposed very strict restrictions on movement to halt the spread of the virus. This study aims at assessing the economic impacts of COVID-19 in the United Kingdom (UK) using [...] Read more.

The COVID-19 pandemic has brought economic activity to a near standstill as many countries imposed very strict restrictions on movement to halt the spread of the virus. This study aims at assessing the economic impacts of COVID-19 in the United Kingdom (UK) using artificial intelligence (AI) and data from previous economic crises to predict future economic impacts. The macroeconomic indicators, gross domestic products (GDP) and GDP growth, and data on the performance of three primary industries in the UK (the construction, production and service industries) were analysed using a comparison with the pattern of previous economic crises. In this research, we experimented with the effectiveness of both continuous and categorical time-series forecasting on predicting future values to generate more accurate and useful results in the economic domain. Continuous value predictions indicate that GDP growth in 2021 will remain steady, but at around −8.5% contraction, compared to the baseline figures before the pandemic. Further, the categorical predictions indicate that there will be no quarterly drop in GDP following the first quarter of 2021. This study provided evidence-based data on the economic effects of COVID-19 that can be used to plan necessary recovery procedures and to take appropriate actions to support the economy. Full article

(This article belongs to the Special Issue The Economics of Health Outbreaks and Epidemics)

► Show Figures

Figure 1

21 pages, 6872 KiB

Open AccessArticle

3E-Net: Entropy-Based Elastic Ensemble of Deep Convolutional Neural Networks for Grading of Invasive Breast Carcinoma Histopathological Microscopic Images

by Zakaria Senousy, Mohammed M. Abdelsamea, Mona Mostafa Mohamed and Mohamed Medhat Gaber

Entropy 2021, 23(5), 620; https://doi.org/10.3390/e23050620 - 16 May 2021

Cited by 27 | Viewed by 4581

Abstract

Automated grading systems using deep convolution neural networks (DCNNs) have proven their capability and potential to distinguish between different breast cancer grades using digitized histopathological images. In digital breast pathology, it is vital to measure how confident a DCNN is in grading using [...] Read more.

Automated grading systems using deep convolution neural networks (DCNNs) have proven their capability and potential to distinguish between different breast cancer grades using digitized histopathological images. In digital breast pathology, it is vital to measure how confident a DCNN is in grading using a machine-confidence metric, especially with the presence of major computer vision challenging problems such as the high visual variability of the images. Such a quantitative metric can be employed not only to improve the robustness of automated systems, but also to assist medical professionals in identifying complex cases. In this paper, we propose Entropy-based Elastic Ensemble of DCNN models (3E-Net) for grading invasive breast carcinoma microscopy images which provides an initial stage of explainability (using an uncertainty-aware mechanism adopting entropy). Our proposed model has been designed in a way to (1) exclude images that are less sensitive and highly uncertain to our ensemble model and (2) dynamically grade the non-excluded images using the certain models in the ensemble architecture. We evaluated two variations of 3E-Net on an invasive breast carcinoma dataset and we achieved grading accuracy of 96.15% and 99.50%. Full article

(This article belongs to the Special Issue Medical Information Processing)

► Show Figures

Figure 1

17 pages, 1781 KiB

Open AccessArticle

gbt-HIPS: Explaining the Classifications of Gradient Boosted Tree Ensembles

by Julian Hatwell, Mohamed Medhat Gaber and R. Muhammad Atif Azad

Appl. Sci. 2021, 11(6), 2511; https://doi.org/10.3390/app11062511 - 11 Mar 2021

Cited by 8 | Viewed by 3641

Abstract

This research presents Gradient Boosted Tree High Importance Path Snippets (gbt-HIPS), a novel, heuristic method for explaining gradient boosted tree (GBT) classification models by extracting a single classification rule (CR) from the ensemble of decision trees that make up the GBT model. This [...] Read more.

This research presents Gradient Boosted Tree High Importance Path Snippets (gbt-HIPS), a novel, heuristic method for explaining gradient boosted tree (GBT) classification models by extracting a single classification rule (CR) from the ensemble of decision trees that make up the GBT model. This CR contains the most statistically important boundary values of the input space as antecedent terms. The CR represents a hyper-rectangle of the input space inside which the GBT model is, very reliably, classifying all instances with the same class label as the explanandum instance. In a benchmark test using nine data sets and five competing state-of-the-art methods, gbt-HIPS offered the best trade-off between coverage (0.16–0.75) and precision (0.85–0.98). Unlike competing methods, gbt-HIPS is also demonstrably guarded against under- and over-fitting. A further distinguishing feature of our method is that, unlike much prior work, our explanations also provide counterfactual detail in accordance with widely accepted recommendations for what makes a good explanation. Full article

(This article belongs to the Special Issue Explainable Artificial Intelligence (XAI))

► Show Figures

Figure 1

26 pages, 1546 KiB

Open AccessArticle

A Frequent Pattern Conjunction Heuristic for Rule Generation in Data Streams

by Frederic Stahl, Thien Le, Atta Badii and Mohamed Medhat Gaber

Information 2021, 12(1), 24; https://doi.org/10.3390/info12010024 - 9 Jan 2021

Cited by 3 | Viewed by 3707

Abstract

This paper introduces a new and expressive algorithm for inducing descriptive rule-sets from streaming data in real-time in order to describe frequent patterns explicitly encoded in the stream. Data Stream Mining (DSM) is concerned with the automatic analysis of data streams in real-time. [...] Read more.

This paper introduces a new and expressive algorithm for inducing descriptive rule-sets from streaming data in real-time in order to describe frequent patterns explicitly encoded in the stream. Data Stream Mining (DSM) is concerned with the automatic analysis of data streams in real-time. Rapid flows of data challenge the state-of-the art processing and communication infrastructure, hence the motivation for research and innovation into real-time algorithms that analyse data streams on-the-fly and can automatically adapt to concept drifts. To date, DSM techniques have largely focused on predictive data mining applications that aim to forecast the value of a particular target feature of unseen data instances, answering questions such as whether a credit card transaction is fraudulent or not. A real-time, expressive and descriptive Data Mining technique for streaming data has not been previously established as part of the DSM toolkit. This has motivated the work reported in this paper, which has resulted in developing and validating a Generalised Rule Induction (GRI) tool, thus producing expressive rules as explanations that can be easily understood by human analysts. The expressiveness of decision models in data streams serves the objectives of transparency, underpinning the vision of ‘explainable AI’ and yet is an area of research that has attracted less attention despite being of high practical importance. The algorithm introduced and described in this paper is termed Fast Generalised Rule Induction (FGRI). FGRI is able to induce descriptive rules incrementally for raw data from both categorical and numerical features. FGRI is able to adapt rule-sets to changes of the pattern encoded in the data stream (concept drift) on the fly as new data arrives and can thus be applied continuously in real-time. The paper also provides a theoretical, qualitative and empirical evaluation of FGRI. Full article

(This article belongs to the Special Issue Data Mining & Machine Learning Techniques for the Analysis of Stream Data)

► Show Figures

Figure 1

14 pages, 315 KiB

Open AccessEditor’s ChoiceArticle

eGAP: An Evolutionary Game Theoretic Approach to Random Forest Pruning

by Khaled Fawagreh and Mohamed Medhat Gaber

Big Data Cogn. Comput. 2020, 4(4), 37; https://doi.org/10.3390/bdcc4040037 - 28 Nov 2020

Cited by 5 | Viewed by 5298

Abstract

To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and [...] Read more.

To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and are being developed to provide, using mobile and electronic technology, higher diagnosis quality of the diseases, better treatment of the patients, and improved quality of lives. Since smart healthcare applications that are mainly concerned with the prediction of healthcare data (like diseases for example) rely on predictive healthcare data analytics, it is imperative for such predictive healthcare data analytics to be as accurate as possible. In this paper, we will exploit supervised machine learning methods in classification and regression to improve the performance of the traditional Random Forest on healthcare datasets, both in terms of accuracy and classification/regression speed, in order to produce an effective and efficient smart healthcare application, which we have termed eGAP. eGAP uses the evolutionary game theoretic approach replicator dynamics to evolve a Random Forest ensemble. Trees of high resemblance in an initial Random Forest are clustered, and then clusters grow and shrink by adding and removing trees using replicator dynamics, according to the predictive accuracy of each subforest represented by a cluster of trees. All clusters have an initial number of trees that is equal to the number of trees in the smallest cluster. Cluster growth is performed using trees that are not initially sampled. The speed and accuracy of the proposed method have been demonstrated by an experimental study on 10 classification and 10 regression medical datasets. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)

► Show Figures

Figure 1

17 pages, 2944 KiB

Open AccessArticle

Edge Machine Learning: Enabling Smart Internet of Things Applications

by Mahmut Taha Yazici, Shadi Basurra and Mohamed Medhat Gaber

Big Data Cogn. Comput. 2018, 2(3), 26; https://doi.org/10.3390/bdcc2030026 - 3 Sep 2018

Cited by 93 | Viewed by 12594

Abstract

Machine learning has traditionally been solely performed on servers and high-performance machines. However, advances in chip technology have given us miniature libraries that fit in our pockets and mobile processors have vastly increased in capability narrowing the vast gap between the simple processors [...] Read more.

Machine learning has traditionally been solely performed on servers and high-performance machines. However, advances in chip technology have given us miniature libraries that fit in our pockets and mobile processors have vastly increased in capability narrowing the vast gap between the simple processors embedded in such things and their more complex cousins in personal computers. Thus, with the current advancement in these devices, in terms of processing power, energy storage and memory capacity, the opportunity has arisen to extract great value in having on-device machine learning for Internet of Things (IoT) devices. Implementing machine learning inference on edge devices has huge potential and is still in its early stages. However, it is already more powerful than most realise. In this paper, a step forward has been taken to understand the feasibility of running machine learning algorithms, both training and inference, on a Raspberry Pi, an embedded version of the Android operating system designed for IoT device development. Three different algorithms: Random Forests, Support Vector Machine (SVM) and Multi-Layer Perceptron, respectively, have been tested using ten diverse data sets on the Raspberry Pi to profile their performance in terms of speed (training and inference), accuracy, and power consumption. As a result of the conducted tests, the SVM algorithm proved to be slightly faster in inference and more efficient in power consumption, but the Random Forest algorithm exhibited the highest accuracy. In addition to the performance results, we will discuss their usability scenarios and the idea of implementing more complex and taxing algorithms such as Deep Learning on these small devices in more details. Full article

(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2018)

► Show Figures

Figure 1

22 pages, 2140 KiB

Open AccessArticle

RedEdge: A Novel Architecture for Big Data Processing in Mobile Edge Computing Environments

by Muhammad Habib ur Rehman, Prem Prakash Jayaraman, Saif Ur Rehman Malik, Atta Ur Rehman Khan and Mohamed Medhat Gaber

J. Sens. Actuator Netw. 2017, 6(3), 17; https://doi.org/10.3390/jsan6030017 - 15 Aug 2017

Cited by 43 | Viewed by 14061

Abstract

We are witnessing the emergence of new big data processing architectures due to the convergence of the Internet of Things (IoTs), edge computing and cloud computing. Existing big data processing architectures are underpinned by the transfer of raw data streams to the cloud [...] Read more.

We are witnessing the emergence of new big data processing architectures due to the convergence of the Internet of Things (IoTs), edge computing and cloud computing. Existing big data processing architectures are underpinned by the transfer of raw data streams to the cloud computing environment for processing and analysis. This operation is expensive and fails to meet the real-time processing needs of IoT applications. In this article, we present and evaluate a novel big data processing architecture named RedEdge (i.e., data reduction on the edge) that incorporates mechanism to facilitate the processing of big data streams near the source of the data. The RedEdge model leverages mobile IoT-termed mobile edge devices as primary data processing platforms. However, in the case of the unavailability of computational and battery power resources, it offloads data streams in nearer mobile edge devices or to the cloud. We evaluate the RedEdge architecture and the related mechanism within a real-world experiment setting involving 12 mobile users. The experimental evaluation reveals that the RedEdge model has the capability to reduce big data stream by up to 92.86% without compromising energy and memory consumption on mobile edge devices. Full article

► Show Figures

Figure 1

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI