Editor's Choice Articles

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area.The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Order results
Result details
Results per page
Select all
Export citation of selected articles as:

Article

Article
Early Diagnosis of Alzheimer’s Disease Using Cerebral Catheter Angiogram Neuroimaging: A Novel Model Based on Deep Learning Approaches
Big Data Cogn. Comput. 2022, 6(1), 2; https://doi.org/10.3390/bdcc6010002 - 28 Dec 2021
Cited by 7 | Viewed by 2263
Abstract
Neuroimaging refers to the techniques that provide efficient information about the neural structure of the human brain, which is utilized for diagnosis, treatment, and scientific research. The problem of classifying neuroimages is one of the most important steps that are needed by medical [...] Read more.
Neuroimaging refers to the techniques that provide efficient information about the neural structure of the human brain, which is utilized for diagnosis, treatment, and scientific research. The problem of classifying neuroimages is one of the most important steps that are needed by medical staff to diagnose their patients early by investigating the indicators of different neuroimaging types. Early diagnosis of Alzheimer’s disease is of great importance in preventing the deterioration of the patient’s situation. In this research, a novel approach was devised based on a digital subtracted angiogram scan that provides sufficient features of a new biomarker cerebral blood flow. The used dataset was acquired from the database of K.A.U.H hospital and contains digital subtracted angiograms of participants who were diagnosed with Alzheimer’s disease, besides samples of normal controls. Since each scan included multiple frames for the left and right ICA’s, pre-processing steps were applied to make the dataset prepared for the next stages of feature extraction and classification. The multiple frames of scans transformed from real space into DCT space and averaged to remove noises. Then, the averaged image was transformed back to the real space, and both sides filtered with Meijering and concatenated in a single image. The proposed model extracts the features using different pre-trained models: InceptionV3 and DenseNet201. Then, the PCA method was utilized to select the features with 0.99 explained variance ratio, where the combination of selected features from both pre-trained models is fed into machine learning classifiers. Overall, the obtained experimental results are at least as good as other state-of-the-art approaches in the literature and more efficient according to the recent medical standards with a 99.14% level of accuracy, considering the difference in dataset samples and the used cerebral blood flow biomarker. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis for Image Processing)
Show Figures

Figure 1

Article
DASentimental: Detecting Depression, Anxiety, and Stress in Texts via Emotional Recall, Cognitive Networks, and Machine Learning
Big Data Cogn. Comput. 2021, 5(4), 77; https://doi.org/10.3390/bdcc5040077 - 13 Dec 2021
Viewed by 2179
Abstract
Most current affect scales and sentiment analysis on written text focus on quantifying valence/sentiment, the primary dimension of emotion. Distinguishing broader, more complex negative emotions of similar valence is key to evaluating mental health. We propose a semi-supervised machine learning model, DASentimental, to [...] Read more.
Most current affect scales and sentiment analysis on written text focus on quantifying valence/sentiment, the primary dimension of emotion. Distinguishing broader, more complex negative emotions of similar valence is key to evaluating mental health. We propose a semi-supervised machine learning model, DASentimental, to extract depression, anxiety, and stress from written text. We trained DASentimental to identify how N = 200 sequences of recalled emotional words correlate with recallers’ depression, anxiety, and stress from the Depression Anxiety Stress Scale (DASS-21). Using cognitive network science, we modeled every recall list as a bag-of-words (BOW) vector and as a walk over a network representation of semantic memory—in this case, free associations. This weights BOW entries according to their centrality (degree) in semantic memory and informs recalls using semantic network distances, thus embedding recalls in a cognitive representation. This embedding translated into state-of-the-art, cross-validated predictions for depression (R = 0.7), anxiety (R = 0.44), and stress (R = 0.52), equivalent to previous results employing additional human data. Powered by a multilayer perceptron neural network, DASentimental opens the door to probing the semantic organizations of emotional distress. We found that semantic distances between recalls (i.e., walk coverage), was key for estimating depression levels but redundant for anxiety and stress levels. Semantic distances from “fear” boosted anxiety predictions but were redundant when the “sad–happy” dyad was considered. We applied DASentimental to a clinical dataset of 142 suicide notes and found that the predicted depression and anxiety levels (high/low) corresponded to differences in valence and arousal as expected from a circumplex model of affect. We discuss key directions for future research enabled by artificial intelligence detecting stress, anxiety, and depression in texts. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

Article
Exploring Ensemble-Based Class Imbalance Learners for Intrusion Detection in Industrial Control Networks
Big Data Cogn. Comput. 2021, 5(4), 72; https://doi.org/10.3390/bdcc5040072 - 06 Dec 2021
Cited by 9 | Viewed by 1757
Abstract
Classifier ensembles have been utilized in the industrial cybersecurity sector for many years. However, their efficacy and reliability for intrusion detection systems remain questionable in current research, owing to the particularly imbalanced data issue. The purpose of this article is to address a [...] Read more.
Classifier ensembles have been utilized in the industrial cybersecurity sector for many years. However, their efficacy and reliability for intrusion detection systems remain questionable in current research, owing to the particularly imbalanced data issue. The purpose of this article is to address a gap in the literature by illustrating the benefits of ensemble-based models for identifying threats and attacks in a cyber-physical power grid. We provide a framework that compares nine cost-sensitive individual and ensemble models designed specifically for handling imbalanced data, including cost-sensitive C4.5, roughly balanced bagging, random oversampling bagging, random undersampling bagging, synthetic minority oversampling bagging, random undersampling boosting, synthetic minority oversampling boosting, AdaC2, and EasyEnsemble. Each ensemble’s performance is tested against a range of benchmarked power system datasets utilizing balanced accuracy, Kappa statistics, and AUC metrics. Our findings demonstrate that EasyEnsemble outperformed significantly in comparison to its rivals across the board. Furthermore, undersampling and oversampling strategies were effective in a boosting-based ensemble but not in a bagging-based ensemble. Full article
(This article belongs to the Special Issue Artificial Intelligence for Trustworthy Industrial Internet of Things)
Show Figures

Figure 1

Article
An Enhanced Parallelisation Model for Performance Prediction of Apache Spark on a Multinode Hadoop Cluster
Big Data Cogn. Comput. 2021, 5(4), 65; https://doi.org/10.3390/bdcc5040065 - 05 Nov 2021
Cited by 1 | Viewed by 1787
Abstract
Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache Spark has been established as one of the most popular big data engines for its efficiency and reliability. However, one of the significant problems of the Spark system [...] Read more.
Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache Spark has been established as one of the most popular big data engines for its efficiency and reliability. However, one of the significant problems of the Spark system is performance prediction. Spark has more than 150 configurable parameters, and configuration of so many parameters is challenging task when determining the suitable parameters for the system. In this paper, we proposed two distinct parallelisation models for performance prediction. Our insight is that each node in a Hadoop cluster can communicate with identical nodes, and a certain function of the non-parallelisable runtime can be estimated accordingly. Both models use simple equations that allows us to predict the runtime when the size of the job and the number of executables are known. The proposed models were evaluated based on five HiBench workloads, Kmeans, PageRank, Graph (NWeight), SVM, and WordCount. The workload’s empirical data were fitted with one of the two models meeting the accuracy requirements. Finally, the experimental findings show that the model can be a handy and helpful tool for scheduling and planning system deployment. Full article
Show Figures

Figure 1

Article
Prediction of Cloud Fractional Cover Using Machine Learning
Big Data Cogn. Comput. 2021, 5(4), 62; https://doi.org/10.3390/bdcc5040062 - 03 Nov 2021
Viewed by 1924
Abstract
Climate change is stated as one of the largest issues of our time, resulting in many unwanted effects on life on earth. Cloud fractional cover (CFC), the portion of the sky covered by clouds, might affect global warming and different other aspects of [...] Read more.
Climate change is stated as one of the largest issues of our time, resulting in many unwanted effects on life on earth. Cloud fractional cover (CFC), the portion of the sky covered by clouds, might affect global warming and different other aspects of human society such as agriculture and solar energy production. It is therefore important to improve the projection of future CFC, which is usually projected using numerical climate methods. In this paper, we explore the potential of using machine learning as part of a statistical downscaling framework to project future CFC. We are not aware of any other research that has explored this. We evaluated the potential of two different methods, a convolutional long short-term memory model (ConvLSTM) and a multiple regression equation, to predict CFC from other environmental variables. The predictions were associated with much uncertainty indicating that there might not be much information in the environmental variables used in the study to predict CFC. Overall the regression equation performed the best, but the ConvLSTM was the better performing model along some coastal and mountain areas. All aspects of the research analyses are explained including data preparation, model development, ML training, performance evaluation and visualization. Full article
(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)
Show Figures

Figure 1

Article
6G Cognitive Information Theory: A Mailbox Perspective
Big Data Cogn. Comput. 2021, 5(4), 56; https://doi.org/10.3390/bdcc5040056 - 16 Oct 2021
Cited by 11 | Viewed by 19793
Abstract
With the rapid development of 5G communications, enhanced mobile broadband, massive machine type communications and ultra-reliable low latency communications are widely supported. However, a 5G communication system is still based on Shannon’s information theory, while the meaning and value of information itself are [...] Read more.
With the rapid development of 5G communications, enhanced mobile broadband, massive machine type communications and ultra-reliable low latency communications are widely supported. However, a 5G communication system is still based on Shannon’s information theory, while the meaning and value of information itself are not taken into account in the process of transmission. Therefore, it is difficult to meet the requirements of intelligence, customization, and value transmission of 6G networks. In order to solve the above challenges, we propose a 6G mailbox theory, namely a cognitive information carrier to enable distributed algorithm embedding for intelligence networking. Based on Mailbox, a 6G network will form an intelligent agent with self-organization, self-learning, self-adaptation, and continuous evolution capabilities. With the intelligent agent, redundant transmission of data can be reduced while the value transmission of information can be improved. Then, the features of mailbox principle are introduced, including polarity, traceability, dynamics, convergence, figurability, and dependence. Furthermore, key technologies with which value transmission of information can be realized are introduced, including knowledge graph, distributed learning, and blockchain. Finally, we establish a cognitive communication system assisted by deep learning. The experimental results show that, compared with a traditional communication system, our communication system performs less data transmission quantity and error. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)
Show Figures

Figure 1

Article
Hardening the Security of Multi-Access Edge Computing through Bio-Inspired VM Introspection
Big Data Cogn. Comput. 2021, 5(4), 52; https://doi.org/10.3390/bdcc5040052 - 08 Oct 2021
Viewed by 1693
Abstract
The extreme bandwidth and performance of 5G mobile networks changes the way we develop and utilize digital services. Within a few years, 5G will not only touch technology and applications, but dramatically change the economy, our society and individual life. One of the [...] Read more.
The extreme bandwidth and performance of 5G mobile networks changes the way we develop and utilize digital services. Within a few years, 5G will not only touch technology and applications, but dramatically change the economy, our society and individual life. One of the emerging technologies that enables the evolution to 5G by bringing cloud capabilities near to the end users is Edge Computing or also known as Multi-Access Edge Computing (MEC) that will become pertinent towards the evolution of 5G. This evolution also entails growth in the threat landscape and increase privacy in concerns at different application areas, hence security and privacy plays a central role in the evolution towards 5G. Since MEC application instantiated in the virtualized infrastructure, in this paper we present a distributed application that aims to constantly introspect multiple virtual machines (VMs) in order to detect malicious activities based on their anomalous behavior. Once suspicious processes detected, our IDS in real-time notifies system administrator about the potential threat. Developed software is able to detect keyloggers, rootkits, trojans, process hiding and other intrusion artifacts via agent-less operation, by operating remotely or directly from the host machine. Remote memory introspection means no software to install, no notice to malware to evacuate or destroy data. Experimental results of remote VMI on more than 50 different malicious code demonstrate average anomaly detection rate close to 97%. We have established wide testbed environment connecting networks of two universities Kyushu Institute of Technology and The City College of New York through secure GRE tunnel. Conducted experiments on this testbed deliver high response time of the proposed system. Full article
(This article belongs to the Special Issue Information Security and Cyber Intelligence)
Show Figures

Figure 1

Article
Bag of Features (BoF) Based Deep Learning Framework for Bleached Corals Detection
Big Data Cogn. Comput. 2021, 5(4), 53; https://doi.org/10.3390/bdcc5040053 - 08 Oct 2021
Cited by 8 | Viewed by 1940
Abstract
Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are [...] Read more.
Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are declining because of over-exploitation, damaging fishery, marine pollution, and global climate changes. Also, coral reefs help treat human immune-deficiency virus (HIV), heart disease, and coastal erosion. The corals of Australia’s great barrier reef have started bleaching due to the ocean acidification, and global warming, which is an alarming threat to the earth’s ecosystem. Many techniques have been developed to address such issues. However, each method has a limitation due to the low resolution of images, diverse weather conditions, etc. In this paper, we propose a bag of features (BoF) based approach that can detect and localize the bleached corals before the safety measures are applied. The dataset contains images of bleached and unbleached corals, and various kernels are used to support the vector machine so that extracted features can be classified. The accuracy of handcrafted descriptors and deep convolutional neural networks is analyzed and provided in detail with comparison to the current method. Various handcrafted descriptors like local binary pattern, a histogram of an oriented gradient, locally encoded transform feature histogram, gray level co-occurrence matrix, and completed joint scale local binary pattern are used for feature extraction. Specific deep convolutional neural networks such as AlexNet, GoogLeNet, VGG-19, ResNet-50, Inception v3, and CoralNet are being used for feature extraction. From experimental analysis and results, the proposed technique outperforms in comparison to the current state-of-the-art methods. The proposed technique achieves 99.08% accuracy with a classification error of 0.92%. A novel bleached coral positioning algorithm is also proposed to locate bleached corals in the coral reef images. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Article
Big Data Contribution in Desktop and Mobile Devices Comparison, Regarding Airlines’ Digital Brand Name Effect
Big Data Cogn. Comput. 2021, 5(4), 48; https://doi.org/10.3390/bdcc5040048 - 26 Sep 2021
Cited by 7 | Viewed by 2000
Abstract
Rising demand for optimized digital marketing strategies has led firms in a hunt to harvest every possible aspect indicating users’ experience and preference. People visit, regularly through the day, numerous websites using both desktop and mobile devices. For businesses to acknowledge device’s usage [...] Read more.
Rising demand for optimized digital marketing strategies has led firms in a hunt to harvest every possible aspect indicating users’ experience and preference. People visit, regularly through the day, numerous websites using both desktop and mobile devices. For businesses to acknowledge device’s usage rates is extremely important. Thus, this research is focused on analyzing each device’s usage and their effect on airline firms’ digital brand name. In the first phase of the research, we gathered web data from 10 airline firms during an observation period of 180 days. We then proceeded in developing an exploratory model using Fuzzy Cognitive Mapping, as well as a predictive and simulation model using Agent-Based Modeling. We inferred that various factors of airlines’ digital brand name are affected by both desktop and mobile usage, with mobile usage having a slightly bigger impact on most of them, with gradually rising values. Desktop device usage also appeared to be quite significant, especially in traffic coming from referral sources. The paper’s contribution has been to provide a handful of time-accurate insights for marketeers, regarding airlines’ digital marketing strategies. Full article
Show Figures

Figure 1

Article
Indoor Localization for Personalized Ambient Assisted Living of Multiple Users in Multi-Floor Smart Environments
Big Data Cogn. Comput. 2021, 5(3), 42; https://doi.org/10.3390/bdcc5030042 - 08 Sep 2021
Cited by 12 | Viewed by 2141
Abstract
This paper presents a multifunctional interdisciplinary framework that makes four scientific contributions towards the development of personalized ambient assisted living (AAL), with a specific focus to address the different and dynamic needs of the diverse aging population in the future of smart living [...] Read more.
This paper presents a multifunctional interdisciplinary framework that makes four scientific contributions towards the development of personalized ambient assisted living (AAL), with a specific focus to address the different and dynamic needs of the diverse aging population in the future of smart living environments. First, it presents a probabilistic reasoning-based mathematical approach to model all possible forms of user interactions for any activity arising from user diversity of multiple users in such environments. Second, it presents a system that uses this approach with a machine learning method to model individual user-profiles and user-specific user interactions for detecting the dynamic indoor location of each specific user. Third, to address the need to develop highly accurate indoor localization systems for increased trust, reliance, and seamless user acceptance, the framework introduces a novel methodology where two boosting approaches—Gradient Boosting and the AdaBoost algorithm are integrated and used on a decision tree-based learning model to perform indoor localization. Fourth, the framework introduces two novel functionalities to provide semantic context to indoor localization in terms of detecting each user’s floor-specific location as well as tracking whether a specific user was located inside or outside a given spatial region in a multi-floor-based indoor setting. These novel functionalities of the proposed framework were tested on a dataset of localization-related Big Data collected from 18 different users who navigated in 3 buildings consisting of 5 floors and 254 indoor spatial regions, with an to address the limitation in prior works in this field centered around the lack of training data from diverse users. The results show that this approach of indoor localization for personalized AAL that models each specific user always achieves higher accuracy as compared to the traditional approach of modeling an average user. The results further demonstrate that the proposed framework outperforms all prior works in this field in terms of functionalities, performance characteristics, and operational features. Full article
(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)
Show Figures

Figure 1

Article
Big Data Research in Fighting COVID-19: Contributions and Techniques
Big Data Cogn. Comput. 2021, 5(3), 30; https://doi.org/10.3390/bdcc5030030 - 12 Jul 2021
Cited by 6 | Viewed by 3280
Abstract
The COVID-19 pandemic has induced many problems in various sectors of human life. After more than one year of the pandemic, many studies have been conducted to discover various technological innovations and applications to combat the virus that has claimed many lives. The [...] Read more.
The COVID-19 pandemic has induced many problems in various sectors of human life. After more than one year of the pandemic, many studies have been conducted to discover various technological innovations and applications to combat the virus that has claimed many lives. The use of Big Data technology to mitigate the threats of the pandemic has been accelerated. Therefore, this survey aims to explore Big Data technology research in fighting the pandemic. Furthermore, the relevance of Big Data technology was analyzed while technological contributions to five main areas were highlighted. These include healthcare, social life, government policy, business and management, and the environment. The analytical techniques of machine learning, deep learning, statistics, and mathematics were discussed to solve issues regarding the pandemic. The data sources used in previous studies were also presented and they consist of government officials, institutional service, IoT generated, online media, and open data. Therefore, this study presents the role of Big Data technologies in enhancing the research relative to COVID-19 and provides insights into the current state of knowledge within the domain and references for further development or starting new studies are provided. Full article
(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)
Show Figures

Figure 1

Article
Big Data and the United Nations Sustainable Development Goals (UN SDGs) at a Glance
Big Data Cogn. Comput. 2021, 5(3), 28; https://doi.org/10.3390/bdcc5030028 - 28 Jun 2021
Cited by 11 | Viewed by 3973
Abstract
The launch of the United Nations (UN) 17 Sustainable Development Goals (SDGs) in 2015 was a historic event, uniting countries around the world around the shared agenda of sustainable development with a more balanced relationship between human beings and the planet. The SDGs [...] Read more.
The launch of the United Nations (UN) 17 Sustainable Development Goals (SDGs) in 2015 was a historic event, uniting countries around the world around the shared agenda of sustainable development with a more balanced relationship between human beings and the planet. The SDGs affect or impact almost all aspects of life, as indeed does the technological revolution, empowered by Big Data and their related technologies. It is inevitable that these two significant domains and their integration will play central roles in achieving the 2030 Agenda. This research aims to provide a comprehensive overview of how these domains are currently interacting, by illustrating the impact of Big Data on sustainable development in the context of each of the 17 UN SDGs. Full article
(This article belongs to the Special Issue Big Data and UN Sustainable Development Goals (SDGs))
Show Figures

Figure 1

Article
Big Remote Sensing Image Classification Based on Deep Learning Extraction Features and Distributed Spark Frameworks
Big Data Cogn. Comput. 2021, 5(2), 21; https://doi.org/10.3390/bdcc5020021 - 05 May 2021
Cited by 2 | Viewed by 3255
Abstract
Big data analysis assumes a significant role in Earth observation using remote sensing images, since the explosion of data images from multiple sensors is used in several fields. The traditional data analysis techniques have different limitations on storing and processing massive volumes of [...] Read more.
Big data analysis assumes a significant role in Earth observation using remote sensing images, since the explosion of data images from multiple sensors is used in several fields. The traditional data analysis techniques have different limitations on storing and processing massive volumes of data. Besides, big remote sensing data analytics demand sophisticated algorithms based on specific techniques to store to process the data in real-time or in near real-time with high accuracy, efficiency, and high speed. In this paper, we present a method for storing a huge number of heterogeneous satellite images based on Hadoop distributed file system (HDFS) and Apache Spark. We also present how deep learning algorithms such as VGGNet and UNet can be beneficial to big remote sensing data processing for feature extraction and classification. The obtained results prove that our approach outperforms other methods. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis for Image Processing)
Show Figures

Figure 1

Article
From data Processing to Knowledge Processing: Working with Operational Schemas by Autopoietic Machines
Big Data Cogn. Comput. 2021, 5(1), 13; https://doi.org/10.3390/bdcc5010013 - 10 Mar 2021
Cited by 10 | Viewed by 3349
Abstract
Knowledge processing is an important feature of intelligence in general and artificial intelligence in particular. To develop computing systems working with knowledge, it is necessary to elaborate the means of working with knowledge representations (as opposed to data), because knowledge is an abstract [...] Read more.
Knowledge processing is an important feature of intelligence in general and artificial intelligence in particular. To develop computing systems working with knowledge, it is necessary to elaborate the means of working with knowledge representations (as opposed to data), because knowledge is an abstract structure. There are different forms of knowledge representations derived from data. One of the basic forms is called a schema, which can belong to one of three classes: operational, descriptive, and representation schemas. The goal of this paper is the development of theoretical and practical tools for processing operational schemas. To achieve this goal, we use schema representations elaborated in the mathematical theory of schemas and use structural machines as a powerful theoretical tool for modeling parallel and concurrent computational processes. We describe the schema of autopoietic machines as physical realizations of structural machines. An autopoietic machine is a technical system capable of regenerating, reproducing, and maintaining itself by production, transformation, and destruction of its components and the networks of processes downstream contained in them. We present the theory and practice of designing and implementing autopoietic machines as information processing structures integrating both symbolic computing and neural networks. Autopoietic machines use knowledge structures containing the behavioral evolution of the system and its interactions with the environment to maintain stability by counteracting fluctuations. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

Article
Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19
Big Data Cogn. Comput. 2021, 5(1), 12; https://doi.org/10.3390/bdcc5010012 - 09 Mar 2021
Cited by 9 | Viewed by 4515
Abstract
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should [...] Read more.
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should efficiently tackle the incoming large amounts of data and provide organizations with relevant processed information that was formerly neither visible nor manageable. After having briefly recalled the strategic advantages of big data solutions in the introductory remarks, in the first part of this paper, we focus on the advantages of big data solutions in the currently difficult time of the COVID-19 pandemic. We characterize it as an endemic heterogeneous data context; we then outline the advantages of technologies such as Hadoop and its IT suitability in this context. In the second part, we identify two specific advantages of Hadoop solutions, globality combined with flexibility, and we notice that they are at work with a “Hadoop Fusion Approach” that we describe as an optimal response to the context. In the third part, we justify selected qualifications of globality and flexibility by the fact that Hadoop solutions enable comparable returns in opposite contexts of models of partial submodels and of models of final exact systems. In part four, we remark that in both these opposite contexts, Hadoop’s solutions allow a large range of needs to be fulfilled, which fits with requirements previously identified as the current heterogeneous data structure of COVID-19 information. In the final part, we propose a framework of strategic data processing conditions. To the best of our knowledge, they appear to be the most suitable to overcome COVID-19 massive information challenges. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

Article
Automatic Defects Segmentation and Identification by Deep Learning Algorithm with Pulsed Thermography: Synthetic and Experimental Data
Big Data Cogn. Comput. 2021, 5(1), 9; https://doi.org/10.3390/bdcc5010009 - 26 Feb 2021
Cited by 12 | Viewed by 3054
Abstract
In quality evaluation (QE) of the industrial production field, infrared thermography (IRT) is one of the most crucial techniques used for evaluating composite materials due to the properties of low cost, fast inspection of large surfaces, and safety. The application of deep neural [...] Read more.
In quality evaluation (QE) of the industrial production field, infrared thermography (IRT) is one of the most crucial techniques used for evaluating composite materials due to the properties of low cost, fast inspection of large surfaces, and safety. The application of deep neural networks tends to be a prominent direction in IRT Non-Destructive Testing (NDT). During the training of the neural network, the Achilles heel is the necessity of a large database. The collection of huge amounts of training data is the high expense task. In NDT with deep learning, synthetic data contributing to training in infrared thermography remains relatively unexplored. In this paper, synthetic data from the standard Finite Element Models are combined with experimental data to build repositories with Mask Region based Convolutional Neural Networks (Mask-RCNN) to strengthen the neural network, learning the essential features of objects of interest and achieving defect segmentation automatically. These results indicate the possibility of adapting inexpensive synthetic data merging with a certain amount of the experimental database for training the neural networks in order to achieve the compelling performance from a limited collection of the annotated experimental data of a real-world practical thermography experiment. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis for Image Processing)
Show Figures

Figure 1

Article
Big Data and Personalisation for Non-Intrusive Smart Home Automation
Big Data Cogn. Comput. 2021, 5(1), 6; https://doi.org/10.3390/bdcc5010006 - 30 Jan 2021
Cited by 10 | Viewed by 3891
Abstract
With the advent of the Internet of Things (IoT), many different smart home technologies are commercially available. However, the adoption of such technologies is slow as many of them are not cost-effective and focus on specific functions such as energy efficiency. Recently, IoT [...] Read more.
With the advent of the Internet of Things (IoT), many different smart home technologies are commercially available. However, the adoption of such technologies is slow as many of them are not cost-effective and focus on specific functions such as energy efficiency. Recently, IoT devices and sensors have been designed to enhance the quality of personal life by having the capability to generate continuous data streams that can be used to monitor and make inferences by the user. While smart home devices connect to the home Wi-Fi network, there are still compatibility issues between devices from different manufacturers. Smart devices get even smarter when they can communicate with and control each other. The information collected by one device can be shared with others for achieving an enhanced automation of their operations. This paper proposes a non-intrusive approach of integrating and collecting data from open standard IoT devices for personalised smart home automation using big data analytics and machine learning. We demonstrate the implementation of our proposed novel technology instantiation approach for achieving non-intrusive IoT based big data analytics with a use case of a smart home environment. We employ open-source frameworks such as Apache Spark, Apache NiFi and FB-Prophet along with popular vendor tech-stacks such as Azure and DataBricks. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Article
NLP-Based Customer Loyalty Improvement Recommender System (CLIRS2)
Big Data Cogn. Comput. 2021, 5(1), 4; https://doi.org/10.3390/bdcc5010004 - 19 Jan 2021
Cited by 8 | Viewed by 3356
Abstract
Structured data on customer feedback is becoming more costly and timely to collect and organize. On the other hand, unstructured opinionated data, e.g., in the form of free-text comments, is proliferating and available on public websites, such as social media websites, blogs, forums, [...] Read more.
Structured data on customer feedback is becoming more costly and timely to collect and organize. On the other hand, unstructured opinionated data, e.g., in the form of free-text comments, is proliferating and available on public websites, such as social media websites, blogs, forums, and websites that provide recommendations. This research proposes a novel method to develop a knowledge-based recommender system from unstructured (text) data. The method is based on applying an opinion mining algorithm, extracting aspect-based sentiment score per text item, and transforming text into a structured form. An action rule mining algorithm is applied to the data table constructed from sentiment mining. The proposed application of the method is the problem of improving customer satisfaction ratings. The results obtained from the dataset of customer comments related to the repair services were evaluated with accuracy and coverage. Further, the results were incorporated into the framework of a web-based user-friendly recommender system to advise the business on how to maximally increase their profits by introducing minimal sets of changes in their service. Experiments and evaluation results from comparing the structured data-based version of the system CLIRS (Customer Loyalty Improvement Recommender System) with the unstructured data-based version of the system (CLIRS2) are provided. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Article
eGAP: An Evolutionary Game Theoretic Approach to Random Forest Pruning
Big Data Cogn. Comput. 2020, 4(4), 37; https://doi.org/10.3390/bdcc4040037 - 28 Nov 2020
Cited by 1 | Viewed by 2900
Abstract
To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and [...] Read more.
To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and are being developed to provide, using mobile and electronic technology, higher diagnosis quality of the diseases, better treatment of the patients, and improved quality of lives. Since smart healthcare applications that are mainly concerned with the prediction of healthcare data (like diseases for example) rely on predictive healthcare data analytics, it is imperative for such predictive healthcare data analytics to be as accurate as possible. In this paper, we will exploit supervised machine learning methods in classification and regression to improve the performance of the traditional Random Forest on healthcare datasets, both in terms of accuracy and classification/regression speed, in order to produce an effective and efficient smart healthcare application, which we have termed eGAP. eGAP uses the evolutionary game theoretic approach replicator dynamics to evolve a Random Forest ensemble. Trees of high resemblance in an initial Random Forest are clustered, and then clusters grow and shrink by adding and removing trees using replicator dynamics, according to the predictive accuracy of each subforest represented by a cluster of trees. All clusters have an initial number of trees that is equal to the number of trees in the smallest cluster. Cluster growth is performed using trees that are not initially sampled. The speed and accuracy of the proposed method have been demonstrated by an experimental study on 10 classification and 10 regression medical datasets. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Article
Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport
Big Data Cogn. Comput. 2020, 4(4), 36; https://doi.org/10.3390/bdcc4040036 - 27 Nov 2020
Cited by 6 | Viewed by 3712
Abstract
In recent years, the demand for collective mobility services registered significant growth. In particular, the long-distance coach market underwent an important change in Europe, since FlixBus adopted a dynamic pricing strategy, providing low-cost transport services and an efficient and fast information system. This [...] Read more.
In recent years, the demand for collective mobility services registered significant growth. In particular, the long-distance coach market underwent an important change in Europe, since FlixBus adopted a dynamic pricing strategy, providing low-cost transport services and an efficient and fast information system. This paper presents a methodology, called DA4PT (Data Analytics for Public Transport), for discovering the factors that influence travelers in booking and purchasing bus tickets. Starting from a set of 3.23 million user-generated event logs of a bus ticketing platform, the methodology shows the correlation rules between booking factors and purchase of tickets. Such rules are then used to train machine learning models for predicting whether a user will buy or not a ticket. The rules are also used to define various dynamic pricing strategies with the purpose of increasing the number of tickets sales on the platform and the related amount of revenues. The methodology reaches an accuracy of 95% in forecasting the purchase of a ticket and a low variance in results. Exploiting a dynamic pricing strategy, DA4PT is able to increase the number of purchased tickets by 6% and the total revenue by 9% by showing the effectiveness of the proposed approach. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Article
Engineering Human–Machine Teams for Trusted Collaboration
Big Data Cogn. Comput. 2020, 4(4), 35; https://doi.org/10.3390/bdcc4040035 - 23 Nov 2020
Cited by 4 | Viewed by 3629
Abstract
The way humans and artificially intelligent machines interact is undergoing a dramatic change. This change becomes particularly apparent in domains where humans and machines collaboratively work on joint tasks or objects in teams, such as in industrial assembly or disassembly processes. While there [...] Read more.
The way humans and artificially intelligent machines interact is undergoing a dramatic change. This change becomes particularly apparent in domains where humans and machines collaboratively work on joint tasks or objects in teams, such as in industrial assembly or disassembly processes. While there is intensive research work on human–machine collaboration in different research disciplines, systematic and interdisciplinary approaches towards engineering systems that consist of or comprise human–machine teams are still rare. In this paper, we review and analyze the state of the art, and derive and discuss core requirements and concepts by means of an illustrating scenario. In terms of methods, we focus on how reciprocal trust between humans and intelligent machines is defined, built, measured, and maintained from a systems engineering and planning perspective in literature. Based on our analysis, we propose and outline three important areas of future research on engineering and operating human–machine teams for trusted collaboration. For each area, we describe exemplary research opportunities. Full article
Show Figures

Figure 1

Article
A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19
Big Data Cogn. Comput. 2020, 4(4), 33; https://doi.org/10.3390/bdcc4040033 - 09 Nov 2020
Cited by 29 | Viewed by 7224
Abstract
During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. [...] Read more.
During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

Article
Using Big and Open Data to Generate Content for an Educational Game to Increase Student Performance and Interest
Big Data Cogn. Comput. 2020, 4(4), 30; https://doi.org/10.3390/bdcc4040030 - 22 Oct 2020
Cited by 1 | Viewed by 3298
Abstract
The goal of this paper is to utilize available big and open data sets to create content for a board and a digital game and implement an educational environment to improve students’ familiarity with concepts and relations in the data and, in the [...] Read more.
The goal of this paper is to utilize available big and open data sets to create content for a board and a digital game and implement an educational environment to improve students’ familiarity with concepts and relations in the data and, in the process, academic performance and engagement. To this end, we used Wikipedia data to generate content for a Monopoly clone called Geopoly and designed a game-based learning experiment. Our research examines whether this game had any impact on the students’ performance, which is related to identifying implied ranking and grouping mechanisms in the game, whether performance is correlated with interest and whether performance differs across genders. Student performance and knowledge about the relationships contained in the data improved significantly after playing the game, while the positive correlation between student interest and performance illustrated the relationship between them. This was also verified by a digital version of the game, evaluated by the students during the COVID-19 pandemic; initial results revealed that students found the game more attractive and rewarding than a traditional geography lesson. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)
Show Figures

Figure 1

Article
Multi-Level Clustering-Based Outlier’s Detection (MCOD) Using Self-Organizing Maps
Big Data Cogn. Comput. 2020, 4(4), 24; https://doi.org/10.3390/bdcc4040024 - 23 Sep 2020
Cited by 9 | Viewed by 2980
Abstract
Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited [...] Read more.
Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited by the pattern/behaviour of the dataset; these methods may not perform well without prior knowledge of the dataset. This paper proposes a multi-level outlier detection algorithm (MCOD) that uses multi-level unsupervised learning to cluster the data and discover outliers. The proposed detection method is tested on datasets in different fields with different sizes and dimensions. Experimental analysis has shown that the proposed MCOD algorithm has the ability to improving the outlier detection rate, as compared to the traditional anomaly detection methods. Enterprises and organizations can adopt the proposed MCOD algorithm to ensure a sustainable and efficient detection of frauds/outliers to increase profitability (and/or) to enhance business outcomes. Full article
Show Figures

Figure 1

Article
Keyword Search over RDF: Is a Single Perspective Enough?
Big Data Cogn. Comput. 2020, 4(3), 22; https://doi.org/10.3390/bdcc4030022 - 27 Aug 2020
Cited by 8 | Viewed by 3249
Abstract
Since the task of accessing RDF datasets through structured query languages like SPARQL is rather demanding for ordinary users, there are various approaches that attempt to exploit the simpler and widely used keyword-based search paradigm. However this task is challenging since there [...] Read more.
Since the task of accessing RDF datasets through structured query languages like SPARQL is rather demanding for ordinary users, there are various approaches that attempt to exploit the simpler and widely used keyword-based search paradigm. However this task is challenging since there is no clear unit of retrieval and presentation, the user information needs are in most cases not clearly formulated, the underlying RDF datasets are in most cases incomplete, and there is not a single presentation method appropriate for all kinds of information needs. As a means to alleviate these problems, in this paper we investigate an interaction approach that offers multiple presentation methods of the search results (multiple-perspectives), allowing the user to easily switch between these perspectives and thus exploit the added value that each such perspective offers. We focus on a set of fundamental perspectives, we discuss the benefits from each one, we compare this approach with related existing systems and report the results of a task-based evaluation with users. The key finding of the task-based evaluation is that users not familiar with RDF (a) managed to complete the information-seeking tasks (with performance very close to that of the experienced users), and (b) they rated positively the approach. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

Article
MOBDA: Microservice-Oriented Big Data Architecture for Smart City Transport Systems
Big Data Cogn. Comput. 2020, 4(3), 17; https://doi.org/10.3390/bdcc4030017 - 09 Jul 2020
Cited by 12 | Viewed by 4200
Abstract
Highly populated cities depend highly on intelligent transportation systems (ITSs) for reliable and efficient resource utilization and traffic management. Current transportation systems struggle to meet different stakeholder expectations while trying their best to optimize resources in providing various transport services. This paper proposes [...] Read more.
Highly populated cities depend highly on intelligent transportation systems (ITSs) for reliable and efficient resource utilization and traffic management. Current transportation systems struggle to meet different stakeholder expectations while trying their best to optimize resources in providing various transport services. This paper proposes a Microservice-Oriented Big Data Architecture (MOBDA) incorporating data processing techniques, such as predictive modelling for achieving smart transportation and analytics microservices required towards smart cities of the future. We postulate key transportation metrics applied on various sources of transportation data to serve this objective. A novel hybrid architecture is proposed to combine stream processing and batch processing of big data for a smart computation of microservice-oriented transportation metrics that can serve the different needs of stakeholders. Development of such an architecture for smart transportation and analytics will improve the predictability of transport supply for transport providers and transport authority as well as enhance consumer satisfaction during peak periods. Full article
Show Figures

Figure 1

Article
TranScreen: Transfer Learning on Graph-Based Anti-Cancer Virtual Screening Model
Big Data Cogn. Comput. 2020, 4(3), 16; https://doi.org/10.3390/bdcc4030016 - 29 Jun 2020
Cited by 5 | Viewed by 4030
Abstract
Deep learning’s automatic feature extraction has proven its superior performance over traditional fingerprint-based features in the implementation of virtual screening models. However, these models face multiple challenges in the field of early drug discovery, such as over-training and generalization to unseen data, due [...] Read more.
Deep learning’s automatic feature extraction has proven its superior performance over traditional fingerprint-based features in the implementation of virtual screening models. However, these models face multiple challenges in the field of early drug discovery, such as over-training and generalization to unseen data, due to the inherently unbalanced and small datasets. In this work, the TranScreen pipeline is proposed, which utilizes transfer learning and a collection of weight initializations to overcome these challenges. An amount of 182 graph convolutional neural networks are trained on molecular source datasets and the learned knowledge is transferred to the target task for fine-tuning. The target task of p53-based bioactivity prediction, an important factor for anti-cancer discovery, is chosen to showcase the capability of the pipeline. Having trained a collection of source models, three different approaches are implemented to compare and rank them for a given task before fine-tuning. The results show improvement in performance of the model in multiple cases, with the best model increasing the area under receiver operating curve ROC-AUC from 0.75 to 0.91 and the recall from 0.25 to 1. This improvement is vital for practical virtual screening via lowering the false negatives and demonstrates the potential of transfer learning. The code and pre-trained models are made accessible online. Full article
Show Figures

Figure 1

Article
#lockdown: Network-Enhanced Emotional Profiling in the Time of COVID-19
Big Data Cogn. Comput. 2020, 4(2), 14; https://doi.org/10.3390/bdcc4020014 - 16 Jun 2020
Cited by 31 | Viewed by 5495
Abstract
The COVID-19 pandemic forced countries all over the world to take unprecedented measures, like nationwide lockdowns. To adequately understand the emotional and social repercussions, a large-scale reconstruction of how people perceived these unexpected events is necessary but currently missing. We address this gap [...] Read more.
The COVID-19 pandemic forced countries all over the world to take unprecedented measures, like nationwide lockdowns. To adequately understand the emotional and social repercussions, a large-scale reconstruction of how people perceived these unexpected events is necessary but currently missing. We address this gap through social media by introducing MERCURIAL (Multi-layer Co-occurrence Networks for Emotional Profiling), a framework which exploits linguistic networks of words and hashtags to reconstruct social discourse describing real-world events. We use MERCURIAL to analyse 101,767 tweets from Italy, the first country to react to the COVID-19 threat with a nationwide lockdown. The data were collected between the 11th and 17th March, immediately after the announcement of the Italian lockdown and the WHO declaring COVID-19 a pandemic. Our analysis provides unique insights into the psychological burden of this crisis, focussing on—(i) the Italian official campaign for self-quarantine (#iorestoacasa), (ii) national lockdown (#italylockdown), and (iii) social denounce (#sciacalli). Our exploration unveils the emergence of complex emotional profiles, where anger and fear (towards political debates and socio-economic repercussions) coexisted with trust, solidarity, and hope (related to the institutions and local communities). We discuss our findings in relation to mental well-being issues and coping mechanisms, like instigation to violence, grieving, and solidarity. We argue that our framework represents an innovative thermometer of emotional status, a powerful tool for policy makers to quickly gauge feelings in massive audiences and devise appropriate responses based on cognitive data. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

Article
Artificial Intelligence-Enhanced Predictive Insights for Advancing Financial Inclusion: A Human-Centric AI-Thinking Approach
Big Data Cogn. Comput. 2020, 4(2), 8; https://doi.org/10.3390/bdcc4020008 - 27 Apr 2020
Cited by 7 | Viewed by 3898
Abstract
According to the World Bank, a key factor to poverty reduction and improving prosperity is financial inclusion. Financial service providers (FSPs) offering financially-inclusive solutions need to understand how to approach the underserved successfully. The application of artificial intelligence (AI) on legacy data can [...] Read more.
According to the World Bank, a key factor to poverty reduction and improving prosperity is financial inclusion. Financial service providers (FSPs) offering financially-inclusive solutions need to understand how to approach the underserved successfully. The application of artificial intelligence (AI) on legacy data can help FSPs to anticipate how prospective customers may respond when they are approached. However, it remains challenging for FSPs who are not well-versed in computer programming to implement AI projects. This paper proffers a no-coding human-centric AI-based approach to simulate the possible dynamics between the financial profiles of prospective customers collected from 45,211 contact encounters and predict their intentions toward the financial products being offered. This approach contributes to the literature by illustrating how AI for social good can also be accessible for people who are not well-versed in computer science. A rudimentary AI-based predictive modeling approach that does not require programming skills will be illustrated in this paper. In these AI-generated multi-criteria optimizations, analysts in FSPs can simulate scenarios to better understand their prospective customers. In conjunction with the usage of AI, this paper also suggests how AI-Thinking could be utilized as a cognitive scaffold for educing (drawing out) actionable insights to advance financial inclusion. Full article
Show Figures

Figure 1

Article
Hydria: An Online Data Lake for Multi-Faceted Analytics in the Cultural Heritage Domain
Big Data Cogn. Comput. 2020, 4(2), 7; https://doi.org/10.3390/bdcc4020007 - 23 Apr 2020
Cited by 6 | Viewed by 3154
Abstract
Advancements in cultural informatics have significantly influenced the way we perceive, analyze, communicate and understand culture. New data sources, such as social media, digitized cultural content, and Internet of Things (IoT) devices, have allowed us to enrich and customize the cultural experience, but [...] Read more.
Advancements in cultural informatics have significantly influenced the way we perceive, analyze, communicate and understand culture. New data sources, such as social media, digitized cultural content, and Internet of Things (IoT) devices, have allowed us to enrich and customize the cultural experience, but at the same time have created an avalanche of new data that needs to be stored and appropriately managed in order to be of value. Although data management plays a central role in driving forward the cultural heritage domain, the solutions applied so far are fragmented, physically distributed, require specialized IT knowledge to deploy, and entail significant IT experience to operate even for trivial tasks. In this work, we present Hydria, an online data lake that allows users without any IT background to harvest, store, organize, analyze and share heterogeneous, multi-faceted cultural heritage data. Hydria provides a zero-administration, zero-cost, integrated framework that enables researchers, museum curators and other stakeholders within the cultural heritage domain to easily (i) deploy data acquisition services (like social media scrapers, focused web crawlers, dataset imports, questionnaire forms), (ii) design and manage versatile customizable data stores, (iii) share whole datasets or horizontal/vertical data shards with other stakeholders, (iv) search, filter and analyze data via an expressive yet simple-to-use graphical query engine and visualization tools, and (v) perform user management and access control operations on the stored data. To the best of our knowledge, this is the first solution in the literature that focuses on collecting, managing, analyzing, and sharing diverse, multi-faceted data in the cultural heritage domain and targets users without an IT background. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)
Show Figures

Figure 1

Article
Text Mining in Big Data Analytics
Big Data Cogn. Comput. 2020, 4(1), 1; https://doi.org/10.3390/bdcc4010001 - 16 Jan 2020
Cited by 59 | Viewed by 8103
Abstract
Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine [...] Read more.
Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine the state of text mining research by examining the developments within published literature over past years and provide valuable insights for practitioners and researchers on the predominant trends, methods, and applications of text mining research. In accordance with this, more than 200 academic journal articles on the subject are included and discussed in this review; the state-of-the-art text mining approaches and techniques used for analyzing transcripts and speeches, meeting transcripts, and academic journal articles, as well as websites, emails, blogs, and social media platforms, across a broad range of application areas are also investigated. Additionally, the benefits and challenges related to text mining are also briefly outlined. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

Article
Emotional Decision-Making Biases Prediction in Cyber-Physical Systems
Big Data Cogn. Comput. 2019, 3(3), 49; https://doi.org/10.3390/bdcc3030049 - 30 Aug 2019
Cited by 1 | Viewed by 2559
Abstract
This article faces the challenge of discovering the trends in decision-making based on capturing emotional data and the influence of the possible external stimuli. We conducted an experiment with a significant sample of the workforce and used machine-learning techniques to model the decision-making [...] Read more.
This article faces the challenge of discovering the trends in decision-making based on capturing emotional data and the influence of the possible external stimuli. We conducted an experiment with a significant sample of the workforce and used machine-learning techniques to model the decision-making process. We studied the trends introduced by the emotional status and the external stimulus that makes these personnel act or report to the supervisor. The main result of this study is the production of a model capable of predicting the bias to act in a specific context. We studied the relationship between emotions and the probability of acting or correcting the system. The main area of interest of these issues is the ability to influence in advance the personnel to make their work more efficient and productive. This would be a whole new line of research for the future. Full article
Show Figures

Figure 1

Article
Optimal Number of Choices in Rating Contexts
Big Data Cogn. Comput. 2019, 3(3), 48; https://doi.org/10.3390/bdcc3030048 - 27 Aug 2019
Cited by 2 | Viewed by 2573
Abstract
In many settings, people must give numerical scores to entities from a small discrete set—for instance, rating physical attractiveness from 1–5 on dating sites, or papers from 1–10 for conference reviewing. We study the problem of understanding when using a different number of [...] Read more.
In many settings, people must give numerical scores to entities from a small discrete set—for instance, rating physical attractiveness from 1–5 on dating sites, or papers from 1–10 for conference reviewing. We study the problem of understanding when using a different number of options is optimal. We consider the case when scores are uniform random and Gaussian. We study computationally when using 2, 3, 4, 5, and 10 options out of a total of 100 is optimal in these models (though our theoretical analysis is for a more general setting with k choices from n total options as well as a continuous underlying space). One may expect that using more options would always improve performance in this model, but we show that this is not necessarily the case, and that using fewer choices—even just two—can surprisingly be optimal in certain situations. While in theory for this setting it would be optimal to use all 100 options, in practice, this is prohibitive, and it is preferable to utilize a smaller number of options due to humans’ limited computational resources. Our results could have many potential applications, as settings requiring entities to be ranked by humans are ubiquitous. There could also be applications to other fields such as signal or image processing where input values from a large set must be mapped to output values in a smaller set. Full article
(This article belongs to the Special Issue Computational Models of Cognition and Learning)
Show Figures

Figure 1

Article
PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop
Big Data Cogn. Comput. 2019, 3(3), 47; https://doi.org/10.3390/bdcc3030047 - 09 Aug 2019
Cited by 5 | Viewed by 3283
Abstract
Evaluating and predicting the performance of big data applications are required to efficiently size capacities and manage operations. Gaining profound insights into the system architecture, dependencies of components, resource demands, and configurations cause difficulties to engineers. To address these challenges, this paper presents [...] Read more.
Evaluating and predicting the performance of big data applications are required to efficiently size capacities and manage operations. Gaining profound insights into the system architecture, dependencies of components, resource demands, and configurations cause difficulties to engineers. To address these challenges, this paper presents an approach to automatically extract and transform system specifications to predict the performance of applications. It consists of three components. First, a system-and tool-agnostic domain-specific language (DSL) allows the modeling of performance-relevant factors of big data applications, computing resources, and data workload. Second, DSL instances are automatically extracted from monitored measurements of Apache Spark and Apache Hadoop (i.e., YARN and HDFS) systems. Third, these instances are transformed to model- and simulation-based performance evaluation tools to allow predictions. By adapting DSL instances, our approach enables engineers to predict the performance of applications for different scenarios such as changing data input and resources. We evaluate our approach by predicting the performance of linear regression and random forest applications of the HiBench benchmark suite. Simulation results of adjusted DSL instances compared to measurement results show accurate predictions errors below 15% based upon averages for response times and resource utilization. Full article
Show Figures

Figure 1

Article
Future-Ready Strategic Oversight of Multiple Artificial Superintelligence-Enabled Adaptive Learning Systems via Human-Centric Explainable AI-Empowered Predictive Optimizations of Educational Outcomes
Big Data Cogn. Comput. 2019, 3(3), 46; https://doi.org/10.3390/bdcc3030046 - 31 Jul 2019
Cited by 6 | Viewed by 3547
Abstract
Artificial intelligence-enabled adaptive learning systems (AI-ALS) have been increasingly utilized in education. Schools are usually afforded the freedom to deploy the AI-ALS that they prefer. However, even before artificial intelligence autonomously develops into artificial superintelligence in the future, it would be remiss to [...] Read more.
Artificial intelligence-enabled adaptive learning systems (AI-ALS) have been increasingly utilized in education. Schools are usually afforded the freedom to deploy the AI-ALS that they prefer. However, even before artificial intelligence autonomously develops into artificial superintelligence in the future, it would be remiss to entirely leave the students to the AI-ALS without any independent oversight of the potential issues. For example, if the students score well in formative assessments within the AI-ALS but subsequently perform badly in paper-based post-tests, or if the relentless algorithm of a particular AI-ALS is suspected of causing undue stress for the students, they should be addressed by educational stakeholders. Policy makers and educational stakeholders should collaborate to analyze the data from multiple AI-ALS deployed in different schools to achieve strategic oversight. The current paper provides exemplars to illustrate how this future-ready strategic oversight could be implemented using an artificial intelligence-based Bayesian network software to analyze the data from five dissimilar AI-ALS, each deployed in a different school. Besides using descriptive analytics to reveal potential issues experienced by students within each AI-ALS, this human-centric AI-empowered approach also enables explainable predictive analytics of the students’ learning outcomes in paper-based summative assessments after training is completed in each AI-ALS. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
Show Figures

Figure 1

Article
Viability in Multiplex Lexical Networks and Machine Learning Characterizes Human Creativity
Big Data Cogn. Comput. 2019, 3(3), 45; https://doi.org/10.3390/bdcc3030045 - 31 Jul 2019
Cited by 19 | Viewed by 4204
Abstract
Previous studies have shown how individual differences in creativity relate to differences in the structure of semantic memory. However, the latter is only one aspect of the whole mental lexicon, a repository of conceptual knowledge that is considered to simultaneously include multiple types [...] Read more.
Previous studies have shown how individual differences in creativity relate to differences in the structure of semantic memory. However, the latter is only one aspect of the whole mental lexicon, a repository of conceptual knowledge that is considered to simultaneously include multiple types of conceptual similarities. In the current study, we apply a multiplex network approach to compute a representation of the mental lexicon combining semantics and phonology and examine how it relates to individual differences in creativity. This multiplex combination of 150,000 phonological and semantic associations identifies a core of words in the mental lexicon known as viable cluster, a kernel containing simpler to parse, more general, concrete words acquired early during language learning. We focus on low (N = 47) and high (N = 47) creative individuals’ performance in generating animal names during a semantic fluency task. We model this performance as the outcome of a mental navigation on the multiplex lexical network, going within, outside, and in-between the viable cluster. We find that low and high creative individuals differ substantially in their access to the viable cluster during the semantic fluency task. Higher creative individuals tend to access the viable cluster less frequently, with a lower uncertainty/entropy, reaching out to more peripheral words and covering longer multiplex network distances between concepts in comparison to lower creative individuals. We use these differences for constructing a machine learning classifier of creativity levels, which leads to an accuracy of 65.0 ± 0.9 % and an area under the curve of 68.0 ± 0.8 % , which are both higher than the random expectation of 50%. These results highlight the potential relevance of combining psycholinguistic measures with multiplex network models of the mental lexicon for modelling mental navigation and, consequently, classifying people automatically according to their creativity levels. Full article
Show Figures

Figure 1

Article
RazorNet: Adversarial Training and Noise Training on a Deep Neural Network Fooled by a Shallow Neural Network
Big Data Cogn. Comput. 2019, 3(3), 43; https://doi.org/10.3390/bdcc3030043 - 23 Jul 2019
Cited by 4 | Viewed by 3207
Abstract
In this work, we propose ShallowDeepNet, a novel system architecture that includes a shallow and a deep neural network. The shallow neural network has the duty of data preprocessing and generating adversarial samples. The deep neural network has the duty of understanding data [...] Read more.
In this work, we propose ShallowDeepNet, a novel system architecture that includes a shallow and a deep neural network. The shallow neural network has the duty of data preprocessing and generating adversarial samples. The deep neural network has the duty of understanding data and information as well as detecting adversarial samples. The deep neural network gets its weights from transfer learning, adversarial training, and noise training. The system is examined on the biometric (fingerprint and iris) and the pharmaceutical data (pill image). According to the simulation results, the system is capable of improving the detection accuracy of the biometric data from 1.31% to 80.65% when the adversarial data is used and to 93.4% when the adversarial data as well as the noisy data are given to the network. The system performance on the pill image data is increased from 34.55% to 96.03% and then to 98.2%, respectively. Training on different types of noise can benefit us in detecting samples from unknown and unseen adversarial attacks. Meanwhile, the system training on the adversarial data as well as noisy data occurs only once. In fact, retraining the system may improve the performance further. Furthermore, training the system on new types of attacks and noise can help in enhancing the system performance. Full article
Show Figures

Figure 1

Article
Cooking Is Creating Emotion: A Study on Hinglish Sentiments of Youtube Cookery Channels Using Semi-Supervised Approach
Big Data Cogn. Comput. 2019, 3(3), 37; https://doi.org/10.3390/bdcc3030037 - 03 Jul 2019
Cited by 12 | Viewed by 4625
Abstract
The success of Youtube has attracted a lot of users, which results in an increase of the number of comments present on Youtube channels. By analyzing those comments we could provide insight to the Youtubers that would help them to deliver better quality. [...] Read more.
The success of Youtube has attracted a lot of users, which results in an increase of the number of comments present on Youtube channels. By analyzing those comments we could provide insight to the Youtubers that would help them to deliver better quality. Youtube is very popular in India. A majority of the population in India speak and write a mixture of two languages known as Hinglish for casual communication on social media. Our study focuses on the sentiment analysis of Hinglish comments on cookery channels. The unsupervised learning technique DBSCAN was employed in our work to find the different patterns in the comments data. We have modelled and evaluated both parametric and non-parametric learning algorithms. Logistic regression with the term frequency vectorizer gave 74.01% accuracy in Nisha Madulika’s dataset and 75.37% accuracy in Kabita’s Kitchen dataset. Each classifier is statistically tested in our study. Full article
Show Figures

Figure 1

Article
Data-Driven Load Forecasting of Air Conditioners for Demand Response Using Levenberg–Marquardt Algorithm-Based ANN
Big Data Cogn. Comput. 2019, 3(3), 36; https://doi.org/10.3390/bdcc3030036 - 02 Jul 2019
Cited by 15 | Viewed by 3252
Abstract
Air Conditioners (AC) impact in overall electricity consumption in buildings is very high. Therefore, controlling ACs power consumption is a significant factor for demand response. With the advancement in the area of demand side management techniques implementation and smart grid, precise AC load [...] Read more.
Air Conditioners (AC) impact in overall electricity consumption in buildings is very high. Therefore, controlling ACs power consumption is a significant factor for demand response. With the advancement in the area of demand side management techniques implementation and smart grid, precise AC load forecasting for electrical utilities and end-users is required. In this paper, big data analysis and its applications in power systems is introduced. After this, various load forecasting categories and various techniques applied for load forecasting in context of big data analysis in power systems have been explored. Then, Levenberg–Marquardt Algorithm (LMA)-based Artificial Neural Network (ANN) for residential AC short-term load forecasting is presented. This forecasting approach utilizes past hourly temperature observations and AC load as input variables for assessment. Different performance assessment indices have also been investigated. Error formulations have shown that LMA-based ANN presents better results in comparison to Scaled Conjugate Gradient (SCG) and statistical regression approach. Furthermore, information of AC load is obtainable for different time horizons like weekly, hourly, and monthly bases due to better prediction accuracy of LMA-based ANN, which is helpful for efficient demand response (DR) implementation. Full article
Show Figures

Figure 1

Article
Automatic Human Brain Tumor Detection in MRI Image Using Template-Based K Means and Improved Fuzzy C Means Clustering Algorithm
Big Data Cogn. Comput. 2019, 3(2), 27; https://doi.org/10.3390/bdcc3020027 - 13 May 2019
Cited by 41 | Viewed by 4107
Abstract
In recent decades, human brain tumor detection has become one of the most challenging issues in medical science. In this paper, we propose a model that includes the template-based K means and improved fuzzy C means (TKFCM) algorithm for detecting human brain tumors [...] Read more.
In recent decades, human brain tumor detection has become one of the most challenging issues in medical science. In this paper, we propose a model that includes the template-based K means and improved fuzzy C means (TKFCM) algorithm for detecting human brain tumors in a magnetic resonance imaging (MRI) image. In this proposed algorithm, firstly, the template-based K-means algorithm is used to initialize segmentation significantly through the perfect selection of a template, based on gray-level intensity of image; secondly, the updated membership is determined by the distances from cluster centroid to cluster data points using the fuzzy C-means (FCM) algorithm while it contacts its best result, and finally, the improved FCM clustering algorithm is used for detecting tumor position by updating membership function that is obtained based on the different features of tumor image including Contrast, Energy, Dissimilarity, Homogeneity, Entropy, and Correlation. Simulation results show that the proposed algorithm achieves better detection of abnormal and normal tissues in the human brain under small detachment of gray-level intensity. In addition, this algorithm detects human brain tumors within a very short time—in seconds compared to minutes with other algorithms. Full article
Show Figures

Figure 1

Article
AI Governance and the Policymaking Process: Key Considerations for Reducing AI Risk
Big Data Cogn. Comput. 2019, 3(2), 26; https://doi.org/10.3390/bdcc3020026 - 08 May 2019
Cited by 11 | Viewed by 5171
Abstract
This essay argues that a new subfield of AI governance should be explored that examines the policy-making process and its implications for AI governance. A growing number of researchers have begun working on the question of how to mitigate the catastrophic risks of [...] Read more.
This essay argues that a new subfield of AI governance should be explored that examines the policy-making process and its implications for AI governance. A growing number of researchers have begun working on the question of how to mitigate the catastrophic risks of transformative artificial intelligence, including what policies states should adopt. However, this essay identifies a preceding, meta-level problem of how the space of possible policies is affected by the politics and administrative mechanisms of how those policies are created and implemented. This creates a new set of key considerations for the field of AI governance and should influence the action of future policymakers. This essay examines some of the theories of the policymaking process, how they compare to current work in AI governance, and their implications for the field at large and ends by identifying areas of future research. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
Article
Pruning Fuzzy Neural Network Applied to the Construction of Expert Systems to Aid in the Diagnosis of the Treatment of Cryotherapy and Immunotherapy
Big Data Cogn. Comput. 2019, 3(2), 22; https://doi.org/10.3390/bdcc3020022 - 09 Apr 2019
Cited by 14 | Viewed by 2904
Abstract
Human papillomavirus (HPV) infection is related to frequent cases of cervical cancer and genital condyloma in humans. Up to now, numerous methods have come into existence for the prevention and treatment of this disease. In this context, this paper aims to help predict [...] Read more.
Human papillomavirus (HPV) infection is related to frequent cases of cervical cancer and genital condyloma in humans. Up to now, numerous methods have come into existence for the prevention and treatment of this disease. In this context, this paper aims to help predict the susceptibility of the patient to forms treatment using both cryotherapy and immunotherapy. These studies facilitate the choice of medications, which can be painful and embarrassing for patients who have warts on intimate parts. However, the use of intelligent models generates efficient results but does not allow a better interpretation of the results. To solve the problem, we present the method of a fuzzy neural network (FNN). A hybrid model capable of solving complex problems and extracting knowledge from the database will pruned through F-score techniques to perform pattern classification in the treatment of warts, and to produce a specialist system based on if/then rules, according to the experience obtained from the database collected through medical research. Finally, binary pattern-classification tests realized in the FNN and compared with other models commonly used for classification tasks capture results of greater accuracy than the current state of the art for this type of problem (84.32% for immunotherapy, and 88.64% for cryotherapy), and extract fuzzy rules from the problem database. It was found that the hybrid approach based on neural networks and fuzzy systems can be an excellent tool to aid the prediction of cryotherapy and immunotherapy treatments. Full article
(This article belongs to the Special Issue Health Assessment in the Big Data Era)
Show Figures

Figure 1

Communication
Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence
Big Data Cogn. Comput. 2019, 3(2), 21; https://doi.org/10.3390/bdcc3020021 - 05 Apr 2019
Cited by 8 | Viewed by 2950
Abstract
An important challenge for safety in machine learning and artificial intelligence systems is a set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart’s or Campbell’s law. This paper presents additional failure modes for interactions within multi-agent systems [...] Read more.
An important challenge for safety in machine learning and artificial intelligence systems is a set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart’s or Campbell’s law. This paper presents additional failure modes for interactions within multi-agent systems that are closely related. These multi-agent failure modes are more complex, more problematic, and less well understood than the single-agent case, and are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing artificial intelligence (AI), the paper explains why these failure modes are in some senses unavoidable. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of the modes: accidental steering, coordination failures, adversarial misalignment, input spoofing and filtering, and goal co-option or direct hacking. The paper then discusses how extant literature on multi-agent AI fails to address these failure modes, and identifies work which may be useful for the mitigation of these failure modes. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
Article
Big Data Management Canvas: A Reference Model for Value Creation from Data
Big Data Cogn. Comput. 2019, 3(1), 19; https://doi.org/10.3390/bdcc3010019 - 11 Mar 2019
Cited by 12 | Viewed by 8104
Abstract
Many big data projects are technology-driven and thus, expensive and inefficient. It is often unclear how to exploit existing data resources and map data, systems and analytics results to actual use cases. Existing big data reference models are mostly either technological or business-oriented [...] Read more.
Many big data projects are technology-driven and thus, expensive and inefficient. It is often unclear how to exploit existing data resources and map data, systems and analytics results to actual use cases. Existing big data reference models are mostly either technological or business-oriented in nature, but do not consequently align both aspects. To address this issue, a reference model for big data management is proposed that operationalizes value creation from big data by linking business targets with technical implementation. The purpose of this model is to provide a goal- and value-oriented framework to effectively map and plan purposeful big data systems aligned with a clear value proposition. Based on an epistemic model that conceptualizes big data management as a cognitive system, the solution space of data value creation is divided into five layers: preparation, analysis, interaction, effectuation, and intelligence. To operationalize the model, each of these layers is subdivided into corresponding business and IT aspects to create a link from use cases to technological implementation. The resulting reference model, the big data management canvas, can be applied to classify and extend existing big data applications and to derive and plan new big data solutions, visions, and strategies for future projects. To validate the model in the context of existing information systems, the paper describes three cases of big data management in existing companies. Full article
Show Figures

Figure 1

Article
Global Solutions vs. Local Solutions for the AI Safety Problem
Big Data Cogn. Comput. 2019, 3(1), 16; https://doi.org/10.3390/bdcc3010016 - 20 Feb 2019
Cited by 5 | Viewed by 2764
Abstract
There are two types of artificial general intelligence (AGI) safety solutions: global and local. Most previously suggested solutions are local: they explain how to align or “box” a specific AI (Artificial Intelligence), but do not explain how to prevent the creation of dangerous [...] Read more.
There are two types of artificial general intelligence (AGI) safety solutions: global and local. Most previously suggested solutions are local: they explain how to align or “box” a specific AI (Artificial Intelligence), but do not explain how to prevent the creation of dangerous AI in other places. Global solutions are those that ensure any AI on Earth is not dangerous. The number of suggested global solutions is much smaller than the number of proposed local solutions. Global solutions can be divided into four groups: 1. No AI: AGI technology is banned or its use is otherwise prevented; 2. One AI: the first superintelligent AI is used to prevent the creation of any others; 3. Net of AIs as AI police: a balance is created between many AIs, so they evolve as a net and can prevent any rogue AI from taking over the world; 4. Humans inside AI: humans are augmented or part of AI. We explore many ideas, both old and new, regarding global solutions for AI safety. They include changing the number of AI teams, different forms of “AI Nanny” (non-self-improving global control AI system able to prevent creation of dangerous AIs), selling AI safety solutions, and sending messages to future AI. Not every local solution scales to a global solution or does it ethically and safely. The choice of the best local solution should include understanding of the ways in which it will be scaled up. Human-AI teams or a superintelligent AI Service as suggested by Drexler may be examples of such ethically scalable local solutions, but the final choice depends on some unknown variables such as the speed of AI progress. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
Article
Intelligent Recommender System for Big Data Applications Based on the Random Neural Network
Big Data Cogn. Comput. 2019, 3(1), 15; https://doi.org/10.3390/bdcc3010015 - 18 Feb 2019
Cited by 5 | Viewed by 1888
Abstract
Online market places make their profit based on their advertisements or sales commission while businesses have the commercial interest to rank higher on recommendations to attract more customers. Web users cannot be guaranteed that the products provided by recommender systems within Big Data [...] Read more.
Online market places make their profit based on their advertisements or sales commission while businesses have the commercial interest to rank higher on recommendations to attract more customers. Web users cannot be guaranteed that the products provided by recommender systems within Big Data are either exhaustive or relevant to their needs. This article analyses the product rank relevance provided by different commercial Big Data recommender systems (Grouplens film, Trip Advisor and Amazon); it also proposes an Intelligent Recommender System (IRS) based on the Random Neural Network; IRS acts as an interface between the customer and the different Recommender Systems that iteratively adapts to the perceived user relevance. In addition, a relevance metric that combines both relevance and rank is presented; this metric is used to validate and compare the performance of the proposed algorithm. On average, IRS outperforms the Big Data recommender systems after learning iteratively from its customer. Full article
(This article belongs to the Special Issue Big-Data Driven Multi-Criteria Decision-Making)
Show Figures

Figure 1

Article
Modelling Early Word Acquisition through Multiplex Lexical Networks and Machine Learning
Big Data Cogn. Comput. 2019, 3(1), 10; https://doi.org/10.3390/bdcc3010010 - 24 Jan 2019
Cited by 17 | Viewed by 2977
Abstract
Early language acquisition is a complex cognitive task. Recent data-informed approaches showed that children do not learn words uniformly at random but rather follow specific strategies based on the associative representation of words in the mental lexicon, a conceptual system enabling human cognitive [...] Read more.
Early language acquisition is a complex cognitive task. Recent data-informed approaches showed that children do not learn words uniformly at random but rather follow specific strategies based on the associative representation of words in the mental lexicon, a conceptual system enabling human cognitive computing. Building on this evidence, the current investigation introduces a combination of machine learning techniques, psycholinguistic features (i.e., frequency, length, polysemy and class) and multiplex lexical networks, representing the semantics and phonology of the mental lexicon, with the aim of predicting normative acquisition of 529 English words by toddlers between 22 and 26 months. Classifications using logistic regression and based on four psycholinguistic features achieve the best baseline cross-validated accuracy of 61.7% when half of the words have been acquired. Adding network information through multiplex closeness centrality enhances accuracy (up to 67.7%) more than adding multiplex neighbourhood density/degree (62.4%) or multiplex PageRank versatility (63.0%) or the best single-layer network metric, i.e., free association degree (65.2%), instead. Multiplex closeness operationalises the structural relevance of words for semantic and phonological information flow. These results indicate that the whole, global, multi-level flow of information and structure of the mental lexicon influence word acquisition more than single-layer or local network features of words when considered in conjunction with language norms. The highlighted synergy of multiplex lexical structure and psycholinguistic norms opens new ways for understanding human cognition and language processing through powerful and data-parsimonious cognitive computing approaches. Full article
(This article belongs to the Special Issue Computational Models of Cognition and Learning)
Show Figures

Figure 1

Article
Fog Computing for Internet of Things (IoT)-Aided Smart Grid Architectures
Big Data Cogn. Comput. 2019, 3(1), 8; https://doi.org/10.3390/bdcc3010008 - 19 Jan 2019
Cited by 34 | Viewed by 3397
Abstract
The fast-paced development of power systems necessitates the smart grid (SG) to facilitate real-time control and monitoring with bidirectional communication and electricity flows. In order to meet the computational requirements for SG applications, cloud computing (CC) provides flexible resources and services shared in [...] Read more.
The fast-paced development of power systems necessitates the smart grid (SG) to facilitate real-time control and monitoring with bidirectional communication and electricity flows. In order to meet the computational requirements for SG applications, cloud computing (CC) provides flexible resources and services shared in network, parallel processing, and omnipresent access. Even though CC model is considered to be efficient for SG, it fails to guarantee the Quality-of-Experience (QoE) requirements for the SG services, viz. latency, bandwidth, energy consumption, and network cost. Fog Computing (FC) extends CC by deploying localized computing and processing facilities into the edge of the network, offering location-awareness, low latency, and latency-sensitive analytics for mission critical requirements of SG applications. By deploying localized computing facilities at the premise of users, it pre-stores the cloud data and distributes to SG users with fast-rate local connections. In this paper, we first examine the current state of cloud based SG architectures and highlight the motivation(s) for adopting FC as a technology enabler for real-time SG analytics. We also present a three layer FC-based SG architecture, characterizing its features towards integrating massive number of Internet of Things (IoT) devices into future SG. We then propose a cost optimization model for FC that jointly investigates data consumer association, workload distribution, virtual machine placement and Quality-of-Service (QoS) constraints. The formulated model is a Mixed-Integer Nonlinear Programming (MINLP) problem which is solved using Modified Differential Evolution (MDE) algorithm. We evaluate the proposed framework on real world parameters and show that for a network with approximately 50% time critical applications, the overall service latency for FC is nearly half to that of cloud paradigm. We also observed that the FC lowers the aggregated power consumption of the generic CC model by more than 44%. Full article
Show Figures

Figure 1

Article
An Enhanced Inference Algorithm for Data Sampling Efficiency and Accuracy Using Periodic Beacons and Optimization
Big Data Cogn. Comput. 2019, 3(1), 7; https://doi.org/10.3390/bdcc3010007 - 16 Jan 2019
Cited by 1 | Viewed by 1696
Abstract
Transferring data from a sensor or monitoring device in electronic health, vehicular informatics, or Internet of Things (IoT) networks has had the enduring challenge of improving data accuracy with relative efficiency. Previous works have proposed the use of an inference system at the [...] Read more.
Transferring data from a sensor or monitoring device in electronic health, vehicular informatics, or Internet of Things (IoT) networks has had the enduring challenge of improving data accuracy with relative efficiency. Previous works have proposed the use of an inference system at the sensor device to minimize the data transfer frequency as well as the size of data to save network usage and battery resources. This has been implemented using various algorithms in sampling and inference, with a tradeoff between accuracy and efficiency. This paper proposes to enhance the accuracy without compromising efficiency by introducing new algorithms in sampling through a hybrid inference method. The experimental results show that accuracy can be significantly improved, whilst the efficiency is not diminished. These algorithms will contribute to saving operation and maintenance costs in data sampling, where resources of computational and battery are constrained and limited, such as in wireless personal area networks emerged with IoT networks. Full article
Show Figures

Figure 1

Article
The Next Generation Cognitive Security Operations Center: Adaptive Analytic Lambda Architecture for Efficient Defense against Adversarial Attacks
Big Data Cogn. Comput. 2019, 3(1), 6; https://doi.org/10.3390/bdcc3010006 - 10 Jan 2019
Cited by 17 | Viewed by 4087
Abstract
A Security Operations Center (SOC) is a central technical level unit responsible for monitoring, analyzing, assessing, and defending an organization’s security posture on an ongoing basis. The SOC staff works closely with incident response teams, security analysts, network engineers and organization managers using [...] Read more.
A Security Operations Center (SOC) is a central technical level unit responsible for monitoring, analyzing, assessing, and defending an organization’s security posture on an ongoing basis. The SOC staff works closely with incident response teams, security analysts, network engineers and organization managers using sophisticated data processing technologies such as security analytics, threat intelligence, and asset criticality to ensure security issues are detected, analyzed and finally addressed quickly. Those techniques are part of a reactive security strategy because they rely on the human factor, experience and the judgment of security experts, using supplementary technology to evaluate the risk impact and minimize the attack surface. This study suggests an active security strategy that adopts a vigorous method including ingenuity, data analysis, processing and decision-making support to face various cyber hazards. Specifically, the paper introduces a novel intelligence driven cognitive computing SOC that is based exclusively on progressive fully automatic procedures. The proposed λ-Architecture Network Flow Forensics Framework (λ-ΝF3) is an efficient cybersecurity defense framework against adversarial attacks. It implements the Lambda machine learning architecture that can analyze a mixture of batch and streaming data, using two accurate novel computational intelligence algorithms. Specifically, it uses an Extreme Learning Machine neural network with Gaussian Radial Basis Function kernel (ELM/GRBFk) for the batch data analysis and a Self-Adjusting Memory k-Nearest Neighbors classifier (SAM/k-NN) to examine patterns from real-time streams. It is a forensics tool for big data that can enhance the automate defense strategies of SOCs to effectively respond to the threats their environments face. Full article
Show Figures

Figure 1

Back to TopTop