Big Data and Cognitive Computing

Article

20 pages, 3064 KiB

Open AccessEditor’s ChoiceArticle

Explaining Intrusion Detection-Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP)

by Remah Younisse, Ashraf Ahmad and Qasem Abu Al-Haija

Big Data Cogn. Comput. 2022, 6(4), 126; https://doi.org/10.3390/bdcc6040126 - 25 Oct 2022

Cited by 44 | Viewed by 5320

Artificial intelligence (AI) and machine learning (ML) models have become essential tools used in many critical systems to make significant decisions; the decisions taken by these models need to be trusted and explained on many occasions. On the other hand, the performance of [...] Read more.

Artificial intelligence (AI) and machine learning (ML) models have become essential tools used in many critical systems to make significant decisions; the decisions taken by these models need to be trusted and explained on many occasions. On the other hand, the performance of different ML and AI models varies with the same used dataset. Sometimes, developers have tried to use multiple models before deciding which model should be used without understanding the reasons behind this variance in performance. Explainable artificial intelligence (XAI) models have presented an explanation for the models’ performance based on highlighting the features that the model considered necessary while making the decision. This work presents an analytical approach to studying the density functions for intrusion detection dataset features. The study explains how and why these features are essential during the XAI process. We aim, in this study, to explain XAI behavior to add an extra layer of explainability. The density function analysis presented in this paper adds a deeper understanding of the importance of features in different AI models. Specifically, we present a method to explain the results of SHAP (Shapley additive explanations) for different machine learning models based on the feature data’s KDE (kernel density estimation) plots. We also survey the specifications of dataset features that can perform better for convolutional neural networks (CNN) based models. Full article

(This article belongs to the Special Issue Machine Learning for Dependable Edge Computing Systems and Services)

► Show Figures

Figure 1

15 pages, 1058 KiB

Open AccessEditor’s ChoiceArticle

White Blood Cell Classification Using Multi-Attention Data Augmentation and Regularization

by Nasrin Bayat, Diane D. Davey, Melanie Coathup and Joon-Hyuk Park

Big Data Cogn. Comput. 2022, 6(4), 122; https://doi.org/10.3390/bdcc6040122 - 21 Oct 2022

Cited by 13 | Viewed by 6192

Abstract

Accurate and robust human immune system assessment through white blood cell evaluation require computer-aided tools with pathologist-level accuracy. This work presents a multi-attention leukocytes subtype classification method by leveraging fine-grained and spatial locality attributes of white blood cell. The proposed framework comprises three [...] Read more.

Accurate and robust human immune system assessment through white blood cell evaluation require computer-aided tools with pathologist-level accuracy. This work presents a multi-attention leukocytes subtype classification method by leveraging fine-grained and spatial locality attributes of white blood cell. The proposed framework comprises three main components: texture-aware/attention map generation blocks, attention regularization, and attention-based data augmentation. The developed framework is applicable to general CNN-based architectures and enhances decision making by paying specific attention to the discriminative regions of a white blood cell. The performance of the proposed method/model was evaluated through an extensive set of experiments and validation. The obtained results demonstrate the superior performance of the model achieving 99.69 % accuracy compared to other state-of-the-art approaches. The proposed model is a good alternative and complementary to existing computer diagnosis tools to assist pathologists in evaluating white blood cells from blood smear images. Full article

(This article belongs to the Special Issue Data Science in Health Care)

► Show Figures

Figure 1

24 pages, 1286 KiB

Open AccessEditor’s ChoiceArticle

Ontology-Based Personalized Job Recommendation Framework for Migrants and Refugees

by Dimos Ntioudis, Panagiota Masa, Anastasios Karakostas, Georgios Meditskos, Stefanos Vrochidis and Ioannis Kompatsiaris

Big Data Cogn. Comput. 2022, 6(4), 120; https://doi.org/10.3390/bdcc6040120 - 19 Oct 2022

Cited by 13 | Viewed by 3837

Abstract

Participation in the labor market is seen as the most important factor favoring long-term integration of migrants and refugees into society. This paper describes the job recommendation framework of the Integration of Migrants MatchER SErvice (IMMERSE). The proposed framework acts as a matching [...] Read more.

Participation in the labor market is seen as the most important factor favoring long-term integration of migrants and refugees into society. This paper describes the job recommendation framework of the Integration of Migrants MatchER SErvice (IMMERSE). The proposed framework acts as a matching tool that enables the contexts of individual migrants and refugees, including their expectations, languages, educational background, previous job experience and skills, to be captured in the ontology and facilitate their matching with the job opportunities available in their host country. Profile information and job listings are processed in real time in the back-end, and matches are revealed in the front-end. Moreover, the matching tool considers the activity of the users on the platform to provide recommendations based on the similarity among existing jobs that they already showed interest in and new jobs posted on the platform. Finally, the framework takes into account the location of the users to rank the results and only shows the most relevant location-based recommendations. Full article

(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)

► Show Figures

Figure 1

14 pages, 307 KiB

Open AccessEditor’s ChoiceArticle

A Survey on Medical Image Segmentation Based on Deep Learning Techniques

by Jayashree Moorthy and Usha Devi Gandhi

Big Data Cogn. Comput. 2022, 6(4), 117; https://doi.org/10.3390/bdcc6040117 - 17 Oct 2022

Cited by 31 | Viewed by 8859

Abstract

Deep learning techniques have rapidly become important as a preferred method for evaluating medical image segmentation. This survey analyses different contributions in the deep learning medical field, including the major common issues published in recent years, and also discusses the fundamentals of deep [...] Read more.

Deep learning techniques have rapidly become important as a preferred method for evaluating medical image segmentation. This survey analyses different contributions in the deep learning medical field, including the major common issues published in recent years, and also discusses the fundamentals of deep learning concepts applicable to medical image segmentation. The study of deep learning can be applied to image categorization, object recognition, segmentation, registration, and other tasks. First, the basic ideas of deep learning techniques, applications, and frameworks are introduced. Deep learning techniques that operate the ideal applications are briefly explained. This paper indicates that there is a previous experience with different techniques in the class of medical image segmentation. Deep learning has been designed to describe and respond to various challenges in the field of medical image analysis such as low accuracy of image classification, low segmentation resolution, and poor image enhancement. Aiming to solve these present issues and improve the evolution of medical image segmentation challenges, we provide suggestions for future research. Full article

(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)

► Show Figures

Figure 1

40 pages, 4281 KiB

Open AccessEditor’s ChoiceArticle

A Probabilistic Data Fusion Modeling Approach for Extracting True Values from Uncertain and Conflicting Attributes

by Ashraf Jaradat, Fadi Safieddine, Aziz Deraman, Omar Ali, Ahmad Al-Ahmad and Yehia Ibrahim Alzoubi

Big Data Cogn. Comput. 2022, 6(4), 114; https://doi.org/10.3390/bdcc6040114 - 13 Oct 2022

Cited by 6 | Viewed by 3325

Abstract

Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task [...] Read more.

Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task is called data fusion. In this paper, we deal with the problem of data fusion based on probabilistic entity linkage and uncertainty management in conflict data. Data fusion has been widely explored in the research community. However, concerns such as explicit uncertainty management and on-demand data fusion, which can cope with dynamic data sources, have not been studied well. This paper proposes a new probabilistic data fusion modeling approach that attempts to find true data values under conditions of uncertain or conflicted multi-valued attributes. These attributes are generated from the probabilistic linkage and merging alternatives of multi-corresponding entities. Consequently, the paper identifies and formulates several data fusion cases and sample spaces that require further conditional computation using our computational fusion method. The identification is established to fit with a real-world data fusion problem. In the real world, there is always the possibility of heterogeneous data sources, the integration of probabilistic entities, single or multiple truth values for certain attributes, and different combinations of attribute values as alternatives for each generated entity. We validate our probabilistic data fusion approach through mathematical representation based on three data sources with different reliability scores. The validity of the approach was assessed via implementation into our probabilistic integration system to show how it can manage and resolve different cases of data conflicts and inconsistencies. The outcome showed improved accuracy in identifying true values due to the association of constructive evidence. Full article

► Show Figures

Figure 1

44 pages, 5439 KiB

Open AccessEditor’s ChoiceArticle

Graph-Based Conversation Analysis in Social Media

by Marco Brambilla, Alireza Javadian Sabet, Kalyani Kharmale and Amin Endah Sulistiawati

Big Data Cogn. Comput. 2022, 6(4), 113; https://doi.org/10.3390/bdcc6040113 - 12 Oct 2022

Cited by 6 | Viewed by 7809

Abstract

Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on [...] Read more.

Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on user relationships or on the shared content, while ignoring the valuable information hidden in the digital conversations, in terms of structure of the discussion and relation between contents, which is essential for understanding online communication behavior. This work proposes a graph-based framework to assess the shape and structure of online conversations. The analysis was composed of two main stages: intent analysis and network generation. Users’ intention was detected using keyword-based classification, followed by the implementation of machine learning-based classification algorithms for uncategorized comments. Afterwards, human-in-the-loop was involved in improving the keyword-based classification. To extract essential information on social media communication patterns among the users, we built conversation graphs using a directed multigraph network and we show our model at work in two real-life experiments. The first experiment used data from a real social media challenge and it was able to categorize 90% of comments with 98% accuracy. The second experiment focused on COVID vaccine-related discussions in online forums and investigated the stance and sentiment to understand how the comments are affected by their parent discussion. Finally, the most popular online discussion patterns were mined and interpreted. We see that the dynamics obtained from conversation graphs are similar to traditional communication activities. Full article

(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)

► Show Figures

Figure 1

33 pages, 2658 KiB

Open AccessEditor’s ChoiceArticle

Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis

by Bhushan Zope, Sashikala Mishra, Kailash Shaw, Deepali Rahul Vora, Ketan Kotecha and Ranjeet Vasant Bidwe

Big Data Cogn. Comput. 2022, 6(4), 109; https://doi.org/10.3390/bdcc6040109 - 7 Oct 2022

Cited by 19 | Viewed by 8186

Abstract

Question Answer System (QAS) automatically answers the question asked in natural language. Due to the varying dimensions and approaches that are available, QAS has a very diverse solution space, and a proper bibliometric study is required to paint the entire domain space. This [...] Read more.

Question Answer System (QAS) automatically answers the question asked in natural language. Due to the varying dimensions and approaches that are available, QAS has a very diverse solution space, and a proper bibliometric study is required to paint the entire domain space. This work presents a bibliometric and literature analysis of QAS. Scopus and Web of Science are two well-known research databases used for the study. A systematic analytical study comprising performance analysis and science mapping is performed. Recent research trends, seminal work, and influential authors are identified in performance analysis using statistical tools on research constituents. On the other hand, science mapping is performed using network analysis on a citation and co-citation network graph. Through this analysis, the domain’s conceptual evolution and intellectual structure are shown. We have divided the literature into four important architecture types and have provided the literature analysis of Knowledge Base (KB)-based and GNN-based approaches for QAS. Full article

► Show Figures

Figure 1

13 pages, 2235 KiB

Open AccessEditor’s ChoiceArticle

Deep Learning-Based Computer-Aided Classification of Amniotic Fluid Using Ultrasound Images from Saudi Arabia

by Irfan Ullah Khan, Nida Aslam, Fatima M. Anis, Samiha Mirza, Alanoud AlOwayed, Reef M. Aljuaid, Razan M. Bakr and Nourah Hasan Al Qahtani

Big Data Cogn. Comput. 2022, 6(4), 107; https://doi.org/10.3390/bdcc6040107 - 3 Oct 2022

Cited by 9 | Viewed by 3507

Abstract

Amniotic Fluid (AF) refers to a protective liquid surrounding the fetus inside the amniotic sac, serving multiple purposes, and hence is a key indicator of fetal health. Determining the AF levels at an early stage helps to ascertain the maturation of lungs and [...] Read more.

Amniotic Fluid (AF) refers to a protective liquid surrounding the fetus inside the amniotic sac, serving multiple purposes, and hence is a key indicator of fetal health. Determining the AF levels at an early stage helps to ascertain the maturation of lungs and gastrointestinal development, etc. Low AF entails the risk of premature birth, perinatal mortality, and thereby admission to intensive care unit (ICU). Moreover, AF level is also a critical factor in determining early deliveries. Hence, AF detection is a vital measurement required during early ultrasound (US), and its automation is essential. The detection of AF is usually a time-consuming process as it is patient specific. Furthermore, its measurement and accuracy are prone to errors as it heavily depends on the sonographer’s experience. However, automating this process by developing robust, precise, and effective methods for detection will be beneficial to the healthcare community. Therefore, in this paper, we utilized transfer learning models in order to classify the AF levels as normal or abnormal using the US images. The dataset used consisted of 166 US images of pregnant women, and initially the dataset was preprocessed before training the model. Five transfer learning models, namely, Xception, Densenet, InceptionResNet, MobileNet, and ResNet, were applied. The results showed that MobileNet achieved an overall accuracy of 0.94. Overall, the proposed study produces an effective result in successfully classifying the AF levels, thereby building automated, effective models reliant on transfer learning in order to aid sonographers in evaluating fetal health. Full article

(This article belongs to the Special Issue Data Science in Health Care)

► Show Figures

Figure 1

17 pages, 935 KiB

Open AccessEditor’s ChoiceArticle

Supporting Meteorologists in Data Analysis through Knowledge-Based Recommendations

by Thoralf Reis, Tim Funke, Sebastian Bruchhaus, Florian Freund, Marco X. Bornschlegl and Matthias L. Hemmje

Big Data Cogn. Comput. 2022, 6(4), 103; https://doi.org/10.3390/bdcc6040103 - 28 Sep 2022

Cited by 3 | Viewed by 2817

Abstract

Climate change means coping directly or indirectly with extreme weather conditions for everybody. Therefore, analyzing meteorological data to create precise models is gaining more importance and might become inevitable. Meteorologists have extensive domain knowledge about meteorological data yet lack practical data analysis skills. [...] Read more.

Climate change means coping directly or indirectly with extreme weather conditions for everybody. Therefore, analyzing meteorological data to create precise models is gaining more importance and might become inevitable. Meteorologists have extensive domain knowledge about meteorological data yet lack practical data analysis skills. This paper presents a method to bridge this gap by empowering the data knowledge carriers to analyze the data. The proposed system utilizes symbolic AI, a knowledge base created by experts, and a recommendation expert system to offer suiting data analysis methods or data pre-processing to meteorologists. This paper systematically analyzes the target user group of meteorologists and practical use cases to arrive at a conceptual and technical system design implemented in the CAMeRI prototype. The concepts in this paper are aligned with the AI2VIS4BigData Reference Model and comprise a novel first-order logic knowledge base that represents analysis methods and related pre-processings. The prototype implementation was qualitatively and quantitatively evaluated. This evaluation included recommendation validation for real-world data, a cognitive walkthrough, and measuring computation timings of the different system components. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Figure 1

42 pages, 4691 KiB

Open AccessEditor’s ChoiceArticle

An Improved African Vulture Optimization Algorithm for Feature Selection Problems and Its Application of Sentiment Analysis on Movie Reviews

by Aitak Shaddeli, Farhad Soleimanian Gharehchopogh, Mohammad Masdari and Vahid Solouk

Big Data Cogn. Comput. 2022, 6(4), 104; https://doi.org/10.3390/bdcc6040104 - 28 Sep 2022

Cited by 19 | Viewed by 4264

Abstract

The African Vulture Optimization Algorithm (AVOA) is inspired by African vultures’ feeding and orienting behaviors. It comprises powerful operators while maintaining the balance of exploration and efficiency in solving optimization problems. To be used in discrete applications, this algorithm needs to be discretized. [...] Read more.

The African Vulture Optimization Algorithm (AVOA) is inspired by African vultures’ feeding and orienting behaviors. It comprises powerful operators while maintaining the balance of exploration and efficiency in solving optimization problems. To be used in discrete applications, this algorithm needs to be discretized. This paper introduces two versions based on the S-shaped and V-shaped transfer functions of AVOA and BAOVAH. Moreover, the increase in computational complexity is avoided. Disruption operator and Bitwise strategy have also been used to maximize this model’s performance. A multi-strategy version of the AVOA called BAVOA-v1 is presented. In the proposed approach, i.e., BAVOA-v1, different strategies such as IPRS, mutation neighborhood search strategy (MNSS) (balance between exploration and exploitation), multi-parent crossover (increasing exploitation), and Bitwise (increasing diversity and exploration) are used to provide solutions with greater variety and to assure the quality of solutions. The proposed methods are evaluated on 30 UCI datasets with different dimensions. The simulation results showed that the proposed BAOVAH algorithm performed better than other binary meta-heuristic algorithms. So that the proposed BAOVAH algorithm set is the most accurate in 67% of the data set, and 93% of the data set is the best value of the fitness functions. In terms of feature selection, it has shown high performance. Finally, the proposed method in a case study to determine the number of neurons and the activator function to improve deep learning results was used in the sentiment analysis of movie viewers. In this paper, the CNNEM model is designed. The results of experiments on three datasets of sentiment analysis—IMDB, Amazon, and Yelp—show that the BAOVAH algorithm increases the accuracy of the CNNEM network in the IMDB dataset by 6%, the Amazon dataset by 33%, and the Yelp dataset by 30%. Full article

► Show Figures

Figure 1

20 pages, 3926 KiB

Open AccessEditor’s ChoiceArticle

An Efficient and Secure Big Data Storage in Cloud Environment by Using Triple Data Encryption Standard

by Mohan Naik Ramachandra, Madala Srinivasa Rao, Wen Cheng Lai, Bidare Divakarachari Parameshachari, Jayachandra Ananda Babu and Kivudujogappa Lingappa Hemalatha

Big Data Cogn. Comput. 2022, 6(4), 101; https://doi.org/10.3390/bdcc6040101 - 26 Sep 2022

Cited by 90 | Viewed by 7320

Abstract

In recent decades, big data analysis has become the most important research topic. Hence, big data security offers Cloud application security and monitoring to host highly sensitive data to support Cloud platforms. However, the privacy and security of big data has become an [...] Read more.

In recent decades, big data analysis has become the most important research topic. Hence, big data security offers Cloud application security and monitoring to host highly sensitive data to support Cloud platforms. However, the privacy and security of big data has become an emerging issue that restricts the organization to utilize Cloud services. The existing privacy preserving approaches showed several drawbacks such as a lack of data privacy and accurate data analysis, a lack of efficiency of performance, and completely rely on third party. In order to overcome such an issue, the Triple Data Encryption Standard (TDES) methodology is proposed to provide security for big data in the Cloud environment. The proposed TDES methodology provides a relatively simpler technique by increasing the sizes of keys in Data Encryption Standard (DES) to protect against attacks and defend the privacy of data. The experimental results showed that the proposed TDES method is effective in providing security and privacy to big healthcare data in the Cloud environment. The proposed TDES methodology showed less encryption and decryption time compared to the existing Intelligent Framework for Healthcare Data Security (IFHDS) method. Full article

► Show Figures

Figure 1

23 pages, 6422 KiB

Open AccessEditor’s ChoiceArticle

Triggers and Tweets: Implicit Aspect-Based Sentiment and Emotion Analysis of Community Chatter Relevant to Education Post-COVID-19

by Heba Ismail, Ashraf Khalil, Nada Hussein and Rawan Elabyad

Big Data Cogn. Comput. 2022, 6(3), 99; https://doi.org/10.3390/bdcc6030099 - 16 Sep 2022

Cited by 11 | Viewed by 4300

Abstract

This research proposes a well-being analytical framework using social media chatter data. The proposed framework infers analytics and provides insights into the public’s well-being relevant to education throughout and post the COVID-19 pandemic through a comprehensive Emotion and Aspect-based Sentiment Analysis (ABSA). Moreover, [...] Read more.

This research proposes a well-being analytical framework using social media chatter data. The proposed framework infers analytics and provides insights into the public’s well-being relevant to education throughout and post the COVID-19 pandemic through a comprehensive Emotion and Aspect-based Sentiment Analysis (ABSA). Moreover, this research aims to examine the variability in emotions of students, parents, and faculty toward the e-learning process over time and across different locations. The proposed framework curates Twitter chatter data relevant to the education sector, identifies tweets with the sentiment, and then identifies the exact emotion and emotional triggers associated with those feelings through implicit ABSA. The produced analytics are then factored by location and time to provide more comprehensive insights that aim to assist the decision-makers and personnel in the educational sector enhance and adapt the educational process during and following the pandemic and looking toward the future. The experimental results for emotion classification show that the Linear Support Vector Classifier (SVC) outperformed other classifiers in terms of overall accuracy, precision, recall, and F-measure of 91%. Moreover, the Logistic Regression classifier outperformed all other classifiers in terms of overall accuracy, recall, an F-measure of 81%, and precision of 83% for aspect classification. In online experiments using UAE COVID-19 education-related data, the analytics show high relevance with the public concerns around the education process that were reported during the experiment’s timeframe. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

15 pages, 364 KiB

Open AccessEditor’s ChoiceArticle

Machine Learning Techniques for Chronic Kidney Disease Risk Prediction

by Elias Dritsas and Maria Trigka

Big Data Cogn. Comput. 2022, 6(3), 98; https://doi.org/10.3390/bdcc6030098 - 14 Sep 2022

Cited by 84 | Viewed by 10966

Abstract

Chronic kidney disease (CKD) is a condition characterized by progressive loss of kidney function over time. It describes a clinical entity that causes kidney damage and affects the general health of the human body. Improper diagnosis and treatment of the disease can eventually [...] Read more.

Chronic kidney disease (CKD) is a condition characterized by progressive loss of kidney function over time. It describes a clinical entity that causes kidney damage and affects the general health of the human body. Improper diagnosis and treatment of the disease can eventually lead to end-stage renal disease and ultimately lead to the patient’s death. Machine Learning (ML) techniques have acquired an important role in disease prediction and are a useful tool in the field of medical science. In the present research work, we aim to build efficient tools for predicting CKD occurrence, following an approach which exploits ML techniques. More specifically, first, we apply class balancing in order to tackle the non-uniform distribution of the instances in the two classes, then features ranking and analysis are performed, and finally, several ML models are trained and evaluated based on various performance metrics. The derived results highlighted the Rotation Forest (RotF), which prevailed in relation to compared models with an Area Under the Curve (AUC) of 100%, Precision, Recall, F-Measure and Accuracy equal to 99.2%. Full article

(This article belongs to the Special Issue Digital Health and Data Analytics in Public Health)

► Show Figures

Figure 1

21 pages, 6061 KiB

Open AccessEditor’s ChoiceArticle

Improving Real Estate Rental Estimations with Visual Data

by Ilia Azizi and Iegor Rudnytskyi

Big Data Cogn. Comput. 2022, 6(3), 96; https://doi.org/10.3390/bdcc6030096 - 9 Sep 2022

Cited by 5 | Viewed by 4377

Abstract

Multi-modal data are widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is [...] Read more.

Multi-modal data are widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is possible to improve the performance of the pricing model using additional unstructured data, namely images of the property and satellite images. We compare four models based on the type of input data they use: (1) tabular data only, (2) tabular data and property images, (3) tabular data and satellite images, and (4) tabular data and a combination of property and satellite images. In a supervised context, the branches of dedicated neural networks for each data type are fused (concatenated) to predict log rental prices. The novel dataset devised for the study (SRED) consists of 11,105 flat rentals advertised over the internet in Switzerland. The results reveal that using all three sources of data generally outperforms machine learning models built on only tabular information. The findings pave the way for further research on integrating other non-structured inputs, for instance, the textual descriptions of properties. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

21 pages, 7858 KiB

Open AccessEditor’s ChoiceArticle

Multimodal Emotional Classification Based on Meaningful Learning

by Hajar Filali, Jamal Riffi, Chafik Boulealam, Mohamed Adnane Mahraz and Hamid Tairi

Big Data Cogn. Comput. 2022, 6(3), 95; https://doi.org/10.3390/bdcc6030095 - 8 Sep 2022

Cited by 9 | Viewed by 4035

Abstract

Emotion recognition has become one of the most researched subjects in the scientific community, especially in the human–computer interface field. Decades of scientific research have been conducted on unimodal emotion analysis, whereas recent contributions concentrate on multimodal emotion recognition. These efforts have achieved [...] Read more.

Emotion recognition has become one of the most researched subjects in the scientific community, especially in the human–computer interface field. Decades of scientific research have been conducted on unimodal emotion analysis, whereas recent contributions concentrate on multimodal emotion recognition. These efforts have achieved great success in terms of accuracy in diverse areas of Deep Learning applications. To achieve better performance for multimodal emotion recognition systems, we exploit Meaningful Neural Network Effectiveness to enable emotion prediction during a conversation. Using the text and the audio modalities, we proposed feature extraction methods based on Deep Learning. Then, the bimodal modality that is created following the fusion of the text and audio features is used. The feature vectors from these three modalities are assigned to feed a Meaningful Neural Network to separately learn each characteristic. Its architecture consists of a set of neurons for each component of the input vector before combining them all together in the last layer. Our model was evaluated on a multimodal and multiparty dataset for emotion recognition in conversation MELD. The proposed approach reached an accuracy of 86.69%, which significantly outperforms all current multimodal systems. To sum up, several evaluation techniques applied to our work demonstrate the robustness and superiority of our model over other state-of-the-art MELD models. Full article

► Show Figures

Figure 1

14 pages, 421 KiB

Open AccessEditor’s ChoiceArticle

Hierarchical Co-Attention Selection Network for Interpretable Fake News Detection

by Xiaoyi Ge, Shuai Hao, Yuxiao Li, Bin Wei and Mingshu Zhang

Big Data Cogn. Comput. 2022, 6(3), 93; https://doi.org/10.3390/bdcc6030093 - 5 Sep 2022

Cited by 3 | Viewed by 4619

Abstract

Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable [...] Read more.

Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable success in interpretable fake news detection. However, individuals’ judgments of news are usually hierarchical, prioritizing valuable words above essential sentences, which is neglected by existing fake news detection models. In this paper, we propose an interpretable novel neural network-based model, the hierarchical co-attention selection network (HCSN), to predict whether the source post is fake, as well as an explanation that emphasizes important comments and particular words. The key insight of the HCSN model is to incorporate the Gumbel–Max trick in the hierarchical co-attention selection mechanism that captures sentence-level and word-level information from the source post and comments following the sequence of words–sentences–words–event. In addition, HCSN enjoys the additional benefit of interpretability—it provides a conscious explanation of how it reaches certain results by selecting comments and highlighting words. According to the experiments conducted on real-world datasets, our model outperformed state-of-the-art methods and generated reasonable explanations. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 705 KiB

Open AccessEditor’s ChoiceArticle

PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data

by Gaia Gambarelli and Aldo Gangemi

Big Data Cogn. Comput. 2022, 6(3), 90; https://doi.org/10.3390/bdcc6030090 - 26 Aug 2022

Cited by 7 | Viewed by 3609

Abstract

The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to [...] Read more.

The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to treat sensitive information. However, any treatment requires firstly to identify sensitive text, and appropriate techniques to do it automatically. The Sensitive Information Detection (SID) task has been explored in the literature in different domains and languages, but there is no common benchmark. Current approaches are mostly based on artificial neural networks (ANN) or transformers based on them. Our research focuses on identifying categories of personal data in informal English sentences, by adopting a new logical-symbolic approach, and eventually hybridising it with ANN models. We present a frame-based knowledge graph built for personal data categories defined in the Data Privacy Vocabulary (DPV). The knowledge graph is designed through the logical composition of already existing frames, and has been evaluated as background knowledge for a SID system against a labeled sensitive information dataset. The accuracy of PRIVAFRAME reached 78%. By comparison, a transformer-based model achieved 12% lower performance on the same dataset. The top-down logical-symbolic frame-based model allows a granular analysis, and does not require a training dataset. These advantages lead us to use it as a layer in a hybrid model, where the logical SID is combined with an ANNs SID tested in a previous study by the authors. Full article

(This article belongs to the Special Issue Artificial Intelligence for Online Safety)

► Show Figures

Figure 1

17 pages, 715 KiB

Open AccessEditor’s ChoiceArticle

Argumentation-Based Query Answering under Uncertainty with Application to Cybersecurity

by Mario A. Leiva, Alejandro J. García, Paulo Shakarian and Gerardo I. Simari

Big Data Cogn. Comput. 2022, 6(3), 91; https://doi.org/10.3390/bdcc6030091 - 26 Aug 2022

Cited by 8 | Viewed by 3019

Abstract

Decision support tools are key components of intelligent sociotechnical systems, and their successful implementation faces a variety of challenges, including the multiplicity of information sources, heterogeneous format, and constant changes. Handling such challenges requires the ability to analyze and process inconsistent and incomplete [...] Read more.

Decision support tools are key components of intelligent sociotechnical systems, and their successful implementation faces a variety of challenges, including the multiplicity of information sources, heterogeneous format, and constant changes. Handling such challenges requires the ability to analyze and process inconsistent and incomplete information with varying degrees of associated uncertainty. Moreover, some domains require the system’s outputs to be explainable and interpretable; an example of this is cyberthreat analysis (CTA) in cybersecurity domains. In this paper, we first present the P-DAQAP system, an extension of a recently developed query-answering platform based on defeasible logic programming (DeLP) that incorporates a probabilistic model and focuses on delivering these capabilities. After discussing the details of its design and implementation, and describing how it can be applied in a CTA use case, we report on the results of an empirical evaluation designed to explore the effectiveness and efficiency of a possible world sampling-based approximate query answering approach that addresses the intractability of exact computations. Full article

(This article belongs to the Special Issue Managing Cybersecurity Threats and Increasing Organizational Resilience)

► Show Figures

Figure 1

19 pages, 33832 KiB

Open AccessEditor’s ChoiceArticle

Large-Scale Oil Palm Trees Detection from High-Resolution Remote Sensing Images Using Deep Learning

by Hery Wibowo, Imas Sukaesih Sitanggang, Mushthofa Mushthofa and Hari Agung Adrianto

Big Data Cogn. Comput. 2022, 6(3), 89; https://doi.org/10.3390/bdcc6030089 - 24 Aug 2022

Cited by 22 | Viewed by 7579

Abstract

Tree counting is an important plantation practice for biological asset inventories, etc. The application of precision agriculture in counting oil palm trees can be implemented by detecting oil palm trees from aerial imagery. This research uses the deep learning approach using YOLOv3, YOLOv4, [...] Read more.

Tree counting is an important plantation practice for biological asset inventories, etc. The application of precision agriculture in counting oil palm trees can be implemented by detecting oil palm trees from aerial imagery. This research uses the deep learning approach using YOLOv3, YOLOv4, and YOLOv5m in detecting oil palm trees. The dataset consists of drone images of an oil palm plantation acquired using a Fixed Wing VTOL drone with a resolution of 5cm/pixel, covering an area of 730 ha labeled with an oil palm class of 56,614 labels. The test dataset covers an area of 180 ha with flat and hilly conditions with sparse, dense, and overlapping canopy and oil palm trees intersecting with other vegetations. Model testing using images from 24 regions, each of which covering 12 ha with up to 1000 trees (for a total of 17,343 oil palm trees), yielded F1-scores of 97.28%, 97.74%, and 94.94%, with an average detection time of 43 s, 45 s, and 21 s for models trained with YOLOv3, YOLOv4, and YOLOv5m, respectively. This result shows that the method is sufficiently accurate and efficient in detecting oil palm trees and has the potential to be implemented in commercial applications for plantation companies. Full article

► Show Figures

Figure 1

17 pages, 1280 KiB

Open AccessEditor’s ChoiceArticle

Impactful Digital Twin in the Healthcare Revolution

by Hossein Hassani, Xu Huang and Steve MacFeely

Big Data Cogn. Comput. 2022, 6(3), 83; https://doi.org/10.3390/bdcc6030083 - 8 Aug 2022

Cited by 130 | Viewed by 13730

Abstract

Over the last few decades, our digitally expanding world has experienced another significant digitalization boost because of the COVID-19 pandemic. Digital transformations are changing every aspect of this world. New technological innovations are springing up continuously, attracting increasing attention and investments. Digital twin, [...] Read more.

Over the last few decades, our digitally expanding world has experienced another significant digitalization boost because of the COVID-19 pandemic. Digital transformations are changing every aspect of this world. New technological innovations are springing up continuously, attracting increasing attention and investments. Digital twin, one of the highest trending technologies of recent years, is now joining forces with the healthcare sector, which has been under the spotlight since the outbreak of COVID-19. This paper sets out to promote a better understanding of digital twin technology, clarify some common misconceptions, and review the current trajectory of digital twin applications in healthcare. Furthermore, the functionalities of the digital twin in different life stages are summarized in the context of a digital twin model in healthcare. Following the Internet of Things as a service concept and digital twining as a service model supporting Industry 4.0, we propose a paradigm of digital twinning everything as a healthcare service, and different groups of physical entities are also clarified for clear reference of digital twin architecture in healthcare. This research discusses the value of digital twin technology in healthcare, as well as current challenges and insights for future research. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

26 pages, 5309 KiB

Open AccessEditor’s ChoiceArticle

RSS-Based Wireless LAN Indoor Localization and Tracking Using Deep Architectures

by Muhammed Zahid Karakusak, Hasan Kivrak, Hasan Fehmi Ates and Mehmet Kemal Ozdemir

Big Data Cogn. Comput. 2022, 6(3), 84; https://doi.org/10.3390/bdcc6030084 - 8 Aug 2022

Cited by 18 | Viewed by 4761

Abstract

Wireless Local Area Network (WLAN) positioning is a challenging task indoors due to environmental constraints and the unpredictable behavior of signal propagation, even at a fixed location. The aim of this work is to develop deep learning-based approaches for indoor localization and tracking [...] Read more.

Wireless Local Area Network (WLAN) positioning is a challenging task indoors due to environmental constraints and the unpredictable behavior of signal propagation, even at a fixed location. The aim of this work is to develop deep learning-based approaches for indoor localization and tracking by utilizing Received Signal Strength (RSS). The study proposes Multi-Layer Perceptron (MLP), One and Two Dimensional Convolutional Neural Networks (1D CNN and 2D CNN), and Long Short Term Memory (LSTM) deep networks architectures for WLAN indoor positioning based on the data obtained by actual RSS measurements from an existing WLAN infrastructure in a mobile user scenario. The results, using different types of deep architectures including MLP, CNNs, and LSTMs with existing WLAN algorithms, are presented. The Root Mean Square Error (RMSE) is used as the assessment criterion. The proposed LSTM Model 2 achieved a dynamic positioning RMSE error of

1.73 m

, which outperforms probabilistic WLAN algorithms such as Memoryless Positioning (RMSE:

10.35 m

) and Nonparametric Information (NI) filter with variable acceleration (RMSE:

5.2 m

) under the same experiment environment. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

17 pages, 26907 KiB

Open AccessEditor’s ChoiceArticle

Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation

by Konlakorn Wongpatikaseree, Sattaya Singkul, Narit Hnoohom and Sumeth Yuenyong

Big Data Cogn. Comput. 2022, 6(3), 79; https://doi.org/10.3390/bdcc6030079 - 15 Jul 2022

Cited by 11 | Viewed by 6395

Abstract

Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than high-resource languages such as German. This paper describes the framework of using a pretrained-model-based front-end and back-end network to [...] Read more.

Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than high-resource languages such as German. This paper describes the framework of using a pretrained-model-based front-end and back-end network to adapt feature spaces from the speech recognition domain to the speech emotion classification domain. It consists of two parts: a speech recognition front-end network and a speech emotion recognition back-end network. For speech recognition, Wav2Vec2 is the state-of-the-art for high-resource languages, while XLSR is used for low-resource languages. Wav2Vec2 and XLSR have proposed generalized end-to-end learning for speech understanding based on the speech recognition domain as feature space representations from feature encoding. This is one reason why our front-end network was selected as Wav2Vec2 and XLSR for the pretrained model. The pre-trained Wav2Vec2 and XLSR are used for front-end networks and fine-tuned for specific languages using the Common Voice 7.0 dataset. Then, feature vectors of the front-end network are input for back-end networks; this includes convolution time reduction (CTR) and linear mean encoding transformation (LMET). Experiments using two different datasets show that our proposed framework can outperform the baselines in terms of unweighted and weighted accuracies. Full article

► Show Figures

Figure 1

22 pages, 1108 KiB

Open AccessEditor’s ChoiceArticle

We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model

by Lihardo Faisal Simanjuntak, Rahmad Mahendra and Evi Yulianti

Big Data Cogn. Comput. 2022, 6(3), 77; https://doi.org/10.3390/bdcc6030077 - 7 Jul 2022

Cited by 32 | Viewed by 5298

Abstract

Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works [...] Read more.

Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works attempted to predict location on English-language tweets. In this study, we attempted to predict the location of Indonesian tweets. We utilized machine learning approaches, i.e., long-short term memory (LSTM) and bidirectional encoder representations from transformers (BERT) to infer Twitter users’ home locations using display name in profile, user description, and user tweets. By concatenating display name, description, and aggregated tweet, the model achieved the best accuracy of 0.77. The performance of the IndoBERT model outperformed several baseline models. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

15 pages, 695 KiB

Open AccessEditor’s ChoiceArticle

Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets

by Ran Deng and Fedor Duzhin

Big Data Cogn. Comput. 2022, 6(3), 74; https://doi.org/10.3390/bdcc6030074 - 5 Jul 2022

Cited by 14 | Viewed by 5843

Abstract

Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural [...] Read more.

Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural language processing task: fake news detection. We have found that deep learning models are more accurate in this task than topological data analysis. However, assembling a deep learning model with topological data analysis significantly improves the model’s accuracy if the available training set is very small. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

19 pages, 491 KiB

Open AccessEditor’s ChoiceArticle

Digital Technologies and the Role of Data in Cultural Heritage: The Past, the Present, and the Future

by Vassilis Poulopoulos and Manolis Wallace

Big Data Cogn. Comput. 2022, 6(3), 73; https://doi.org/10.3390/bdcc6030073 - 4 Jul 2022

Cited by 52 | Viewed by 16052

Abstract

Is culture considered to be our past, our roots, ancient ruins, or an old piece of art? Culture is all the factors that define who we are, how we act and interact in our world, in our daily activities, in our personal and [...] Read more.

Is culture considered to be our past, our roots, ancient ruins, or an old piece of art? Culture is all the factors that define who we are, how we act and interact in our world, in our daily activities, in our personal and public relations, in our life. Culture is all the things we are not obliged to do. However, today, we live in a mixed environment, an environment that is a combination of “offline” and the online, digital world. In this mixed environment, it is technology that defines our behaviour, technology that unites people in a large world, that finally, defines a status of “monoculture”. In this article, we examine the role of technology, and especially big data, in relation to the culture. We present the advances that led to paradigm shifts in the research area of cultural informatics, and forecast the future of culture as will be defined in this mixed world. Full article

(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)

► Show Figures

Figure 1

16 pages, 4437 KiB

Open AccessEditor’s ChoiceArticle

Lightweight AI Framework for Industry 4.0 Case Study: Water Meter Recognition

by Jalel Ktari, Tarek Frikha, Monia Hamdi, Hela Elmannai and Habib Hmam

Big Data Cogn. Comput. 2022, 6(3), 72; https://doi.org/10.3390/bdcc6030072 - 1 Jul 2022

Cited by 29 | Viewed by 5501

Abstract

The evolution of applications in telecommunication, network, computing, and embedded systems has led to the emergence of the Internet of Things and Artificial Intelligence. The combination of these technologies enabled improving productivity by optimizing consumption and facilitating access to real-time information. In this [...] Read more.

The evolution of applications in telecommunication, network, computing, and embedded systems has led to the emergence of the Internet of Things and Artificial Intelligence. The combination of these technologies enabled improving productivity by optimizing consumption and facilitating access to real-time information. In this work, there is a focus on Industry 4.0 and Smart City paradigms and a proposal of a new approach to monitor and track water consumption using an OCR, as well as the artificial intelligence algorithm and, in particular the YoLo 4 machine learning model. The goal of this work is to provide optimized results in real time. The recognition rate obtained with the proposed algorithms is around 98%. Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)

► Show Figures

Figure 1

20 pages, 6876 KiB

Open AccessEditor’s ChoiceArticle

DeepWings©: Automatic Wing Geometric Morphometrics Classification of Honey Bee (Apis mellifera) Subspecies Using Deep Learning for Detecting Landmarks

by Pedro João Rodrigues, Walter Gomes and Maria Alice Pinto

Big Data Cogn. Comput. 2022, 6(3), 70; https://doi.org/10.3390/bdcc6030070 - 27 Jun 2022

Cited by 21 | Viewed by 8412

Abstract

Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein, we developed a software called DeepWings© that overcomes [...] Read more.

Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein, we developed a software called DeepWings© that overcomes this constraint in wing geometric morphometrics classification by automatically detecting the 19 landmarks on digital images of the right forewing. We used a database containing 7634 forewing images, including 1864 analyzed by F. Ruttner in the original delineation of 26 honey bee subspecies, to tune a convolutional neural network as a wing detector, a deep learning U-Net as a landmarks segmenter, and a support vector machine as a subspecies classifier. The implemented MobileNet wing detector was able to achieve a mAP of 0.975 and the landmarks segmenter was able to detect the 19 landmarks with 91.8% accuracy, with an average positional precision of 0.943 resemblance to manually annotated landmarks. The subspecies classifier, in turn, presented an average accuracy of 86.6% for 26 subspecies and 95.8% for a subset of five important subspecies. The final implementation of the system showed good speed performance, requiring only 14 s to process 10 images. DeepWings© is very user-friendly and is the first fully automated software, offered as a free Web service, for honey bee classification from wing geometric morphometrics. DeepWings© can be used for honey bee breeding, conservation, and even scientific purposes as it provides the coordinates of the landmarks in excel format, facilitating the work of research teams using classical identification approaches and alternative analytical tools. Full article

► Show Figures

Figure 1

25 pages, 3658 KiB

Open AccessEditor’s ChoiceArticle

A Comprehensive Spark-Based Layer for Converting Relational Databases to NoSQL

by Manal A. Abdel-Fattah, Wael Mohamed and Sayed Abdelgaber

Big Data Cogn. Comput. 2022, 6(3), 71; https://doi.org/10.3390/bdcc6030071 - 27 Jun 2022

Cited by 2 | Viewed by 5061

Abstract

Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data [...] Read more.

Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data because NoSQL represents data in diverse models and uses a variety of query languages, unlike traditional relational databases. Therefore, using NoSQL has become essential, and many studies have attempted to propose different layers to convert relational databases to NoSQL; however, most of them targeted only one or two models of NoSQL, and evaluated their layers on a single node, not in a distributed environment. This study proposes a Spark-based layer for mapping relational databases to NoSQL models, focusing on the document, column, and key–value databases of NoSQL models. The proposed Spark-based layer comprises of two parts. The first part is concerned with converting relational databases to document, column, and key–value databases, and encompasses two phases: a metadata analyzer of relational databases and Spark-based transformation and migration. The second part focuses on executing a structured query language (SQL) on the NoSQL. The suggested layer was applied and compared with Unity, as it has similar components and features and supports sub-queries and join operations in a single-node environment. The experimental results show that the proposed layer outperformed Unity in terms of the query execution time by a factor of three. In addition, the proposed layer was applied to multi-node clusters using different scenarios, and the results show that the integration between the Spark cluster and NoSQL databases on multi-node clusters provided better performance in reading and writing while increasing the dataset size than using a single node. Full article

► Show Figures

Figure 1

19 pages, 5202 KiB

Open AccessEditor’s ChoiceArticle

Iris Liveness Detection Using Multiple Deep Convolution Networks

by Smita Khade, Shilpa Gite and Biswajeet Pradhan

Big Data Cogn. Comput. 2022, 6(2), 67; https://doi.org/10.3390/bdcc6020067 - 15 Jun 2022

Cited by 15 | Viewed by 4967

Abstract

In the recent decade, comprehensive research has been carried out in terms of promising biometrics modalities regarding humans’ physical features for person recognition. This work focuses on iris characteristics and traits for person identification and iris liveness detection. This study used five pre-trained [...] Read more.

In the recent decade, comprehensive research has been carried out in terms of promising biometrics modalities regarding humans’ physical features for person recognition. This work focuses on iris characteristics and traits for person identification and iris liveness detection. This study used five pre-trained networks, including VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7, to recognize iris liveness using transfer learning techniques. These models are compared using three state-of-the-art biometric databases: the LivDet-Iris 2015 dataset, IIITD contact dataset, and ND Iris3D 2020 dataset. Validation accuracy, loss, precision, recall, and f1-score, APCER (attack presentation classification error rate), NPCER (normal presentation classification error rate), and ACER (average classification error rate) were used to evaluate the performance of all pre-trained models. According to the observational data, these models have a considerable ability to transfer their experience to the field of iris recognition and to recognize the nanostructures within the iris region. Using the ND Iris 3D 2020 dataset, the EfficeintNetB7 model has achieved 99.97% identification accuracy. Experiments show that pre-trained models outperform other current iris biometrics variants. Full article

(This article belongs to the Special Issue Data, Structure, and Information in Artificial Intelligence)

► Show Figures

Figure 1

32 pages, 7749 KiB

Open AccessEditor’s ChoiceArticle

CompositeView: A Network-Based Visualization Tool

by Stephen A. Allegri, Kevin McCoy and Cassie S. Mitchell

Big Data Cogn. Comput. 2022, 6(2), 66; https://doi.org/10.3390/bdcc6020066 - 14 Jun 2022

Cited by 6 | Viewed by 5910

Abstract

Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display [...] Read more.

Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display them using the Cytoscape component of Dash. Composite scores are defined representations of smaller sets of conceptually similar data that, when combined, generate a single score to reduce information overload. Visualized interactive results are user-refined via filtering elements such as node value and edge weight sliders and graph manipulation options (e.g., node color and layout spread). The primary difference between CompositeView and other network visualization tools is its ability to auto-calculate and auto-update composite scores as the user interactively filters or aggregates data. CompositeView was developed to visualize network relevance rankings, but it performs well with non-network data. Three disparate CompositeView use cases are shown: relevance rankings from SemNet 2.0, an open-source knowledge graph relationship ranking software for biomedical literature-based discovery; Human Development Index (HDI) data; and the Framingham cardiovascular study. CompositeView was stress tested to construct reference benchmarks that define breadth and size of data effectively visualized. Finally, CompositeView is compared to Excel, Tableau, Cytoscape, neo4j, NodeXL, and Gephi. Full article

(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)

► Show Figures

Figure 1

22 pages, 3553 KiB

Open AccessEditor’s ChoiceArticle

Synthesizing a Talking Child Avatar to Train Interviewers Working with Maltreated Children

by Pegah Salehi, Syed Zohaib Hassan, Myrthe Lammerse, Saeed Shafiee Sabet, Ingvild Riiser, Ragnhild Klingenberg Røed, Miriam S. Johnson, Vajira Thambawita, Steven A. Hicks, Martine Powell, Michael E. Lamb, Gunn Astrid Baugerud, Pål Halvorsen and Michael A. Riegler

Big Data Cogn. Comput. 2022, 6(2), 62; https://doi.org/10.3390/bdcc6020062 - 1 Jun 2022

Cited by 22 | Viewed by 6790

Abstract

When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Current research emphasizes the importance of the interviewer’s [...] Read more.

When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Current research emphasizes the importance of the interviewer’s ability to follow empirically based guidelines. In doing so, it is essential to implement economical and scientific training courses for interviewers. Due to recent advances in artificial intelligence, we propose to generate a realistic and interactive child avatar, aiming to mimic a child. Our ongoing research involves the integration and interaction of different components with each other, including how to handle the language, auditory, emotional, and visual components of the avatar. This paper presents three subjective studies that investigate and compare various state-of-the-art methods for implementing multiple aspects of the child avatar. The first user study evaluates the whole system and shows that the system is well received by the expert and highlights the importance of its realism. The second user study investigates the emotional component and how it can be integrated with video and audio, and the third user study investigates realism in the auditory and visual components of the avatar created by different methods. The insights and feedback from these studies have contributed to the refined and improved architecture of the child avatar system which we present here. Full article

(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)

► Show Figures

Figure 1

20 pages, 9323 KiB

Open AccessEditor’s ChoiceArticle

COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method

by Yosra Didi, Ahlam Walha and Ali Wali

Big Data Cogn. Comput. 2022, 6(2), 58; https://doi.org/10.3390/bdcc6020058 - 18 May 2022

Cited by 25 | Viewed by 5933

Abstract

In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better [...] Read more.

In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification. Full article

► Show Figures

Figure 1

19 pages, 2274 KiB

Open AccessEditor’s ChoiceArticle

Virtual Reality Adaptation Using Electrodermal Activity to Support the User Experience

by Francesco Chiossi, Robin Welsch, Steeven Villa, Lewis Chuang and Sven Mayer

Big Data Cogn. Comput. 2022, 6(2), 55; https://doi.org/10.3390/bdcc6020055 - 13 May 2022

Cited by 31 | Viewed by 6443

Abstract

Virtual reality is increasingly used for tasks such as work and education. Thus, rendering scenarios that do not interfere with such goals and deplete user experience are becoming progressively more relevant. We present a physiologically adaptive system that optimizes the virtual environment based [...] Read more.

Virtual reality is increasingly used for tasks such as work and education. Thus, rendering scenarios that do not interfere with such goals and deplete user experience are becoming progressively more relevant. We present a physiologically adaptive system that optimizes the virtual environment based on physiological arousal, i.e., electrodermal activity. We investigated the usability of the adaptive system in a simulated social virtual reality scenario. Participants completed an n-back task (primary) and a visual detection (secondary) task. Here, we adapted the visual complexity of the secondary task in the form of the number of non-player characters of the secondary task to accomplish the primary task. We show that an adaptive virtual reality can improve users’ comfort by adapting to physiological arousal regarding the task complexity. Our findings suggest that physiologically adaptive virtual reality systems can improve users’ experience in a wide range of scenarios. Full article

(This article belongs to the Special Issue Cognitive and Physiological Assessments in Human-Computer Interaction)

► Show Figures

Figure 1

21 pages, 4585 KiB

Open AccessEditor’s ChoiceArticle

Cognitive Networks Extract Insights on COVID-19 Vaccines from English and Italian Popular Tweets: Anticipation, Logistics, Conspiracy and Loss of Trust

by Massimo Stella, Michael S. Vitevitch and Federico Botta

Big Data Cogn. Comput. 2022, 6(2), 52; https://doi.org/10.3390/bdcc6020052 - 12 May 2022

Cited by 16 | Viewed by 5342

Abstract

Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science [...] Read more.

Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. This work reconstructs how popular and trending posts framed semantically and emotionally COVID-19 vaccines on Twitter. We achieve this by merging natural language processing, cognitive network science and AI-based image analysis. We focus on 4765 unique popular tweets in English or Italian about COVID-19 vaccines between December 2020 and March 2021. One popular English tweet contained in our data set was liked around 495,000 times, highlighting how popular tweets could cognitively affect large parts of the population. We investigate both text and multimedia content in tweets and build a cognitive network of syntactic/semantic associations in messages, including emotional cues and pictures. This network representation indicates how online users linked ideas in social discourse and framed vaccines along specific semantic/emotional content. The English semantic frame of “vaccine” was highly polarised between trust/anticipation (towards the vaccine as a scientific asset saving lives) and anger/sadness (mentioning critical issues with dose administering). Semantic associations with “vaccine,” “hoax” and conspiratorial jargon indicated the persistence of conspiracy theories and vaccines in extremely popular English posts. Interestingly, these were absent in Italian messages. Popular tweets with images of people wearing face masks used language that lacked the trust and joy found in tweets showing people with no masks. This difference indicates a negative effect attributed to face-covering in social discourse. Behavioural analysis revealed a tendency for users to share content eliciting joy, sadness and disgust and to like sad messages less. Both patterns indicate an interplay between emotions and content diffusion beyond sentiment. After its suspension in mid-March 2021, “AstraZeneca” was associated with trustful language driven by experts. After the deaths of a small number of vaccinated people in mid-March, popular Italian tweets framed “vaccine” by crucially replacing earlier levels of trust with deep sadness. Our results stress how cognitive networks and innovative multimedia processing open new ways for reconstructing online perceptions about vaccines and trust. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

24 pages, 7573 KiB

Open AccessEditor’s ChoiceArticle

Robust Multi-Mode Synchronization of Chaotic Fractional Order Systems in the Presence of Disturbance, Time Delay and Uncertainty with Application in Secure Communications

by Ali Akbar Kekha Javan, Assef Zare, Roohallah Alizadehsani and Saeed Balochian

Big Data Cogn. Comput. 2022, 6(2), 51; https://doi.org/10.3390/bdcc6020051 - 8 May 2022

Cited by 6 | Viewed by 3028

Abstract

This paper investigates the robust adaptive synchronization of multi-mode fractional-order chaotic systems (MMFOCS). To that end, synchronization was performed with unknown parameters, unknown time delays, the presence of disturbance, and uncertainty with the unknown boundary. The convergence of the synchronization error to zero [...] Read more.

This paper investigates the robust adaptive synchronization of multi-mode fractional-order chaotic systems (MMFOCS). To that end, synchronization was performed with unknown parameters, unknown time delays, the presence of disturbance, and uncertainty with the unknown boundary. The convergence of the synchronization error to zero was guaranteed using the Lyapunov function. Additionally, the control rules were extracted as explicit continuous functions. An image encryption approach was proposed based on maps with time-dependent coding for secure communication. The simulations indicated the effectiveness of the proposed design regarding the suitability of the parameters, the convergence of errors, and robustness. Subsequently, the presented method was applied to fractional-order Chen systems and was encrypted using the chaotic masking of different benchmark images. The results indicated the desirable performance of the proposed method in encrypting the benchmark images. Full article

► Show Figures

Figure 1

32 pages, 5511 KiB

Open AccessEditor’s ChoiceArticle

Gender Stereotypes in Hollywood Movies and Their Evolution over Time: Insights from Network Analysis

by Arjun M. Kumar, Jasmine Y. Q. Goh, Tiffany H. H. Tan and Cynthia S. Q. Siew

Big Data Cogn. Comput. 2022, 6(2), 50; https://doi.org/10.3390/bdcc6020050 - 6 May 2022

Cited by 8 | Viewed by 64737

Abstract

The present analysis of more than 180,000 sentences from movie plots across the period from 1940 to 2019 emphasizes how gender stereotypes are expressed through the cultural products of society. By applying a network analysis to the word co-occurrence networks of movie plots [...] Read more.

The present analysis of more than 180,000 sentences from movie plots across the period from 1940 to 2019 emphasizes how gender stereotypes are expressed through the cultural products of society. By applying a network analysis to the word co-occurrence networks of movie plots and using a novel method of identifying story tropes, we demonstrate that gender stereotypes exist in Hollywood movies. An analysis of specific paths in the network and the words reflecting various domains show the dynamic changes in some of these stereotypical associations. Our results suggest that gender stereotypes are complex and dynamic in nature. Specifically, whereas male characters appear to be associated with a diversity of themes in movies, female characters seem predominantly associated with the theme of romance. Although associations of female characters to physical beauty and marriage are declining over time, associations of female characters to sexual relationships and weddings are increasing. Our results demonstrate how the application of cognitive network science methods can enable a more nuanced investigation of gender stereotypes in textual data. Full article

(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)

► Show Figures

Figure 1

19 pages, 3173 KiB

Open AccessEditor’s ChoiceArticle

A Comparative Study of MongoDB and Document-Based MySQL for Big Data Application Data Management

by Cornelia A. Győrödi, Diana V. Dumşe-Burescu, Doina R. Zmaranda and Robert Ş. Győrödi

Big Data Cogn. Comput. 2022, 6(2), 49; https://doi.org/10.3390/bdcc6020049 - 5 May 2022

Cited by 15 | Viewed by 17261

Abstract

In the context of the heavy demands of Big Data, software developers have also begun to consider NoSQL data storage solutions. One of the important criteria when choosing a NoSQL database for an application is its performance in terms of speed of data [...] Read more.

In the context of the heavy demands of Big Data, software developers have also begun to consider NoSQL data storage solutions. One of the important criteria when choosing a NoSQL database for an application is its performance in terms of speed of data accessing and processing, including response times to the most important CRUD operations (CREATE, READ, replace into, DELETE). In this paper, the behavior of two of the major document-based NoSQL databases, MongoDB and document-based MySQL, was analyzed in terms of the complexity and performance of CRUD operations, especially in query operations. The main objective of the paper is to make a comparative analysis of the impact that each specific database has on application performance when realizing CRUD requests. To perform this analysis, a case-study application was developed using the two document-based MongoDB and MySQL databases, which aim to model and streamline the activity of service providers that use a lot of data. The results obtained demonstrate the performance of both databases for different volumes of data; based on these, a detailed analysis and several conclusions were presented to support a decision for choosing an appropriate solution that could be used in a big-data application. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

19 pages, 2456 KiB

Open AccessEditor’s ChoiceArticle

A New Ontology-Based Method for Arabic Sentiment Analysis

by Safaa M. Khabour, Qasem A. Al-Radaideh and Dheya Mustafa

Big Data Cogn. Comput. 2022, 6(2), 48; https://doi.org/10.3390/bdcc6020048 - 29 Apr 2022

Cited by 16 | Viewed by 5228

Abstract

Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for [...] Read more.

Arabic sentiment analysis is a process that aims to extract the subjective opinions of different users about different subjects since these opinions and sentiments are used to recognize their perspectives and judgments in a particular domain. Few research studies addressed semantic-oriented approaches for Arabic sentiment analysis based on domain ontologies and features’ importance. In this paper, we built a semantic orientation approach for calculating overall polarity from the Arabic subjective texts based on built domain ontology and the available sentiment lexicon. We used the ontology concepts to extract and weight the semantic domain features by considering their levels in the ontology tree and their frequencies in the dataset to compute the overall polarity of a given textual review based on the importance of each domain feature. For evaluation, an Arabic dataset from the hotels’ domain was selected to build the domain ontology and to test the proposed approach. The overall accuracy and f-measure reach 79.20% and 78.75%, respectively. Results showed that the approach outperformed the other semantic orientation approaches, and it is an appealing approach to be used for Arabic sentiment analysis. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

28 pages, 772 KiB

Open AccessEditor’s ChoiceArticle

Incentive Mechanisms for Smart Grid: State of the Art, Challenges, Open Issues, Future Directions

by Sweta Bhattacharya, Rajeswari Chengoden, Gautam Srivastava, Mamoun Alazab, Abdul Rehman Javed, Nancy Victor, Praveen Kumar Reddy Maddikunta and Thippa Reddy Gadekallu

Big Data Cogn. Comput. 2022, 6(2), 47; https://doi.org/10.3390/bdcc6020047 - 27 Apr 2022

Cited by 50 | Viewed by 8202

Abstract

Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow [...] Read more.

Smart grids (SG) are electricity grids that communicate with each other, provide reliable information, and enable administrators to operate energy supplies across the country, ensuring optimized reliability and efficiency. The smart grid contains sensors that measure and transmit data to adjust the flow of electricity automatically based on supply/demand, and thus, responding to problems becomes quicker and easier. This also plays a crucial role in controlling carbon emissions, by avoiding energy losses during peak load hours and ensuring optimal energy management. The scope of big data analytics in smart grids is huge, as they collect information from raw data and derive intelligent information from the same. However, these benefits of the smart grid are dependent on the active and voluntary participation of the consumers in real-time. Consumers need to be motivated and conscious to avail themselves of the achievable benefits. Incentivizing the appropriate actor is an absolute necessity to encourage prosumers to generate renewable energy sources (RES) and motivate industries to establish plants that support sustainable and green-energy-based processes or products. The current study emphasizes similar aspects and presents a comprehensive survey of the start-of-the-art contributions pertinent to incentive mechanisms in smart grids, which can be used in smart grids to optimize the power distribution during peak times and also reduce carbon emissions. The various technologies, such as game theory, blockchain, and artificial intelligence, used in implementing incentive mechanisms in smart grids are discussed, followed by different incentive projects being implemented across the globe. The lessons learnt, challenges faced in such implementations, and open issues such as data quality, privacy, security, and pricing related to incentive mechanisms in SG are identified to guide the future scope of research in this sector. Full article

(This article belongs to the Special Issue Energy-Efficient IoT (Internet of Things) and Big Data Challenges for Connected Intelligence)

► Show Figures

Figure 1

25 pages, 664 KiB

Open AccessEditor’s ChoiceArticle

New Efficient Approach to Solve Big Data Systems Using Parallel Gauss–Seidel Algorithms

by Shih Yu Chang, Hsiao-Chun Wu and Yifan Wang

Big Data Cogn. Comput. 2022, 6(2), 43; https://doi.org/10.3390/bdcc6020043 - 19 Apr 2022

Viewed by 3332

Abstract

In order to perform big-data analytics, regression involving large matrices is often necessary. In particular, large scale regression problems are encountered when one wishes to extract semantic patterns for knowledge discovery and data mining. When a large matrix can be processed in its [...] Read more.

In order to perform big-data analytics, regression involving large matrices is often necessary. In particular, large scale regression problems are encountered when one wishes to extract semantic patterns for knowledge discovery and data mining. When a large matrix can be processed in its factorized form, advantages arise in terms of computation, implementation, and data-compression. In this work, we propose two new parallel iterative algorithms as extensions of the Gauss–Seidel algorithm (GSA) to solve regression problems involving many variables. The convergence study in terms of error-bounds of the proposed iterative algorithms is also performed, and the required computation resources, namely time- and memory-complexities, are evaluated to benchmark the efficiency of the proposed new algorithms. Finally, the numerical results from both Monte Carlo simulations and real-world datasets are presented to demonstrate the striking effectiveness of our proposed new methods. Full article

► Show Figures

Figure 1

18 pages, 1307 KiB

Open AccessEditor’s ChoiceArticle

An Emergency Event Detection Ensemble Model Based on Big Data

by Khalid Alfalqi and Martine Bellaiche

Big Data Cogn. Comput. 2022, 6(2), 42; https://doi.org/10.3390/bdcc6020042 - 16 Apr 2022

Cited by 7 | Viewed by 4580

Abstract

Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as [...] Read more.

Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as well as the inherent financial consequences. Social network utilization in emergency event detection models can play an important role as information is shared and users’ status is updated once an emergency event occurs. Besides, big data proved its significance as a tool to assist and alleviate emergency events by processing an enormous amount of data over a short time interval. This paper shows that it is necessary to have an appropriate emergency event detection ensemble model (EEDEM) to respond quickly once such unfortunate events occur. Furthermore, it integrates Snapchat maps to propose a novel method to pinpoint the exact location of an emergency event. Moreover, merging social networks and big data can accelerate the emergency event detection system: social network data, such as those from Twitter and Snapchat, allow us to manage, monitor, analyze and detect emergency events. The main objective of this paper is to propose a novel and efficient big data-based EEDEM to pinpoint the exact location of emergency events by employing the collected data from social networks, such as “Twitter” and “Snapchat”, while integrating big data (BD) and machine learning (ML). Furthermore, this paper evaluates the performance of five ML base models and the proposed ensemble approach to detect emergency events. Results show that the proposed ensemble approach achieved a very high accuracy of 99.87% which outperform the other base models. Moreover, the proposed base models yields a high level of accuracy: 99.72%, 99.70% for LSTM and decision tree, respectively, with an acceptable training time. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 312 KiB

Open AccessEditor’s ChoiceArticle

Operations with Nested Named Sets as a Tool for Artificial Intelligence

by Mark Burgin

Big Data Cogn. Comput. 2022, 6(2), 37; https://doi.org/10.3390/bdcc6020037 - 1 Apr 2022

Viewed by 3123

Abstract

Knowledge and data representations are important for artificial intelligence (AI), as well as for intelligence in general. Intelligent functioning presupposes efficient operation with knowledge and data representations in particular. At the same time, it has been demonstrated that named sets, which are also [...] Read more.

Knowledge and data representations are important for artificial intelligence (AI), as well as for intelligence in general. Intelligent functioning presupposes efficient operation with knowledge and data representations in particular. At the same time, it has been demonstrated that named sets, which are also called fundamental triads, instantiate the most fundamental structure in general and for knowledge and data representations in particular. In this context, named sets allow for effective mathematical portrayal of the key phenomenon, called nesting. Nesting plays a weighty role in a variety of fields, such as mathematics and computer science. Computing tools of AI include nested levels of parentheses in arithmetical expressions; different types of recursion; nesting of several levels of subroutines; nesting in recursive calls; multilevel nesting in information hiding; a variety of nested data structures, such as records, objects, and classes; and nested blocks of imperative source code, such as nested repeat-until clauses, while clauses, if clauses, etc. In this paper, different operations with nested named sets are constructed and their properties obtained, reflecting different attributes of nesting. An AI system receives information in the form of data and knowledge and processing information, performs operations with these data and knowledge. Thus, such a system needs various operations for these processes. Operations constructed in this paper perform processing of data and knowledge in the form of nested named sets. Knowing properties of these operations can help to optimize the processing of data and knowledge in AI systems. Full article

(This article belongs to the Special Issue Data, Structure, and Information in Artificial Intelligence)

22 pages, 2824 KiB

Open AccessEditor’s ChoiceArticle

Startups and Consumer Purchase Behavior: Application of Support Vector Machine Algorithm

by Pejman Ebrahimi, Aidin Salamzadeh, Maryam Soleimani, Seyed Mohammad Khansari, Hadi Zarea and Maria Fekete-Farkas

Big Data Cogn. Comput. 2022, 6(2), 34; https://doi.org/10.3390/bdcc6020034 - 25 Mar 2022

Cited by 31 | Viewed by 6880

Abstract

This study evaluated the impact of startup technology innovations and customer relationship management (CRM) performance on customer participation, value co-creation, and consumer purchase behavior (CPB). This analytical study empirically tested the proposed hypotheses using structural equation modeling (SEM) and SmartPLS 3 techniques. Moreover, [...] Read more.

This study evaluated the impact of startup technology innovations and customer relationship management (CRM) performance on customer participation, value co-creation, and consumer purchase behavior (CPB). This analytical study empirically tested the proposed hypotheses using structural equation modeling (SEM) and SmartPLS 3 techniques. Moreover, we used a support vector machine (SVM) algorithm to verify the model’s accuracy. SVM algorithm uses four different kernels to check the accuracy criterion, and we checked all of them. This research used the convenience sampling approach in gathering the data. We used the conventional bias test method. A total of 466 respondents were completed. Technological innovations of startups and CRM have a positive and significant effect on customer participation. Customer participation significantly affects the value of pleasure, economic value, and relationship value. Based on the importance-performance map analysis (IPMA) matrix results, “customer participation” with a score of 0.782 had the highest importance. If customers increase their participation performance by one unit during the COVID-19 epidemic, its overall CPB increases by 0.782. In addition, our results showed that the lowest performance is related to the technological innovations of startups, which indicates an excellent opportunity for development in this area. SVM results showed that polynomial kernel, to a high degree, is the best kernel that confirms the model’s accuracy. Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)

► Show Figures

Figure 1

18 pages, 1612 KiB

Open AccessEditor’s ChoiceArticle

Social Networks Marketing and Consumer Purchase Behavior: The Combination of SEM and Unsupervised Machine Learning Approaches

by Pejman Ebrahimi, Marjan Basirat, Ali Yousefi, Md. Nekmahmud, Abbas Gholampour and Maria Fekete-Farkas

Big Data Cogn. Comput. 2022, 6(2), 35; https://doi.org/10.3390/bdcc6020035 - 25 Mar 2022

Cited by 53 | Viewed by 21612

Abstract

The purpose of this paper is to reveal how social network marketing (SNM) can affect consumers’ purchase behavior (CPB). We used the combination of structural equation modeling (SEM) and unsupervised machine learning approaches as an innovative method. The statistical population of the study [...] Read more.

The purpose of this paper is to reveal how social network marketing (SNM) can affect consumers’ purchase behavior (CPB). We used the combination of structural equation modeling (SEM) and unsupervised machine learning approaches as an innovative method. The statistical population of the study concluded users who live in Hungary and use Facebook Marketplace. This research uses the convenience sampling approach to overcome bias. Out of 475 surveys distributed, a total of 466 respondents successfully filled out the entire survey with a response rate of 98.1%. The results showed that all dimensions of social network marketing, such as entertainment, customization, interaction, WoM and trend, had positively and significantly influenced consumer purchase behavior (CPB) in Facebook Marketplace. Furthermore, we used hierarchical clustering and K-means unsupervised algorithms to cluster consumers. The results show that respondents of this research can be clustered in nine different groups based on behavior regarding demographic attributes. It means that distinctive strategies can be used for different clusters. Meanwhile, marketing managers can provide different options, products and services for each group. This study is of high importance in that it has adopted and used plspm and Matrixpls packages in R to show the model predictive power. Meanwhile, we used unsupervised machine learning algorithms to cluster consumer behaviors. Full article

(This article belongs to the Special Issue Machine Learning for Dependable Edge Computing Systems and Services)

► Show Figures

Figure 1

23 pages, 830 KiB

Open AccessEditor’s ChoiceArticle

Service Oriented R-ANN Knowledge Model for Social Internet of Things

by Mohana S. D., S. P. Shiva Prakash and Kirill Krinkin

Big Data Cogn. Comput. 2022, 6(1), 32; https://doi.org/10.3390/bdcc6010032 - 18 Mar 2022

Cited by 7 | Viewed by 4021

Abstract

Increase in technologies around the world requires adding intelligence to the objects, and making it a smart object in an environment leads to the Social Internet of Things (SIoT). These social objects are uniquely identifiable, transferable and share information from user-to-objects and objects-to [...] Read more.

Increase in technologies around the world requires adding intelligence to the objects, and making it a smart object in an environment leads to the Social Internet of Things (SIoT). These social objects are uniquely identifiable, transferable and share information from user-to-objects and objects-to objects through interactions in a smart environment such as smart homes, smart cities and many more applications. SIoT faces certain challenges such as handling of heterogeneous objects, selection of generated data in objects, missing values in data. Therefore, the discovery and communication of meaningful patterns in data are more important for every application. Thus, the analysis of data is essential in smarter decisions and qualifies performance of data for various applications. In a smart environment, social networks of intelligent objects are increasing services and decreasing the relationship in a reliable and efficient way of sharing resources and services. Hence, this work proposed the feature selection method based on proposed semantic rules and established the relationships to classify the services using relationship artificial neural networks (R-ANN). R-ANN is an inversely proportional relationship to the objects based on certain rules and conditions between the objects to objects and users to objects. It provides the service oriented knowledge model to make decisions in the proposed R-ANN model that produces service to the users. The proposed R-ANN provides an accuracy of 89.62% for various services namely weather, air quality, parking, light status, and people presence respectively in the SIoT environment compared to the existing model. Full article

► Show Figures

Figure 1

17 pages, 2573 KiB

Open AccessEditor’s ChoiceArticle

Big Data Management in Drug–Drug Interaction: A Modern Deep Learning Approach for Smart Healthcare

by Muhammad Salman, Hafiz Suliman Munawar, Khalid Latif, Muhammad Waseem Akram, Sara Imran Khan and Fahim Ullah

Big Data Cogn. Comput. 2022, 6(1), 30; https://doi.org/10.3390/bdcc6010030 - 9 Mar 2022

Cited by 10 | Viewed by 7324

Abstract

The detection and classification of drug–drug interactions (DDI) from existing data are of high importance because recent reports show that DDIs are among the major causes of hospital-acquired conditions and readmissions and are also necessary for smart healthcare. Therefore, to avoid adverse drug [...] Read more.

The detection and classification of drug–drug interactions (DDI) from existing data are of high importance because recent reports show that DDIs are among the major causes of hospital-acquired conditions and readmissions and are also necessary for smart healthcare. Therefore, to avoid adverse drug interactions, it is necessary to have an up-to-date knowledge of DDIs. This knowledge could be extracted by applying text-processing techniques to the medical literature published in the form of ‘Big Data’ because, whenever a drug interaction is investigated, it is typically reported and published in healthcare and clinical pharmacology journals. However, it is crucial to automate the extraction of the interactions taking place between drugs because the medical literature is being published in immense volumes, and it is impossible for healthcare professionals to read and collect all of the investigated DDI reports from these Big Data. To avoid this time-consuming procedure, the Information Extraction (IE) and Relationship Extraction (RE) techniques that have been studied in depth in Natural Language Processing (NLP) could be very promising. Since 2011, a lot of research has been reported in this particular area, and there are many approaches that have been implemented that can also be applied to biomedical texts to extract DDI-related information. A benchmark corpus is also publicly available for the advancement of DDI extraction tasks. The current state-of-the-art implementations for extracting DDIs from biomedical texts has employed Support Vector Machines (SVM) or other machine learning methods that work on manually defined features and that might be the cause of the low precision and recall that have been achieved in this domain so far. Modern deep learning techniques have also been applied for the automatic extraction of DDIs from the scientific literature and have proven to be very promising for the advancement of DDI extraction tasks. As such, it is pertinent to investigate deep learning techniques for the extraction and classification of DDIs in order for them to be used in the smart healthcare domain. We proposed a deep neural network-based method (SEV-DDI: Severity-Drug–Drug Interaction) with some further-integrated units/layers to achieve higher precision and accuracy. After successfully outperforming other methods in the DDI classification task, we moved a step further and utilized the methods in a sentiment analysis task to investigate the severity of an interaction. The ability to determine the severity of a DDI will be very helpful for clinical decision support systems in making more accurate and informed decisions, ensuring the safety of the patients. Full article

► Show Figures

Figure 1

29 pages, 2381 KiB

Open AccessEditor’s ChoiceArticle

Radiology Imaging Scans for Early Diagnosis of Kidney Tumors: A Review of Data Analytics-Based Machine Learning and Deep Learning Approaches

by Maha Gharaibeh, Dalia Alzu’bi, Malak Abdullah, Ismail Hmeidi, Mohammad Rustom Al Nasar, Laith Abualigah and Amir H. Gandomi

Big Data Cogn. Comput. 2022, 6(1), 29; https://doi.org/10.3390/bdcc6010029 - 8 Mar 2022

Cited by 48 | Viewed by 16335

Abstract

Plenty of disease types exist in world communities that can be explained by humans’ lifestyles or the economic, social, genetic, and other factors of the country of residence. Recently, most research has focused on studying common diseases in the population to reduce death [...] Read more.

Plenty of disease types exist in world communities that can be explained by humans’ lifestyles or the economic, social, genetic, and other factors of the country of residence. Recently, most research has focused on studying common diseases in the population to reduce death risks, take the best procedure for treatment, and enhance the healthcare level of the communities. Kidney Disease is one of the common diseases that have affected our societies. Sectionicularly Kidney Tumors (KT) are the 10th most prevalent tumor for men and women worldwide. Overall, the lifetime likelihood of developing a kidney tumor for males is about 1 in 466 (2.02 percent) and it is around 1 in 80 (1.03 percent) for females. Still, more research is needed on new diagnostic, early, and innovative methods regarding finding an appropriate treatment method for KT. Compared to the tedious and time-consuming traditional diagnosis, automatic detection algorithms of machine learning can save diagnosis time, improve test accuracy, and reduce costs. Previous studies have shown that deep learning can play a role in dealing with complex tasks, diagnosis and segmentation, and classification of Kidney Tumors, one of the most malignant tumors. The goals of this review article on deep learning in radiology imaging are to summarize what has already been accomplished, determine the techniques used by the researchers in previous years in diagnosing Kidney Tumors through medical imaging, and identify some promising future avenues, whether in terms of applications or technological developments, as well as identifying common problems, describing ways to expand the data set, summarizing the knowledge and best practices, and determining remaining challenges and future directions. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

19 pages, 14228 KiB

Open AccessEditor’s ChoiceArticle

Comparison of Object Detection in Head-Mounted and Desktop Displays for Congruent and Incongruent Environments

by René Reinhard, Erinchan Telatar and Shah Rukh Humayoun

Big Data Cogn. Comput. 2022, 6(1), 28; https://doi.org/10.3390/bdcc6010028 - 7 Mar 2022

Cited by 3 | Viewed by 3707

Abstract

Virtual reality technologies, including head-mounted displays (HMD), can provide benefits to psychological research by combining high degrees of experimental control with improved ecological validity. This is due to the strong feeling of being in the displayed environment (presence) experienced by VR users. As [...] Read more.

Virtual reality technologies, including head-mounted displays (HMD), can provide benefits to psychological research by combining high degrees of experimental control with improved ecological validity. This is due to the strong feeling of being in the displayed environment (presence) experienced by VR users. As of yet, it is not fully explored how using HMDs impacts basic perceptual tasks, such as object perception. In traditional display setups, the congruency between background environment and object category has been shown to impact response times in object perception tasks. In this study, we investigated whether this well-established effect is comparable when using desktop and HMD devices. In the study, 21 participants used both desktop and HMD setups to perform an object identification task and, subsequently, their subjective presence while experiencing two-distinct virtual environments (a beach and a home environment) was evaluated. Participants were quicker to identify objects in the HMD condition, independent of object-environment congruency, while congruency effects were not impacted. Furthermore, participants reported significantly higher presence in the HMD condition. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality, and Human-Computer Interaction)

► Show Figures

Figure 1

28 pages, 1211 KiB

Open AccessEditor’s ChoiceArticle

A Combined System Metrics Approach to Cloud Service Reliability Using Artificial Intelligence

by Tek Raj Chhetri, Chinmaya Kumar Dehury, Artjom Lind, Satish Narayana Srirama and Anna Fensel

Big Data Cogn. Comput. 2022, 6(1), 26; https://doi.org/10.3390/bdcc6010026 - 1 Mar 2022

Cited by 5 | Viewed by 5280

Abstract

Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (self-monitoring, analysis, and reporting technology) hard drive metrics with [...] Read more.

Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (self-monitoring, analysis, and reporting technology) hard drive metrics with other system metrics, such as central processing unit (CPU) utilisation. Therefore, we propose a combined system metrics approach for failure prediction based on artificial intelligence to improve reliability. We tested over 100 cloud servers’ data and four artificial intelligence algorithms: random forest, gradient boosting, long short-term memory, and gated recurrent unit, and also performed correlation analysis. Our correlation analysis sheds light on the relationships that exist between system metrics and failure, and the experimental results demonstrate the advantages of combining system metrics, outperforming the state-of-the-art. Full article

(This article belongs to the Special Issue Advanced Machine Learning and Data Mining: A New Frontier in Artificial Intelligence Research)

► Show Figures

Figure 1

42 pages, 679 KiB

Open AccessEditor’s ChoiceArticle

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

by Anna Kirkpatrick, Chidozie Onyeze, David Kartchner, Stephen Allegri, Davi Nakajima An, Kevin McCoy, Evie Davalbhakta and Cassie S. Mitchell

Big Data Cogn. Comput. 2022, 6(1), 27; https://doi.org/10.3390/bdcc6010027 - 1 Mar 2022

Cited by 10 | Viewed by 5454

Abstract

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way [...] Read more.

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities. Full article

(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Editor’s Choice Articles

Article

Further Information

Guidelines

MDPI Initiatives

Follow MDPI