Editor’s Choice Articles

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1799 KiB  
Article
An Information System Supporting Insurance Use Cases by Automated Anomaly Detection
Big Data Cogn. Comput. 2023, 7(1), 4; https://doi.org/10.3390/bdcc7010004 - 28 Dec 2022
Viewed by 2242
Abstract
The increasing availability of vast quantities of data from various sources significantly impacts the insurance industry, although this industry has always been data driven. It accelerates manual processes and enables new products or business models. On the other hand, it also burdens insurance [...] Read more.
The increasing availability of vast quantities of data from various sources significantly impacts the insurance industry, although this industry has always been data driven. It accelerates manual processes and enables new products or business models. On the other hand, it also burdens insurance analysts and other users that need to cope with this development parallel to other global changes. A novel information system (IS) for artificial intelligence (AI)-supported big data analysis, introduced within this paper, shall help to overcome user overload and to empower human data analysts in the insurance industry. The IS research’s focus lies neither in novel algorithms nor datasets but in concepts that combine AI and big data analysis for synergies, such as usability enhancements. For this purpose, this paper systematically designs and implements an AI2VIS4BigData reference model to help information systems conform to automatically detect anomalies and increase its users’ confidence and efficiency. Practical relevance is assured by an interview with an insurance analyst to verify the demand for the developed system and derive all requirements from two insurance industry user stories. A core contribution is the introduction of the IS. Another significant contribution is an extension of the AI2VIS4BigData service-based architecture and user interface (UI) concept on AI and machine learning (ML)-based user empowerment and data transformation. The implemented prototype was applied to synthetic data to enable the evaluation of the system. The quantitative and qualitative evaluations confirm the system’s usability and applicability to the insurance domain yet reveal the need for improvements toward bigger quantities of data and further evaluations with a more extensive user group. Full article
Show Figures

Figure 1

20 pages, 1359 KiB  
Article
A Scientific Perspective on Using Artificial Intelligence in Sustainable Urban Development
Big Data Cogn. Comput. 2023, 7(1), 3; https://doi.org/10.3390/bdcc7010003 - 20 Dec 2022
Cited by 5 | Viewed by 2966
Abstract
Digital transformation (or digitalization) is the process of continuous further development of digital technologies (such as smart devices, cloud services, and Big Data) that have a lasting impact on our economy and society. In this manner, digitalization is a huge driver for permanent [...] Read more.
Digital transformation (or digitalization) is the process of continuous further development of digital technologies (such as smart devices, cloud services, and Big Data) that have a lasting impact on our economy and society. In this manner, digitalization is a huge driver for permanent change, even in the field of Sustainable Urban Development. In the wake of digitalization, expectations are changing, placing pressure at the societal level on the design and development of smart environments for everything that means Sustainable Urban Development. In this sense, the solution is the integration of Artificial Intelligence into Sustainable Urban Development, because technology can simplify people’s lives. The aim of this paper is to ascertain which Sustainable Urban Development dimensions are taken into account when integrating Artificial Intelligence and what results can be achieved. These questions formed the basic framework for this research article. In order to make the current state of Artificial Intelligence in Sustainable Urban Development as a snapshot visible, a systematic review of the current literature between 2012 and 2022 was conducted. The data were collected and analyzed using PRISMA. Based on the studies identified, we found a significant growth in studies, starting in 2018, and that Artificial Intelligence applications refer to the Sustainable Urban Development dimensions of environmental protection, economic development, social justice and equity, culture, and governance. The used Artificial Intelligence techniques in Sustainable Urban Development cover a broad field of Artificial Intelligence, such as Artificial Intelligence in general, Machine Learning, Deep Learning, Artificial Neuronal Networks, Operations Research, Predictive Analytics, and Data Mining. However, with the integration of Artificial Intelligence in Sustainable Urban Development, challenges are marked out. These include responsible municipal policies, awareness of data quality, privacy and data security, the formation of partnerships among stakeholders (e.g., local citizens, civil society, industry, and various levels of government), and transparency and traceability in the implementation and rollout of Artificial Intelligence. A first step was taken towards providing an overview of the possible applications of Artificial Intelligence in Sustainable Urban Development. It was clearly shown that Artificial Intelligence is also gaining ground in this sector. Full article
Show Figures

Figure 1

20 pages, 1982 KiB  
Article
Using an Evidence-Based Approach for Policy-Making Based on Big Data Analysis and Applying Detection Techniques on Twitter
Big Data Cogn. Comput. 2022, 6(4), 160; https://doi.org/10.3390/bdcc6040160 - 19 Dec 2022
Viewed by 2328
Abstract
Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and [...] Read more.
Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and intense user engagement, the question to ask is whether the data on social media can be used as evidence in the policy-making process. The question gives rise to the debate on what characteristics of data should be considered as evidence. Despite the numerous research studies carried out on social media analysis or policy-making, this domain has not been dealt with through an “evidence detection” lens. Thus, this study addresses the gap in the literature on how to analyze the big text data produced by social media and how to use it for policy-making based on evidence detection. The present paper seeks to fill the gap by developing and offering a model that can help policy-makers to distinguish “evidence” from “non-evidence”. To do so, in the first phase of the study, the researchers elicited the characteristics of the “evidence” by conducting a thematic analysis of semi-structured interviews with experts and policy-makers. In the second phase, the developed model was tested against 6-month data elicited from Twitter accounts. The experimental results show that the evidence detection model performed better with decision tree (DT) than the other algorithms. Decision tree (DT) outperformed the other algorithms by an 85.9% accuracy score. This study shows how the model managed to fulfill the aim of the present study, which was detecting Twitter posts that can be used as evidence. This study contributes to the body of knowledge by exploring novel models of text processing and offering an efficient method for analyzing big text data. The practical implication of the study also lies in its efficiency and ease of use, which offers the required evidence for policy-makers. Full article
Show Figures

Figure 1

13 pages, 2240 KiB  
Article
Proposal of Decentralized P2P Service Model for Transfer between Blockchain-Based Heterogeneous Cryptocurrencies and CBDCs
Big Data Cogn. Comput. 2022, 6(4), 159; https://doi.org/10.3390/bdcc6040159 - 19 Dec 2022
Cited by 2 | Viewed by 2430
Abstract
This paper proposes a solution to the transfer problem between blockchain-based heterogeneous cryptocurrencies and CBDCs, with research derived from an analysis of the existing literature. Interoperability between heterogeneous blockchains has been an obstacle to service diversity and user convenience. Many types of cryptocurrencies [...] Read more.
This paper proposes a solution to the transfer problem between blockchain-based heterogeneous cryptocurrencies and CBDCs, with research derived from an analysis of the existing literature. Interoperability between heterogeneous blockchains has been an obstacle to service diversity and user convenience. Many types of cryptocurrencies are currently trading on the market, and many countries are researching and testing central bank digital currencies (CBDCs). In this paper, existing interoperability studies and solutions between heterogeneous blockchains and differences from the proposed service model are described. To enhance digital financial services and improve user convenience, transfer between heterogeneous cryptocurrencies, transfer between heterogeneous CBDCs, and transfer between cryptocurrency and CBDC should be required. This paper proposes an interoperable architecture between heterogeneous blockchains, and a decentralized peer-to-peer (P2P) service model based on the interoperable architecture for transferring between blockchain-based heterogeneous cryptocurrencies and CBDCs. Security threats to the proposed service model are identified and security requirements to prevent the identified security threats are specified. The mentioned security threats and security requirements should be considered when implementing the proposed service model. Full article
Show Figures

Figure 1

19 pages, 784 KiB  
Review
A Survey on Big Data in Pharmacology, Toxicology and Pharmaceutics
Big Data Cogn. Comput. 2022, 6(4), 161; https://doi.org/10.3390/bdcc6040161 - 19 Dec 2022
Cited by 4 | Viewed by 2157
Abstract
Patients, hospitals, sensors, researchers, providers, phones, and healthcare organisations are producing enormous amounts of data in both the healthcare and drug detection sectors. The real challenge in these sectors is to find, investigate, manage, and collect information from patients in order to make [...] Read more.
Patients, hospitals, sensors, researchers, providers, phones, and healthcare organisations are producing enormous amounts of data in both the healthcare and drug detection sectors. The real challenge in these sectors is to find, investigate, manage, and collect information from patients in order to make their lives easier and healthier, not only in terms of formulating new therapies and understanding diseases, but also to predict the results at earlier stages and make effective decisions. The volumes of data available in the fields of pharmacology, toxicology, and pharmaceutics are constantly increasing. These increases are driven by advances in technology, which allow for the analysis of ever-larger data sets. Big Data (BD) has the potential to transform drug development and safety testing by providing new insights into the effects of drugs on human health. However, harnessing this potential involves several challenges, including the need for specialised skills and infrastructure. In this survey, we explore how BD approaches are currently being used in the pharmacology, toxicology, and pharmaceutics fields; in particular, we highlight how researchers have applied BD in pharmacology, toxicology, and pharmaceutics to address various challenges and establish solutions. A comparative analysis helps to trace the implementation of big data in the fields of pharmacology, toxicology, and pharmaceutics. Certain relevant limitations and directions for future research are emphasised. The pharmacology, toxicology, and pharmaceutics fields are still at an early stage of BD adoption, and there are many research challenges to be overcome, in order to effectively employ BD to address specific issues. Full article
Show Figures

Figure 1

29 pages, 464 KiB  
Article
What Is (Not) Big Data Based on Its 7Vs Challenges: A Survey
Big Data Cogn. Comput. 2022, 6(4), 158; https://doi.org/10.3390/bdcc6040158 - 14 Dec 2022
Cited by 1 | Viewed by 2271
Abstract
Big Data has changed how enterprises and people manage knowledge and make decisions. However, when talking about Big Data, so many times there are different definitions about what it is and what it is used for, as there are many interpretations and disagreements. [...] Read more.
Big Data has changed how enterprises and people manage knowledge and make decisions. However, when talking about Big Data, so many times there are different definitions about what it is and what it is used for, as there are many interpretations and disagreements. For these reasons, we have reviewed the literature to compile and provide a possible solution to the existing discrepancies between the terms Data Analysis, Data Mining, Knowledge Discovery in Databases, and Big Data. In addition, we have gathered the patterns used in Data Mining, the different phases of Knowledge Discovery in Databases, and some definitions of Big Data according to some important companies and organisations. Moreover, Big Data has challenges that sometimes are the same as its own characteristics. These characteristics are known as the Vs. Nonetheless, depending on the author, these Vs can be more or less, from 3 to 5, or even 7. Furthermore, the 4Vs or 5Vs are not the same every time. Therefore, in this survey, we reviewed the literature to explain how many Vs have been detected and explained according to different existing problems. In addition, we detected 7Vs, three of which had subtypes. Full article
23 pages, 735 KiB  
Review
Explore Big Data Analytics Applications and Opportunities: A Review
Big Data Cogn. Comput. 2022, 6(4), 157; https://doi.org/10.3390/bdcc6040157 - 14 Dec 2022
Cited by 9 | Viewed by 5343
Abstract
Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big [...] Read more.
Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big data applications pre and peri-COVID-19. A comparison between Pre and Peri of the pandemic for using Big Data applications is presented. The comparison is expanded to four highly recognized industry fields: Healthcare, Education, Transportation, and Banking. A discussion on the effectiveness of the four major types of data analytics across the mentioned industries is highlighted. Hence, this paper provides an illustrative description of the importance of big data applications in the era of COVID-19, as well as aligning the applications to their relevant big data analytics models. This review paper concludes that applying the ultimate big data applications and their associated data analytics models can harness the significant limitations faced by organizations during one of the most fateful pandemics worldwide. Future work will conduct a systematic literature review and a comparative analysis of the existing Big Data Systems and models. Moreover, future work will investigate the critical challenges of Big Data Analytics and applications during the COVID-19 pandemic. Full article
Show Figures

Figure 1

31 pages, 6664 KiB  
Review
Machine Learning Styles for Diabetic Retinopathy Detection: A Review and Bibliometric Analysis
Big Data Cogn. Comput. 2022, 6(4), 154; https://doi.org/10.3390/bdcc6040154 - 12 Dec 2022
Cited by 4 | Viewed by 4416
Abstract
Diabetic retinopathy (DR) is a medical condition caused by diabetes. The development of retinopathy significantly depends on how long a person has had diabetes. Initially, there may be no symptoms or just a slight vision problem due to impairment of the retinal blood [...] Read more.
Diabetic retinopathy (DR) is a medical condition caused by diabetes. The development of retinopathy significantly depends on how long a person has had diabetes. Initially, there may be no symptoms or just a slight vision problem due to impairment of the retinal blood vessels. Later, it may lead to blindness. Recognizing the early clinical signs of DR is very important for intervening in and effectively treating DR. Thus, regular eye check-ups are necessary to direct the person to a doctor for a comprehensive ocular examination and treatment as soon as possible to avoid permanent vision loss. Nevertheless, due to limited resources, it is not feasible for screening. As a result, emerging technologies, such as artificial intelligence, for the automatic detection and classification of DR are alternative screening methodologies and thereby make the system cost-effective. People have been working on artificial-intelligence-based technologies to detect and analyze DR in recent years. This study aimed to investigate different machine learning styles that are chosen for diagnosing retinopathy. Thus, a bibliometric analysis was systematically done to discover different machine learning styles for detecting diabetic retinopathy. The data were exported from popular databases, namely, Web of Science (WoS) and Scopus. These data were analyzed using Biblioshiny and VOSviewer in terms of publications, top countries, sources, subject area, top authors, trend topics, co-occurrences, thematic evolution, factorial map, citation analysis, etc., which form the base for researchers to identify the research gaps in diabetic retinopathy detection and classification. Full article
Show Figures

Figure 1

17 pages, 1860 KiB  
Article
Explaining Exploration–Exploitation in Humans
Big Data Cogn. Comput. 2022, 6(4), 155; https://doi.org/10.3390/bdcc6040155 - 12 Dec 2022
Cited by 1 | Viewed by 1250
Abstract
Human as well as algorithmic searches are performed to balance exploration and exploitation. The search task in this paper is the global optimization of a 2D multimodal function, unknown to the searcher. Thus, the task presents the following features: (i) uncertainty [...] Read more.
Human as well as algorithmic searches are performed to balance exploration and exploitation. The search task in this paper is the global optimization of a 2D multimodal function, unknown to the searcher. Thus, the task presents the following features: (i) uncertainty (i.e., information about the function can be acquired only through function observations), (ii) sequentiality (i.e., the choice of the next point to observe depends on the previous ones), and (iii) limited budget (i.e., a maximum number of sequential choices allowed to the players). The data about human behavior are gathered through a gaming app whose screen represents all the possible locations the player can click on. The associated value of the unknown function is shown to the player. Experimental data are gathered from 39 subjects playing 10 different tasks each. Decisions are analyzed in a Pareto optimality setting—improvement vs. uncertainty. The experimental results show that the most significant deviations from the Pareto rationality are associated with a behavior named “exasperated exploration”, close to random search. This behavior shows a statistically significant association with stressful situations occurring when, according to their current belief, the human feels there are no chances to improve over the best value observed so far, while the remaining budget is running out. To classify between Pareto and Not-Pareto decisions, an explainable/interpretable Machine Learning model based on Decision Tree learning is developed. The resulting model is used to implement a synthetic human searcher/optimizer successively compared against Bayesian Optimization. On half of the test problems, the synthetic human results as more effective and efficient. Full article
Show Figures

Figure 1

22 pages, 2447 KiB  
Article
An Advanced Big Data Quality Framework Based on Weighted Metrics
Big Data Cogn. Comput. 2022, 6(4), 153; https://doi.org/10.3390/bdcc6040153 - 09 Dec 2022
Cited by 3 | Viewed by 2215
Abstract
While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually [...] Read more.
While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually limited to few metrics. Indeed, while more than 50 data quality dimensions have been defined in the literature, the number of measured dimensions is limited to 11 dimensions. Therefore, this paper aims to extend the measured dimensions by defining four new data quality metrics: Integrity, Accessibility, Ease of manipulation, and Security. Thus, we propose a comprehensive Big Data Quality Assessment Framework based on 12 metrics: Completeness, Timeliness, Volatility, Uniqueness, Conformity, Consistency, Ease of manipulation, Relevancy, Readability, Security, Accessibility, and Integrity. In addition, to ensure accurate data quality assessment, we apply data weights at three data unit levels: data fields, quality metrics, and quality aspects. Furthermore, we define and measure five quality aspects to provide a macro-view of data quality. Finally, an experiment is performed to implement the defined measures. The results show that the suggested methodology allows a more exhaustive and accurate big data quality assessment, with a more extensive methodology defining a weighted quality score based on 12 metrics and achieving a best quality model score of 9/10. Full article
Show Figures

Figure 1

21 pages, 3495 KiB  
Article
Innovative Business Process Reengineering Adoption: Framework of Big Data Sentiment, Improving Customers’ Service Level Agreement
Big Data Cogn. Comput. 2022, 6(4), 151; https://doi.org/10.3390/bdcc6040151 - 08 Dec 2022
Cited by 2 | Viewed by 2017
Abstract
Social media is now regarded as the most valuable source of data for trend analysis and innovative business process reengineering preferences. Data made accessible through social media can be utilized for a variety of purposes, such as by an entrepreneur who wants to [...] Read more.
Social media is now regarded as the most valuable source of data for trend analysis and innovative business process reengineering preferences. Data made accessible through social media can be utilized for a variety of purposes, such as by an entrepreneur who wants to learn more about the market they intend to enter and uncover their consumers’ requirements before launching their new products or services. Sentiment analysis and text mining of telecommunication businesses via social media posts and comments are the subject of this study. A proposed framework will be utilized as a guideline, and it will be tested for sentiment analysis. Lexicon-based sentiment categorization is used as a model training dataset for a supervised machine learning support vector machine. The result is very promising. The accuracy and the quantity of the true sentiments it can detect are compared. This result signifies the usefulness of text mining and sentiment analysis on social media data, while the use of machine learning classifiers for predicting sentiment orientation provides a useful tool for operations and marketing departments. The availability of large amounts of data in this digitally active society is advantageous for sectors such as the telecommunication industry. These companies can be two steps ahead with their strategy and develop a more cohesive company that can make customers happier and mitigate problems easily with the use of text mining and sentiment analysis for further adopting innovative business process reengineering for service improvements within the telecommunications industry. Full article
(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)
Show Figures

Figure 1

41 pages, 6572 KiB  
Review
A Systematic Literature Review on Diabetic Retinopathy Using an Artificial Intelligence Approach
Big Data Cogn. Comput. 2022, 6(4), 152; https://doi.org/10.3390/bdcc6040152 - 08 Dec 2022
Cited by 4 | Viewed by 5791
Abstract
Diabetic retinopathy occurs due to long-term diabetes with changing blood glucose levels and has become the most common cause of vision loss worldwide. It has become a severe problem among the working-age group that needs to be solved early to avoid vision loss [...] Read more.
Diabetic retinopathy occurs due to long-term diabetes with changing blood glucose levels and has become the most common cause of vision loss worldwide. It has become a severe problem among the working-age group that needs to be solved early to avoid vision loss in the future. Artificial intelligence-based technologies have been utilized to detect and grade diabetic retinopathy at the initial level. Early detection allows for proper treatment and, as a result, eyesight complications can be avoided. The in-depth analysis now details the various methods for diagnosing diabetic retinopathy using blood vessels, microaneurysms, exudates, macula, optic discs, and hemorrhages. In most trials, fundus images of the retina are used, which are taken using a fundus camera. This survey discusses the basics of diabetes, its prevalence, complications, and artificial intelligence approaches to deal with the early detection and classification of diabetic retinopathy. The research also discusses artificial intelligence-based techniques such as machine learning and deep learning. New research fields such as transfer learning using generative adversarial networks, domain adaptation, multitask learning, and explainable artificial intelligence in diabetic retinopathy are also considered. A list of existing datasets, screening systems, performance measurements, biomarkers in diabetic retinopathy, potential issues, and challenges faced in ophthalmology, followed by the future scope conclusion, is discussed. To the author, no other literature has analyzed recent state-of-the-art techniques considering the PRISMA approach and artificial intelligence as the core. Full article
Show Figures

Figure 1

16 pages, 4442 KiB  
Article
Yolov5 Series Algorithm for Road Marking Sign Identification
Big Data Cogn. Comput. 2022, 6(4), 149; https://doi.org/10.3390/bdcc6040149 - 07 Dec 2022
Cited by 5 | Viewed by 3882
Abstract
Road markings and signs provide vehicles and pedestrians with essential information that assists them to follow the traffic regulations. Road surface markings include pedestrian crossings, directional arrows, zebra crossings, speed limit signs, other similar signs and text, and so on, which are usually [...] Read more.
Road markings and signs provide vehicles and pedestrians with essential information that assists them to follow the traffic regulations. Road surface markings include pedestrian crossings, directional arrows, zebra crossings, speed limit signs, other similar signs and text, and so on, which are usually painted directly onto the road surface. Road markings fulfill a variety of important functions, such as alerting drivers to the potentially hazardous road section, directing traffic, prohibiting certain actions, and slowing down. This research paper provides a summary of the Yolov5 algorithm series for road marking sign identification, which includes Yolov5s, Yolov5m, Yolov5n, Yolov5l, and Yolov5x. This study explores a wide range of contemporary object detectors, such as the ones that are used to determine the location of road marking signs. Performance metrics monitor important data, including the quantity of BFLOPS, the mean average precision (mAP), and the detection time (IoU). Our findings shows that Yolov5m is the most stable method compared to other methods with 76% precision, 86% recall, and 83% mAP during the training stage. Moreover, Yolov5m and Yolov5l achieve the highest score, mAP 87% on average in the testing stage. In addition, we have created a new dataset for road marking signs in Taiwan, called TRMSD. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

13 pages, 879 KiB  
Article
Trust-Based Data Communication in Wireless Body Area Network for Healthcare Applications
Big Data Cogn. Comput. 2022, 6(4), 148; https://doi.org/10.3390/bdcc6040148 - 01 Dec 2022
Cited by 1 | Viewed by 1530
Abstract
A subset of Wireless Sensor Networks, Wireless Body Area Networks (WBAN) is an emerging technology. WBAN is a collection of tiny pieces of wireless body sensors with small computational capability, communicating short distances using ZigBee or Bluetooth, an application mainly in the healthcare [...] Read more.
A subset of Wireless Sensor Networks, Wireless Body Area Networks (WBAN) is an emerging technology. WBAN is a collection of tiny pieces of wireless body sensors with small computational capability, communicating short distances using ZigBee or Bluetooth, an application mainly in the healthcare industry like remote patient monitoring. The small piece of sensor monitors health factors like body temperature, pulse rate, ECG, heart rate, etc., and communicates to the base station or central coordinator for aggregation or data computation. The final data is communicated to remote monitoring devices through the internet or cloud service providers. The main challenge for this technology is energy consumption and secure communication within the network and the possibility of attacks executed by malicious nodes, creating problems for the network. This system proposes a suitable trust model for secure communication in WBAN based on node trust and data trust. Node trust is calculated using direct trust calculation and node behaviours. The data trust is calculated using consistent data success and data aging. The performance is compared with an existing protocol like Trust Evaluation (TE)-WBAN and Body Area Network (BAN)-Trust which is not a cryptographic technique. The protocol is lightweight and has low overhead. The performance is rated best for Throughput, Packet Delivery Ratio, and Minimum delay. With extensive simulation on-off attacks, Selfishness attacks, sleeper attacks, and Message suppression attacks were prevented. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

19 pages, 6928 KiB  
Article
Image Fundus Classification System for Diabetic Retinopathy Stage Detection Using Hybrid CNN-DELM
Big Data Cogn. Comput. 2022, 6(4), 146; https://doi.org/10.3390/bdcc6040146 - 01 Dec 2022
Cited by 6 | Viewed by 1684
Abstract
Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In [...] Read more.
Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In this study, the detection of DR severity was carried out using the hybrid CNN-DELM method (CDELM). The CNN architectures used were ResNet-18, ResNet-50, ResNet-101, GoogleNet, and DenseNet. The learning outcome features were further classified using the DELM algorithm. The comparison of CNN architecture aimed to find the best CNN architecture for fundus image features extraction. This research also compared the effect of using the kernel function on the performance of DELM in fundus image classification. All experiments using CDELM showed maximum results, with an accuracy of 100% in the DRIVE data and the two-class MESSIDOR data. Meanwhile, the best results obtained in the MESSIDOR 4 class data reached 98.20%. The advantage of the DELM method compared to the conventional CNN method is that the training time duration is much shorter. CNN takes an average of 30 min for training, while the CDELM method takes only an average of 2.5 min. Based on the value of accuracy and duration of training time, the CDELM method had better performance than the conventional CNN method. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

20 pages, 6697 KiB  
Article
Image Segmentation for Mitral Regurgitation with Convolutional Neural Network Based on UNet, Resnet, Vnet, FractalNet and SegNet: A Preliminary Study
Big Data Cogn. Comput. 2022, 6(4), 141; https://doi.org/10.3390/bdcc6040141 - 25 Nov 2022
Cited by 3 | Viewed by 1788
Abstract
The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an [...] Read more.
The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an abnormality of the mitral valve on the left side of the heart that causes an inability of the mitral valve to close properly. Convolutional Neural Network (CNN) is a type of deep learning that is suitable for use in image analysis. Segmentation is widely used in analyzing medical images because it can divide images into simpler ones to facilitate the analysis process by separating objects that are not analyzed into backgrounds and objects to be analyzed into foregrounds. This study builds a dataset from the data of patients with mitral regurgitation and patients who have normal hearts, and heart valve image analysis is done by segmenting the images of their mitral heart valves. Several types of CNN architecture were applied in this research, including U-Net, SegNet, V-Net, FractalNet, and ResNet architectures. The experimental results show that the best architecture is U-Net3 in terms of Pixel Accuracy (97.59%), Intersection over Union (86.98%), Mean Accuracy (93.46%), Precision (85.60%), Recall (88.39%), and Dice Coefficient (86.58%). Full article
(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)
Show Figures

Figure 1

13 pages, 367 KiB  
Article
PSO-Driven Feature Selection and Hybrid Ensemble for Network Anomaly Detection
Big Data Cogn. Comput. 2022, 6(4), 137; https://doi.org/10.3390/bdcc6040137 - 13 Nov 2022
Cited by 2 | Viewed by 1632
Abstract
As a system capable of monitoring and evaluating illegitimate network access, an intrusion detection system (IDS) profoundly impacts information security research. Since machine learning techniques constitute the backbone of IDS, it has been challenging to develop an accurate detection mechanism. This study aims [...] Read more.
As a system capable of monitoring and evaluating illegitimate network access, an intrusion detection system (IDS) profoundly impacts information security research. Since machine learning techniques constitute the backbone of IDS, it has been challenging to develop an accurate detection mechanism. This study aims to enhance the detection performance of IDS by using a particle swarm optimization (PSO)-driven feature selection approach and hybrid ensemble. Specifically, the final feature subsets derived from different IDS datasets, i.e., NSL-KDD, UNSW-NB15, and CICIDS-2017, are trained using a hybrid ensemble, comprising two well-known ensemble learners, i.e., gradient boosting machine (GBM) and bootstrap aggregation (bagging). Instead of training GBM with individual ensemble learning, we train GBM on a subsample of each intrusion dataset and combine the final class prediction using majority voting. Our proposed scheme led to pivotal refinements over existing baselines, such as TSE-IDS, voting ensembles, weighted majority voting, and other individual ensemble-based IDS such as LightGBM. Full article
Show Figures

Figure 1

24 pages, 718 KiB  
Review
An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management
Big Data Cogn. Comput. 2022, 6(4), 132; https://doi.org/10.3390/bdcc6040132 - 07 Nov 2022
Cited by 10 | Viewed by 21028
Abstract
Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its [...] Read more.
Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its performance and services. Major organizations generate, collect and process vast amounts of data, falling under the category of big data. Managing and analyzing the sheer volume and variety of big data is a cumbersome process. At the same time, proper utilization of the vast collection of an organization’s information can generate meaningful insights into business tactics. In this regard, two of the popular data management systems in the area of big data analytics (i.e., data warehouse and data lake) act as platforms to accumulate the big data generated and used by organizations. Although seemingly similar, both of them differ in terms of their characteristics and applications. This article presents a detailed overview of the roles of data warehouses and data lakes in modern enterprise data management. We detail the definitions, characteristics and related works for the respective data management frameworks. Furthermore, we explain the architecture and design considerations of the current state of the art. Finally, we provide a perspective on the challenges and promising research directions for the future. Full article
Show Figures

Figure 1

19 pages, 686 KiB  
Article
THOR: A Hybrid Recommender System for the Personalized Travel Experience
Big Data Cogn. Comput. 2022, 6(4), 131; https://doi.org/10.3390/bdcc6040131 - 04 Nov 2022
Cited by 1 | Viewed by 2797
Abstract
One of the travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items. Recommendation systems provide an effective way to solve the problem [...] Read more.
One of the travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items. Recommendation systems provide an effective way to solve the problem of information overload. In this work, we design and implement “The Hybrid Offer Ranker” (THOR), a hybrid, personalized recommender system for the transportation domain. THOR assigns every traveler a unique contextual preference model built using solely their personal data, which makes the model sensitive to the user’s choices. This model is used to rank travel offers presented to each user according to their personal preferences. We reduce the recommendation problem to one of binary classification that predicts the probability with which the traveler will buy each available travel offer. Travel offers are ranked according to the computed probabilities, hence to the user’s personal preference model. Moreover, to tackle the cold start problem for new users, we apply clustering algorithms to identify groups of travelers with similar profiles and build a preference model for each group. To test the system’s performance, we generate a dataset according to some carefully designed rules. The results of the experiments show that the THOR tool is capable of learning the contextual preferences of each traveler and ranks offers starting from those that have the higher probability of being selected. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

29 pages, 620 KiB  
Article
A Space-Time Framework for Sentiment Scope Analysis in Social Media
Big Data Cogn. Comput. 2022, 6(4), 130; https://doi.org/10.3390/bdcc6040130 - 03 Nov 2022
Cited by 18 | Viewed by 2206
Abstract
The concept of scope was introduced in Social Network Analysis to assess the authoritativeness and convincing ability of a user toward other users on one or more social platforms. It has been studied in the past in some specific contexts, for example to [...] Read more.
The concept of scope was introduced in Social Network Analysis to assess the authoritativeness and convincing ability of a user toward other users on one or more social platforms. It has been studied in the past in some specific contexts, for example to assess the ability of a user to spread information on Twitter. In this paper, we propose a new investigation on scope, as we want to assess the scope of the sentiment of a user on a topic. We also propose a multi-dimensional definition of scope. In fact, besides the traditional spatial scope, we introduce the temporal one, which has never been addressed in the literature, and propose a model that allows the concept of scope to be extended to further dimensions in the future. Furthermore, we propose an approach and a related set of parameters for measuring the scope of the sentiment of a user on a topic in a social network. Finally, we illustrate the results of an experimental campaign we conducted to evaluate the proposed framework on a dataset derived from Reddit. The main novelties of this paper are: (i) a multi-dimensional view of scope; (ii) the introduction of the concept of sentiment scope; (iii) the definition of a general framework capable of analyzing the sentiment scope related to any subject on any social network. Full article
Show Figures

Figure 1

22 pages, 601 KiB  
Review
Facial Age Estimation Using Machine Learning Techniques: An Overview
Big Data Cogn. Comput. 2022, 6(4), 128; https://doi.org/10.3390/bdcc6040128 - 26 Oct 2022
Cited by 6 | Viewed by 5782
Abstract
Automatic age estimation from facial images is an exciting machine learning topic that has attracted researchers’ attention over the past several years. Numerous human–computer interaction applications, such as targeted marketing, content access control, or soft-biometrics systems, employ age estimation models to carry out [...] Read more.
Automatic age estimation from facial images is an exciting machine learning topic that has attracted researchers’ attention over the past several years. Numerous human–computer interaction applications, such as targeted marketing, content access control, or soft-biometrics systems, employ age estimation models to carry out secondary tasks such as user filtering or identification. Despite the vast array of applications that could benefit from automatic age estimation, building an automatic age estimation system comes with issues such as data disparity, the unique ageing pattern of each individual, and facial photo quality. This paper provides a survey on the standard methods of building automatic age estimation models, the benchmark datasets for building these models, and some of the latest proposed pieces of literature that introduce new age estimation methods. Finally, we present and discuss the standard evaluation metrics used to assess age estimation models. In addition to the survey, we discuss the identified gaps in the reviewed literature and present recommendations for future research. Full article
Show Figures

Figure 1

23 pages, 598 KiB  
Review
Applications and Challenges of Federated Learning Paradigm in the Big Data Era with Special Emphasis on COVID-19
Big Data Cogn. Comput. 2022, 6(4), 127; https://doi.org/10.3390/bdcc6040127 - 26 Oct 2022
Cited by 1 | Viewed by 2931
Abstract
Federated learning (FL) is one of the leading paradigms of modern times with higher privacy guarantees than any other digital solution. Since its inception in 2016, FL has been rigorously investigated from multiple perspectives. Some of these perspectives are extensions of FL’s applications [...] Read more.
Federated learning (FL) is one of the leading paradigms of modern times with higher privacy guarantees than any other digital solution. Since its inception in 2016, FL has been rigorously investigated from multiple perspectives. Some of these perspectives are extensions of FL’s applications in different sectors, communication overheads, statistical heterogeneity problems, client dropout issues, the legitimacy of FL system results, privacy preservation, etc. Recently, FL is being increasingly used in the medical domain for multiple purposes, and many successful applications exist that are serving mankind in various ways. In this work, we describe the novel applications and challenges of the FL paradigm with special emphasis on the COVID-19 pandemic. We describe the synergies of FL with other emerging technologies to accomplish multiple services to fight the COVID-19 pandemic. We analyze the recent open-source development of FL which can help in designing scalable and reliable FL models. Lastly, we suggest valuable recommendations to enhance the technical persuasiveness of the FL paradigm. To the best of the authors’ knowledge, this is the first work that highlights the efficacy of FL in the era of COVID-19. The analysis enclosed in this article can pave the way for understanding the technical efficacy of FL in medical field, specifically COVID-19. Full article
(This article belongs to the Special Issue Cyber Security in Big Data Era)
Show Figures

Figure 1

20 pages, 3064 KiB  
Article
Explaining Intrusion Detection-Based Convolutional Neural Networks Using Shapley Additive Explanations (SHAP)
Big Data Cogn. Comput. 2022, 6(4), 126; https://doi.org/10.3390/bdcc6040126 - 25 Oct 2022
Cited by 8 | Viewed by 2130
Abstract
Artificial intelligence (AI) and machine learning (ML) models have become essential tools used in many critical systems to make significant decisions; the decisions taken by these models need to be trusted and explained on many occasions. On the other hand, the performance of [...] Read more.
Artificial intelligence (AI) and machine learning (ML) models have become essential tools used in many critical systems to make significant decisions; the decisions taken by these models need to be trusted and explained on many occasions. On the other hand, the performance of different ML and AI models varies with the same used dataset. Sometimes, developers have tried to use multiple models before deciding which model should be used without understanding the reasons behind this variance in performance. Explainable artificial intelligence (XAI) models have presented an explanation for the models’ performance based on highlighting the features that the model considered necessary while making the decision. This work presents an analytical approach to studying the density functions for intrusion detection dataset features. The study explains how and why these features are essential during the XAI process. We aim, in this study, to explain XAI behavior to add an extra layer of explainability. The density function analysis presented in this paper adds a deeper understanding of the importance of features in different AI models. Specifically, we present a method to explain the results of SHAP (Shapley additive explanations) for different machine learning models based on the feature data’s KDE (kernel density estimation) plots. We also survey the specifications of dataset features that can perform better for convolutional neural networks (CNN) based models. Full article
(This article belongs to the Special Issue Machine Learning for Dependable Edge Computing Systems and Services)
Show Figures

Figure 1

15 pages, 1058 KiB  
Article
White Blood Cell Classification Using Multi-Attention Data Augmentation and Regularization
Big Data Cogn. Comput. 2022, 6(4), 122; https://doi.org/10.3390/bdcc6040122 - 21 Oct 2022
Cited by 7 | Viewed by 3487
Abstract
Accurate and robust human immune system assessment through white blood cell evaluation require computer-aided tools with pathologist-level accuracy. This work presents a multi-attention leukocytes subtype classification method by leveraging fine-grained and spatial locality attributes of white blood cell. The proposed framework comprises three [...] Read more.
Accurate and robust human immune system assessment through white blood cell evaluation require computer-aided tools with pathologist-level accuracy. This work presents a multi-attention leukocytes subtype classification method by leveraging fine-grained and spatial locality attributes of white blood cell. The proposed framework comprises three main components: texture-aware/attention map generation blocks, attention regularization, and attention-based data augmentation. The developed framework is applicable to general CNN-based architectures and enhances decision making by paying specific attention to the discriminative regions of a white blood cell. The performance of the proposed method/model was evaluated through an extensive set of experiments and validation. The obtained results demonstrate the superior performance of the model achieving 99.69 % accuracy compared to other state-of-the-art approaches. The proposed model is a good alternative and complementary to existing computer diagnosis tools to assist pathologists in evaluating white blood cells from blood smear images. Full article
(This article belongs to the Special Issue Data Science in Health Care)
Show Figures

Figure 1

24 pages, 1286 KiB  
Article
Ontology-Based Personalized Job Recommendation Framework for Migrants and Refugees
Big Data Cogn. Comput. 2022, 6(4), 120; https://doi.org/10.3390/bdcc6040120 - 19 Oct 2022
Cited by 5 | Viewed by 2102
Abstract
Participation in the labor market is seen as the most important factor favoring long-term integration of migrants and refugees into society. This paper describes the job recommendation framework of the Integration of Migrants MatchER SErvice (IMMERSE). The proposed framework acts as a matching [...] Read more.
Participation in the labor market is seen as the most important factor favoring long-term integration of migrants and refugees into society. This paper describes the job recommendation framework of the Integration of Migrants MatchER SErvice (IMMERSE). The proposed framework acts as a matching tool that enables the contexts of individual migrants and refugees, including their expectations, languages, educational background, previous job experience and skills, to be captured in the ontology and facilitate their matching with the job opportunities available in their host country. Profile information and job listings are processed in real time in the back-end, and matches are revealed in the front-end. Moreover, the matching tool considers the activity of the users on the platform to provide recommendations based on the similarity among existing jobs that they already showed interest in and new jobs posted on the platform. Finally, the framework takes into account the location of the users to rank the results and only shows the most relevant location-based recommendations. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

14 pages, 307 KiB  
Article
A Survey on Medical Image Segmentation Based on Deep Learning Techniques
Big Data Cogn. Comput. 2022, 6(4), 117; https://doi.org/10.3390/bdcc6040117 - 17 Oct 2022
Cited by 12 | Viewed by 3675
Abstract
Deep learning techniques have rapidly become important as a preferred method for evaluating medical image segmentation. This survey analyses different contributions in the deep learning medical field, including the major common issues published in recent years, and also discusses the fundamentals of deep [...] Read more.
Deep learning techniques have rapidly become important as a preferred method for evaluating medical image segmentation. This survey analyses different contributions in the deep learning medical field, including the major common issues published in recent years, and also discusses the fundamentals of deep learning concepts applicable to medical image segmentation. The study of deep learning can be applied to image categorization, object recognition, segmentation, registration, and other tasks. First, the basic ideas of deep learning techniques, applications, and frameworks are introduced. Deep learning techniques that operate the ideal applications are briefly explained. This paper indicates that there is a previous experience with different techniques in the class of medical image segmentation. Deep learning has been designed to describe and respond to various challenges in the field of medical image analysis such as low accuracy of image classification, low segmentation resolution, and poor image enhancement. Aiming to solve these present issues and improve the evolution of medical image segmentation challenges, we provide suggestions for future research. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

40 pages, 4281 KiB  
Article
A Probabilistic Data Fusion Modeling Approach for Extracting True Values from Uncertain and Conflicting Attributes
Big Data Cogn. Comput. 2022, 6(4), 114; https://doi.org/10.3390/bdcc6040114 - 13 Oct 2022
Cited by 1 | Viewed by 1807
Abstract
Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task [...] Read more.
Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task is called data fusion. In this paper, we deal with the problem of data fusion based on probabilistic entity linkage and uncertainty management in conflict data. Data fusion has been widely explored in the research community. However, concerns such as explicit uncertainty management and on-demand data fusion, which can cope with dynamic data sources, have not been studied well. This paper proposes a new probabilistic data fusion modeling approach that attempts to find true data values under conditions of uncertain or conflicted multi-valued attributes. These attributes are generated from the probabilistic linkage and merging alternatives of multi-corresponding entities. Consequently, the paper identifies and formulates several data fusion cases and sample spaces that require further conditional computation using our computational fusion method. The identification is established to fit with a real-world data fusion problem. In the real world, there is always the possibility of heterogeneous data sources, the integration of probabilistic entities, single or multiple truth values for certain attributes, and different combinations of attribute values as alternatives for each generated entity. We validate our probabilistic data fusion approach through mathematical representation based on three data sources with different reliability scores. The validity of the approach was assessed via implementation into our probabilistic integration system to show how it can manage and resolve different cases of data conflicts and inconsistencies. The outcome showed improved accuracy in identifying true values due to the association of constructive evidence. Full article
Show Figures

Figure 1

44 pages, 5439 KiB  
Article
Graph-Based Conversation Analysis in Social Media
Big Data Cogn. Comput. 2022, 6(4), 113; https://doi.org/10.3390/bdcc6040113 - 12 Oct 2022
Cited by 3 | Viewed by 3544
Abstract
Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on [...] Read more.
Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on user relationships or on the shared content, while ignoring the valuable information hidden in the digital conversations, in terms of structure of the discussion and relation between contents, which is essential for understanding online communication behavior. This work proposes a graph-based framework to assess the shape and structure of online conversations. The analysis was composed of two main stages: intent analysis and network generation. Users’ intention was detected using keyword-based classification, followed by the implementation of machine learning-based classification algorithms for uncategorized comments. Afterwards, human-in-the-loop was involved in improving the keyword-based classification. To extract essential information on social media communication patterns among the users, we built conversation graphs using a directed multigraph network and we show our model at work in two real-life experiments. The first experiment used data from a real social media challenge and it was able to categorize 90% of comments with 98% accuracy. The second experiment focused on COVID vaccine-related discussions in online forums and investigated the stance and sentiment to understand how the comments are affected by their parent discussion. Finally, the most popular online discussion patterns were mined and interpreted. We see that the dynamics obtained from conversation graphs are similar to traditional communication activities. Full article
(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)
Show Figures

Figure 1

33 pages, 2658 KiB  
Article
Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis
Big Data Cogn. Comput. 2022, 6(4), 109; https://doi.org/10.3390/bdcc6040109 - 07 Oct 2022
Cited by 6 | Viewed by 4564
Abstract
Question Answer System (QAS) automatically answers the question asked in natural language. Due to the varying dimensions and approaches that are available, QAS has a very diverse solution space, and a proper bibliometric study is required to paint the entire domain space. This [...] Read more.
Question Answer System (QAS) automatically answers the question asked in natural language. Due to the varying dimensions and approaches that are available, QAS has a very diverse solution space, and a proper bibliometric study is required to paint the entire domain space. This work presents a bibliometric and literature analysis of QAS. Scopus and Web of Science are two well-known research databases used for the study. A systematic analytical study comprising performance analysis and science mapping is performed. Recent research trends, seminal work, and influential authors are identified in performance analysis using statistical tools on research constituents. On the other hand, science mapping is performed using network analysis on a citation and co-citation network graph. Through this analysis, the domain’s conceptual evolution and intellectual structure are shown. We have divided the literature into four important architecture types and have provided the literature analysis of Knowledge Base (KB)-based and GNN-based approaches for QAS. Full article
Show Figures

Figure 1

13 pages, 2235 KiB  
Article
Deep Learning-Based Computer-Aided Classification of Amniotic Fluid Using Ultrasound Images from Saudi Arabia
Big Data Cogn. Comput. 2022, 6(4), 107; https://doi.org/10.3390/bdcc6040107 - 03 Oct 2022
Cited by 2 | Viewed by 1911
Abstract
Amniotic Fluid (AF) refers to a protective liquid surrounding the fetus inside the amniotic sac, serving multiple purposes, and hence is a key indicator of fetal health. Determining the AF levels at an early stage helps to ascertain the maturation of lungs and [...] Read more.
Amniotic Fluid (AF) refers to a protective liquid surrounding the fetus inside the amniotic sac, serving multiple purposes, and hence is a key indicator of fetal health. Determining the AF levels at an early stage helps to ascertain the maturation of lungs and gastrointestinal development, etc. Low AF entails the risk of premature birth, perinatal mortality, and thereby admission to intensive care unit (ICU). Moreover, AF level is also a critical factor in determining early deliveries. Hence, AF detection is a vital measurement required during early ultrasound (US), and its automation is essential. The detection of AF is usually a time-consuming process as it is patient specific. Furthermore, its measurement and accuracy are prone to errors as it heavily depends on the sonographer’s experience. However, automating this process by developing robust, precise, and effective methods for detection will be beneficial to the healthcare community. Therefore, in this paper, we utilized transfer learning models in order to classify the AF levels as normal or abnormal using the US images. The dataset used consisted of 166 US images of pregnant women, and initially the dataset was preprocessed before training the model. Five transfer learning models, namely, Xception, Densenet, InceptionResNet, MobileNet, and ResNet, were applied. The results showed that MobileNet achieved an overall accuracy of 0.94. Overall, the proposed study produces an effective result in successfully classifying the AF levels, thereby building automated, effective models reliant on transfer learning in order to aid sonographers in evaluating fetal health. Full article
(This article belongs to the Special Issue Data Science in Health Care)
Show Figures

Figure 1

17 pages, 935 KiB  
Article
Supporting Meteorologists in Data Analysis through Knowledge-Based Recommendations
Big Data Cogn. Comput. 2022, 6(4), 103; https://doi.org/10.3390/bdcc6040103 - 28 Sep 2022
Cited by 2 | Viewed by 1625
Abstract
Climate change means coping directly or indirectly with extreme weather conditions for everybody. Therefore, analyzing meteorological data to create precise models is gaining more importance and might become inevitable. Meteorologists have extensive domain knowledge about meteorological data yet lack practical data analysis skills. [...] Read more.
Climate change means coping directly or indirectly with extreme weather conditions for everybody. Therefore, analyzing meteorological data to create precise models is gaining more importance and might become inevitable. Meteorologists have extensive domain knowledge about meteorological data yet lack practical data analysis skills. This paper presents a method to bridge this gap by empowering the data knowledge carriers to analyze the data. The proposed system utilizes symbolic AI, a knowledge base created by experts, and a recommendation expert system to offer suiting data analysis methods or data pre-processing to meteorologists. This paper systematically analyzes the target user group of meteorologists and practical use cases to arrive at a conceptual and technical system design implemented in the CAMeRI prototype. The concepts in this paper are aligned with the AI2VIS4BigData Reference Model and comprise a novel first-order logic knowledge base that represents analysis methods and related pre-processings. The prototype implementation was qualitatively and quantitatively evaluated. This evaluation included recommendation validation for real-world data, a cognitive walkthrough, and measuring computation timings of the different system components. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

42 pages, 4691 KiB  
Article
An Improved African Vulture Optimization Algorithm for Feature Selection Problems and Its Application of Sentiment Analysis on Movie Reviews
Big Data Cogn. Comput. 2022, 6(4), 104; https://doi.org/10.3390/bdcc6040104 - 28 Sep 2022
Cited by 10 | Viewed by 2479
Abstract
The African Vulture Optimization Algorithm (AVOA) is inspired by African vultures’ feeding and orienting behaviors. It comprises powerful operators while maintaining the balance of exploration and efficiency in solving optimization problems. To be used in discrete applications, this algorithm needs to be discretized. [...] Read more.
The African Vulture Optimization Algorithm (AVOA) is inspired by African vultures’ feeding and orienting behaviors. It comprises powerful operators while maintaining the balance of exploration and efficiency in solving optimization problems. To be used in discrete applications, this algorithm needs to be discretized. This paper introduces two versions based on the S-shaped and V-shaped transfer functions of AVOA and BAOVAH. Moreover, the increase in computational complexity is avoided. Disruption operator and Bitwise strategy have also been used to maximize this model’s performance. A multi-strategy version of the AVOA called BAVOA-v1 is presented. In the proposed approach, i.e., BAVOA-v1, different strategies such as IPRS, mutation neighborhood search strategy (MNSS) (balance between exploration and exploitation), multi-parent crossover (increasing exploitation), and Bitwise (increasing diversity and exploration) are used to provide solutions with greater variety and to assure the quality of solutions. The proposed methods are evaluated on 30 UCI datasets with different dimensions. The simulation results showed that the proposed BAOVAH algorithm performed better than other binary meta-heuristic algorithms. So that the proposed BAOVAH algorithm set is the most accurate in 67% of the data set, and 93% of the data set is the best value of the fitness functions. In terms of feature selection, it has shown high performance. Finally, the proposed method in a case study to determine the number of neurons and the activator function to improve deep learning results was used in the sentiment analysis of movie viewers. In this paper, the CNNEM model is designed. The results of experiments on three datasets of sentiment analysis—IMDB, Amazon, and Yelp—show that the BAOVAH algorithm increases the accuracy of the CNNEM network in the IMDB dataset by 6%, the Amazon dataset by 33%, and the Yelp dataset by 30%. Full article
Show Figures

Figure 1

20 pages, 3926 KiB  
Article
An Efficient and Secure Big Data Storage in Cloud Environment by Using Triple Data Encryption Standard
Big Data Cogn. Comput. 2022, 6(4), 101; https://doi.org/10.3390/bdcc6040101 - 26 Sep 2022
Cited by 15 | Viewed by 3386
Abstract
In recent decades, big data analysis has become the most important research topic. Hence, big data security offers Cloud application security and monitoring to host highly sensitive data to support Cloud platforms. However, the privacy and security of big data has become an [...] Read more.
In recent decades, big data analysis has become the most important research topic. Hence, big data security offers Cloud application security and monitoring to host highly sensitive data to support Cloud platforms. However, the privacy and security of big data has become an emerging issue that restricts the organization to utilize Cloud services. The existing privacy preserving approaches showed several drawbacks such as a lack of data privacy and accurate data analysis, a lack of efficiency of performance, and completely rely on third party. In order to overcome such an issue, the Triple Data Encryption Standard (TDES) methodology is proposed to provide security for big data in the Cloud environment. The proposed TDES methodology provides a relatively simpler technique by increasing the sizes of keys in Data Encryption Standard (DES) to protect against attacks and defend the privacy of data. The experimental results showed that the proposed TDES method is effective in providing security and privacy to big healthcare data in the Cloud environment. The proposed TDES methodology showed less encryption and decryption time compared to the existing Intelligent Framework for Healthcare Data Security (IFHDS) method. Full article
Show Figures

Figure 1

23 pages, 6422 KiB  
Article
Triggers and Tweets: Implicit Aspect-Based Sentiment and Emotion Analysis of Community Chatter Relevant to Education Post-COVID-19
Big Data Cogn. Comput. 2022, 6(3), 99; https://doi.org/10.3390/bdcc6030099 - 16 Sep 2022
Cited by 5 | Viewed by 2671
Abstract
This research proposes a well-being analytical framework using social media chatter data. The proposed framework infers analytics and provides insights into the public’s well-being relevant to education throughout and post the COVID-19 pandemic through a comprehensive Emotion and Aspect-based Sentiment Analysis (ABSA). Moreover, [...] Read more.
This research proposes a well-being analytical framework using social media chatter data. The proposed framework infers analytics and provides insights into the public’s well-being relevant to education throughout and post the COVID-19 pandemic through a comprehensive Emotion and Aspect-based Sentiment Analysis (ABSA). Moreover, this research aims to examine the variability in emotions of students, parents, and faculty toward the e-learning process over time and across different locations. The proposed framework curates Twitter chatter data relevant to the education sector, identifies tweets with the sentiment, and then identifies the exact emotion and emotional triggers associated with those feelings through implicit ABSA. The produced analytics are then factored by location and time to provide more comprehensive insights that aim to assist the decision-makers and personnel in the educational sector enhance and adapt the educational process during and following the pandemic and looking toward the future. The experimental results for emotion classification show that the Linear Support Vector Classifier (SVC) outperformed other classifiers in terms of overall accuracy, precision, recall, and F-measure of 91%. Moreover, the Logistic Regression classifier outperformed all other classifiers in terms of overall accuracy, recall, an F-measure of 81%, and precision of 83% for aspect classification. In online experiments using UAE COVID-19 education-related data, the analytics show high relevance with the public concerns around the education process that were reported during the experiment’s timeframe. Full article
Show Figures

Figure 1

15 pages, 364 KiB  
Article
Machine Learning Techniques for Chronic Kidney Disease Risk Prediction
Big Data Cogn. Comput. 2022, 6(3), 98; https://doi.org/10.3390/bdcc6030098 - 14 Sep 2022
Cited by 20 | Viewed by 4857
Abstract
Chronic kidney disease (CKD) is a condition characterized by progressive loss of kidney function over time. It describes a clinical entity that causes kidney damage and affects the general health of the human body. Improper diagnosis and treatment of the disease can eventually [...] Read more.
Chronic kidney disease (CKD) is a condition characterized by progressive loss of kidney function over time. It describes a clinical entity that causes kidney damage and affects the general health of the human body. Improper diagnosis and treatment of the disease can eventually lead to end-stage renal disease and ultimately lead to the patient’s death. Machine Learning (ML) techniques have acquired an important role in disease prediction and are a useful tool in the field of medical science. In the present research work, we aim to build efficient tools for predicting CKD occurrence, following an approach which exploits ML techniques. More specifically, first, we apply class balancing in order to tackle the non-uniform distribution of the instances in the two classes, then features ranking and analysis are performed, and finally, several ML models are trained and evaluated based on various performance metrics. The derived results highlighted the Rotation Forest (RotF), which prevailed in relation to compared models with an Area Under the Curve (AUC) of 100%, Precision, Recall, F-Measure and Accuracy equal to 99.2%. Full article
(This article belongs to the Special Issue Digital Health and Data Analytics in Public Health)
Show Figures

Figure 1

21 pages, 6061 KiB  
Article
Improving Real Estate Rental Estimations with Visual Data
Big Data Cogn. Comput. 2022, 6(3), 96; https://doi.org/10.3390/bdcc6030096 - 09 Sep 2022
Cited by 3 | Viewed by 2370
Abstract
Multi-modal data are widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is [...] Read more.
Multi-modal data are widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is possible to improve the performance of the pricing model using additional unstructured data, namely images of the property and satellite images. We compare four models based on the type of input data they use: (1) tabular data only, (2) tabular data and property images, (3) tabular data and satellite images, and (4) tabular data and a combination of property and satellite images. In a supervised context, the branches of dedicated neural networks for each data type are fused (concatenated) to predict log rental prices. The novel dataset devised for the study (SRED) consists of 11,105 flat rentals advertised over the internet in Switzerland. The results reveal that using all three sources of data generally outperforms machine learning models built on only tabular information. The findings pave the way for further research on integrating other non-structured inputs, for instance, the textual descriptions of properties. Full article
Show Figures

Figure 1

21 pages, 7858 KiB  
Article
Multimodal Emotional Classification Based on Meaningful Learning
Big Data Cogn. Comput. 2022, 6(3), 95; https://doi.org/10.3390/bdcc6030095 - 08 Sep 2022
Cited by 2 | Viewed by 2189
Abstract
Emotion recognition has become one of the most researched subjects in the scientific community, especially in the human–computer interface field. Decades of scientific research have been conducted on unimodal emotion analysis, whereas recent contributions concentrate on multimodal emotion recognition. These efforts have achieved [...] Read more.
Emotion recognition has become one of the most researched subjects in the scientific community, especially in the human–computer interface field. Decades of scientific research have been conducted on unimodal emotion analysis, whereas recent contributions concentrate on multimodal emotion recognition. These efforts have achieved great success in terms of accuracy in diverse areas of Deep Learning applications. To achieve better performance for multimodal emotion recognition systems, we exploit Meaningful Neural Network Effectiveness to enable emotion prediction during a conversation. Using the text and the audio modalities, we proposed feature extraction methods based on Deep Learning. Then, the bimodal modality that is created following the fusion of the text and audio features is used. The feature vectors from these three modalities are assigned to feed a Meaningful Neural Network to separately learn each characteristic. Its architecture consists of a set of neurons for each component of the input vector before combining them all together in the last layer. Our model was evaluated on a multimodal and multiparty dataset for emotion recognition in conversation MELD. The proposed approach reached an accuracy of 86.69%, which significantly outperforms all current multimodal systems. To sum up, several evaluation techniques applied to our work demonstrate the robustness and superiority of our model over other state-of-the-art MELD models. Full article
Show Figures

Figure 1

14 pages, 421 KiB  
Article
Hierarchical Co-Attention Selection Network for Interpretable Fake News Detection
Big Data Cogn. Comput. 2022, 6(3), 93; https://doi.org/10.3390/bdcc6030093 - 05 Sep 2022
Viewed by 3069
Abstract
Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable [...] Read more.
Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable success in interpretable fake news detection. However, individuals’ judgments of news are usually hierarchical, prioritizing valuable words above essential sentences, which is neglected by existing fake news detection models. In this paper, we propose an interpretable novel neural network-based model, the hierarchical co-attention selection network (HCSN), to predict whether the source post is fake, as well as an explanation that emphasizes important comments and particular words. The key insight of the HCSN model is to incorporate the Gumbel–Max trick in the hierarchical co-attention selection mechanism that captures sentence-level and word-level information from the source post and comments following the sequence of words–sentences–words–event. In addition, HCSN enjoys the additional benefit of interpretability—it provides a conscious explanation of how it reaches certain results by selecting comments and highlighting words. According to the experiments conducted on real-world datasets, our model outperformed state-of-the-art methods and generated reasonable explanations. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 705 KiB  
Article
PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data
Big Data Cogn. Comput. 2022, 6(3), 90; https://doi.org/10.3390/bdcc6030090 - 26 Aug 2022
Cited by 2 | Viewed by 2200
Abstract
The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to [...] Read more.
The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to treat sensitive information. However, any treatment requires firstly to identify sensitive text, and appropriate techniques to do it automatically. The Sensitive Information Detection (SID) task has been explored in the literature in different domains and languages, but there is no common benchmark. Current approaches are mostly based on artificial neural networks (ANN) or transformers based on them. Our research focuses on identifying categories of personal data in informal English sentences, by adopting a new logical-symbolic approach, and eventually hybridising it with ANN models. We present a frame-based knowledge graph built for personal data categories defined in the Data Privacy Vocabulary (DPV). The knowledge graph is designed through the logical composition of already existing frames, and has been evaluated as background knowledge for a SID system against a labeled sensitive information dataset. The accuracy of PRIVAFRAME reached 78%. By comparison, a transformer-based model achieved 12% lower performance on the same dataset. The top-down logical-symbolic frame-based model allows a granular analysis, and does not require a training dataset. These advantages lead us to use it as a layer in a hybrid model, where the logical SID is combined with an ANNs SID tested in a previous study by the authors. Full article
(This article belongs to the Special Issue Artificial Intelligence for Online Safety)
Show Figures

Figure 1

17 pages, 715 KiB  
Article
Argumentation-Based Query Answering under Uncertainty with Application to Cybersecurity
Big Data Cogn. Comput. 2022, 6(3), 91; https://doi.org/10.3390/bdcc6030091 - 26 Aug 2022
Cited by 3 | Viewed by 1766
Abstract
Decision support tools are key components of intelligent sociotechnical systems, and their successful implementation faces a variety of challenges, including the multiplicity of information sources, heterogeneous format, and constant changes. Handling such challenges requires the ability to analyze and process inconsistent and incomplete [...] Read more.
Decision support tools are key components of intelligent sociotechnical systems, and their successful implementation faces a variety of challenges, including the multiplicity of information sources, heterogeneous format, and constant changes. Handling such challenges requires the ability to analyze and process inconsistent and incomplete information with varying degrees of associated uncertainty. Moreover, some domains require the system’s outputs to be explainable and interpretable; an example of this is cyberthreat analysis (CTA) in cybersecurity domains. In this paper, we first present the P-DAQAP system, an extension of a recently developed query-answering platform based on defeasible logic programming (DeLP) that incorporates a probabilistic model and focuses on delivering these capabilities. After discussing the details of its design and implementation, and describing how it can be applied in a CTA use case, we report on the results of an empirical evaluation designed to explore the effectiveness and efficiency of a possible world sampling-based approximate query answering approach that addresses the intractability of exact computations. Full article
Show Figures

Figure 1

19 pages, 33832 KiB  
Article
Large-Scale Oil Palm Trees Detection from High-Resolution Remote Sensing Images Using Deep Learning
Big Data Cogn. Comput. 2022, 6(3), 89; https://doi.org/10.3390/bdcc6030089 - 24 Aug 2022
Cited by 8 | Viewed by 3690
Abstract
Tree counting is an important plantation practice for biological asset inventories, etc. The application of precision agriculture in counting oil palm trees can be implemented by detecting oil palm trees from aerial imagery. This research uses the deep learning approach using YOLOv3, YOLOv4, [...] Read more.
Tree counting is an important plantation practice for biological asset inventories, etc. The application of precision agriculture in counting oil palm trees can be implemented by detecting oil palm trees from aerial imagery. This research uses the deep learning approach using YOLOv3, YOLOv4, and YOLOv5m in detecting oil palm trees. The dataset consists of drone images of an oil palm plantation acquired using a Fixed Wing VTOL drone with a resolution of 5cm/pixel, covering an area of 730 ha labeled with an oil palm class of 56,614 labels. The test dataset covers an area of 180 ha with flat and hilly conditions with sparse, dense, and overlapping canopy and oil palm trees intersecting with other vegetations. Model testing using images from 24 regions, each of which covering 12 ha with up to 1000 trees (for a total of 17,343 oil palm trees), yielded F1-scores of 97.28%, 97.74%, and 94.94%, with an average detection time of 43 s, 45 s, and 21 s for models trained with YOLOv3, YOLOv4, and YOLOv5m, respectively. This result shows that the method is sufficiently accurate and efficient in detecting oil palm trees and has the potential to be implemented in commercial applications for plantation companies. Full article
Show Figures

Figure 1

26 pages, 5309 KiB  
Article
RSS-Based Wireless LAN Indoor Localization and Tracking Using Deep Architectures
Big Data Cogn. Comput. 2022, 6(3), 84; https://doi.org/10.3390/bdcc6030084 - 08 Aug 2022
Cited by 5 | Viewed by 2496
Abstract
Wireless Local Area Network (WLAN) positioning is a challenging task indoors due to environmental constraints and the unpredictable behavior of signal propagation, even at a fixed location. The aim of this work is to develop deep learning-based approaches for indoor localization and tracking [...] Read more.
Wireless Local Area Network (WLAN) positioning is a challenging task indoors due to environmental constraints and the unpredictable behavior of signal propagation, even at a fixed location. The aim of this work is to develop deep learning-based approaches for indoor localization and tracking by utilizing Received Signal Strength (RSS). The study proposes Multi-Layer Perceptron (MLP), One and Two Dimensional Convolutional Neural Networks (1D CNN and 2D CNN), and Long Short Term Memory (LSTM) deep networks architectures for WLAN indoor positioning based on the data obtained by actual RSS measurements from an existing WLAN infrastructure in a mobile user scenario. The results, using different types of deep architectures including MLP, CNNs, and LSTMs with existing WLAN algorithms, are presented. The Root Mean Square Error (RMSE) is used as the assessment criterion. The proposed LSTM Model 2 achieved a dynamic positioning RMSE error of 1.73m, which outperforms probabilistic WLAN algorithms such as Memoryless Positioning (RMSE: 10.35m) and Nonparametric Information (NI) filter with variable acceleration (RMSE: 5.2m) under the same experiment environment. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

17 pages, 1280 KiB  
Article
Impactful Digital Twin in the Healthcare Revolution
Big Data Cogn. Comput. 2022, 6(3), 83; https://doi.org/10.3390/bdcc6030083 - 08 Aug 2022
Cited by 43 | Viewed by 6859
Abstract
Over the last few decades, our digitally expanding world has experienced another significant digitalization boost because of the COVID-19 pandemic. Digital transformations are changing every aspect of this world. New technological innovations are springing up continuously, attracting increasing attention and investments. Digital twin, [...] Read more.
Over the last few decades, our digitally expanding world has experienced another significant digitalization boost because of the COVID-19 pandemic. Digital transformations are changing every aspect of this world. New technological innovations are springing up continuously, attracting increasing attention and investments. Digital twin, one of the highest trending technologies of recent years, is now joining forces with the healthcare sector, which has been under the spotlight since the outbreak of COVID-19. This paper sets out to promote a better understanding of digital twin technology, clarify some common misconceptions, and review the current trajectory of digital twin applications in healthcare. Furthermore, the functionalities of the digital twin in different life stages are summarized in the context of a digital twin model in healthcare. Following the Internet of Things as a service concept and digital twining as a service model supporting Industry 4.0, we propose a paradigm of digital twinning everything as a healthcare service, and different groups of physical entities are also clarified for clear reference of digital twin architecture in healthcare. This research discusses the value of digital twin technology in healthcare, as well as current challenges and insights for future research. Full article
Show Figures

Figure 1

17 pages, 26907 KiB  
Article
Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation
Big Data Cogn. Comput. 2022, 6(3), 79; https://doi.org/10.3390/bdcc6030079 - 15 Jul 2022
Cited by 5 | Viewed by 4063
Abstract
Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than high-resource languages such as German. This paper describes the framework of using a pretrained-model-based front-end and back-end network to [...] Read more.
Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than high-resource languages such as German. This paper describes the framework of using a pretrained-model-based front-end and back-end network to adapt feature spaces from the speech recognition domain to the speech emotion classification domain. It consists of two parts: a speech recognition front-end network and a speech emotion recognition back-end network. For speech recognition, Wav2Vec2 is the state-of-the-art for high-resource languages, while XLSR is used for low-resource languages. Wav2Vec2 and XLSR have proposed generalized end-to-end learning for speech understanding based on the speech recognition domain as feature space representations from feature encoding. This is one reason why our front-end network was selected as Wav2Vec2 and XLSR for the pretrained model. The pre-trained Wav2Vec2 and XLSR are used for front-end networks and fine-tuned for specific languages using the Common Voice 7.0 dataset. Then, feature vectors of the front-end network are input for back-end networks; this includes convolution time reduction (CTR) and linear mean encoding transformation (LMET). Experiments using two different datasets show that our proposed framework can outperform the baselines in terms of unweighted and weighted accuracies. Full article
Show Figures

Figure 1

22 pages, 1108 KiB  
Article
We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model
Big Data Cogn. Comput. 2022, 6(3), 77; https://doi.org/10.3390/bdcc6030077 - 07 Jul 2022
Cited by 12 | Viewed by 3126
Abstract
Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works [...] Read more.
Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works attempted to predict location on English-language tweets. In this study, we attempted to predict the location of Indonesian tweets. We utilized machine learning approaches, i.e., long-short term memory (LSTM) and bidirectional encoder representations from transformers (BERT) to infer Twitter users’ home locations using display name in profile, user description, and user tweets. By concatenating display name, description, and aggregated tweet, the model achieved the best accuracy of 0.77. The performance of the IndoBERT model outperformed several baseline models. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

15 pages, 695 KiB  
Article
Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets
Big Data Cogn. Comput. 2022, 6(3), 74; https://doi.org/10.3390/bdcc6030074 - 05 Jul 2022
Cited by 4 | Viewed by 3599
Abstract
Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural [...] Read more.
Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural language processing task: fake news detection. We have found that deep learning models are more accurate in this task than topological data analysis. However, assembling a deep learning model with topological data analysis significantly improves the model’s accuracy if the available training set is very small. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

19 pages, 491 KiB  
Article
Digital Technologies and the Role of Data in Cultural Heritage: The Past, the Present, and the Future
Big Data Cogn. Comput. 2022, 6(3), 73; https://doi.org/10.3390/bdcc6030073 - 04 Jul 2022
Cited by 15 | Viewed by 5501
Abstract
Is culture considered to be our past, our roots, ancient ruins, or an old piece of art? Culture is all the factors that define who we are, how we act and interact in our world, in our daily activities, in our personal and [...] Read more.
Is culture considered to be our past, our roots, ancient ruins, or an old piece of art? Culture is all the factors that define who we are, how we act and interact in our world, in our daily activities, in our personal and public relations, in our life. Culture is all the things we are not obliged to do. However, today, we live in a mixed environment, an environment that is a combination of “offline” and the online, digital world. In this mixed environment, it is technology that defines our behaviour, technology that unites people in a large world, that finally, defines a status of “monoculture”. In this article, we examine the role of technology, and especially big data, in relation to the culture. We present the advances that led to paradigm shifts in the research area of cultural informatics, and forecast the future of culture as will be defined in this mixed world. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)
Show Figures

Figure 1

16 pages, 4437 KiB  
Article
Lightweight AI Framework for Industry 4.0 Case Study: Water Meter Recognition
Big Data Cogn. Comput. 2022, 6(3), 72; https://doi.org/10.3390/bdcc6030072 - 01 Jul 2022
Cited by 14 | Viewed by 3323
Abstract
The evolution of applications in telecommunication, network, computing, and embedded systems has led to the emergence of the Internet of Things and Artificial Intelligence. The combination of these technologies enabled improving productivity by optimizing consumption and facilitating access to real-time information. In this [...] Read more.
The evolution of applications in telecommunication, network, computing, and embedded systems has led to the emergence of the Internet of Things and Artificial Intelligence. The combination of these technologies enabled improving productivity by optimizing consumption and facilitating access to real-time information. In this work, there is a focus on Industry 4.0 and Smart City paradigms and a proposal of a new approach to monitor and track water consumption using an OCR, as well as the artificial intelligence algorithm and, in particular the YoLo 4 machine learning model. The goal of this work is to provide optimized results in real time. The recognition rate obtained with the proposed algorithms is around 98%. Full article
(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)
Show Figures

Figure 1

25 pages, 3658 KiB  
Article
A Comprehensive Spark-Based Layer for Converting Relational Databases to NoSQL
Big Data Cogn. Comput. 2022, 6(3), 71; https://doi.org/10.3390/bdcc6030071 - 27 Jun 2022
Cited by 1 | Viewed by 3222
Abstract
Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data [...] Read more.
Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data because NoSQL represents data in diverse models and uses a variety of query languages, unlike traditional relational databases. Therefore, using NoSQL has become essential, and many studies have attempted to propose different layers to convert relational databases to NoSQL; however, most of them targeted only one or two models of NoSQL, and evaluated their layers on a single node, not in a distributed environment. This study proposes a Spark-based layer for mapping relational databases to NoSQL models, focusing on the document, column, and key–value databases of NoSQL models. The proposed Spark-based layer comprises of two parts. The first part is concerned with converting relational databases to document, column, and key–value databases, and encompasses two phases: a metadata analyzer of relational databases and Spark-based transformation and migration. The second part focuses on executing a structured query language (SQL) on the NoSQL. The suggested layer was applied and compared with Unity, as it has similar components and features and supports sub-queries and join operations in a single-node environment. The experimental results show that the proposed layer outperformed Unity in terms of the query execution time by a factor of three. In addition, the proposed layer was applied to multi-node clusters using different scenarios, and the results show that the integration between the Spark cluster and NoSQL databases on multi-node clusters provided better performance in reading and writing while increasing the dataset size than using a single node. Full article
Show Figures

Figure 1

20 pages, 6876 KiB  
Article
DeepWings©: Automatic Wing Geometric Morphometrics Classification of Honey Bee (Apis mellifera) Subspecies Using Deep Learning for Detecting Landmarks
Big Data Cogn. Comput. 2022, 6(3), 70; https://doi.org/10.3390/bdcc6030070 - 27 Jun 2022
Cited by 8 | Viewed by 4413
Abstract
Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein, we developed a software called DeepWings© that overcomes [...] Read more.
Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein, we developed a software called DeepWings© that overcomes this constraint in wing geometric morphometrics classification by automatically detecting the 19 landmarks on digital images of the right forewing. We used a database containing 7634 forewing images, including 1864 analyzed by F. Ruttner in the original delineation of 26 honey bee subspecies, to tune a convolutional neural network as a wing detector, a deep learning U-Net as a landmarks segmenter, and a support vector machine as a subspecies classifier. The implemented MobileNet wing detector was able to achieve a mAP of 0.975 and the landmarks segmenter was able to detect the 19 landmarks with 91.8% accuracy, with an average positional precision of 0.943 resemblance to manually annotated landmarks. The subspecies classifier, in turn, presented an average accuracy of 86.6% for 26 subspecies and 95.8% for a subset of five important subspecies. The final implementation of the system showed good speed performance, requiring only 14 s to process 10 images. DeepWings© is very user-friendly and is the first fully automated software, offered as a free Web service, for honey bee classification from wing geometric morphometrics. DeepWings© can be used for honey bee breeding, conservation, and even scientific purposes as it provides the coordinates of the landmarks in excel format, facilitating the work of research teams using classical identification approaches and alternative analytical tools. Full article
Show Figures

Figure 1

Back to TopTop