Previous Issue
Volume 7, March
 
 

Big Data Cogn. Comput., Volume 7, Issue 2 (June 2023) – 53 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Select all
Export citation of selected articles as:
Article
DSpamOnto: An Ontology Modelling for Domain-Specific Social Spammers in Microblogging
Big Data Cogn. Comput. 2023, 7(2), 109; https://doi.org/10.3390/bdcc7020109 - 02 Jun 2023
Viewed by 242
Abstract
The lack of regulations and oversight on Online Social Networks (OSNs) has resulted in the rise of social spam, which is the dissemination of unsolicited and low-quality content that aims to deceive and manipulate users. Social spam can cause a range of negative [...] Read more.
The lack of regulations and oversight on Online Social Networks (OSNs) has resulted in the rise of social spam, which is the dissemination of unsolicited and low-quality content that aims to deceive and manipulate users. Social spam can cause a range of negative consequences for individuals and businesses, such as the spread of malware, phishing scams, and reputational damage. While machine learning techniques can be used to detect social spammers by analysing patterns in data, they have limitations such as the potential for false positives and false negatives. In contrast, ontologies allow for the explicit modelling and representation of domain knowledge, which can be used to create a set of rules for identifying social spammers. However, the literature exposes a deficiency of ontologies that conceptualize domain-based social spam. This paper aims to address this gap by designing a domain-specific ontology called DSpamOnto to detect social spammers in microblogging that targes a specific domain. DSpamOnto can identify social spammers based on their domain-specific behaviour, such as posting repetitive or irrelevant content and using misleading information. The proposed model is compared and benchmarked against well-proven ML models using various evaluation metrics to verify and validate its utility in capturing social spammers. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

Review
Privacy-Enhancing Digital Contact Tracing with Machine Learning for Pandemic Response: A Comprehensive Review
Big Data Cogn. Comput. 2023, 7(2), 108; https://doi.org/10.3390/bdcc7020108 - 01 Jun 2023
Viewed by 271
Abstract
The rapid global spread of the coronavirus disease (COVID-19) has severely impacted daily life worldwide. As potential solutions, various digital contact tracing (DCT) strategies have emerged to mitigate the virus’s spread while maintaining economic and social activities. The computational epidemiology problems of DCT [...] Read more.
The rapid global spread of the coronavirus disease (COVID-19) has severely impacted daily life worldwide. As potential solutions, various digital contact tracing (DCT) strategies have emerged to mitigate the virus’s spread while maintaining economic and social activities. The computational epidemiology problems of DCT often involve parameter optimization through learning processes, making it crucial to understand how to apply machine learning techniques for effective DCT optimization. While numerous research studies on DCT have emerged recently, most existing reviews primarily focus on DCT application design and implementation. This paper offers a comprehensive overview of privacy-preserving machine learning-based DCT in preparation for future pandemics. We propose a new taxonomy to classify existing DCT strategies into forward, backward, and proactive contact tracing. We then categorize several DCT apps developed during the COVID-19 pandemic based on their tracing strategies. Furthermore, we derive three research questions related to computational epidemiology for DCT and provide a detailed description of machine learning techniques to address these problems. We discuss the challenges of learning-based DCT and suggest potential solutions. Additionally, we include a case study demonstrating the review’s insights into the pandemic response. Finally, we summarize the study’s limitations and highlight promising future research directions in DCT. Full article
(This article belongs to the Special Issue Digital Health and Data Analytics in Public Health)
Show Figures

Figure 1

Article
Semantic Hierarchical Indexing for Online Video Lessons Using Natural Language Processing
Big Data Cogn. Comput. 2023, 7(2), 107; https://doi.org/10.3390/bdcc7020107 - 31 May 2023
Viewed by 199
Abstract
Huge quantities of audio and video material are available at universities and teaching institutions, but their use can be limited because of the lack of intelligent search tools. This paper describes a possible way to set up an indexing scheme that offers a [...] Read more.
Huge quantities of audio and video material are available at universities and teaching institutions, but their use can be limited because of the lack of intelligent search tools. This paper describes a possible way to set up an indexing scheme that offers a smart search modality, that combines semantic analysis of video/audio transcripts with the exact time positioning of uttered words. The proposal leverages NLP methods for topic modeling with lexical analysis of lessons’ transcripts and builds a semantic hierarchical index into the corpus of lessons analyzed. Moreover, using abstracting summarization, the system can offer short summaries on the subject semantically implied by the search carried out. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

Article
Adaptive KNN-Based Extended Collaborative Filtering Recommendation Services
Big Data Cogn. Comput. 2023, 7(2), 106; https://doi.org/10.3390/bdcc7020106 - 31 May 2023
Viewed by 237
Abstract
In the current era of e-commerce, users are overwhelmed with countless products, making it difficult to find relevant items. Recommendation systems generate suggestions based on user preferences, to avoid information overload. Collaborative filtering is a widely used model in modern recommendation systems. Despite [...] Read more.
In the current era of e-commerce, users are overwhelmed with countless products, making it difficult to find relevant items. Recommendation systems generate suggestions based on user preferences, to avoid information overload. Collaborative filtering is a widely used model in modern recommendation systems. Despite its popularity, collaborative filtering has limitations that researchers aim to overcome. In this paper, we enhance the K-nearest neighbor (KNN)-based collaborative filtering algorithm for a recommendation system, by considering the similarity of user cognition. This enhancement aimed to improve the accuracy in grouping users and generating more relevant recommendations for the active user. The experimental results showed that the proposed model outperformed benchmark models, in terms of MAE, RMSE, MAP, and NDCG metrics. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems, Volume II)
Show Figures

Figure 1

Article
Breaking Barriers: Unveiling Factors Influencing the Adoption of Artificial Intelligence by Healthcare Providers
Big Data Cogn. Comput. 2023, 7(2), 105; https://doi.org/10.3390/bdcc7020105 - 30 May 2023
Viewed by 229
Abstract
Artificial intelligence (AI) is an emerging technological system that provides a platform to manage and analyze data by emulating human cognitive functions with greater accuracy, revolutionizing patient care and introducing a paradigm shift to the healthcare industry. The purpose of this study is [...] Read more.
Artificial intelligence (AI) is an emerging technological system that provides a platform to manage and analyze data by emulating human cognitive functions with greater accuracy, revolutionizing patient care and introducing a paradigm shift to the healthcare industry. The purpose of this study is to identify the underlying factors that affect the adoption of artificial intelligence in healthcare (AIH) by healthcare providers and to understand “What are the factors that influence healthcare providers’ behavioral intentions to adopt AIH in their routine practice?” An integrated survey was conducted among healthcare providers, including consultants, residents/students, and nurses. The survey included items related to performance expectancy, effort expectancy, initial trust, personal innovativeness, task complexity, and technology characteristics. The collected data were analyzed using structural equation modeling. A total of 392 healthcare professionals participated in the survey, with 72.4% being male and 50.7% being 30 years old or younger. The results showed that performance expectancy, effort expectancy, and initial trust have a positive influence on the behavioral intentions of healthcare providers to use AIH. Personal innovativeness was found to have a positive influence on effort expectancy, while task complexity and technology characteristics have a positive influence on effort expectancy for AIH. The study’s empirically validated model sheds light on healthcare providers’ intention to adopt AIH, while the study’s findings can be used to develop strategies to encourage this adoption. However, further investigation is necessary to understand the individual factors affecting the adoption of AIH by healthcare providers. Full article
(This article belongs to the Special Issue Deep Network Learning and Its Applications)
Show Figures

Figure 1

Editorial
Perspectives on Big Data, Cloud-Based Data Analysis and Machine Learning Systems
Big Data Cogn. Comput. 2023, 7(2), 104; https://doi.org/10.3390/bdcc7020104 - 30 May 2023
Viewed by 300
Abstract
Huge amounts of digital data are continuously generated and collected from different sources, such as sensors, cameras, in-vehicle infotainment, smart meters, mobile devices, social media platforms, and web applications and services [...] Full article
Article
On-Shore Plastic Waste Detection with YOLOv5 and RGB-Near-Infrared Fusion: A State-of-the-Art Solution for Accurate and Efficient Environmental Monitoring
Big Data Cogn. Comput. 2023, 7(2), 103; https://doi.org/10.3390/bdcc7020103 - 29 May 2023
Viewed by 360
Abstract
Plastic waste is a growing environmental concern that poses a significant threat to onshore ecosystems, human health, and wildlife. The accumulation of plastic waste in oceans has reached a staggering estimate of over eight million tons annually, leading to hazardous outcomes in marine [...] Read more.
Plastic waste is a growing environmental concern that poses a significant threat to onshore ecosystems, human health, and wildlife. The accumulation of plastic waste in oceans has reached a staggering estimate of over eight million tons annually, leading to hazardous outcomes in marine life and the food chain. Plastic waste is prevalent in urban areas, posing risks to animals that may ingest it or become entangled in it, and negatively impacting the economy and tourism industry. Effective plastic waste management requires a comprehensive approach that includes reducing consumption, promoting recycling, and developing innovative technologies such as automated plastic detection systems. The development of accurate and efficient plastic detection methods is therefore essential for effective waste management. To address this challenge, machine learning techniques such as the YOLOv5 model have emerged as promising tools for developing automated plastic detection systems. Furthermore, there is a need to study both visible light (RGB) and near-infrared (RGNIR) as part of plastic waste detection due to the unique properties of plastic waste in different environmental settings. To this end, two plastic waste datasets, comprising RGB and RGNIR images, were utilized to train the proposed model, YOLOv5m. The performance of the model was then evaluated using a 10-fold cross-validation method on both datasets. The experiment was extended by adding background images into the training dataset to reduce false positives. An additional experiment was carried out to fuse both the RGB and RGNIR datasets. A performance-metric score called the Weighted Metric Score (WMS) was proposed, where the WMS equaled the sum of the mean average precision at the intersection over union (IoU) threshold of 0.5 ([email protected]) × 0.1 and the mean average precision averaged over different IoU thresholds ranging from 0.5 to 0.95 ([email protected]:0.95) × 0.9. In addition, a 10-fold cross-validation procedure was implemented. Based on the results, the proposed model achieved the best performance using the fusion of the RGB and RGNIR datasets when evaluated on the testing dataset with a mean of [email protected], [email protected]:0.95, and a WMS of 92.96% ± 2.63%, 69.47% ± 3.11%, and 71.82% ± 3.04%, respectively. These findings indicate that utilizing both normal visible light and the near-infrared spectrum as feature representations in machine learning could lead to improved performance in plastic waste detection. This opens new opportunities in the development of automated plastic detection systems for use in fields such as automation, environmental management, and resource management. Full article
Show Figures

Figure 1

Article
Hand Gesture Recognition Using Automatic Feature Extraction and Deep Learning Algorithms with Memory
Big Data Cogn. Comput. 2023, 7(2), 102; https://doi.org/10.3390/bdcc7020102 - 23 May 2023
Viewed by 428
Abstract
Gesture recognition is widely used to express emotions or to communicate with other people or machines. Hand gesture recognition is a problem of great interest to researchers because it is a high-dimensional pattern recognition problem. The high dimensionality of the problem is directly [...] Read more.
Gesture recognition is widely used to express emotions or to communicate with other people or machines. Hand gesture recognition is a problem of great interest to researchers because it is a high-dimensional pattern recognition problem. The high dimensionality of the problem is directly related to the performance of machine learning models. The dimensionality problem can be addressed through feature selection and feature extraction. In this sense, the evaluation of a model with manual feature extraction and automatic feature extraction was proposed. The manual feature extraction was performed using the statistical functions of central tendency, while the automatic extraction was performed by means of a CNN and BiLSTM. These features were also evaluated in classifiers such as Softmax, ANN, and SVM. The best-performing model was the combination of BiLSTM and ANN (BiLSTM-ANN), with an accuracy of 99.9912%. Full article
Show Figures

Figure 1

Article
An Ontology Development Methodology Based on Ontology-Driven Conceptual Modeling and Natural Language Processing: Tourism Case Study
Big Data Cogn. Comput. 2023, 7(2), 101; https://doi.org/10.3390/bdcc7020101 - 21 May 2023
Viewed by 558
Abstract
Ontologies provide a powerful method for representing, reusing, and sharing domain knowledge. They are extensively used in a wide range of disciplines, including artificial intelligence, knowledge engineering, biomedical informatics, and many more. For several reasons, developing domain ontologies is a challenging task. One [...] Read more.
Ontologies provide a powerful method for representing, reusing, and sharing domain knowledge. They are extensively used in a wide range of disciplines, including artificial intelligence, knowledge engineering, biomedical informatics, and many more. For several reasons, developing domain ontologies is a challenging task. One of these reasons is that it is a complicated and time-consuming process. Multiple ontology development methodologies have already been proposed. However, there is room for improvement in terms of covering more activities during development (such as enrichment) and enhancing others (such as conceptualization). In this research, an enhanced ontology development methodology (ON-ODM) is proposed. Ontology-driven conceptual modeling (ODCM) and natural language processing (NLP) serve as the foundation of the proposed methodology. ODCM is defined as the utilization of ontological ideas from various areas to build engineering artifacts that improve conceptual modeling. NLP refers to the scientific discipline that employs computer techniques to analyze human language. The proposed ON-ODM is applied to build a tourism ontology that will be beneficial for a variety of applications, including e-tourism. The produced ontology is evaluated based on competency questions (CQs) and quality metrics. It is verified that the ontology answers SPARQL queries covering all CQ groups specified by domain experts. Quality metrics are used to compare the produced ontology with four existing tourism ontologies. For instance, according to the metrics related to conciseness, the produced ontology received a first place ranking when compared to the others, whereas it received a second place ranking regarding understandability. These results show that utilizing ODCM and NLP could facilitate and improve the development process, respectively. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage, Volume II)
Show Figures

Figure 1

Article
Investigating the Accuracy of Autoregressive Recurrent Networks Using Hierarchical Aggregation Structure-Based Data Partitioning
Big Data Cogn. Comput. 2023, 7(2), 100; https://doi.org/10.3390/bdcc7020100 - 18 May 2023
Viewed by 299
Abstract
Global models have been developed to tackle the challenge of forecasting sets of series that are related or share similarities, but they have not been developed for heterogeneous datasets. Various methods of partitioning by relatedness have been introduced to enhance the similarities of [...] Read more.
Global models have been developed to tackle the challenge of forecasting sets of series that are related or share similarities, but they have not been developed for heterogeneous datasets. Various methods of partitioning by relatedness have been introduced to enhance the similarities of sets, resulting in improved forecasting accuracy but often at the cost of a reduced sample size, which could be harmful. To shed light on how the relatedness between series impacts the effectiveness of global models in real-world demand-forecasting problems, we perform an extensive empirical study using the M5 competition dataset. We examine cross-learning scenarios driven by the product hierarchy commonly employed in retail planning to allow global models to capture interdependencies across products and regions more effectively. Our findings show that global models outperform state-of-the-art local benchmarks by a considerable margin, indicating that they are not inherently more limited than local models and can handle unrelated time-series data effectively. The accuracy of data-partitioning approaches increases as the sizes of the data pools and the models’ complexity decrease. However, there is a trade-off between data availability and data relatedness. Smaller data pools lead to increased similarity among time series, making it easier to capture cross-product and cross-region dependencies, but this comes at the cost of a reduced sample, which may not be beneficial. Finally, it is worth noting that the successful implementation of global models for heterogeneous datasets can significantly impact forecasting practice. Full article
Show Figures

Figure 1

Article
Unsupervised Deep Learning for Structural Health Monitoring
Big Data Cogn. Comput. 2023, 7(2), 99; https://doi.org/10.3390/bdcc7020099 - 17 May 2023
Viewed by 535
Abstract
In the last few decades, structural health monitoring has gained relevance in the context of civil engineering, and much effort has been made to automate the process of data acquisition and analysis through the use of data-driven methods. Currently, the main issues arising [...] Read more.
In the last few decades, structural health monitoring has gained relevance in the context of civil engineering, and much effort has been made to automate the process of data acquisition and analysis through the use of data-driven methods. Currently, the main issues arising in automated monitoring processing regard the establishment of a robust approach that covers all intermediate steps from data acquisition to output production and interpretation. To overcome this limitation, we introduce a dedicated artificial-intelligence-based monitoring approach for the assessment of the health conditions of structures in near-real time. The proposed approach is based on the construction of an unsupervised deep learning algorithm, with the aim of establishing a reliable method of anomaly detection for data acquired from sensors positioned on buildings. After preprocessing, the data are fed into various types of artificial neural network autoencoders, which are trained to produce outputs as close as possible to the inputs. We tested the proposed approach on data generated from an OpenSees numerical model of a railway bridge and data acquired from physical sensors positioned on the Historical Tower of Ravenna (Italy). The results show that the approach actually flags the data produced when damage scenarios are activated in the OpenSees model as coming from a damaged structure. The proposed method is also able to reliably detect anomalous structural behaviors of the tower, preventing critical scenarios. Compared to other state-of-the-art methods for anomaly detection, the proposed approach shows very promising results. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

Article
Massive Parallel Alignment of RNA-seq Reads in Serverless Computing
Big Data Cogn. Comput. 2023, 7(2), 98; https://doi.org/10.3390/bdcc7020098 - 15 May 2023
Viewed by 493
Abstract
In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its [...] Read more.
In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its auto-scaling methodologies, reliability, and fault tolerance. We present a solution based on in-house Serverless infrastructure, which is able to perform large-scale RNA-seq data analysis focused on the mapping of sequencing reads to a reference genome. The main contribution was bringing the computation of genomic data into serverless computing, focusing on RNA-seq read-mapping to a reference genome, as this is the most time-consuming task for some pipelines. The proposed solution handles massive parallel instances to maximize the efficiency in terms of running time. We evaluated the performance of our solution by performing two main tests, both based on the mapping of RNA-seq reads to Human GRCh38. Our experiments demonstrated a reduction of 79.838%, 90.079%, and 96.382%, compared to the local environments with 16, 8, and 4 virtual cores, respectively. Furthermore, serverless limitations were investigated. Full article
(This article belongs to the Special Issue Data-Based Bioinformatics and Applications)
Show Figures

Figure 1

Systematic Review
SQL and NoSQL Database Software Architecture Performance Analysis and Assessments—A Systematic Literature Review
Big Data Cogn. Comput. 2023, 7(2), 97; https://doi.org/10.3390/bdcc7020097 - 12 May 2023
Viewed by 856
Abstract
The competent software architecture plays a crucial role in the difficult task of big data processing for SQL and NoSQL databases. SQL databases were created to organize data and allow for horizontal expansion. NoSQL databases, on the other hand, support horizontal scalability and [...] Read more.
The competent software architecture plays a crucial role in the difficult task of big data processing for SQL and NoSQL databases. SQL databases were created to organize data and allow for horizontal expansion. NoSQL databases, on the other hand, support horizontal scalability and can efficiently process large amounts of unstructured data. Organizational needs determine which paradigm is appropriate, yet selecting the best option is not always easy. Differences in database design are what set SQL and NoSQL databases apart. Each NoSQL database type also consistently employs a mixed-model approach. Therefore, it is challenging for cloud users to transfer their data among different cloud storage services (CSPs). There are several different paradigms being monitored by the various cloud platforms (IaaS, PaaS, SaaS, and DBaaS). The purpose of this SLR is to examine the articles that address cloud data portability and interoperability, as well as the software architectures of SQL and NoSQL databases. Numerous studies comparing the capabilities of SQL and NoSQL of databases, particularly Oracle RDBMS and NoSQL Document Database (MongoDB), in terms of scale, performance, availability, consistency, and sharding, were presented as part of the state of the art. Research indicates that NoSQL databases, with their specifically tailored structures, may be the best option for big data analytics, while SQL databases are best suited for online transaction processing (OLTP) purposes. Full article
Show Figures

Figure 1

Article
Design Proposal for a Virtual Shopping Assistant for People with Vision Problems Applying Artificial Intelligence Techniques
Big Data Cogn. Comput. 2023, 7(2), 96; https://doi.org/10.3390/bdcc7020096 - 12 May 2023
Viewed by 647
Abstract
Accessibility is an increasingly important topic for Ecommerce, especially for individuals with vision problems. To improve their online experience, the design of a voice assistant has been proposed to allow these individuals to browse and shop online more quickly and efficiently. This voice [...] Read more.
Accessibility is an increasingly important topic for Ecommerce, especially for individuals with vision problems. To improve their online experience, the design of a voice assistant has been proposed to allow these individuals to browse and shop online more quickly and efficiently. This voice assistant forms an intelligent system that can understand and respond to users’ voice commands. The design considers the visual limitations of the users, such as difficulty reading information on the screen or identifying images. The voice assistant provides detailed product descriptions and ideas in a clear, easy-to-understand voice. In addition, the voice assistant has a series of additional features to improve the shopping experience. For example, the assistant can provide product recommendations based on the user’s previous purchases and information about special promotions and discounts. The main goal of this design is to create an accessible and inclusive online shopping experience for the visually impaired. The voice assistant is based on a conversational user interface, allowing users to easily navigate an eCommerce website, search for products, and make purchases. Full article
(This article belongs to the Special Issue Speech Recognition and Machine Learning: Current Trends and Future)
Show Figures

Figure 1

Article
Virtual Reality-Based Digital Twins: A Case Study on Pharmaceutical Cannabis
Big Data Cogn. Comput. 2023, 7(2), 95; https://doi.org/10.3390/bdcc7020095 - 10 May 2023
Viewed by 759
Abstract
Digital Twins are digital equivalents of real-life objects. They allow producers to act immediately in case of (expected) deviations and to simulate effects of interventions based on real-life data. Digital Twin and eXtended Reality technologies (including Augmented Reality, Mixed Reality and Virtual Reality [...] Read more.
Digital Twins are digital equivalents of real-life objects. They allow producers to act immediately in case of (expected) deviations and to simulate effects of interventions based on real-life data. Digital Twin and eXtended Reality technologies (including Augmented Reality, Mixed Reality and Virtual Reality technologies), when coupled, are promising solutions to address the challenges of highly regulated crop production, namely the complexity of modern production environments for pharmaceutical cannabis, which are growing constantly as a result of legislative changes. Cannabis farms not only have to meet very high quality standards and regulatory requirements but also have to deal with high production and market uncertainties, including energy considerations. Thus, the main contributions of the research include an architecture design for eXtended-Reality-based Digital Twins for pharmaceutical cannabis production and a proof of concept, which was demonstrated at the Wageningen University Digital Twins conference. A convenience sampling method was used to recruit 30 participants who provided feedback on the application. The findings indicate that, despite 70% being unfamiliar with the concept, 80% of the participants were positive regarding the innovation and creativity. Full article
(This article belongs to the Special Issue Digital Twins for Complex Systems)
Show Figures

Figure 1

Article
Recognizing Similar Musical Instruments with YOLO Models
Big Data Cogn. Comput. 2023, 7(2), 94; https://doi.org/10.3390/bdcc7020094 - 10 May 2023
Viewed by 690
Abstract
Researchers in the fields of machine learning and artificial intelligence have recently begun to focus their attention on object recognition. One of the biggest obstacles in image recognition through computer vision is the detection and identification of similar items. Identifying similar musical instruments [...] Read more.
Researchers in the fields of machine learning and artificial intelligence have recently begun to focus their attention on object recognition. One of the biggest obstacles in image recognition through computer vision is the detection and identification of similar items. Identifying similar musical instruments can be approached as a classification problem, where the goal is to train a machine learning model to classify instruments based on their features and shape. Cellos, clarinets, erhus, guitars, saxophones, trumpets, French horns, harps, recorders, bassoons, and violins were all classified in this investigation. There are many different musical instruments that have the same size, shape, and sound. In addition, we were amazed by the simplicity with which humans can identify items that are very similar to one another, but this is a challenging task for computers. For this study, we used YOLOv7 to identify pairs of musical instruments that are most like one another. Next, we compared and evaluated the results from YOLOv7 with those from YOLOv5. Furthermore, the results of our tests allowed us to enhance the performance in terms of detecting similar musical instruments. Moreover, with an average accuracy of 86.7%, YOLOv7 outperformed previous approaches and other research results. Full article
(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)
Show Figures

Figure 1

Article
Application of Artificial Intelligence for Fraudulent Banking Operations Recognition
Big Data Cogn. Comput. 2023, 7(2), 93; https://doi.org/10.3390/bdcc7020093 - 10 May 2023
Viewed by 730
Abstract
This study considers the task of applying artificial intelligence to recognize bank fraud. In recent years, due to the COVID-19 pandemic, bank fraud has become even more common due to the massive transition of many operations to online platforms and the creation of [...] Read more.
This study considers the task of applying artificial intelligence to recognize bank fraud. In recent years, due to the COVID-19 pandemic, bank fraud has become even more common due to the massive transition of many operations to online platforms and the creation of many charitable funds that criminals can use to deceive users. The present work focuses on machine learning algorithms as a tool well suited for analyzing and recognizing online banking transactions. The study’s scientific novelty is the development of machine learning models for identifying fraudulent banking transactions and techniques for preprocessing bank data for further comparison and selection of the best results. This paper also details various methods for improving detection accuracy, i.e., handling highly imbalanced datasets, feature transformation, and feature engineering. The proposed model, which is based on an artificial neural network, effectively improves the accuracy of fraudulent transaction detection. The results of the different algorithms are visualized, and the logistic regression algorithm performs the best, with an output AUC value of approximately 0.946. The stacked generalization shows a better AUC of 0.954. The recognition of banking fraud using artificial intelligence algorithms is a topical issue in our digital society. Full article
(This article belongs to the Special Issue Quality and Security of Critical Infrastructure Systems)
Show Figures

Figure 1

Article
An Improved Pattern Sequence-Based Energy Load Forecast Algorithm Based on Self-Organizing Maps and Artificial Neural Networks
Big Data Cogn. Comput. 2023, 7(2), 92; https://doi.org/10.3390/bdcc7020092 - 10 May 2023
Viewed by 579
Abstract
Pattern sequence-based models are a type of forecasting algorithm that utilizes clustering and other techniques to produce easily interpretable predictions faster than traditional machine learning models. This research focuses on their application in energy demand forecasting and introduces two significant contributions to the [...] Read more.
Pattern sequence-based models are a type of forecasting algorithm that utilizes clustering and other techniques to produce easily interpretable predictions faster than traditional machine learning models. This research focuses on their application in energy demand forecasting and introduces two significant contributions to the field. Firstly, this study evaluates the use of pattern sequence-based models with large datasets. Unlike previous works that use only one dataset or multiple datasets with less than two years of data, this work evaluates the models in three different public datasets, each containing eleven years of data. Secondly, we propose a new pattern sequence-based algorithm that uses a genetic algorithm to optimize the number of clusters alongside all other hyperparameters of the forecasting method, instead of using the Cluster Validity Indices (CVIs) commonly used in previous proposals. The results indicate that neural networks provide more accurate results than any pattern sequence-based algorithm and that our proposed algorithm outperforms other pattern sequence-based algorithms, albeit with a longer training time. Full article
Show Figures

Figure 1

Article
Predicting Cell Cleavage Timings from Time-Lapse Videos of Human Embryos
Big Data Cogn. Comput. 2023, 7(2), 91; https://doi.org/10.3390/bdcc7020091 - 09 May 2023
Viewed by 645
Abstract
Assisted reproductive technology is used for treating infertility, and its success relies on the quality and viability of embryos chosen for uterine transfer. Currently, embryologists manually assess embryo development, including the time duration between the cell cleavages. This paper introduces a machine learning [...] Read more.
Assisted reproductive technology is used for treating infertility, and its success relies on the quality and viability of embryos chosen for uterine transfer. Currently, embryologists manually assess embryo development, including the time duration between the cell cleavages. This paper introduces a machine learning methodology for automating the computations for the start of cell cleavage stages, in hours post insemination, in time-lapse videos. The methodology detects embryo cells in video frames and predicts the frame with the onset of the cell cleavage stage. Next, the methodology reads hours post insemination from the frame using optical character recognition. Unlike traditional embryo cell detection techniques, our suggested approach eliminates the need for extra image processing tasks such as locating embryos or removing extracellular material (fragmentation). The methodology accurately predicts cell cleavage stages up to five cells. The methodology was also able to detect the morphological structures of later cell cleavage stages, such as morula and blastocyst. It takes about one minute for the methodology to annotate the times of all the cell cleavages in a time-lapse video. Full article
(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)
Show Figures

Figure 1

Article
Blockchain-Based Double-Layer Byzantine Fault Tolerance for Scalability Enhancement for Building Information Modeling Information Exchange
Big Data Cogn. Comput. 2023, 7(2), 90; https://doi.org/10.3390/bdcc7020090 - 09 May 2023
Viewed by 676
Abstract
A Practical Byzantine Fault Tolerance (PBFT) is a consensus algorithm deployed in a consortium blockchain that connects a group of related participants. This type of blockchain suits the implementation of the Building Information Modeling (BIM) information exchange with few participants. However, when much [...] Read more.
A Practical Byzantine Fault Tolerance (PBFT) is a consensus algorithm deployed in a consortium blockchain that connects a group of related participants. This type of blockchain suits the implementation of the Building Information Modeling (BIM) information exchange with few participants. However, when much more participants are involved in the BIM information exchange, the PBFT algorithm, which inherently requires intensive communications among participating nodes, has limitations in terms of scalability and performance. The proposed solution for a multi-layer BFT hypothesizes that multi-layer BFT reduces communication complexity. However, having more layers will introduce more latency. Therefore, in this paper, Double-Layer Byzantine Fault Tolerance (DLBFT) is proposed to improve the blockchain scalability and performance of BIM information exchange. This study shows a double-layer network structure of nodes that can be built with each node on the first layer, which connects and forms a group with several nodes on the second layer. This network runs the Byzantine Fault Tolerance algorithm to reach a consensus. Instead of having one node send messages to all the nodes in the peer-to-peer network, one node only sends messages to a limited number of nodes on Layer 1 and up to three nodes in each corresponding group in Layer 2 in a hierarchical network. The DLBFT algorithm has been shown to reduce the required number of messages exchanged among nodes by 84% and the time to reach a consensus by 70%, thus improving blockchain scalability. Further research is required if more than one party is involved in multi-BIM projects. Full article
Show Figures

Figure 1

Article
Digital Twin Four-Dimension Fusion Modeling Method Design and Application to the Discrete Manufacturing Line
Big Data Cogn. Comput. 2023, 7(2), 89; https://doi.org/10.3390/bdcc7020089 - 08 May 2023
Viewed by 543
Abstract
With the development of new-generation information technologies, such as big data and artificial intelligence, digital twins have become a key technology in intelligent manufacturing. The introduction of digital twin technology has addressed many problems in discrete manufacturing lines, including low visualization and difficult [...] Read more.
With the development of new-generation information technologies, such as big data and artificial intelligence, digital twins have become a key technology in intelligent manufacturing. The introduction of digital twin technology has addressed many problems in discrete manufacturing lines, including low visualization and difficult cyber–physical integration. However, the application of digital twin technology to discrete manufacturing lines still faces problems of low modeling accuracy, response delay, and insufficient production line control accuracy. Therefore, this paper proposes a digital twin four-dimension fusion modeling method to solve the above problems. First, a digital twin system architecture for a discrete manufacturing production line is designed. Then, the information control dimension is integrated into traditional digital twin modeling methods. Further, a digital twin geometry–physics–behavior–information control four-dimension fusion modeling method is proposed. This method can describe the geometric and physical characteristics of a physical entity and map its behavior mechanism. More importantly, it reveals the control logic and virtual–real mapping rules, which provides important support for the virtual–real intelligent mutual control. Finally, the feasibility and effectiveness of the proposed method are verified by experiments on a fidget spinner discrete manufacturing line, and a digital twin operation and maintenance management system is developed. The results presented in this study could provide ideas for the digital transformation of discrete manufacturing enterprises. Full article
(This article belongs to the Special Issue Digital Twins for Complex Systems)
Show Figures

Figure 1

Article
Finding the Time-Period-Based Most Frequent Path from Trajectory–Topology
Big Data Cogn. Comput. 2023, 7(2), 88; https://doi.org/10.3390/bdcc7020088 - 08 May 2023
Viewed by 455
Abstract
The Time-Period-Based Most Frequent Path (TPMFP) problem has been a hot topic in traffic studies for many years. The TPMFP problem involves finding the most frequent path between two locations by observing the travelling behaviors of drivers in a specific time period. However, [...] Read more.
The Time-Period-Based Most Frequent Path (TPMFP) problem has been a hot topic in traffic studies for many years. The TPMFP problem involves finding the most frequent path between two locations by observing the travelling behaviors of drivers in a specific time period. However, the previous researchers over-simplify the road network, which results in the ignorance of transfer costs at intersections. To address this problem more elegantly, we built up an urban topology model consisting of Intersection Vertices and Connection Vertices. Specifically, we split the Intersection Vertices to eliminate the influence of transfer cost on finding TPMFP and generate Trajectory–Topology from GPS records data. In addition, we further leveraged the Footmark Graph method to find the TPMFP. Finally, we conducted extensive experiments using a real-world dataset containing over eight million GPS records. Compared to the current state-of-the-art method, our proposed approach can find more reasonable MFP in approximately 10% of cases during off-peak hours and 40% of cases during peak hours. Full article
Show Figures

Figure 1

Article
Attention Mechanism and Support Vector Machine for Image-Based E-Mail Spam Filtering
Big Data Cogn. Comput. 2023, 7(2), 87; https://doi.org/10.3390/bdcc7020087 - 06 May 2023
Viewed by 681
Abstract
Spammers have created a new kind of electronic mail (e-mail) called image-based spam to bypass text-based spam filters. Unfortunately, these images contain harmful links that can infect the user’s computer system and take a long time to be deleted, which can hamper users’ [...] Read more.
Spammers have created a new kind of electronic mail (e-mail) called image-based spam to bypass text-based spam filters. Unfortunately, these images contain harmful links that can infect the user’s computer system and take a long time to be deleted, which can hamper users’ productivity and security. In this paper, a hybrid deep neural network architecture is suggested to address this problem. It is based on the convolution neural network (CNN), which has been enhanced with the convolutional block attention module (CBAM). Initially, CNN enhanced with CBAM is used to extract the most crucial information from each image-based e-mail. Then, the generated feature vectors are fed to the support vector machine (SVM) model to classify them as either spam or ham. Four datasets—including Image Spam Hunter (ISH), Annadatha, Chavda Approach 1, and Chavda Approach 2—are used in the experiments. The obtained results demonstrated that in terms of accuracy, our model exceeds the existing state-of-the-art methods. Full article
Show Figures

Figure 1

Systematic Review
A Systematic Review of Blockchain Technology Adoption Barriers and Enablers for Smart and Sustainable Agriculture
Big Data Cogn. Comput. 2023, 7(2), 86; https://doi.org/10.3390/bdcc7020086 - 04 May 2023
Viewed by 505
Abstract
Smart and sustainable agricultural practices are more complex than other industries as the production depends on many pre- and post-harvesting factors which are difficult to predict and control. Previous studies have shown that technologies such as blockchain along with sustainable practices can achieve [...] Read more.
Smart and sustainable agricultural practices are more complex than other industries as the production depends on many pre- and post-harvesting factors which are difficult to predict and control. Previous studies have shown that technologies such as blockchain along with sustainable practices can achieve smart and sustainable agriculture. These studies state that there is a need for a reliable and trustworthy environment among the intermediaries throughout the agrifood supply chain to achieve sustainability. However, there are limited studies on blockchain technology adoption for smart and sustainable agriculture. Therefore, this systematic review uses the PRISMA technique to explore the barriers and enablers of blockchain adoption for smart and sustainable agriculture. Data was collected using exhaustive selection criteria and filters to evaluate the barriers and enablers of blockchain technology for smart and sustainable agriculture. The results provide on the one hand adoption enablers such as stakeholder collaboration, enhance customer trust, and democratization, and, on the other hand, barriers such as lack of global standards, industry level best practices and policies for blockchain adoption in the agrifood sector. The outcome of this review highlights the adoption barriers over enablers of blockchain technology for smart and sustainable agriculture. Furthermore, several recommendations and implications are presented for addressing knowledge gaps for successful implementation. Full article
Show Figures

Figure 1

Article
An Ensemble-Learning-Based Technique for Bimodal Sentiment Analysis
Big Data Cogn. Comput. 2023, 7(2), 85; https://doi.org/10.3390/bdcc7020085 - 30 Apr 2023
Viewed by 746
Abstract
Human communication is predominantly expressed through speech and writing, which are powerful mediums for conveying thoughts and opinions. Researchers have been studying the analysis of human sentiments for a long time, including the emerging area of bimodal sentiment analysis in natural language processing [...] Read more.
Human communication is predominantly expressed through speech and writing, which are powerful mediums for conveying thoughts and opinions. Researchers have been studying the analysis of human sentiments for a long time, including the emerging area of bimodal sentiment analysis in natural language processing (NLP). Bimodal sentiment analysis has gained attention in various areas such as social opinion mining, healthcare, banking, and more. However, there is a limited amount of research on bimodal conversational sentiment analysis, which is challenging due to the complex nature of how humans express sentiment cues across different modalities. To address this gap in research, a comparison of multiple data modality models has been conducted on the widely used MELD dataset, which serves as a benchmark for sentiment analysis in the research community. The results show the effectiveness of combining acoustic and linguistic representations using a proposed neural-network-based ensemble learning technique over six transformer and deep-learning-based models, achieving state-of-the-art accuracy. Full article
Show Figures

Figure 1

Article
A Comparative Study of Secure Outsourced Matrix Multiplication Based on Homomorphic Encryption
Big Data Cogn. Comput. 2023, 7(2), 84; https://doi.org/10.3390/bdcc7020084 - 28 Apr 2023
Viewed by 903
Abstract
Homomorphic encryption (HE) is a promising solution for handling sensitive data in semi-trusted third-party computing environments, as it enables processing of encrypted data. However, applying sophisticated techniques such as machine learning, statistics, and image processing to encrypted data remains a challenge. The computational [...] Read more.
Homomorphic encryption (HE) is a promising solution for handling sensitive data in semi-trusted third-party computing environments, as it enables processing of encrypted data. However, applying sophisticated techniques such as machine learning, statistics, and image processing to encrypted data remains a challenge. The computational complexity of some encrypted operations can significantly increase processing time. In this paper, we focus on the analysis of two state-of-the-art HE matrix multiplication algorithms with the best time and space complexities. We show how their performance depends on the libraries and the execution context, considering the standard Cheon–Kim–Kim–Song (CKKS) HE scheme with fixed-point numbers based on the Microsoft SEAL and PALISADE libraries. We show that Windows OS for the SEAL library and Linux OS for the PALISADE library are the best options. In general, PALISADE-Linux outperforms PALISADE-Windows, SEAL-Linux, and SEAL-Windows by 1.28, 1.59, and 1.67 times on average for different matrix sizes, respectively. We derive high-precision extrapolation formulas to estimate the processing time of HE multiplication of larger matrices. Full article
Show Figures

Figure 1

Article
Two-Stage PNN–SVM Ensemble for Higher Education Admission Prediction
Big Data Cogn. Comput. 2023, 7(2), 83; https://doi.org/10.3390/bdcc7020083 - 23 Apr 2023
Viewed by 626
Abstract
In this paper, we investigate the methods used to evaluate the admission chances of higher education institutions’ (HEI) entrants as a crucial factor that directly influences the admission efficiency, quality of education results, and future students’ life-long trajectories. Due to the conditions of [...] Read more.
In this paper, we investigate the methods used to evaluate the admission chances of higher education institutions’ (HEI) entrants as a crucial factor that directly influences the admission efficiency, quality of education results, and future students’ life-long trajectories. Due to the conditions of uncertainty surrounding the decision-making process that determines the admission of entrants and the inability to independently assess the probability of potential outcomes, we propose the application of the machine learning (ML) model as an algorithm that provides decision-making support. The proposed model includes the support vector machine (SVM) stacking ensemble, which expands the input data set obtained using the Probabilistic Neural Network (PNN). The basic algorithms include four SVM ensemble methods with different kernel functions and Logistic Regression (LR) as a meta-algorithm. We evaluate the accuracy of the developed model in three stages: comparison with existing ML methods; comparison with a single-based model that comprises it; and comparison with a similar stacking model and with other types of ensembles (boosting, begging). The results of the designed two-stage PNN–SVM ensemble model provided an accuracy of 94% and possessed acquired superiority in the comparison stages. The obtained results enable the use of the presented model in the subsequent stages of the development of an intellectual support system for decision making regarding entrants’ admission. Full article
(This article belongs to the Special Issue Quality and Security of Critical Infrastructure Systems)
Show Figures

Figure 1

Article
Analysis of 2D and 3D Convolution Models for Volumetric Segmentation of the Human Hippocampus
Big Data Cogn. Comput. 2023, 7(2), 82; https://doi.org/10.3390/bdcc7020082 - 23 Apr 2023
Viewed by 668
Abstract
Extensive medical research has revealed evidence of a strong association between hippocampus atrophy and age-related diseases such as Alzheimer’s disease (AD). Therefore; segmentation of the hippocampus is an important task that can help clinicians and researchers in diagnosing cognitive impairment and uncovering the [...] Read more.
Extensive medical research has revealed evidence of a strong association between hippocampus atrophy and age-related diseases such as Alzheimer’s disease (AD). Therefore; segmentation of the hippocampus is an important task that can help clinicians and researchers in diagnosing cognitive impairment and uncovering the mechanisms behind hippocampal changes and diseases of the brain. The main aim of this paper was to provide a fair comparison of 2D and 3D convolution-based architectures for the specific task of hippocampus segmentation from brain MRI volumes to determine whether 3D convolution models truly perform better in hippocampus segmentation and also to assess any additional costs in terms of time and computational resources. Our optimized model, which used 50 epochs and a mini-batch size of 2, achieved the best validation loss and Dice Similarity Score (DSC) of 0.0129 and 0.8541, respectively, across all experiment runs. Based on the model comparisons, we concluded that 2D convolution models can surpass their 3D counterparts in terms of both hippocampus segmentation performance and training efficiency. Our automatic hippocampus segmentation demonstrated potential savings of thousands of clinician person-hours spent on manually analyzing and segmenting brain MRI scans Full article
(This article belongs to the Topic Recent Trends in Image Processing and Pattern Recognition)
Show Figures

Figure 1

Article
Analyzing Online Fake News Using Latent Semantic Analysis: Case of USA Election Campaign
Big Data Cogn. Comput. 2023, 7(2), 81; https://doi.org/10.3390/bdcc7020081 - 20 Apr 2023
Viewed by 1478
Abstract
Recent studies have indicated that fake news is always produced to manipulate readers and that it spreads very fast and brings great damage to human society through social media. From the available literature, most studies focused on fake news detection and identification and [...] Read more.
Recent studies have indicated that fake news is always produced to manipulate readers and that it spreads very fast and brings great damage to human society through social media. From the available literature, most studies focused on fake news detection and identification and fake news sentiment analysis using machine learning or deep learning techniques. However, relatively few researchers have paid attention to fake news analysis. This is especially true for fake political news. Unlike other published works which built fake news detection models from computer scientists’ viewpoints, this study aims to develop an effective method that combines natural language processing (NLP) and latent semantic analysis (LSA) using singular value decomposition (SVD) techniques to help social scientists to analyze fake news for discovering the exact elements. In addition, the authors analyze the characteristics of true news and fake news. A real case from the USA election campaign in 2016 is employed to demonstrate the effectiveness of our methods. The experimental results could give useful suggestions to future researchers to distinguish fake news. This study finds the five concepts extracted from LSA and that they are representative of political fake news during the election. Full article
(This article belongs to the Special Issue Artificial Intelligence in Digital Humanities)
Show Figures

Figure 1

Article
The Dataset for Optimal Circulant Topologies
Big Data Cogn. Comput. 2023, 7(2), 80; https://doi.org/10.3390/bdcc7020080 - 20 Apr 2023
Viewed by 530
Abstract
This article presents software for the synthesis of circulant graphs and the dataset obtained. An algorithm and new methods, which increase the speed of finding optimal circulant topologies, are proposed. The results obtained confirm an increase in performance and a decrease in memory [...] Read more.
This article presents software for the synthesis of circulant graphs and the dataset obtained. An algorithm and new methods, which increase the speed of finding optimal circulant topologies, are proposed. The results obtained confirm an increase in performance and a decrease in memory consumption compared to the previous implementation of the circulant topologies synthesis method. The developed software is designed to generate circulant topologies for the construction of networks-on-chip (NoCs) and multi-core systems reaching thousands of computing nodes. The developed software makes it possible to achieve high performance on an ordinary research workstation commensurate with similar solutions created for a supercomputer. The use cases of application of the created software for the analysis of routing algorithms in circulants and the regression analysis of the generated dataset of graph signatures to predict the characteristics of graphs of any size are described. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop