Big Data and Cognitive Computing

27 pages, 4746 KiB

Open AccessArticle

Building Trust in Conversational AI: A Review and Solution Architecture Using Large Language Models and Knowledge Graphs

by Ahtsham Zafar, Venkatesh Balavadhani Parthasarathy, Chan Le Van, Saad Shahid, Aafaq Iqbal Khan and Arsalan Shahid

Big Data Cogn. Comput. 2024, 8(6), 70; https://doi.org/10.3390/bdcc8060070 - 17 Jun 2024

Cited by 6 | Viewed by 5371

Abstract

Conversational AI systems have emerged as key enablers of human-like interactions across diverse sectors. Nevertheless, the balance between linguistic nuance and factual accuracy has proven elusive. In this paper, we first introduce LLMXplorer, a comprehensive tool that provides an in-depth review of over [...] Read more.

Conversational AI systems have emerged as key enablers of human-like interactions across diverse sectors. Nevertheless, the balance between linguistic nuance and factual accuracy has proven elusive. In this paper, we first introduce LLMXplorer, a comprehensive tool that provides an in-depth review of over 205 large language models (LLMs), elucidating their practical implications, ranging from social and ethical to regulatory, as well as their applicability across industries. Building on this foundation, we propose a novel functional architecture that seamlessly integrates the structured dynamics of knowledge graphs with the linguistic capabilities of LLMs. Validated using real-world AI news data, our architecture adeptly blends linguistic sophistication with factual rigor and further strengthens data security through role-based access control. This research provides insights into the evolving landscape of conversational AI, emphasizing the imperative for systems that are efficient, transparent, and trustworthy. Full article

► Show Figures

Figure 1

30 pages, 2117 KiB

Open AccessArticle

Towards a Refined Heuristic Evaluation: Incorporating Hierarchical Analysis for Weighted Usability Assessment

by Leonardo Talero-Sarmiento, Marc Gonzalez-Capdevila, Antoni Granollers, Henry Lamos-Diaz and Karine Pistili-Rodrigues

Big Data Cogn. Comput. 2024, 8(6), 69; https://doi.org/10.3390/bdcc8060069 - 13 Jun 2024

Cited by 1 | Viewed by 3396

Abstract

This study explores the implementation of the analytic hierarchy process in usability evaluations, specifically focusing on user interface assessment during software development phases. Addressing the challenge of diverse and unstandardized evaluation methodologies, our research develops and applies a tailored algorithm that simplifies heuristic [...] Read more.

This study explores the implementation of the analytic hierarchy process in usability evaluations, specifically focusing on user interface assessment during software development phases. Addressing the challenge of diverse and unstandardized evaluation methodologies, our research develops and applies a tailored algorithm that simplifies heuristic prioritization. This novel method combines the analytic hierarchy process framework with a bespoke algorithm that leverages transitive properties for efficient pairwise comparisons, significantly reducing the evaluative workload. The algorithm is designed to facilitate the estimation of heuristic relevance regardless of the number of items per heuristic or the item scale, thereby streamlining the evaluation process. Rigorous simulation testing of this tailored algorithm is complemented by its empirical application, where seven usability experts evaluate a web interface. This practical implementation demonstrates our method’s ability to decrease the necessary comparisons and simplify the complexity and workload associated with the traditional prioritization process. Additionally, it improves the accuracy and relevance of the user interface usability heuristic testing results. By prioritizing heuristics based on their importance as determined by the Usability Testing Leader—rather than merely depending on the number of items, scale, or heuristics—our approach ensures that evaluations focus on the most critical usability aspects from the start. The findings from this study highlight the importance of expert-driven evaluations for gaining a thorough understanding of heuristic UI assessment, offering a wider perspective than user-perception-based methods like the questionnaire approach. Our research contributes to advancing UI evaluation methodologies, offering an organized and effective framework for future usability testing endeavors. Full article

(This article belongs to the Special Issue Human Factor in Information Systems Development and Management)

► Show Figures

Figure 1

18 pages, 577 KiB

Open AccessArticle

Application of Natural Language Processing and Genetic Algorithm to Fine-Tune Hyperparameters of Classifiers for Economic Activities Analysis

by Ivan Malashin, Igor Masich, Vadim Tynchenko, Vladimir Nelyub, Aleksei Borodulin and Andrei Gantimurov

Big Data Cogn. Comput. 2024, 8(6), 68; https://doi.org/10.3390/bdcc8060068 - 13 Jun 2024

Cited by 5 | Viewed by 1669

Abstract

This study proposes a method for classifying economic activity descriptors to match Nomenclature of Economic Activities (NACE) codes, employing a blend of machine learning techniques and expert evaluation. By leveraging natural language processing (NLP) methods to vectorize activity descriptors and utilizing genetic algorithm [...] Read more.

This study proposes a method for classifying economic activity descriptors to match Nomenclature of Economic Activities (NACE) codes, employing a blend of machine learning techniques and expert evaluation. By leveraging natural language processing (NLP) methods to vectorize activity descriptors and utilizing genetic algorithm (GA) optimization to fine-tune hyperparameters in multi-class classifiers like Naive Bayes, Decision Trees, Random Forests, and Multilayer Perceptrons, our aim is to boost the accuracy and reliability of an economic classification system. This system faces challenges due to the absence of precise target labels in the dataset. Hence, it is essential to initially check the accuracy of utilized methods based on expert evaluations using a small dataset before generalizing to a larger one. Full article

(This article belongs to the Special Issue Recent Advances in Big Data-Driven Prescriptive Analytics)

► Show Figures

Figure 1

22 pages, 5439 KiB

Open AccessArticle

Research on Multimodal Transport of Electronic Documents Based on Blockchain

by Xueqi Qian, Lixin Shen, Dong Yang, Zhiwen Zhang and Zhihong Jin

Big Data Cogn. Comput. 2024, 8(6), 67; https://doi.org/10.3390/bdcc8060067 - 7 Jun 2024

Cited by 2 | Viewed by 2139

Abstract

Multimodal transport document collaboration is the foundation of multimodal transport operations. Blockchain technology can effectively address issues such as a lack of trust and difficulties in information sharing in current multimodal transport document collaboration. However, in current research on blockchain-based electronic documents, the [...] Read more.

Multimodal transport document collaboration is the foundation of multimodal transport operations. Blockchain technology can effectively address issues such as a lack of trust and difficulties in information sharing in current multimodal transport document collaboration. However, in current research on blockchain-based electronic documents, the bottleneck lies in the collaboration aspect of multimodal transport among multiple entities, known as the “one-bill coverage system” collaborative problem. The collaboration problem studied in this paper involves selecting suitable transport routes according to the shipper’s transport needs, and selecting the most suitable specific carrier from numerous carriers. To address the collaboration problem among multiple parties in the multimodal transport “one-bill coverage system”, a multiparty collaboration mechanism is designed. This mechanism includes two aspects: firstly, designing the architecture of the multimodal transport blockchain transport platform, which reengineers the operation process of the “one-bill coverage system” for container multimodal transport; secondly, constructing a multiparty collaboration decision-making model for the “one-bill coverage system” in multimodal transport. The model is solved and analyzed, and the collaboration strategy obtained is embedded in the application layer of the platform. Smart contracts related to the “one-bill coverage system” for multimodal transport are written in the Solidity language and deployed and executed on the Remix platform. The design of this mechanism can effectively improve the collaboration efficiency of participants in the “one-bill coverage system” for multimodal transport. Full article

(This article belongs to the Special Issue Blockchain Meets IoT for Big Data)

► Show Figures

Figure 1

30 pages, 3527 KiB

Open AccessReview

Advancing Dental Diagnostics: A Review of Artificial Intelligence Applications and Challenges in Dentistry

by Dhiaa Musleh, Haya Almossaeed, Fay Balhareth, Ghadah Alqahtani, Norah Alobaidan, Jana Altalag and May Issa Aldossary

Big Data Cogn. Comput. 2024, 8(6), 66; https://doi.org/10.3390/bdcc8060066 - 7 Jun 2024

Cited by 9 | Viewed by 9929

Abstract

The rise of artificial intelligence has created and facilitated numerous everyday tasks in a variety of industries, including dentistry. Dentists have utilized X-rays for diagnosing patients’ ailments for many years. However, the procedure is typically performed manually, which can be challenging and time-consuming [...] Read more.

The rise of artificial intelligence has created and facilitated numerous everyday tasks in a variety of industries, including dentistry. Dentists have utilized X-rays for diagnosing patients’ ailments for many years. However, the procedure is typically performed manually, which can be challenging and time-consuming for non-specialized specialists and carries a significant risk of error. As a result, researchers have turned to machine and deep learning modeling approaches to precisely identify dental disorders using X-ray pictures. This review is motivated by the need to address these challenges and to explore the potential of AI to enhance diagnostic accuracy, efficiency, and reliability in dental practice. Although artificial intelligence is frequently employed in dentistry, the approaches’ outcomes are still influenced by aspects such as dataset availability and quantity, chapter balance, and data interpretation capability. Consequently, it is critical to work with the research community to address these issues in order to identify the most effective approaches for use in ongoing investigations. This article, which is based on a literature review, provides a concise summary of the diagnosis process using X-ray imaging systems, offers a thorough understanding of the difficulties that dental researchers face, and presents an amalgamative evaluation of the performances and methodologies assessed using publicly available benchmarks. Full article

(This article belongs to the Special Issue Revolutionizing Healthcare: Exploring the Latest Advances in Digital Health Technology)

► Show Figures

Figure 1

23 pages, 1647 KiB

Open AccessArticle

Harnessing Graph Neural Networks to Predict International Trade Flows

by Bassem Sellami, Chahinez Ounoughi, Tarmo Kalvet, Marek Tiits and Diego Rincon-Yanez

Big Data Cogn. Comput. 2024, 8(6), 65; https://doi.org/10.3390/bdcc8060065 - 7 Jun 2024

Cited by 5 | Viewed by 4160

Abstract

In the realm of international trade and economic development, the prediction of trade flows between countries is crucial for identifying export opportunities. Commonly used log-linear regression models are constrained due to difficulties when dealing with extensive, high-cardinality datasets, and the utilization of machine [...] Read more.

In the realm of international trade and economic development, the prediction of trade flows between countries is crucial for identifying export opportunities. Commonly used log-linear regression models are constrained due to difficulties when dealing with extensive, high-cardinality datasets, and the utilization of machine learning techniques in predictions offers new possibilities. We examine the predictive power of Graph Neural Networks (GNNs) in estimating the value of bilateral trade between countries. We work with detailed UN Comtrade data that represent annual bilateral trade in goods between any two countries in the world and more than 5000 product groups. We explore two different types of GNNs, namely Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs), by applying them to trade flow data. This study evaluates the effectiveness of GNNs relative to traditional machine learning techniques such as random forest and examines the possible effects of data drift on their performance. Our findings reveal the superior predictive capability of GNNs, suggesting their effectiveness in modeling complex trade relationships. The research presented in this work offers a data-driven foundation for decision-making and is relevant for business strategies and policymaking as it helps in identifying markets, products, and sectors with significant development potential. Full article

(This article belongs to the Special Issue Recent Advances in Big Data-Driven Prescriptive Analytics)

► Show Figures

Figure 1

29 pages, 2444 KiB

Open AccessReview

Integrating OLAP with NoSQL Databases in Big Data Environments: Systematic Mapping

by Diana Martinez-Mosquera, Rosa Navarrete, Sergio Luján-Mora, Lorena Recalde and Andres Andrade-Cabrera

Big Data Cogn. Comput. 2024, 8(6), 64; https://doi.org/10.3390/bdcc8060064 - 5 Jun 2024

Cited by 2 | Viewed by 3747

Abstract

The growing importance of data analytics is leading to a shift in data management strategy at many companies, moving away from simple data storage towards adopting Online Analytical Processing (OLAP) query analysis. Concurrently, NoSQL databases are gaining ground as the preferred choice for [...] Read more.

The growing importance of data analytics is leading to a shift in data management strategy at many companies, moving away from simple data storage towards adopting Online Analytical Processing (OLAP) query analysis. Concurrently, NoSQL databases are gaining ground as the preferred choice for storing and querying analytical data. This article presents a comprehensive, systematic mapping, aiming to consolidate research efforts related to the integration of OLAP with NoSQL databases in Big Data environments. After identifying 1646 initial research studies from scientific digital repositories, a thorough examination of their content resulted in the acceptance of 22 studies. Utilizing the snowballing technique, an additional three studies were selected, culminating in a final corpus of twenty-five relevant articles. This review addresses the growing importance of leveraging NoSQL databases for OLAP query analysis in response to increasing data analytics demands. By identifying the most commonly used NoSQL databases with OLAP, such as column-oriented and document-oriented, prevalent OLAP modeling methods, such as Relational Online Analytical Processing (ROLAP) and Multidimensional Online Analytical Processing (MOLAP), and suggested models for batch and real-time processing, among other results, this research provides a roadmap for organizations navigating the integration of OLAP with NoSQL. Additionally, exploring computational resource requirements and performance benchmarks facilitates informed decision making and promotes advancements in Big Data analytics. The main findings of this review provide valuable insights and updated information regarding the integration of OLAP cubes with NoSQL databases to benefit future research, industry practitioners, and academia alike. This consolidation of research efforts not only promotes innovative solutions but also promises reduced operational costs compared to traditional database systems. Full article

► Show Figures

Figure 1

17 pages, 1493 KiB

Open AccessArticle

LLMs and NLP Models in Cryptocurrency Sentiment Analysis: A Comparative Classification Study

by Konstantinos I. Roumeliotis, Nikolaos D. Tselikas and Dimitrios K. Nasiopoulos

Big Data Cogn. Comput. 2024, 8(6), 63; https://doi.org/10.3390/bdcc8060063 - 5 Jun 2024

Cited by 11 | Viewed by 10824

Abstract

Cryptocurrencies are becoming increasingly prominent in financial investments, with more investors diversifying their portfolios and individuals drawn to their ease of use and decentralized financial opportunities. However, this accessibility also brings significant risks and rewards, often influenced by news and the sentiments of [...] Read more.

Cryptocurrencies are becoming increasingly prominent in financial investments, with more investors diversifying their portfolios and individuals drawn to their ease of use and decentralized financial opportunities. However, this accessibility also brings significant risks and rewards, often influenced by news and the sentiments of crypto investors, known as crypto signals. This paper explores the capabilities of large language models (LLMs) and natural language processing (NLP) models in analyzing sentiment from cryptocurrency-related news articles. We fine-tune state-of-the-art models such as GPT-4, BERT, and FinBERT for this specific task, evaluating their performance and comparing their effectiveness in sentiment classification. By leveraging these advanced techniques, we aim to enhance the understanding of sentiment dynamics in the cryptocurrency market, providing insights that can inform investment decisions and risk management strategies. The outcomes of this comparative study contribute to the broader discourse on applying advanced NLP models to cryptocurrency sentiment analysis, with implications for both academic research and practical applications in financial markets. Full article

(This article belongs to the Special Issue Generative AI and Large Language Models)

► Show Figures

Figure 1

29 pages, 5167 KiB

Open AccessReview

Insights into Industrial Efficiency: An Empirical Study of Blockchain Technology

by Kaoutar Douaioui and Othmane Benmoussa

Big Data Cogn. Comput. 2024, 8(6), 62; https://doi.org/10.3390/bdcc8060062 - 4 Jun 2024

Cited by 6 | Viewed by 2495

Abstract

Blockchain technology is expected to have a radical impact on most industries by boosting security, transparency, and efficiency. This work considers the potential benefits of blockchain-focused applications in industrial process monitoring. The research design facilitates a detailed bibliometric analysis and delivers insights into [...] Read more.

Blockchain technology is expected to have a radical impact on most industries by boosting security, transparency, and efficiency. This work considers the potential benefits of blockchain-focused applications in industrial process monitoring. The research design facilitates a detailed bibliometric analysis and delivers insights into the intellectual structure of blockchain technology’s application in industry via scientometric approaches. The work also approaches numerous sources in various industrial sectors to identify the transformative role of blockchain in industrial processes. Aspects such as blockchain technology’s impact on industrial processes’ transparency are discussed, while the paper does not ignore that success stories in applying blockchain to industrial sectors are often exaggerated due to a highly competitive environment that the cryptocurrency domain has become. Finally, the work presents major research avenues and decision-making areas that should be tackled to maximize the disruptive potential of blockchain and create a secure, transparent, and inclusive future. Full article

(This article belongs to the Special Issue Industrial Applications of IoT and Blockchain for Sustainable Environment)

► Show Figures

Figure 1

16 pages, 2955 KiB

Open AccessArticle

Analyzing Trends in Digital Transformation Korean Social Media Data: A Semantic Network Analysis

by Jong-Hwi Song and Byung-Suk Seo

Big Data Cogn. Comput. 2024, 8(6), 61; https://doi.org/10.3390/bdcc8060061 - 4 Jun 2024

Viewed by 2245

Abstract

This study explores the impact of digital transformation on Korean society by analyzing Korean social media data, focusing on the societal and economic effects triggered by advancements in digital technology. Utilizing text mining techniques and semantic network analysis, we extracted key terms and [...] Read more.

This study explores the impact of digital transformation on Korean society by analyzing Korean social media data, focusing on the societal and economic effects triggered by advancements in digital technology. Utilizing text mining techniques and semantic network analysis, we extracted key terms and their relationships from online news and blogs, identifying major themes related to digital transformation. Our analysis, based on data collected from major Korean portals using various related search terms, provides deep insights into how digital evolution influences individuals, businesses, and government sectors. The findings offer a comprehensive view of the technological and social trends emerging from digital transformation, including its policy, economic, and educational implications. This research not only sheds light on the understanding and strategic approaches to digital transformation in Korea but also demonstrates the potential of social media data in analyzing the societal impact of technological advancements, offering valuable resources for future research in effectively navigating the era of digital change. Full article

(This article belongs to the Special Issue Challenges and Perspectives of Social Networks within Social Computing)

► Show Figures

Figure 1

18 pages, 3050 KiB

Open AccessArticle

Quantifying Variations in Controversial Discussions within Kuwaiti Social Networks

by Yeonjung Lee, Hana Alostad and Hasan Davulcu

Big Data Cogn. Comput. 2024, 8(6), 60; https://doi.org/10.3390/bdcc8060060 - 4 Jun 2024

Cited by 1 | Viewed by 1384

Abstract

During the COVID-19 pandemic, pro-vaccine and anti-vaccine groups emerged, influencing others to vaccinate or abstain and leading to polarized debates. Due to incomplete user data and the complexity of social network interactions, understanding the dynamics of these discussions is challenging. This study aims [...] Read more.

During the COVID-19 pandemic, pro-vaccine and anti-vaccine groups emerged, influencing others to vaccinate or abstain and leading to polarized debates. Due to incomplete user data and the complexity of social network interactions, understanding the dynamics of these discussions is challenging. This study aims to discover and quantify the factors driving the controversy related to vaccine stances across Kuwaiti social networks. To tackle these challenges, a graph convolutional network (GCN) and feature propagation (FP) were utilized to accurately detect users’ stances despite incomplete features, achieving an accuracy of 96%. Additionally, the random walk controversy (RWC) score was employed to quantify polarization points within the social networks. Experiments were conducted using a dataset of vaccine-related retweets and discussions from X (formerly Twitter) during the Kuwait COVID-19 vaccine rollout period. The analysis revealed high polarization periods correlating with specific vaccination rates and governmental announcements. This research provides a novel approach to accurately detecting user stances in low-resource languages like the Kuwaiti dialect without the need for costly annotations, offering valuable insights to help policymakers understand public opinion and address misinformation effectively. Full article

► Show Figures

Figure 1

19 pages, 331 KiB

Open AccessArticle

An Efficient Probabilistic Algorithm to Detect Periodic Patterns in Spatio-Temporal Datasets

by Claudio Gutiérrez-Soto, Patricio Galdames and Marco A. Palomino

Big Data Cogn. Comput. 2024, 8(6), 59; https://doi.org/10.3390/bdcc8060059 - 3 Jun 2024

Viewed by 1558

Abstract

Deriving insight from data is a challenging task for researchers and practitioners, especially when working on spatio-temporal domains. If pattern searching is involved, the complications introduced by temporal data dimensions create additional obstacles, as traditional data mining techniques are insufficient to address spatio-temporal [...] Read more.

Deriving insight from data is a challenging task for researchers and practitioners, especially when working on spatio-temporal domains. If pattern searching is involved, the complications introduced by temporal data dimensions create additional obstacles, as traditional data mining techniques are insufficient to address spatio-temporal databases (STDBs). We hereby present a new algorithm, which we refer to as F1/FP, and can be described as a probabilistic version of the Minus-F1 algorithm to look for periodic patterns. To the best of our knowledge, no previous work has compared the most cited algorithms in the literature to look for periodic patterns—namely, Apriori, MS-Apriori, FP-Growth, Max-Subpattern, and PPA. Thus, we have carried out such comparisons and then evaluated our algorithm empirically using two datasets, showcasing its ability to handle different types of periodicity and data distributions. By conducting such a comprehensive comparative analysis, we have demonstrated that our newly proposed algorithm has a smaller complexity than the existing alternatives and speeds up the performance regardless of the size of the dataset. We expect our work to contribute greatly to the mining of astronomical data and the permanently growing online streams derived from social media. Full article

(This article belongs to the Special Issue Big Data and Information Science Technology)

23 pages, 1380 KiB

Open AccessArticle

Enhancing Self-Supervised Learning through Explainable Artificial Intelligence Mechanisms: A Computational Analysis

by Elie Neghawi and Yan Liu

Big Data Cogn. Comput. 2024, 8(6), 58; https://doi.org/10.3390/bdcc8060058 - 3 Jun 2024

Cited by 2 | Viewed by 2367

Abstract

Self-supervised learning continues to drive advancements in machine learning. However, the absence of unified computational processes for benchmarking and evaluation remains a challenge. This study conducts a comprehensive analysis of state-of-the-art self-supervised learning algorithms, emphasizing their underlying mechanisms and computational intricacies. Building upon [...] Read more.

Self-supervised learning continues to drive advancements in machine learning. However, the absence of unified computational processes for benchmarking and evaluation remains a challenge. This study conducts a comprehensive analysis of state-of-the-art self-supervised learning algorithms, emphasizing their underlying mechanisms and computational intricacies. Building upon this analysis, we introduce a unified model-agnostic computation (UMAC) process, tailored to complement modern self-supervised learning algorithms. UMAC serves as a model-agnostic and global explainable artificial intelligence (XAI) methodology that is capable of systematically integrating and enhancing state-of-the-art algorithms. Through UMAC, we identify key computational mechanisms and craft a unified framework for self-supervised learning evaluation. Leveraging UMAC, we integrate an XAI methodology to enhance transparency and interpretability. Our systematic approach yields a 17.12% increase in improvement in training time complexity and a 13.1% boost in improvement in testing time complexity. Notably, improvements are observed in augmentation, encoder architecture, and auxiliary components within the network classifier. These findings underscore the importance of structured computational processes in enhancing model efficiency and fortifying algorithmic transparency in self-supervised learning, paving the way for more interpretable and efficient AI models. Full article

► Show Figures

Figure 1

16 pages, 4754 KiB

Open AccessArticle

Dynamic Electrocardiogram Signal Quality Assessment Method Based on Convolutional Neural Network and Long Short-Term Memory Network

by Chen He, Yuxuan Wei, Yeru Wei, Qiang Liu and Xiang An

Big Data Cogn. Comput. 2024, 8(6), 57; https://doi.org/10.3390/bdcc8060057 - 31 May 2024

Cited by 1 | Viewed by 1799

Abstract

Cardiovascular diseases (CVDs) are highly prevalent, sudden onset, and relatively fatal, posing a significant public health burden. Long-term dynamic electrocardiography, which can continuously record the long-term dynamic ECG activities of individuals in their daily lives, has high research value. However, ECG signals are [...] Read more.

Cardiovascular diseases (CVDs) are highly prevalent, sudden onset, and relatively fatal, posing a significant public health burden. Long-term dynamic electrocardiography, which can continuously record the long-term dynamic ECG activities of individuals in their daily lives, has high research value. However, ECG signals are weak and highly susceptible to external interference, which may lead to false alarms and misdiagnosis, affecting the diagnostic efficiency and the utilization rate of healthcare resources, so research on the quality of dynamic ECG signals is extremely necessary. Aimed at the above problems, this paper proposes a dynamic ECG signal quality assessment method based on CNN and LSTM that divides the signal into three quality categories: the signal of the Q1 category has a lower noise level, which can be used for reliable diagnosis of arrhythmia, etc.; the signal of the Q2 category has a higher noise level, but it still contains information that can be used for heart rate calculation, HRV analysis, etc.; and the signal of the Q3 category has a higher noise level that can interfere with the diagnosis of cardiovascular disease and should be discarded or labeled. In this paper, we use the widely recognized MIT-BIH database, based on which the model is applied to realistically collect exercise experimental data to assess the performance of the model in dealing with real-world situations. The model achieves an accuracy of 98.65% on the test set, a macro-averaged F1 score of 98.5%, and a high F1 score of 99.71% for the prediction of Q3 category signals, which shows that the model has good accuracy and generalization performance. Full article

► Show Figures

Figure 1

18 pages, 4725 KiB

Open AccessArticle

Stock Trend Prediction with Machine Learning: Incorporating Inter-Stock Correlation Information through Laplacian Matrix

by Wenxuan Zhang and Benzhuo Lu

Big Data Cogn. Comput. 2024, 8(6), 56; https://doi.org/10.3390/bdcc8060056 - 30 May 2024

Cited by 1 | Viewed by 3442

Abstract

Predicting stock trends in financial markets is of significant importance to investors and portfolio managers. In addition to a stock’s historical price information, the correlation between that stock and others can also provide valuable information for forecasting future returns. Existing methods often fall [...] Read more.

Predicting stock trends in financial markets is of significant importance to investors and portfolio managers. In addition to a stock’s historical price information, the correlation between that stock and others can also provide valuable information for forecasting future returns. Existing methods often fall short of straightforward and effective capture of the intricate interdependencies between stocks. In this research, we introduce the concept of a Laplacian correlation graph (LOG), designed to explicitly model the correlations in stock price changes as the edges of a graph. After constructing the LOG, we will build a machine learning model, such as a graph attention network (GAT), and incorporate the LOG into the loss term. This innovative loss term is designed to empower the neural network to learn and leverage price correlations among different stocks in a straightforward but effective manner. The advantage of a Laplacian matrix is that matrix operation form is more suitable for current machine learning frameworks, thus achieving high computational efficiency and simpler model representation. Experimental results demonstrate improvements across multiple evaluation metrics using our LOG. Incorporating our LOG into five base machine learning models consistently enhances their predictive performance. Furthermore, backtesting results reveal superior returns and information ratios, underscoring the practical implications of our approach for real-world investment decisions. Our study addresses the limitations of existing methods that miss the correlation between stocks or fail to model correlation in a simple and effective way, and the proposed LOG emerges as a promising tool for stock returns prediction, offering enhanced predictive accuracy and improved investment outcomes. Full article

(This article belongs to the Special Issue Big Data Analytics and Edge Computing: Recent Trends and Future)

► Show Figures

Figure 1

25 pages, 8282 KiB

Open AccessArticle

A Secure Data Publishing and Access Service for Sensitive Data from Living Labs: Enabling Collaboration with External Researchers via Shareable Data

by Mikel Hernandez, Evdokimos Konstantinidis, Gorka Epelde, Francisco Londoño, Despoina Petsani, Michalis Timoleon, Vasiliki Fiska, Lampros Mpaltadoros, Christoniki Maga-Nteve, Ilias Machairas and Panagiotis D. Bamidis

Big Data Cogn. Comput. 2024, 8(6), 55; https://doi.org/10.3390/bdcc8060055 - 28 May 2024

Cited by 4 | Viewed by 1862

Abstract

Intending to enable a broader collaboration with the scientific community while maintaining privacy of the data stored and generated in Living Labs, this paper presents the Shareable Data Publishing and Access Service for Living Labs, implemented within the framework of the H2020 VITALISE [...] Read more.

Intending to enable a broader collaboration with the scientific community while maintaining privacy of the data stored and generated in Living Labs, this paper presents the Shareable Data Publishing and Access Service for Living Labs, implemented within the framework of the H2020 VITALISE project. Building upon previous work, significant enhancements and improvements are presented in the architecture enabling Living Labs to securely publish collected data in an internal and isolated node for external use. External researchers can access a portal to discover and download shareable data versions (anonymised or synthetic data) derived from the data stored across different Living Labs that they can use to develop, test, and debug their processing scripts locally, adhering to legal and ethical data handling practices. Subsequently, they may request remote execution of the same algorithms against the real internal data in Living Lab nodes, comparing the outcomes with those obtained using shareable data. The paper details the architecture, data flows, technical details and validation of the service with real-world usage examples, demonstrating its efficacy in promoting data-driven research in digital health while preserving privacy. The presented service can be used as an intermediary between Living Labs and external researchers for secure data exchange and to accelerate research on data analytics paradigms in digital health, ensuring compliance with data protection laws. Full article

(This article belongs to the Special Issue Privacy-Enhancing Technologies of Data for Sustainable and Secure Cooperation)

► Show Figures

Figure 1

20 pages, 70388 KiB

Open AccessArticle

Analyzing the Attractiveness of Food Images Using an Ensemble of Deep Learning Models Trained via Social Media Images

by Tanyaboon Morinaga, Karn Patanukhom and Yuthapong Somchit

Big Data Cogn. Comput. 2024, 8(6), 54; https://doi.org/10.3390/bdcc8060054 - 27 May 2024

Viewed by 1810

Abstract

With the growth of digital media and social networks, sharing visual content has become common in people’s daily lives. In the food industry, visually appealing food images can attract attention, drive engagement, and influence consumer behavior. Therefore, it is crucial for businesses to [...] Read more.

With the growth of digital media and social networks, sharing visual content has become common in people’s daily lives. In the food industry, visually appealing food images can attract attention, drive engagement, and influence consumer behavior. Therefore, it is crucial for businesses to understand what constitutes attractive food images. Assessing the attractiveness of food images poses significant challenges due to the lack of large labeled datasets that align with diverse public preferences. Additionally, it is challenging for computer assessments to approach human judgment in evaluating aesthetic quality. This paper presents a novel framework that circumvents the need for explicit human annotation by leveraging user engagement data that are readily available on social media platforms. We propose procedures to collect, filter, and automatically label the attractiveness classes of food images based on their user engagement levels. The data gathered from social media are used to create predictive models for category-specific attractiveness assessments. Our experiments across five food categories demonstrate the efficiency of our approach. The experimental results show that our proposed user-engagement-based attractiveness class labeling achieves a high consistency of 97.2% compared to human judgments obtained through A/B testing. Separate attractiveness assessment models were created for each food category using convolutional neural networks (CNNs). When analyzing unseen food images, our models achieve a consistency of 76.0% compared to human judgments. The experimental results suggest that the food image dataset collected from social networks, using the proposed framework, can be successfully utilized for learning food attractiveness assessment models. Full article

(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)

► Show Figures

Figure 1

23 pages, 2866 KiB

Open AccessArticle

Exploiting Rating Prediction Certainty for Recommendation Formulation in Collaborative Filtering

by Dionisis Margaris, Kiriakos Sgardelis, Dimitris Spiliotopoulos and Costas Vassilakis

Big Data Cogn. Comput. 2024, 8(6), 53; https://doi.org/10.3390/bdcc8060053 - 27 May 2024

Cited by 5 | Viewed by 1514

Abstract

Collaborative filtering is a popular recommender system (RecSys) method that produces rating prediction values for products by combining the ratings that close users have already given to the same products. Afterwards, the products that achieve the highest prediction values are recommended to the [...] Read more.

Collaborative filtering is a popular recommender system (RecSys) method that produces rating prediction values for products by combining the ratings that close users have already given to the same products. Afterwards, the products that achieve the highest prediction values are recommended to the user. However, as expected, prediction estimation may contain errors, which, in the case of RecSys, will lead to either not recommending a product that the user would actually like (i.e., purchase, watch, or listen) or to recommending a product that the user would not like, with both cases leading to degraded recommendation quality. Especially in the latter case, the RecSys would be deemed unreliable. In this work, we design and develop a recommendation algorithm that considers both the rating prediction values and the prediction confidence, derived from features associated with rating prediction accuracy in collaborative filtering. The presented algorithm is based on the rationale that it is preferable to recommend an item with a slightly lower prediction value, if that prediction seems to be certain and safe, over another that has a higher value but of lower certainty. The proposed algorithm prevents low-confidence rating predictions from being included in recommendations, ensuring the recommendation quality and reliability of the RecSys. Full article

(This article belongs to the Special Issue Business Intelligence and Big Data in E-commerce)

► Show Figures

Figure 1

14 pages, 4246 KiB

Open AccessArticle

Image-Based Leaf Disease Recognition Using Transfer Deep Learning with a Novel Versatile Optimization Module

by Petra Radočaj, Dorijan Radočaj and Goran Martinović

Big Data Cogn. Comput. 2024, 8(6), 52; https://doi.org/10.3390/bdcc8060052 - 23 May 2024

Cited by 6 | Viewed by 2817

Abstract

Due to the projected increase in food production by 70% in 2050, crops should be additionally protected from diseases and pests to ensure a sufficient food supply. Transfer deep learning approaches provide a more efficient solution than traditional methods, which are labor-intensive and [...] Read more.

Due to the projected increase in food production by 70% in 2050, crops should be additionally protected from diseases and pests to ensure a sufficient food supply. Transfer deep learning approaches provide a more efficient solution than traditional methods, which are labor-intensive and struggle to effectively monitor large areas, leading to delayed disease detection. This study proposed a versatile module based on the Inception module, Mish activation function, and Batch normalization (IncMB) as a part of deep neural networks. A convolutional neural network (CNN) with transfer learning was used as the base for evaluated approaches for tomato disease detection: (1) CNNs, (2) CNNs with a support vector machine (SVM), and (3) CNNs with the proposed IncMB module. In the experiment, the public dataset PlantVillage was used, containing images of six different tomato leaf diseases. The best results were achieved by the pre-trained InceptionV3 network, which contains an IncMB module with an accuracy of 97.78%. In three out of four cases, the highest accuracy was achieved by networks containing the proposed IncMB module in comparison to evaluated CNNs. The proposed IncMB module represented an improvement in the early detection of plant diseases, providing a basis for timely leaf disease detection. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence, 2nd Volume)

► Show Figures

Figure 1

20 pages, 4936 KiB

Open AccessArticle

Development of Context-Based Sentiment Classification for Intelligent Stock Market Prediction

by Nurmaganbet Smatov, Ruslan Kalashnikov and Amandyk Kartbayev

Big Data Cogn. Comput. 2024, 8(6), 51; https://doi.org/10.3390/bdcc8060051 - 22 May 2024

Cited by 4 | Viewed by 2280

Abstract

This paper presents a novel approach to sentiment analysis specifically customized for predicting stock market movements, bypassing the need for external dictionaries that are often unavailable for many languages. Our methodology directly analyzes textual data, with a particular focus on context-specific sentiment words [...] Read more.

This paper presents a novel approach to sentiment analysis specifically customized for predicting stock market movements, bypassing the need for external dictionaries that are often unavailable for many languages. Our methodology directly analyzes textual data, with a particular focus on context-specific sentiment words within neural network models. This specificity ensures that our sentiment analysis is both relevant and accurate in identifying trends in the stock market. We employ sophisticated mathematical modeling techniques to enhance both the precision and interpretability of our models. Through meticulous data handling and advanced machine learning methods, we leverage large datasets from Twitter and financial markets to examine the impact of social media sentiment on financial trends. We achieved an accuracy exceeding 75%, highlighting the effectiveness of our modeling approach, which we further refined into a convolutional neural network model. This achievement contributes valuable insights into sentiment analysis within the financial domain, thereby improving the overall clarity of forecasting in this field. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 8, Issue 6 (June 2024) – 20 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI