Big Data and Information Science Technology

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289).

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 11186

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information Engineering (DII), Polytechnic University of Marche, 60121 Ancona, Italy
Interests: big data analytics; social network analysis; network theory and practice; machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Information Engineering, Polytechnic University of Marche, 60121 Ancona, Italy
Interests: social and complex network analysis; hypernetwork and network science; Internet of Things; advanced algorithms for sequences comparison; pattern mining; logic programming; data science
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Information Engineering, Polytechnic University of Marche, 60121 Ancona, Italy
Interests: big data analytics; social network analysis; deep learning; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The digital age has brought about an unprecedented surge in data, leading to the emergence of big data and information science technology. The integration and use of big data technologies is of utmost importance within information science. This can be furthered by highlighting innovative research and applications in data science across various fields such as business, healthcare, education, and more. In doing so, we aim to revolutionize decision-making processes and offer unprecedented insights.

This Special Issue seeks to showcase original research articles and reviews on themes including big data analytics, machine learning, artificial intelligence for data management, real-time data processing, and data security in big data. By exploring these topics, we aim to contribute to the ongoing discourse on the potential of big data and information science technology to transform various sectors.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but not limited to) the following:

  • Predictive analytics and machine learning algorithms for big data;
  • Data mining and knowledge discovery in large-scale datasets;
  • Natural language processing and text mining in big data;
  • Visualization techniques for big data analysis and exploration;
  • Big data-driven decision support systems and applications;
  • Scalable and distributed computing frameworks for big data processing;
  • Privacy-preserving techniques in big data analytics;
  • Real-time stream processing and analytics for big data;
  • Big data integration, fusion, and interoperability;
  • Ethical and legal considerations in the era of big data.

We look forward to receiving your contributions.

Dr. Enrico Corradini
Dr. Francesco Cauteruccio
Dr. Luca Virgili
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • information science
  • data analytics
  • data security
  • machine learning
  • data mining
  • real-time data processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 831 KiB  
Article
Leveraging Mixture of Experts and Deep Learning-Based Data Rebalancing to Improve Credit Fraud Detection
by Zeyuan Yang, Yixuan Wang, Haokun Shi and Qiang Qiu
Big Data Cogn. Comput. 2024, 8(11), 151; https://doi.org/10.3390/bdcc8110151 - 5 Nov 2024
Viewed by 1020
Abstract
Credit card fraud detection is a critical challenge in the financial sector due to the rapidly evolving tactics of fraudsters and the significant class imbalance betweenegitimate and fraudulent transactions. Traditional models, while effective to some extent, often suffer from high false positive rates [...] Read more.
Credit card fraud detection is a critical challenge in the financial sector due to the rapidly evolving tactics of fraudsters and the significant class imbalance betweenegitimate and fraudulent transactions. Traditional models, while effective to some extent, often suffer from high false positive rates and fail to generalize well to emerging fraud patterns. In this paper, we propose a novel approach that integrates a Mixture of Experts (MoE) model with a Deep Neural Network-based Synthetic Minority Over-sampling Technique (DNN-SMOTE) to enhance fraud detection performance. The MoE modeleverages multiple specialized expert networks, each trained to detect specific types of fraud, while the DNN-SMOTE generates high-quality synthetic samples to address the class imbalance. Our experimental results on a publicly available dataset demonstrate that the proposed method achieves a classification accuracy of 99.93%, a true positive rate of 84.69%, and a true negative rate of 99.95%. The Matthews Correlation Coefficient (MCC) of 0.7883 further highlights the model’s balanced performance in detecting fraudulent transactions. These results underscore the effectiveness of combining MoE with DNN-SMOTE, offering a robust solution for real-world credit card fraud detection scenarios. Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
Show Figures

Figure 1

23 pages, 7702 KiB  
Article
MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application
by Weisi Chen, Pengxiang Qiu and Francesco Cauteruccio
Big Data Cogn. Comput. 2024, 8(8), 86; https://doi.org/10.3390/bdcc8080086 - 2 Aug 2024
Viewed by 943
Abstract
Named-entity recognition (NER) is a crucial task in natural language processing, especially for extracting meaningful information from unstructured text data. In the healthcare domain, accurate NER can significantly enhance patient care by enabling efficient extraction and analysis of clinical information. This paper presents [...] Read more.
Named-entity recognition (NER) is a crucial task in natural language processing, especially for extracting meaningful information from unstructured text data. In the healthcare domain, accurate NER can significantly enhance patient care by enabling efficient extraction and analysis of clinical information. This paper presents MedNER, a novel service-oriented framework designed specifically for medical NER in Chinese medical texts. MedNER leverages advanced deep learning techniques and domain-specific linguistic resources to achieve good performance in identifying diabetes-related entities such as symptoms, tests, and drugs. The framework integrates seamlessly with real-world healthcare systems, offering scalable and efficient solutions for processing large volumes of clinical data. This paper provides an in-depth discussion on the architecture and implementation of MedNER, featuring the concept of Deep Learning as a Service (DLaaS). A prototype has encapsulated BiLSTM-CRF and BERT-BiLSTM-CRF models into the core service, demonstrating its flexibility, usability, and effectiveness in addressing the unique challenges of Chinese medical text processing. Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
Show Figures

Figure 1

19 pages, 331 KiB  
Article
An Efficient Probabilistic Algorithm to Detect Periodic Patterns in Spatio-Temporal Datasets
by Claudio Gutiérrez-Soto, Patricio Galdames and Marco A. Palomino
Big Data Cogn. Comput. 2024, 8(6), 59; https://doi.org/10.3390/bdcc8060059 - 3 Jun 2024
Viewed by 1058
Abstract
Deriving insight from data is a challenging task for researchers and practitioners, especially when working on spatio-temporal domains. If pattern searching is involved, the complications introduced by temporal data dimensions create additional obstacles, as traditional data mining techniques are insufficient to address spatio-temporal [...] Read more.
Deriving insight from data is a challenging task for researchers and practitioners, especially when working on spatio-temporal domains. If pattern searching is involved, the complications introduced by temporal data dimensions create additional obstacles, as traditional data mining techniques are insufficient to address spatio-temporal databases (STDBs). We hereby present a new algorithm, which we refer to as F1/FP, and can be described as a probabilistic version of the Minus-F1 algorithm to look for periodic patterns. To the best of our knowledge, no previous work has compared the most cited algorithms in the literature to look for periodic patterns—namely, Apriori, MS-Apriori, FP-Growth, Max-Subpattern, and PPA. Thus, we have carried out such comparisons and then evaluated our algorithm empirically using two datasets, showcasing its ability to handle different types of periodicity and data distributions. By conducting such a comprehensive comparative analysis, we have demonstrated that our newly proposed algorithm has a smaller complexity than the existing alternatives and speeds up the performance regardless of the size of the dataset. We expect our work to contribute greatly to the mining of astronomical data and the permanently growing online streams derived from social media. Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
24 pages, 429 KiB  
Article
Cancer Detection Using a New Hybrid Method Based on Pattern Recognition in MicroRNAs Combining Particle Swarm Optimization Algorithm and Artificial Neural Network
by Sepideh Molaei, Stefano Cirillo and Giandomenico Solimando
Big Data Cogn. Comput. 2024, 8(3), 33; https://doi.org/10.3390/bdcc8030033 - 19 Mar 2024
Cited by 1 | Viewed by 2307
Abstract
MicroRNAs (miRNAs) play a crucial role in cancer development, but not all miRNAs are equally significant in cancer detection. Traditional methods face challenges in effectively identifying cancer-associated miRNAs due to data complexity and volume. This study introduces a novel, feature-based technique for detecting [...] Read more.
MicroRNAs (miRNAs) play a crucial role in cancer development, but not all miRNAs are equally significant in cancer detection. Traditional methods face challenges in effectively identifying cancer-associated miRNAs due to data complexity and volume. This study introduces a novel, feature-based technique for detecting attributes related to cancer-affecting microRNAs. It aims to enhance cancer diagnosis accuracy by identifying the most relevant miRNAs for various cancer types using a hybrid approach. In particular, we used a combination of particle swarm optimization (PSO) and artificial neural networks (ANNs) for this purpose. PSO was employed for feature selection, focusing on identifying the most informative miRNAs, while ANNs were used for recognizing patterns within the miRNA data. This hybrid method aims to overcome limitations in traditional miRNA analysis by reducing data redundancy and focusing on key genetic markers. The application of this method showed a significant improvement in the detection accuracy for various cancers, including breast and lung cancer and melanoma. Our approach demonstrated a higher precision in identifying relevant miRNAs compared to existing methods, as evidenced by the analysis of different datasets. The study concludes that the integration of PSO and ANNs provides a more efficient, cost-effective, and accurate method for cancer detection via miRNA analysis. This method can serve as a supplementary tool for cancer diagnosis and potentially aid in developing personalized cancer treatments. Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
Show Figures

Figure 1

22 pages, 9109 KiB  
Article
Temporal Dynamics of Citizen-Reported Urban Challenges: A Comprehensive Time Series Analysis
by Andreas F. Gkontzis, Sotiris Kotsiantis, Georgios Feretzakis and Vassilios S. Verykios
Big Data Cogn. Comput. 2024, 8(3), 27; https://doi.org/10.3390/bdcc8030027 - 4 Mar 2024
Cited by 1 | Viewed by 1923
Abstract
In an epoch characterized by the swift pace of digitalization and urbanization, the essence of community well-being hinges on the efficacy of urban management. As cities burgeon and transform, the need for astute strategies to navigate the complexities of urban life becomes increasingly [...] Read more.
In an epoch characterized by the swift pace of digitalization and urbanization, the essence of community well-being hinges on the efficacy of urban management. As cities burgeon and transform, the need for astute strategies to navigate the complexities of urban life becomes increasingly paramount. This study employs time series analysis to scrutinize citizen interactions with the coordinate-based problem mapping platform in the Municipality of Patras in Greece. The research explores the temporal dynamics of reported urban issues, with a specific focus on identifying recurring patterns through the lens of seasonality. The analysis, employing the seasonal decomposition technique, dissects time series data to expose trends in reported issues and areas of the city that might be obscured in raw big data. It accentuates a distinct seasonal pattern, with concentrations peaking during the summer months. The study extends its approach to forecasting, providing insights into the anticipated evolution of urban issues over time. Projections for the coming years show a consistent upward trend in both overall city issues and those reported in specific areas, with distinct seasonal variations. This comprehensive exploration of time series analysis and seasonality provides valuable insights for city stakeholders, enabling informed decision-making and predictions regarding future urban challenges. Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
Show Figures

Figure 1

26 pages, 4290 KiB  
Article
A Model for Enhancing Unstructured Big Data Warehouse Execution Time
by Marwa Salah Farhan, Amira Youssef and Laila Abdelhamid
Big Data Cogn. Comput. 2024, 8(2), 17; https://doi.org/10.3390/bdcc8020017 - 6 Feb 2024
Cited by 1 | Viewed by 2611
Abstract
Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing [...] Read more.
Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract–Transform–Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract–Clean–Load–Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time. Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
Show Figures

Figure 1

Back to TopTop