Big Data and Large-Scale Data Processing Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 February 2024) | Viewed by 5936

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information and Electronic Engineering, International Hellenic University, 57001 Thermi, Greece
Interests: machine learning; deep learning; data mining; parallel algorithms; data structures; information retrieval
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Science and Technology, International Hellenic University, 57001 Thessaloniki, Greece
Interests: algorithms; data structures; data mining; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Nowadays, data of all kinds are created at fast rates. Huge online social communities with billions of users, in addition to trillions of IoT devices and sensors, contribute to these rapid data production rates on a daily basis. Expectedly, these massive volumes of data have rendered emerging problems such as data management, engineering and processing. As a natural consequence, numerous researchers worldwide are working to introduce robust and efficient solutions to these problems.

The relevant literature includes studies from two major categories: parallel algorithm development and infrastructure organization and enhancement. The first category introduces methods for efficiently managing and processing large volumes of data via new or existing algorithmic approaches. Due to the overwhelming amount of underlying data, these solutions are systematically deployed on massive, parallel and distributed architectures such as large computer clusters, multi-CPU systems and highly parallelizable GPUs. On the other hand, the second category presents methods for effectively organizing the architecture of the aforementioned parallel infrastructures.

This Special Issue aims to attract high-quality research works that deal with “Big Data and Large-Scale Data Processing Applications” in the fields of:

  1. e-Commerce;
  2. Social networks, blogs and online communities;
  3. Internet Of Things;
  4. e-Government;
  5. Smart cities and smart transportation;
  6. Power data management and processing;
  7. Text management and processing;
  8. Image/Multimedia data management and processing;
  9. Sensor-generated data management and processing;
  10. Cloud and fog computing;
  11. Cybersecurity.

Dr. Leonidas Akritidis
Prof. Dr. Panayiotis Bozanis
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • parallel algorithms
  • distributed systems
  • large-scale data processing
  • cloud computing

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 3047 KiB  
Article
Synergism of Fuzzy Leaky Bucket with Virtual Buffer for Large Scale Social Driven Energy Allocation in Emergencies in Smart City Zones
by Miltiadis Alamaniotis and Michail Alexiou
Electronics 2024, 13(4), 762; https://doi.org/10.3390/electronics13040762 - 14 Feb 2024
Viewed by 421
Abstract
Smart cities can be viewed as expansive systems that optimize operational quality and deliver a range of services, particularly in the realm of energy management. Identifying energy zones within smart cities marks an initial step towards ensuring equitable energy distribution driven by factors [...] Read more.
Smart cities can be viewed as expansive systems that optimize operational quality and deliver a range of services, particularly in the realm of energy management. Identifying energy zones within smart cities marks an initial step towards ensuring equitable energy distribution driven by factors beyond energy considerations. This study introduces a socially oriented methodology for energy allocation during emergencies, implemented at the zone level to address justice concerns. The proposed method integrates a fuzzy leaky bucket model with an energy virtual buffer, leveraging extensive data from diverse city zones to allocate energy resources during emergent situations. By employing fuzzy sets and rules, the leaky bucket mechanism distributes buffered energy to zones, aiming to maximize energy utilization while promoting social justice principles. Evaluation of the approach utilizes consumption data from simulated smart city zones during energy-constrained emergencies, comparing it against a uniform allocation method. Results demonstrate the socially equitable allocation facilitated by the proposed methodology. Full article
(This article belongs to the Special Issue Big Data and Large-Scale Data Processing Applications)
Show Figures

Figure 1

23 pages, 3805 KiB  
Article
A Software Testing Workflow Analysis Tool Based on the ADCV Method
by Zijian Mao, Qiang Han, Yu He, Nan Li, Cong Li, Zhihui Shan and Sheng Han
Electronics 2023, 12(21), 4464; https://doi.org/10.3390/electronics12214464 - 30 Oct 2023
Viewed by 970
Abstract
Based on two progressive aspects of the modeling problems in business process management (BPM), (1) in order to address the increasing complexity of user requirements on workflows underlying various BPM application scenarios, a more verifiable fundamental modeling method must be invented; (2) to [...] Read more.
Based on two progressive aspects of the modeling problems in business process management (BPM), (1) in order to address the increasing complexity of user requirements on workflows underlying various BPM application scenarios, a more verifiable fundamental modeling method must be invented; (2) to address the diversification of software testing processes, more formalized advanced modeling technology must also be applied based on the fundamental modeling method. Aiming to address these modeling problems, this paper first proposes an ADCV (acquisition, decomposition, combination, and verification) method that runs through the core management links of four types of business processes (mining, decomposition, recombination, and verification) and then describes the compositional structure of the ADCV method and the design of corresponding algorithms. Then, the software testing workflow is managed and monitored using the method, and the corresponding analysis tool is implemented based on Petri nets. At the same time, the tool is applied to the case processing of the software testing workflow. Specifically, the workflow models are established successively through ADCV during the process of business iteration. Then, the analysis tool developed with the ADCV method, the model–view–controller (MVC) design pattern, and Java Swing technology are applied to instances of the software testing workflow to realize the modeling and management of the testing processes. Thus, the analysis tool can guarantee the accuracy of the parameter estimations of related software reliability growth models (SRGMs) and ultimately improve the quality of software products. Full article
(This article belongs to the Special Issue Big Data and Large-Scale Data Processing Applications)
Show Figures

Figure 1

17 pages, 2904 KiB  
Article
Large-Scale Service Function Chaining Management and Orchestration in Smart City
by Prohim Tam, Seungwoo Kang, Seyha Ros, Inseok Song and Seokhoon Kim
Electronics 2023, 12(19), 4018; https://doi.org/10.3390/electronics12194018 - 24 Sep 2023
Viewed by 985
Abstract
In the core networking of smart cities, mobile network operators need solutions to reflect service function chaining (SFC) orchestration policies while ensuring efficient resource utilization and preserving quality of service (QoS) in large-scale networking congestion states. To offer this solution, we observe the [...] Read more.
In the core networking of smart cities, mobile network operators need solutions to reflect service function chaining (SFC) orchestration policies while ensuring efficient resource utilization and preserving quality of service (QoS) in large-scale networking congestion states. To offer this solution, we observe the standardized QoS class identifiers of smart city scenarios. Then, we reflect the service criticalities via cloning virtual network function (VNF) with reserved resources for ensuring effective scheduling of request queue management. We employ graph neural networks (GNN) with a message-passing mechanism to iteratively update hidden states of VNF nodes with the objectives of enhancing allocation of resource blocks, accurate detection of availability statuses, and duplication of heavily congested instances. The deployment properties of smart city use cases are presented along with their intelligent service functions, and we aim to activate a modular architecture with multi-purpose VNFs and chaining isolation for generalizing global instances. Experimental simulation is conducted to illustrate how the proposed scheme performs under different congestion levels of SFC request rates, while capturing the key performance metrics of average delay, acceptance ratios, and completion ratios. Full article
(This article belongs to the Special Issue Big Data and Large-Scale Data Processing Applications)
Show Figures

Figure 1

13 pages, 602 KiB  
Article
Learning Analytics on YouTube Educational Videos: Exploring Sentiment Analysis Methods and Topic Clustering
by Ilias Chalkias, Katerina Tzafilkou, Dimitrios Karapiperis and Christos Tjortjis
Electronics 2023, 12(18), 3949; https://doi.org/10.3390/electronics12183949 - 19 Sep 2023
Cited by 2 | Viewed by 1516
Abstract
The popularity of social media is continuously growing, as it endeavors to bridge the gap in communication between individuals. YouTube, one of the most well-known social media platforms with millions of users, stands out due to its remarkable ability to facilitate communication through [...] Read more.
The popularity of social media is continuously growing, as it endeavors to bridge the gap in communication between individuals. YouTube, one of the most well-known social media platforms with millions of users, stands out due to its remarkable ability to facilitate communication through the exchange of video content. Despite its primary purpose being entertainment, YouTube also offers individuals the valuable opportunity to learn from its vast array of educational content. The primary objective of this study is to explore the sentiments of YouTube learners by analyzing their comments on educational YouTube videos. A total of 167,987 comments were extracted and processed from educational YouTube channels through the YouTube Data API and Google Sheets. Lexicon-based sentiment analysis was conducted using two different methods, VADER and TextBlob, with the aim of detecting the prevailing sentiment. The sentiment analysis results revealed that the dominant sentiment expressed in the comments was neutral, followed by positive sentiment, while negative sentiment was the least common. VADER and TextBlob algorithms produced comparable results. Nevertheless, TextBlob yielded higher scores in both positive and negative sentiments, whereas VADER detected a greater number of neutral statements. Furthermore, the Latent Dirichlet Allocation (LDA) topic clustering outcomes shed light on various video attributes that potentially influence viewers’ experiences. These attributes included animation, music, and the conveyed messages within the videos. These findings make a significant contribution to ongoing research efforts aimed at understanding the educational advantages of YouTube and discerning viewers’ preferences regarding video components and educational topics. Full article
(This article belongs to the Special Issue Big Data and Large-Scale Data Processing Applications)
Show Figures

Figure 1

21 pages, 5357 KiB  
Article
Efficient Large-Scale GPS Trajectory Compression on Spark: A Pipeline-Based Approach
by Wen Xiong, Xiaoxuan Wang and Hao Li
Electronics 2023, 12(17), 3569; https://doi.org/10.3390/electronics12173569 - 24 Aug 2023
Cited by 1 | Viewed by 859
Abstract
Every day, hundreds of thousands of vehicles, including buses, taxis, and ride-hailing cars, continuously generate GPS positioning records. Simultaneously, the traffic big data platform of urban transportation systems has already collected a large amount of GPS trajectory datasets. These incremental and historical GPS [...] Read more.
Every day, hundreds of thousands of vehicles, including buses, taxis, and ride-hailing cars, continuously generate GPS positioning records. Simultaneously, the traffic big data platform of urban transportation systems has already collected a large amount of GPS trajectory datasets. These incremental and historical GPS datasets require more and more storage space, placing unprecedented cost pressure on the big data platform. Therefore, it is imperative to efficiently compress these large-scale GPS trajectory datasets, saving storage cost and subsequent computing cost. However, a set of classical trajectory compression algorithms can only be executed in a single-threaded manner and are limited to running in a single-node environment. Therefore, these trajectory compression algorithms are insufficient to compress this incremental data, which often amounts to hundreds of gigabytes, within an acceptable time frame. This paper utilizes Spark, a popular big data processing engine, to parallelize a set of classical trajectory compression algorithms. These algorithms consist of the DP (Douglas–Peucker), the TD-TR (Top-Down Time-Ratio), the SW (Sliding Window), SQUISH (Spatial Quality Simplification Heuristic), and the V-DP (Velocity-Aware Douglas–Peucker). We systematically evaluate these parallelized algorithms on a very large GPS trajectory dataset, which contains 117.5 GB of data produced by 20,000 taxis. The experimental results show that: (1) It takes only 438 s to compress this dataset in a Spark cluster with 14 nodes; (2) These parallelized algorithms can save an average of 26% on storage cost, and up to 40%. In addition, we design and implement a pipeline-based solution that automatically performs preprocessing and compression for continuous GPS trajectories on the Spark platform. Full article
(This article belongs to the Special Issue Big Data and Large-Scale Data Processing Applications)
Show Figures

Figure 1

Back to TopTop