Topical Collection "Big Data Analysis and Visualization Ⅱ"

Editors

Prof. Dr. Kwan-Hee Yoo
E-Mail Website
Guest Editor
Department of Computer Science, Chungbuk National University, 1, Chungdae-ro, Seowon-gu, Cheongju-si 28644, Chungcheongbuk-do, Korea
Interests: big data analysis; data visualization; visual analytics; smart manufacturing; virtual reality; augmented reality
Special Issues, Collections and Topics in MDPI journals
Prof. Dr. Carson K. Leung
E-Mail Website
Guest Editor
Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
Interests: data mining; big data processing; social network analysis
Special Issues, Collections and Topics in MDPI journals
Prof. Dr. Nakhoon Baek
E-Mail Website
Guest Editor
School of Computer Science and Engineering, Kyungpook National University, 80 Daehakro, Bukgu 41566, Daegu, Korea
Interests: big data processing; data visualization; massively parallel computing

Topical Collection Information

Dear Colleagues,

Big data have become a core technology for providing innovative solutions in many fields. Big data analytics is a process of examining data to discover information, such as hidden patterns, unknown correlations, market insights, and customer preferences, that can be useful to make various business decisions. Recent advances in deep learning, machine learning, and data mining have improved to the point where these techniques can be used in analyzing big data in healthcare, manufacturing, social life, etc.

On the other hand, big data are being investigated using various visual analytical tools. These tools assist in visualizing new meanings and interpretations of the big data and, thus, can help better explore the data and simplify the complex big data analytics processes.

Hence, we invite the academic community and relevant industrial partners to submit papers to this Special Issue, on relevant fields and topics including (but not limited to) the following:

  • Novel algorithms for big data analysis
  • Big data preprocessing techniques (acquisition, integration, and cleaning)
  • Data mining, machine learning, and deep learning analysis for big data analysis
  • Application of computer vision techniques in big data analysis
  • Big database engineering and applications
  • Visual analytics of big database engineering and applications
  • Visualization and visual analytics for supporting the big data analysis process
  • Data structures for big data visualization
  • Application of big data visualization to a variety of fields
  • Big data visualization: case studies and applications

In addition to papers submitted by researchers, invited papers based on excellent contributions to recent conferences in this field will be included in this Special Issue; for example, from IDEAS 2020, IEEE CBDCom 2020, and BigDAS 2020.

Prof. Dr. Kwan-Hee Yoo
Prof. Dr. Carson K. Leung
Prof. Dr. Nakhoon Baek
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the collection website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2300 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Big data
  • Big data preprocessing
  • Big data analysis
  • Big data visualization
  • Visual analytics
  • Data mining
  • Machine learning
  • Deep learning
  • Computer vision
  • Multimedia big data

Published Papers (14 papers)

2021

Jump to: 2020

Article
Visual Analysis of Spatiotemporal Data Predictions with Deep Learning Models
Appl. Sci. 2021, 11(13), 5853; https://doi.org/10.3390/app11135853 - 24 Jun 2021
Cited by 2 | Viewed by 583
Abstract
The output of a deep-learning model delivers different predictions depending on the input of the deep learning model. In particular, the input characteristics might affect the output of a deep learning model. When predicting data that are measured with sensors in multiple locations, [...] Read more.
The output of a deep-learning model delivers different predictions depending on the input of the deep learning model. In particular, the input characteristics might affect the output of a deep learning model. When predicting data that are measured with sensors in multiple locations, it is necessary to train a deep learning model with spatiotemporal characteristics of the data. Additionally, since not all of the data measured together result in increasing the accuracy of the deep learning model, we need to utilize the correlation characteristics between the data features. However, it is difficult to interpret the deep learning output, depending on the input characteristics. Therefore, it is necessary to analyze how the input characteristics affect prediction results to interpret deep learning models. In this paper, we propose a visualization system to analyze deep learning models with air pollution data. The proposed system visualizes the predictions according to the input characteristics. The input characteristics include space-time and data features, and we apply temporal prediction networks, including gated recurrent units (GRU), long short term memory (LSTM), and spatiotemporal prediction networks (convolutional LSTM) as deep learning models. We interpret the output according to the characteristics of input to show the effectiveness of the system. Full article
Show Figures

Figure 1

Article
s2p: Provenance Research for Stream Processing System
Appl. Sci. 2021, 11(12), 5523; https://doi.org/10.3390/app11125523 - 15 Jun 2021
Cited by 2 | Viewed by 641
Abstract
The main purpose of our provenance research for DSP (distributed stream processing) systems is to analyze abnormal results. Provenance for these systems is not nontrivial because of the ephemerality of stream data and instant data processing mode in modern DSP systems. Challenges include [...] Read more.
The main purpose of our provenance research for DSP (distributed stream processing) systems is to analyze abnormal results. Provenance for these systems is not nontrivial because of the ephemerality of stream data and instant data processing mode in modern DSP systems. Challenges include but are not limited to an optimization solution for avoiding excessive runtime overhead, reducing provenance-related data storage, and providing it in an easy-to-use fashion. Without any prior knowledge about which kinds of data may finally lead to the abnormal, we have to track all transformations in detail, which potentially causes hard system burden. This paper proposes s2p (Stream Process Provenance), which mainly consists of online provenance and offline provenance, to provide fine- and coarse-grained provenance in different precision. We base our design of s2p on the fact that, for a mature online DSP system, the abnormal results are rare, and the results that require a detailed analysis are even rarer. We also consider state transition in our provenance explanation. We implement s2p on Apache Flink named as s2p-flink and conduct three experiments to evaluate its scalability, efficiency, and overhead from end-to-end cost, throughput, and space overhead. Our evaluation shows that s2p-flink incurs a 13% to 32% cost overhead, 11% to 24% decline in throughput, and few additional space costs in the online provenance phase. Experiments also demonstrates the s2p-flink can scale well. A case study is presented to demonstrate the feasibility of the whole s2p solution. Full article
Show Figures

Figure 1

Article
Threshold Effects of Infectious Disease Outbreaks on Livestock Prices: Cases of African Swine Fever and Avian Influenza in South Korea
Appl. Sci. 2021, 11(11), 5114; https://doi.org/10.3390/app11115114 - 31 May 2021
Viewed by 921
Abstract
In this paper we demonstrate the threshold effects of infectious diseases on livestock prices. Daily retail prices of pork and chicken were used as structured data; news and SNS mentions of African Swine Fever (ASF) and Avian Influenza (AI) were used as unstructured [...] Read more.
In this paper we demonstrate the threshold effects of infectious diseases on livestock prices. Daily retail prices of pork and chicken were used as structured data; news and SNS mentions of African Swine Fever (ASF) and Avian Influenza (AI) were used as unstructured data. Models were tested for the threshold effects of disease-related news and SNS frequencies, specifically those related to ASF and AI, on the retail prices of pork and chicken, respectively. The effects were found to exist, and the values of ASF-related news on pork prices were estimated to be −9 and 8, indicating that the threshold autoregressive (TAR) model can be divided into three regimes. The coefficients of the ASF-related SNS frequencies on pork prices were 1.1666, 0.2663 and −0.1035 for regimes 1, 2 and 3, respectively, suggesting that pork prices increased by 1.1666 Korean won in regime 1 when ASF-related SNS frequencies increased. To promote pork consumption by SNS posts, the required SNS frequencies were estimated to have impacts as great as one standard deviation in the pork price. These values were 247.057, 1309.158 and 2817.266 for regimes 1, 2 and 3, respectively. The impact response periods for pork prices were estimated to last 48, 6, and 8 days for regimes 1, 2 and 3, respectively. When the prediction accuracies of the TAR and autoregressive (AR) models with regard to pork prices were compared for the root mean square error, the prediction accuracy of the TAR model was found to be slightly better than that of the AR. When the threshold effect of AI-related news on chicken prices was tested, a linear relationship appeared without a threshold effect. These findings suggest that when infectious diseases such as ASF occur for the first time, the impact on livestock prices is significant, as indicated by the threshold effect and the long impact response period. Our findings also suggest that the impact on livestock prices is not remarkable when infectious diseases occur multiple times, as in the case of AI. To date, this study is the first to suggest the use of SNS to promote meat consumption. Full article
Show Figures

Figure 1

Article
A Person Re-Identification Scheme Using Local Multiscale Feature Embedding with Dual Pyramids
Appl. Sci. 2021, 11(8), 3363; https://doi.org/10.3390/app11083363 - 08 Apr 2021
Viewed by 656
Abstract
In this paper, we propose a new person re-identification scheme that uses dual pyramids to construct and utilize the local multiscale feature embedding that reflects different sizes and shapes of visual feature elements appearing in various areas of a person image. In the [...] Read more.
In this paper, we propose a new person re-identification scheme that uses dual pyramids to construct and utilize the local multiscale feature embedding that reflects different sizes and shapes of visual feature elements appearing in various areas of a person image. In the dual pyramids, a scale pyramid reflects the visual feature elements in various sizes and shapes, and a part pyramid selects elements and differently combines them for the feature embedding per each region of the person image. In the experiments, the performance of the cases with and without each pyramid were compared to verify that the proposed scheme has an optimal structure. The state-of-the-art studies known in the field of person re-identification were also compared for accuracy. According to the experimental results, the method proposed in this study showed a maximum of 99.25% Rank-1 accuracy according to the dataset used in the experiments. Based on the same dataset, the accuracy was determined to be about 3.55% higher than the previous studies, which used only person images, and about 1.25% higher than the other studies using additional meta-information besides images of persons. Full article
Show Figures

Figure 1

Article
A Corner-Highlighting Method for Ambient Occlusion
Appl. Sci. 2021, 11(7), 3276; https://doi.org/10.3390/app11073276 - 06 Apr 2021
Viewed by 576
Abstract
Graphical user experiences are now ubiquitous features, and therefore widespread. Specifically, the computer graphics field and the game industry have been continually favoring the ambient occlusion post-processing method for its superb indirect light approximation and its effectiveness. Nonetheless of its canonical performance, its [...] Read more.
Graphical user experiences are now ubiquitous features, and therefore widespread. Specifically, the computer graphics field and the game industry have been continually favoring the ambient occlusion post-processing method for its superb indirect light approximation and its effectiveness. Nonetheless of its canonical performance, its operation on non-occluded surfaces is often seen redundant and unfavorable. In this paper, we propose a new perspective to handle such issues by highlighting the corners where ambient occlusion is likely to occur. Potential illumination occlusions are highlighted by checking the corners of the surfaces in the screen-space. Our algorithm showed feasibility for renderers to avoid unwanted computations by achieving performance improvements of 15% to 28% acceleration, in comparison to the previous works. Full article
Show Figures

Figure 1

Article
A Shader-Based Ray Tracing Engine
Appl. Sci. 2021, 11(7), 3264; https://doi.org/10.3390/app11073264 - 06 Apr 2021
Cited by 1 | Viewed by 1146
Abstract
Recently, ray tracing techniques have been highly adopted to produce high quality images and animations. In this paper, we present our design and implementation of a real-time ray-traced rendering engine. We achieved real-time capability for triangle primitives, based on the ray tracing techniques [...] Read more.
Recently, ray tracing techniques have been highly adopted to produce high quality images and animations. In this paper, we present our design and implementation of a real-time ray-traced rendering engine. We achieved real-time capability for triangle primitives, based on the ray tracing techniques on GPGPU (general-purpose graphics processing unit) compute shaders. To accelerate the ray tracing engine, we used a set of acceleration techniques, including bounding volume hierarchy, its roped representation, joint up-sampling, and bilateral filtering. Our current implementation shows remarkable speed-ups, with acceptable error values. Experimental results shows 2.5–13.6 times acceleration, and less than 3% error values for the 95% confidence range. Our next step will be enhancing bilateral filter behaviors. Full article
Show Figures

Figure 1

Article
DyEgoVis: Visual Exploration of Dynamic Ego-Network Evolution
Appl. Sci. 2021, 11(5), 2399; https://doi.org/10.3390/app11052399 - 08 Mar 2021
Viewed by 600
Abstract
Ego-network, which can describe relationships between a focus node (i.e., ego) and its neighbor nodes (i.e., alters), often changes over time. Exploring dynamic ego-networks can help users gain insight into how each ego interacts with and is influenced by the outside world. However, [...] Read more.
Ego-network, which can describe relationships between a focus node (i.e., ego) and its neighbor nodes (i.e., alters), often changes over time. Exploring dynamic ego-networks can help users gain insight into how each ego interacts with and is influenced by the outside world. However, most of the existing methods do not fully consider the multilevel analysis of dynamic ego-networks, resulting in some evolution information at different granularities being ignored. In this paper, we present an interactive visualization system called DyEgoVis which allows users to explore the evolutions of dynamic ego-networks at global, local and individual levels. At the global level, DyEgoVis reduces dynamic ego-networks and their snapshots to 2D points to reveal global patterns such as clusters and outliers. At the local level, DyEgoVis projects all snapshots of the selected dynamic ego-networks onto a 2D space to identify similar or abnormal states. At the individual level, DyEgoVis utilizes a novel layout method to visualize the selected dynamic ego-network so that users can track, compare and analyze changes in the relationships between the ego and alters. Through two case studies on real datasets, we demonstrate the usability and effectiveness of DyEgoVis. Full article
Show Figures

Graphical abstract

Article
Re-Enrichment Learning: Metadata Saliency for the Evolutive Personalization of a Recommender System
Appl. Sci. 2021, 11(4), 1733; https://doi.org/10.3390/app11041733 - 16 Feb 2021
Cited by 1 | Viewed by 777
Abstract
Many studies have been conducted on recommender systems in both the academic and industrial fields, as they are currently broadly used in various digital platforms to make personalized suggestions. Despite the improvement in the accuracy of recommenders, the diversity of interest areas recommended [...] Read more.
Many studies have been conducted on recommender systems in both the academic and industrial fields, as they are currently broadly used in various digital platforms to make personalized suggestions. Despite the improvement in the accuracy of recommenders, the diversity of interest areas recommended to a user tends to be reduced, and the sparsity of explicit feedback from users has been an important issue for making progress in recommender systems. In this paper, we introduce a novel approach, namely re-enrichment learning, which effectively leverages the implicit logged feedback from users to enhance user retention in a platform by enriching their interest areas. The approach consists of (i) graph-based domain transfer and (ii) metadata saliency, which (i) find an adaptive and collaborative domain representing the relations among many users’ metadata and (ii) extract attentional features from a user’s implicit logged feedback, respectively. The experimental results show that our proposed approach has a better capacity to enrich the diversity of interests of a user by means of implicit feedback and to help recommender systems achieve more balanced personalization. Our approach, finally, helps recommenders improve user retention, i.e., encouraging users to click more items or dwell longer on the platform. Full article
Show Figures

Figure 1

Article
Locating Core Modules through the Association between Software Source Structure and Execution
Appl. Sci. 2021, 11(4), 1685; https://doi.org/10.3390/app11041685 - 13 Feb 2021
Viewed by 595
Abstract
To improve software quality, the source code that composes software has to be improved, and improving the important code that largely affects the software quality should be a cost-effective method. Static analysis defines important codes as those that occupy important positions in the [...] Read more.
To improve software quality, the source code that composes software has to be improved, and improving the important code that largely affects the software quality should be a cost-effective method. Static analysis defines important codes as those that occupy important positions in the source network, while dynamic analysis defines important codes as those with high execution frequency. However, neither method analyzes the association between network structure and execution frequency, and both have their disadvantages. Thus, this study analyzed the association between source network structure and execution frequency to solve their disadvantages. The source function of Notepad++ was analyzed, and the function ranking was derived using the association between network structure and execution frequency. For verification, the Spearman correlation between the newly derived function ranking and the function ranking of the network and execution frequency obtained with the conventional method was measured. By measuring the Spearman correlation, the newly derived function ranking had strong correlations with execution frequency and included the network structure’s characteristics. Moreover, similar to the Pareto principle, the analysis showed that 20% of Notepad++’s functions could be categorized as important functions, largely affecting the software’s quality. Full article
Show Figures

Figure 1

2020

Jump to: 2021

Article
Prediction of Machine Inactivation Status Using Statistical Feature Extraction and Machine Learning
Appl. Sci. 2020, 10(21), 7413; https://doi.org/10.3390/app10217413 - 22 Oct 2020
Cited by 2 | Viewed by 824
Abstract
In modern manufacturing, the detection and prediction of machine anomalies, i.e., the inactive state of the machine during operation, is an important issue. Accurate inactive state detection models for factory machines can result in increased productivity. Moreover, they can guide engineers in implementing [...] Read more.
In modern manufacturing, the detection and prediction of machine anomalies, i.e., the inactive state of the machine during operation, is an important issue. Accurate inactive state detection models for factory machines can result in increased productivity. Moreover, they can guide engineers in implementing appropriate maintenance actions, which can prevent catastrophic failures and minimize economic losses. In this paper, we present a novel two-step data-driven method for the non-active detection of industry machines. First, we propose a feature extraction approach that aims to better distinguish the pattern of the active state and non-active state of the machine by multiple statistical analyses, such as reliability, time-domain, and frequency-domain analyses. Next, we construct a method to detect the active and non-active status of an industrial machine by applying various machine learning methods. The performance evaluation with a real-world dataset from the automobile part manufacturer demonstrates the proposed method achieves high accuracy. Full article
Show Figures

Figure 1

Article
The Derivation of Defect Priorities and Core Defects through Impact Relationship Analysis between Embedded Software Defects
Appl. Sci. 2020, 10(19), 6946; https://doi.org/10.3390/app10196946 - 04 Oct 2020
Cited by 2 | Viewed by 681
Abstract
As embedded software is closely related to hardware equipment, any defect in embedded software can lead to major accidents. Thus, all defects must be collected, classified, and tested based on their severity. In the pure software field, a method of deriving core defects [...] Read more.
As embedded software is closely related to hardware equipment, any defect in embedded software can lead to major accidents. Thus, all defects must be collected, classified, and tested based on their severity. In the pure software field, a method of deriving core defects already exists, enabling the collection and classification of all possible defects. However, in the embedded software field, studies that have collected and categorized relevant defects into an integrated perspective are scarce, and none of them have identified core defects. Therefore, the present study collected embedded software defects worldwide and identified 12 types of embedded software defect classifications through iterative consensus processes with embedded software experts. The impact relation map of the defects was drawn using the decision-making trial and evaluation laboratory (DEMATEL) method, which analyzes the influence relationship between elements. As a result of analyzing the impact relation map, the following core embedded software defects were derived: hardware interrupt, external interface, timing error, device error, and task management. All defects can be tested using this defect classification. Moreover, knowing the correct test order of all defects can eliminate critical defects and improve the reliability of embedded systems. Full article
Show Figures

Figure 1

Article
Audio-Visual Tensor Fusion Network for Piano Player Posture Classification
Appl. Sci. 2020, 10(19), 6857; https://doi.org/10.3390/app10196857 - 29 Sep 2020
Cited by 2 | Viewed by 1251
Abstract
Playing the piano in the correct position is important because the correct position helps to produce good sound and prevents injuries. Many studies have been conducted in the field of piano playing posture recognition that combines various techniques. Most of these techniques are [...] Read more.
Playing the piano in the correct position is important because the correct position helps to produce good sound and prevents injuries. Many studies have been conducted in the field of piano playing posture recognition that combines various techniques. Most of these techniques are based on analyzing visual information. However, in the piano education field, it is essential to utilize audio information in addition to visual information due to the deep relationship between posture and sound. In this paper, we propose an audio-visual tensor fusion network (simply, AV-TFN) for piano performance posture classification. Unlike existing studies that used only visual information, the proposed method uses audio information to improve the accuracy in classifying the postures of professional and amateur pianists. For this, we first introduce a dataset called C3Pap (Classic piano performance postures of amateur and professionals) that contains actual piano performance videos in diverse environments. Furthermore, we propose a data structure that represents audio-visual information. The proposed data structure represents audio information on the color scale and visual information on the black and white scale for representing relativeness between them. We call this data structure an audio-visual tensor. Finally, we compare the performance of the proposed method with state-of-the-art approaches: VN (Visual Network), AN (Audio Network), AVN (Audio-Visual Network) with concatenation and attention techniques. The experiment results demonstrate that AV-TFN outperforms existing studies and, thus, can be effectively used in the classification of piano playing postures. Full article
Show Figures

Figure 1

Article
NAP: Natural App Processing for Predictive User Contexts in Mobile Smartphones
Appl. Sci. 2020, 10(19), 6657; https://doi.org/10.3390/app10196657 - 23 Sep 2020
Cited by 1 | Viewed by 1049
Abstract
The resource management of an application is an essential task in smartphones. Optimizing the application launch process results in a faster and more efficient system, directly impacting the user experience. Predicting the next application that will be used can orient the smartphone to [...] Read more.
The resource management of an application is an essential task in smartphones. Optimizing the application launch process results in a faster and more efficient system, directly impacting the user experience. Predicting the next application that will be used can orient the smartphone to address the system resources to the correct application, making the system more intelligent and efficient. Neural networks have been presenting outstanding results in the state-of-the-art for mapping large sequences of data, outperforming all previous classification and prediction models. A recurrent neural network (RNN) is an artificial neural network associated with sequence models, and it can recognize patterns in sequences. One of the areas that use RNN is language modeling (LM). Given an arrangement of words, LM can learn how the words are organized in sentences, making it possible to predict the next word given a group of previous words. We propose building a predictive model inspired by LM. However, instead of using words, we will use previous applications to predict the next application. Moreover, some context features, such as timestamp and energy record, will be included in the prediction model to evaluate the impact of the features on the performance. We will provide the following application prediction result and extend it to the top-k possible candidates for the next application. Full article
Show Figures

Figure 1

Article
Analyzing Zone-Based Registration Using a Three Zone System: A Semi-Markov Process Approach
Appl. Sci. 2020, 10(16), 5705; https://doi.org/10.3390/app10165705 - 17 Aug 2020
Cited by 3 | Viewed by 693
Abstract
The location of user equipment (UE) should always be maintained in order to connect any incoming calls within a mobile network. While several methods of location registration have been proposed, most mobile networks have adopted zone-based registration due to its superior performance. Even [...] Read more.
The location of user equipment (UE) should always be maintained in order to connect any incoming calls within a mobile network. While several methods of location registration have been proposed, most mobile networks have adopted zone-based registration due to its superior performance. Even though recommendations from research on these zone-based systems state that multiple zones can be stored in a zone-based registration system, actual current mobile networks only employ a zone-based registration system that stores a single zone. Therefore, some studies have been conducted on zone-based registration using multiple zones. However, most of these studies consider only two zones. In this study, through the development of a semi-Markov process approach, we present a simple but accurate mathematical model for zone-based registration using three zones. In addition, our research results in zone-based registration systems where one, two and three zones are used to suggest the optimal management scheme for zone-based registration. Given that most mobile networks have already adopted some kind of zone-based registration, these results are able to directly enhance the performance of the actual mobile network in the near future with the minimum of effort required for implementation. Full article
Show Figures

Figure 1

Back to TopTop