Applied Data Analytics

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Network Science".

Deadline for manuscript submissions: closed (31 August 2021) | Viewed by 21837

Special Issue Editor


E-Mail Website
Guest Editor
Faculty of Science and Technology, Department of Electrical Engineering and Computer Science, University of Stavanger, Stavanger, Norway
Interests: time series; data science; big data

Special Issue Information

Dear Colleagues,

In recent years, we have seen the proliferation of new data analytics methods and approaches. These developments have happened under the labels of machine learning, data mining, deep learning, and smart X (where X can stand for any domain of application, such as smart cities, and smart energy).

However, a road from generic methods to practical applications is not always straightforward. Typical problems include incomplete or dirty data, too small or biased data, a lack of reproducible experiments, missing code for the whole or parts of the method, unclear parameter or hyper parameter values, etc.

The aim of this Special Issue is to provide a forum for applied analytics researchers to present their original contributions describing their experience and approaches to the aforementioned or similar problems in real-life applications on data analytics. Improvements and modifications to the existing methods are also of interest.

Submissions should be original and unpublished. Extended versions of conference publications will be considered if they contain at least 50% new content.

Prof. Dr. Tomasz Wiktorski
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data analysis
  • Data science
  • Big data
  • Algorithms
  • Applied data analytics
  • Machine learning
  • Data mining.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 580 KiB  
Article
Association Rules Mining for Hospital Readmission: A Case Study
by Nor Hamizah Miswan, ‘Ismat Mohd Sulaiman, Chee Seng Chan and Chong Guan Ng
Mathematics 2021, 9(21), 2706; https://doi.org/10.3390/math9212706 - 25 Oct 2021
Cited by 8 | Viewed by 3790
Abstract
As an indicator of healthcare quality and performance, hospital readmission incurs major costs for healthcare systems worldwide. Understanding the relationships between readmission factors, such as input features and readmission length, is challenging following intricate hospital readmission procedures. This study discovered the significant correlation [...] Read more.
As an indicator of healthcare quality and performance, hospital readmission incurs major costs for healthcare systems worldwide. Understanding the relationships between readmission factors, such as input features and readmission length, is challenging following intricate hospital readmission procedures. This study discovered the significant correlation between potential readmission factors (threshold of various settings for readmission length) and basic demographic variables. Association rule mining (ARM), particularly the Apriori algorithm, was utilised to extract the hidden input variable patterns and relationships among admitted patients by generating supervised learning rules. The mined rules were categorised into two outcomes to comprehend readmission data; (i) the rules associated with various readmission length and (ii) several expert-validated variables related to basic demographics (gender, race, and age group). The extracted rules proved useful to facilitate decision-making and resource preparation to minimise patient readmission. Full article
(This article belongs to the Special Issue Applied Data Analytics)
Show Figures

Figure 1

20 pages, 3151 KiB  
Article
Improved Multi-Scale Deep Integration Paradigm for Point and Interval Carbon Trading Price Forecasting
by Jujie Wang and Shiyao Qiu
Mathematics 2021, 9(20), 2595; https://doi.org/10.3390/math9202595 - 15 Oct 2021
Cited by 5 | Viewed by 1179
Abstract
The forecast of carbon trading price is crucial to both sellers and purchasers; multi-scale integration models have been used widely in this process. However, these multi-scale models ignore the feature reconstruction process as well as the residual part and also they often focus [...] Read more.
The forecast of carbon trading price is crucial to both sellers and purchasers; multi-scale integration models have been used widely in this process. However, these multi-scale models ignore the feature reconstruction process as well as the residual part and also they often focus on the linear integration. Meanwhile, most of the models cannot provide prediction interval which means they neglect the uncertainty. In this paper, an improved multi-scale nonlinear integration model is proposed. The original dataset is divided into some subgroups through variational mode decomposition (VMD) and all the subgroups will go through sample entropy (SE) process to reconstruct the features. Then, random forest and long-short term memory (LSTM) integration are used to model feature sub-sequences. For the residual part, LSTM residual correction strategy based on white noise test corrects residuals to obtain point prediction results. Finally, Gaussian process (GP) is applied to get the prediction interval estimate. The result shows that compared with some other methods, the proposed method can obtain satisfying accuracy which has the minimum statistical error. So, it is safe to conclude that the proposed method is able to efficiently predict the carbon price as well as to provide the prediction interval estimate. Full article
(This article belongs to the Special Issue Applied Data Analytics)
Show Figures

Figure 1

13 pages, 273 KiB  
Article
Application of Exploratory Factor Analysis in the Construction of a Self-Perception Model of Informational Competences in Higher Education
by Belén Quintero Ordóñez, Ignacio González López, Eloísa Reche Urbano and Juan Antonio Fuentes Esparrell
Mathematics 2021, 9(18), 2332; https://doi.org/10.3390/math9182332 - 20 Sep 2021
Cited by 4 | Viewed by 1960
Abstract
The progress experienced by society resulting from the ready availability of information through the use of technology highlights the need to develop specific learning related to informational competences (IC) in educational settings where future professionals are trained to educate others, specifically in university [...] Read more.
The progress experienced by society resulting from the ready availability of information through the use of technology highlights the need to develop specific learning related to informational competences (IC) in educational settings where future professionals are trained to educate others, specifically in university degrees in social sciences. This study seeks to ascertain the opinions of students enrolled in these degrees at the Universidad de Córdoba (Spain) with regard to the knowledge they consider that they possess about IC for their future professional development, through the practical application of exploratory factor analysis. The methodology designed is based on a descriptive, non-experimental, correlational survey. The results show that factor analysis is a fundamental tool for obtaining results in terms of students’ perception of their knowledge of IC because its psychometric value has confirmed construct validity and enabled us to break down the items that made up the four initial dimensions of IC into eight factors to improve the understanding and explanation of these IC. Full article
(This article belongs to the Special Issue Applied Data Analytics)
23 pages, 3708 KiB  
Article
PM2.5 Prediction Model Based on Combinational Hammerstein Recurrent Neural Networks
by Yi-Chung Chen, Tsu-Chiang Lei, Shun Yao and Hsin-Ping Wang
Mathematics 2020, 8(12), 2178; https://doi.org/10.3390/math8122178 - 6 Dec 2020
Cited by 15 | Viewed by 2471
Abstract
Airborne particulate matter 2.5 (PM2.5) can have a profound effect on the health of the population. Many researchers have been reporting highly accurate numerical predictions based on raw PM2.5 data imported directly into deep learning models; however, there is still considerable room for [...] Read more.
Airborne particulate matter 2.5 (PM2.5) can have a profound effect on the health of the population. Many researchers have been reporting highly accurate numerical predictions based on raw PM2.5 data imported directly into deep learning models; however, there is still considerable room for improvement in terms of implementation costs due to heavy computational overhead. From the perspective of environmental science, PM2.5 values in a given location can be attributed to local sources as well as external sources. Local sources tend to have a dramatic short-term impact on PM2.5 values, whereas external sources tend to have more subtle but longer-lasting effects. In the presence of PM2.5 from both sources at the same time, this combination of effects can undermine the predictive accuracy of the model. This paper presents a novel combinational Hammerstein recurrent neural network (CHRNN) to enhance predictive accuracy and overcome the heavy computational and monetary burden imposed by deep learning models. The CHRNN comprises a based-neural network tasked with learning gradual (long-term) fluctuations in conjunction with add-on neural networks to deal with dramatic (short-term) fluctuations. The CHRNN can be coupled with a random forest model to determine the degree to which short-term effects influence long-term outcomes. We also developed novel feature selection and normalization methods to enhance prediction accuracy. Using real-world measurement data of air quality and PM2.5 datasets from Taiwan, the precision of the proposed system in the numerical prediction of PM2.5 levels was comparable to that of state-of-the-art deep learning models, such as deep recurrent neural networks and long short-term memory, despite far lower implementation costs and computational overhead. Full article
(This article belongs to the Special Issue Applied Data Analytics)
Show Figures

Graphical abstract

17 pages, 5951 KiB  
Article
Improving Accuracy and Generalization Performance of Small-Size Recurrent Neural Networks Applied to Short-Term Load Forecasting
by Pavel V. Matrenin, Vadim Z. Manusov, Alexandra I. Khalyasmaa, Dmitry V. Antonenkov, Stanislav A. Eroshenko and Denis N. Butusov
Mathematics 2020, 8(12), 2169; https://doi.org/10.3390/math8122169 - 4 Dec 2020
Cited by 26 | Viewed by 3308
Abstract
The load forecasting of a coal mining enterprise is a complicated problem due to the irregular technological process of mining. It is necessary to apply models that can distinguish both cyclic components and complex rules in the energy consumption data that reflect the [...] Read more.
The load forecasting of a coal mining enterprise is a complicated problem due to the irregular technological process of mining. It is necessary to apply models that can distinguish both cyclic components and complex rules in the energy consumption data that reflect the highly volatile technological process. For such tasks, Artificial Neural Networks demonstrate advanced performance. In recent years, the effectiveness of Artificial Neural Networks has been significantly improved thanks to new state-of-the-art architectures, training methods and approaches to reduce overfitting. In this paper, the Recurrent Neural Network architecture with a small-size model was applied to the short-term load forecasting of a coal mining enterprise. A single recurrent model was developed and trained for the entire four-year operational period of the enterprise, with significant changes in the energy consumption pattern during the period. This task was challenging since it required high-level generalization performance from the model. It was shown that the accuracy and generalization properties of small-size recurrent models can be significantly improved by the proper selection of the hyper-parameters and training method. The effectiveness of the proposed approach was validated using a real-case dataset. Full article
(This article belongs to the Special Issue Applied Data Analytics)
Show Figures

Figure 1

15 pages, 912 KiB  
Article
WINFRA: A Web-Based Platform for Semantic Data Retrieval and Data Analytics
by Addi Ait-Mlouk, Xuan-Son Vu and Lili Jiang
Mathematics 2020, 8(11), 2090; https://doi.org/10.3390/math8112090 - 23 Nov 2020
Cited by 4 | Viewed by 3151
Abstract
Given the huge amount of heterogeneous data stored in different locations, it needs to be federated and semantically interconnected for further use. This paper introduces WINFRA, a comprehensive open-access platform for semantic web data and advanced analytics based on natural language processing (NLP) [...] Read more.
Given the huge amount of heterogeneous data stored in different locations, it needs to be federated and semantically interconnected for further use. This paper introduces WINFRA, a comprehensive open-access platform for semantic web data and advanced analytics based on natural language processing (NLP) and data mining techniques (e.g., association rules, clustering, classification based on associations). The system is designed to facilitate federated data analysis, knowledge discovery, information retrieval, and new techniques to deal with semantic web and knowledge graph representation. The processing step integrates data from multiple sources virtually by creating virtual databases. Afterwards, the developed RDF Generator is built to generate RDF files for different data sources, together with SPARQL queries, to support semantic data search and knowledge graph representation. Furthermore, some application cases are provided to demonstrate how it facilitates advanced data analytics over semantic data and showcase our proposed approach toward semantic association rules. Full article
(This article belongs to the Special Issue Applied Data Analytics)
Show Figures

Figure 1

22 pages, 490 KiB  
Article
A Novel Data Analytics Method for Predicting the Delivery Speed of Software Enhancement Projects
by Elías Ventura-Molina, Cuauhtémoc López-Martín, Itzamá López-Yáñez and Cornelio Yáñez-Márquez
Mathematics 2020, 8(11), 2002; https://doi.org/10.3390/math8112002 - 10 Nov 2020
Cited by 3 | Viewed by 2098
Abstract
A fundamental issue of the software engineering economics is productivity. In this regard, one measure of software productivity is delivery speed. Software productivity prediction is useful to determine corrective activities, as well as to identify improvement alternatives. A type of software maintenance is [...] Read more.
A fundamental issue of the software engineering economics is productivity. In this regard, one measure of software productivity is delivery speed. Software productivity prediction is useful to determine corrective activities, as well as to identify improvement alternatives. A type of software maintenance is enhancement. In this paper, we propose a data analytics-based software engineering algorithm called search method based on feature construction (SMFC) for predicting the delivery speed of software enhancement projects. The SMFC belongs to the minimalist machine learning paradigm, and as such it always generates a two-dimensional model. Unlike the usual data analytics methods, SMFC includes an original algorithmic training procedure, in which both the independent and dependent variables are considered for transformation. SMFC prediction performance is compared to those of statistical regression, neural networks, support vector regression, and fuzzy regression. To do this, seven datasets of software enhancement projects obtained from the International Software Benchmarking Standards Group (ISBSG) Release 2017 were used. The validation method is leave-one-out cross validation, whereas absolute residuals have been chosen as the performance measure. The results indicate that the SMFC is statistically better than statistical regression. This fact represents an obvious advantage in favor of SMFC, because the other two methods are not statistically better than SMFC. Full article
(This article belongs to the Special Issue Applied Data Analytics)
Show Figures

Figure 1

15 pages, 2631 KiB  
Article
Corporate Performance and Economic Convergence between Europe and the US: A Cluster Analysis Along Industry Lines
by Călin Vâlsan and Elena Druică
Mathematics 2020, 8(3), 451; https://doi.org/10.3390/math8030451 - 20 Mar 2020
Cited by 6 | Viewed by 2750
Abstract
We investigate the extent to which the United States and the countries of Europe have achieved economic convergence of their corporate sector. We define convergence as the homogenization of economic performance, institutional arrangements, and market valuation taking place at the meso-economic level. We [...] Read more.
We investigate the extent to which the United States and the countries of Europe have achieved economic convergence of their corporate sector. We define convergence as the homogenization of economic performance, institutional arrangements, and market valuation taking place at the meso-economic level. We perform a cluster analysis along industry lines and find that industries and corporations on both continents cluster in four groups, based on six variables measuring operating performance, ownership, and market valuation. The clusters resulted from the US data are more unstable than those resulted from European data. We are also able to pair a handful of highly similar clusters between the US and European data. These findings suggest a complex dynamic. It seems that the US corporate sector is more homogeneous than the European one. Moreover, some degree of convergence between the European Union and the United States appears to have already occurred. Full article
(This article belongs to the Special Issue Applied Data Analytics)
Show Figures

Figure 1

Back to TopTop