Special Issue "Data Science: Measuring Uncertainties"

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Multidisciplinary Applications".

Deadline for manuscript submissions: closed (31 July 2020).

Special Issue Editors

Prof. Dr. Carlos Alberto De Bragança Pereira
E-Mail Website
Guest Editor
Federal University of Mato Grosso do Sul, Campo Grande, MS, Brazil and University of Sao Paulo, Sao Paulo, SP, Brazil
Interests: Bayesian statistics; controversies and paradoxes in probability and statistics; Bayesian reliability; Bayesian analysis of discrete data (BADD); applied statistics
Special Issues and Collections in MDPI journals
Prof. Dr. Adriano Polpo
E-Mail Website
Guest Editor
School of Physics, Maths and Computing, Mathematics and Statistics, University of Western Australia, Crawley WA 6009, Australia
Interests: Bayesian inference; foundations of statistics; significance tests; reliability and survival analysis; model selection; biostatistics
Assist. Prof. Agatha Rodrigues
E-Mail Website
Guest Editor
Federal University of Espirito Santo, ‎Vitória, ES, Brazil
Interests: data analysis; statistical analysis; statistical modeling; applied statistics; R statistical package

Special Issue Information

Dear Colleagues,

The demand for data analysis is increasing day by day, and this is reflected in a large number of jobs and the high number of published articles. New solutions to the problems seem to be reproducing at a massive rate. A new era is coming! The dazzle is so great that many of us do not bother to check the suitability of the solutions for the problems that they are intended to solve. Current and future challenges require greater care in the creation of new solutions satisfying the rationality of each type of problem. Labels such as big data, data science, machine learning, statistical learning, and artificial intelligence are demanding more sophistication in the fundamentals and in the way that they are being applied.

This Special Issue is dedicated to solutions for and discussions of measuring uncertainties in data analysis problems. For example, considering the large amount of data related to an IoT (internet of things) problem, or even considering the small sample size of a biological study with huge dimensions, one must show how to properly understand the data, how to develop the best process of analysis and, finally, to illustrate how to apply the solutions that were obtained theoretically. We seek to respond to these challenges and publish papers that consider the reasons for a solution and how to apply them. Papers can cover existing methodologies by elucidating questions related to the reasons for their selection and their uses.

We are open to innovative solutions and theoretical works that justify the use of a method and to applied works that describe a good implementation of a theoretical method.

Prof. Carlos Alberto De Bragança Pereira
Prof. Dr. Adriano Polpo
Assist. Prof. Agatha Rodrigues
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

Open AccessEditorial
Data Science: Measuring Uncertainties
Entropy 2020, 22(12), 1438; https://doi.org/10.3390/e22121438 - 20 Dec 2020
Viewed by 541
Abstract
With the increase in data processing and storage capacity, a large amount of data is available [...] Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)

Research

Jump to: Editorial

Open AccessArticle
A Novel Comprehensive Evaluation Method for Estimating the Bank Profile Shape and Dimensions of Stable Channels Using the Maximum Entropy Principle
Entropy 2020, 22(11), 1218; https://doi.org/10.3390/e22111218 - 26 Oct 2020
Cited by 1 | Viewed by 495
Abstract
This paper presents an extensive and practical study of the estimation of stable channel bank shape and dimensions using the maximum entropy principle. The transverse slope (St) distribution of threshold channel bank cross-sections satisfies the properties of the probability space. The entropy of St is subject to two constraint conditions, and the principle of maximum entropy must be applied to find the least biased probability distribution. Accordingly, the Lagrange multiplier (λ) as a critical parameter in the entropy equation is calculated numerically based on the maximum entropy principle. The main goal of the present paper is the investigation of the hydraulic parameters influence governing the mean transverse slope (St¯) value comprehensively using a Gene Expression Programming (GEP) by knowing the initial information (discharge (Q) and mean sediment size (d50)) related to the intended problem. An explicit and simple equation of the St¯ of banks and the geometric and hydraulic parameters of flow is introduced based on the GEP in combination with the previous shape profile equation related to previous researchers. Therefore, a reliable numerical hybrid model is designed, namely Entropy-based Design Model of Threshold Channels (EDMTC) based on entropy theory combined with the evolutionary algorithm of the GEP model, for estimating the bank profile shape and also dimensions of threshold channels. A wide range of laboratory and field data are utilized to verify the proposed EDMTC. The results demonstrate that the used Shannon entropy model is accurate with a lower average value of Mean Absolute Relative Error (MARE) equal to 0.317 than a previous model proposed by Cao and Knight (1997) (MARE = 0.98) in estimating the bank profile shape of threshold channels based on entropy for the first time. Furthermore, the EDMTC proposed in this paper has acceptable accuracy in predicting the shape profile and consequently, the dimensions of threshold channel banks with a wide range of laboratory and field data when only the channel hydraulic characteristics (e.g., Q and d50) are known. Thus, EDMTC can be used in threshold channel design and implementation applications in cases when the channel characteristics are unknown. Furthermore, the uncertainty analysis of the EDMTC supports the model’s high reliability with a Width of Uncertainty Bound (WUB) of ±0.03 and standard deviation (Sd) of 0.24. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
Application of Cloud Model in Qualitative Forecasting for Stock Market Trends
Entropy 2020, 22(9), 991; https://doi.org/10.3390/e22090991 - 06 Sep 2020
Cited by 1 | Viewed by 941
Abstract
Forecasting stock prices plays an important role in setting a trading strategy or determining the appropriate timing for buying or selling a stock. The use of technical analysis for financial forecasting has been successfully employed by many researchers. The existing qualitative based methods [...] Read more.
Forecasting stock prices plays an important role in setting a trading strategy or determining the appropriate timing for buying or selling a stock. The use of technical analysis for financial forecasting has been successfully employed by many researchers. The existing qualitative based methods developed based on fuzzy reasoning techniques cannot describe the data comprehensively, which has greatly limited the objectivity of fuzzy time series in uncertain data forecasting. Extended fuzzy sets (e.g., fuzzy probabilistic set) study the fuzziness of the membership grade to a concept. The cloud model, based on probability measure space, automatically produces random membership grades of a concept through a cloud generator. In this paper, a cloud model-based approach was proposed to confirm accurate stock based on Japanese candlestick. By incorporating probability statistics and fuzzy set theories, the cloud model can aid the required transformation between the qualitative concepts and quantitative data. The degree of certainty associated with candlestick patterns can be calculated through repeated assessments by employing the normal cloud model. The hybrid weighting method comprising the fuzzy time series, and Heikin–Ashi candlestick was employed for determining the weights of the indicators in the multi-criteria decision-making process. Fuzzy membership functions are constructed by the cloud model to deal effectively with uncertainty and vagueness of the stock historical data with the aim to predict the next open, high, low, and close prices for the stock. The experimental results prove the feasibility and high forecasting accuracy of the proposed model. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
A Novel Perspective of the Kalman Filter from the Rényi Entropy
Entropy 2020, 22(9), 982; https://doi.org/10.3390/e22090982 - 03 Sep 2020
Cited by 2 | Viewed by 880
Abstract
Rényi entropy as a generalization of the Shannon entropy allows for different averaging of probabilities of a control parameter α. This paper gives a new perspective of the Kalman filter from the Rényi entropy. Firstly, the Rényi entropy is employed to measure the uncertainty of the multivariate Gaussian probability density function. Then, we calculate the temporal derivative of the Rényi entropy of the Kalman filter’s mean square error matrix, which will be minimized to obtain the Kalman filter’s gain. Moreover, the continuous Kalman filter approaches a steady state when the temporal derivative of the Rényi entropy is equal to zero, which means that the Rényi entropy will keep stable. As the temporal derivative of the Rényi entropy is independent of parameter α and is the same as the temporal derivative of the Shannon entropy, the result is the same as for Shannon entropy. Finally, an example of an experiment of falling body tracking by radar using an unscented Kalman filter (UKF) in noisy conditions and a loosely coupled navigation experiment are performed to demonstrate the effectiveness of the conclusion. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
Cointegration and Unit Root Tests: A Fully Bayesian Approach
Entropy 2020, 22(9), 968; https://doi.org/10.3390/e22090968 - 31 Aug 2020
Cited by 2 | Viewed by 701
Abstract
To perform statistical inference for time series, one should be able to assess if they present deterministic or stochastic trends. For univariate analysis, one way to detect stochastic trends is to test if the series has unit roots, and for multivariate studies it [...] Read more.
To perform statistical inference for time series, one should be able to assess if they present deterministic or stochastic trends. For univariate analysis, one way to detect stochastic trends is to test if the series has unit roots, and for multivariate studies it is often relevant to search for stationary linear relationships between the series, or if they cointegrate. The main goal of this article is to briefly review the shortcomings of unit root and cointegration tests proposed by the Bayesian approach of statistical inference and to show how they can be overcome by the Full Bayesian Significance Test (FBST), a procedure designed to test sharp or precise hypothesis. We will compare its performance with the most used frequentist alternatives, namely, the Augmented Dickey–Fuller for unit roots and the maximum eigenvalue test for cointegration. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
A New Multi-Attribute Emergency Decision-Making Algorithm Based on Intuitionistic Fuzzy Cross-Entropy and Comprehensive Grey Correlation Analysis
Entropy 2020, 22(7), 768; https://doi.org/10.3390/e22070768 - 14 Jul 2020
Cited by 3 | Viewed by 828
Abstract
Intuitionistic fuzzy distance measurement is an effective method to study multi-attribute emergency decision-making (MAEDM) problems. Unfortunately, the traditional intuitionistic fuzzy distance measurement method cannot accurately reflect the difference between membership and non-membership data, where it is easy to cause information confusion. Therefore, from [...] Read more.
Intuitionistic fuzzy distance measurement is an effective method to study multi-attribute emergency decision-making (MAEDM) problems. Unfortunately, the traditional intuitionistic fuzzy distance measurement method cannot accurately reflect the difference between membership and non-membership data, where it is easy to cause information confusion. Therefore, from the intuitionistic fuzzy number (IFN), this paper constructs a decision-making model based on intuitionistic fuzzy cross-entropy and a comprehensive grey correlation analysis algorithm. For the MAEDM problems of completely unknown and partially known attribute weights, this method establishes a grey correlation analysis algorithm based on the objective evaluation value and subjective preference value of decision makers (DMs), which makes up for the shortcomings of traditional model information loss and greatly improves the accuracy of MAEDM. Finally, taking the Wenchuan Earthquake on May 12th 2008 as a case study, this paper constructs and solves the ranking problem of shelters. Through the sensitivity comparison analysis, when the grey resolution coefficient increases from 0.4 to 1.0, the ranking result of building shelters remains stable. Compared to the traditional intuitionistic fuzzy distance, this method is shown to be more reliable. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
Objective Bayesian Inference in Probit Models with Intrinsic Priors Using Variational Approximations
Entropy 2020, 22(5), 513; https://doi.org/10.3390/e22050513 - 30 Apr 2020
Cited by 1 | Viewed by 869
Abstract
There is not much literature on objective Bayesian analysis for binary classification problems, especially for intrinsic prior related methods. On the other hand, variational inference methods have been employed to solve classification problems using probit regression and logistic regression with normal priors. In [...] Read more.
There is not much literature on objective Bayesian analysis for binary classification problems, especially for intrinsic prior related methods. On the other hand, variational inference methods have been employed to solve classification problems using probit regression and logistic regression with normal priors. In this article, we propose to apply the variational approximation on probit regression models with intrinsic prior. We review the mean-field variational method and the procedure of developing intrinsic prior for the probit regression model. We then present our work on implementing the variational Bayesian probit regression model using intrinsic prior. Publicly available data from the world’s largest peer-to-peer lending platform, LendingClub, will be used to illustrate how model output uncertainties are addressed through the framework we proposed. With LendingClub data, the target variable is the final status of a loan, either charged-off or fully paid. Investors may very well be interested in how predictive features like FICO, amount financed, income, etc. may affect the final loan status. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
On a Class of Tensor Markov Fields
Entropy 2020, 22(4), 451; https://doi.org/10.3390/e22040451 - 16 Apr 2020
Cited by 3 | Viewed by 884
Abstract
Here, we introduce a class of Tensor Markov Fields intended as probabilistic graphical models from random variables spanned over multiplexed contexts. These fields are an extension of Markov Random Fields for tensor-valued random variables. By extending the results of Dobruschin, Hammersley and Clifford [...] Read more.
Here, we introduce a class of Tensor Markov Fields intended as probabilistic graphical models from random variables spanned over multiplexed contexts. These fields are an extension of Markov Random Fields for tensor-valued random variables. By extending the results of Dobruschin, Hammersley and Clifford to such tensor valued fields, we proved that tensor Markov fields are indeed Gibbs fields, whenever strictly positive probability measures are considered. Hence, there is a direct relationship with many results from theoretical statistical mechanics. We showed how this class of Markov fields it can be built based on a statistical dependency structures inferred on information theoretical grounds over empirical data. Thus, aside from purely theoretical interest, the Tensor Markov Fields described here may be useful for mathematical modeling and data analysis due to their intrinsic simplicity and generality. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
Channels’ Confirmation and Predictions’ Confirmation: From the Medical Test to the Raven Paradox
Entropy 2020, 22(4), 384; https://doi.org/10.3390/e22040384 - 26 Mar 2020
Cited by 3 | Viewed by 1783
Abstract
After long arguments between positivism and falsificationism, the verification of universal hypotheses was replaced with the confirmation of uncertain major premises. Unfortunately, Hemple proposed the Raven Paradox. Then, Carnap used the increment of logical probability as the confirmation measure. So far, many confirmation [...] Read more.
After long arguments between positivism and falsificationism, the verification of universal hypotheses was replaced with the confirmation of uncertain major premises. Unfortunately, Hemple proposed the Raven Paradox. Then, Carnap used the increment of logical probability as the confirmation measure. So far, many confirmation measures have been proposed. Measure F proposed by Kemeny and Oppenheim among them possesses symmetries and asymmetries proposed by Elles and Fitelson, monotonicity proposed by Greco et al., and normalizing property suggested by many researchers. Based on the semantic information theory, a measure b* similar to F is derived from the medical test. Like the likelihood ratio, measures b* and F can only indicate the quality of channels or the testing means instead of the quality of probability predictions. Furthermore, it is still not easy to use b*, F, or another measure to clarify the Raven Paradox. For this reason, measure c* similar to the correct rate is derived. Measure c* supports the Nicod Criterion and undermines the Equivalence Condition, and hence, can be used to eliminate the Raven Paradox. An example indicates that measures F and b* are helpful for diagnosing the infection of Novel Coronavirus, whereas most popular confirmation measures are not. Another example reveals that all popular confirmation measures cannot be used to explain that a black raven can confirm “Ravens are black” more strongly than a piece of chalk. Measures F, b*, and c* indicate that the existence of fewer counterexamples is more important than more positive examples’ existence, and hence, are compatible with Popper’s falsification thought. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
The Decomposition and Forecasting of Mutual Investment Funds Using Singular Spectrum Analysis
Entropy 2020, 22(1), 83; https://doi.org/10.3390/e22010083 - 09 Jan 2020
Cited by 4 | Viewed by 1076
Abstract
Singular spectrum analysis (SSA) is a non-parametric method that breaks down a time series into a set of components that can be interpreted and grouped as trend, periodicity, and noise, emphasizing the separability of the underlying components and separate periodicities that occur at [...] Read more.
Singular spectrum analysis (SSA) is a non-parametric method that breaks down a time series into a set of components that can be interpreted and grouped as trend, periodicity, and noise, emphasizing the separability of the underlying components and separate periodicities that occur at different time scales. The original time series can be recovered by summing all components. However, only the components associated to the signal should be considered for the reconstruction of the noise-free time series and to conduct forecasts. When the time series data has the presence of outliers, SSA and other classic parametric and non-parametric methods might result in misleading conclusions and robust methodologies should be used. In this paper we consider the use of two robust SSA algorithms for model fit and one for model forecasting. The classic SSA model, the robust SSA alternatives, and the autoregressive integrated moving average (ARIMA) model are compared in terms of computational time and accuracy for model fit and model forecast, using a simulation example and time series data from the quotas and returns of six mutual investment funds. When outliers are present in the data, the simulation study shows that the robust SSA algorithms outperform the classical ARIMA and SSA models. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
Prior Sensitivity Analysis in a Semi-Parametric Integer-Valued Time Series Model
Entropy 2020, 22(1), 69; https://doi.org/10.3390/e22010069 - 06 Jan 2020
Cited by 1 | Viewed by 1101
Abstract
We examine issues of prior sensitivity in a semi-parametric hierarchical extension of the INAR(p) model with innovation rates clustered according to a Pitman–Yor process placed at the top of the model hierarchy. Our main finding is a graphical criterion that guides [...] Read more.
We examine issues of prior sensitivity in a semi-parametric hierarchical extension of the INAR(p) model with innovation rates clustered according to a Pitman–Yor process placed at the top of the model hierarchy. Our main finding is a graphical criterion that guides the specification of the hyperparameters of the Pitman–Yor process base measure. We show how the discount and concentration parameters interact with the chosen base measure to yield a gain in terms of the robustness of the inferential results. The forecasting performance of the model is exemplified in the analysis of a time series of worldwide earthquake events, for which the new model outperforms the original INAR(p) model. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Open AccessArticle
Universal Sample Size Invariant Measures for Uncertainty Quantification in Density Estimation
Entropy 2019, 21(11), 1120; https://doi.org/10.3390/e21111120 - 15 Nov 2019
Cited by 2 | Viewed by 984
Abstract
Previously, we developed a high throughput non-parametric maximum entropy method (PLOS ONE, 13(5): e0196937, 2018) that employs a log-likelihood scoring function to characterize uncertainty in trial probability density estimates through a scaled quantile residual (SQR). The SQR for the true probability density has [...] Read more.
Previously, we developed a high throughput non-parametric maximum entropy method (PLOS ONE, 13(5): e0196937, 2018) that employs a log-likelihood scoring function to characterize uncertainty in trial probability density estimates through a scaled quantile residual (SQR). The SQR for the true probability density has universal sample size invariant properties equivalent to sampled uniform random data (SURD). Alternative scoring functions are considered that include the Anderson-Darling test. Scoring function effectiveness is evaluated using receiver operator characteristics to quantify efficacy in discriminating SURD from decoy-SURD, and by comparing overall performance characteristics during density estimation across a diverse test set of known probability distributions. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Graphical abstract

Open AccessArticle
An Integrated Approach for Making Inference on the Number of Clusters in a Mixture Model
Entropy 2019, 21(11), 1063; https://doi.org/10.3390/e21111063 - 30 Oct 2019
Cited by 1 | Viewed by 681
Abstract
This paper presents an integrated approach for the estimation of the parameters of a mixture model in the context of data clustering. The method is designed to estimate the unknown number of clusters from observed data. For this, we marginalize out the weights [...] Read more.
This paper presents an integrated approach for the estimation of the parameters of a mixture model in the context of data clustering. The method is designed to estimate the unknown number of clusters from observed data. For this, we marginalize out the weights for getting allocation probabilities that depend on the number of clusters but not on the number of components of the mixture model. As an alternative to the stochastic expectation maximization (SEM) algorithm, we propose the integrated stochastic expectation maximization (ISEM) algorithm, which in contrast to SEM, does not need the specification, a priori, of the number of components of the mixture. Using this algorithm, one estimates the parameters associated with the clusters, with at least two observations, via local maximization of the likelihood function. In addition, at each iteration of the algorithm, there exists a positive probability of a new cluster being created by a single observation. Using simulated datasets, we compare the performance of the ISEM algorithm against both SEM and reversible jump (RJ) algorithms. The obtained results show that ISEM outperforms SEM and RJ algorithms. We also provide the performance of the three algorithms in two real datasets. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties)
Show Figures

Figure 1

Back to TopTop