Statistical Methods and Applications

A special issue of Axioms (ISSN 2075-1680). This special issue belongs to the section "Mathematical Analysis".

Deadline for manuscript submissions: closed (30 September 2023) | Viewed by 49149

Special Issue Editor


E-Mail Website
Guest Editor
Statistics Discipline, Division of Science and Mathematics, University of Minnesota at Morris, Morris, MN 56267, USA
Interests: probability and stochastic processes; Functional Data Analysis; financial time series
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue welcomes original papers and critical reviews on statistical methodologies and their broad applications in different scientific domains, including environmental, social, and governance (ESG).

Environmental, social, and governance (ESG) is an important research topic in economics, finance, mathematics, statistics, and many other fields. To promote ESG research in applied statistics, diverse machine learning and artificial intelligence techniques for large and complex satellite image data have been developed. This Special Issue will present modern machine learning data analysis methods in ESG. Suitable topics include, but are not limited to, the following:

  • Stochastic modeling and statistical methods.
  • Applied mathematics.
  • Environmental, social, and governance (ESG). 
  • Green technologies:
    • Artificial intelligence.
    • Blockchain.
    • Big data.
    • Cryptocurrencies.
    • Cyber security.
    • Data analytics.
    • Data mining.
    • Deep learning.
    • Electronic data interchange (EDI).
    • E-learning.
    • Internet security.
    • Internet of things.
    • Neural networks.
    • Fuzzy logic.
    • Expert systems.
    • Sentiment analysis.

Prof. Dr. Jong-Min Kim
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Axioms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (22 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 371 KiB  
Article
Probability Distributions Approximation via Fractional Moments and Maximum Entropy: Theoretical and Computational Aspects
by Pier Luigi Novi Inverardi and Aldo Tagliani
Axioms 2024, 13(1), 28; https://doi.org/10.3390/axioms13010028 - 30 Dec 2023
Cited by 5 | Viewed by 1439
Abstract
In the literature, the use of fractional moments to express the available information in the framework of maximum entropy (MaxEnt) approximation of a distribution F having finite or unbounded positive support, has been essentially considered as a computational tool to improve the performance [...] Read more.
In the literature, the use of fractional moments to express the available information in the framework of maximum entropy (MaxEnt) approximation of a distribution F having finite or unbounded positive support, has been essentially considered as a computational tool to improve the performance of the analogous procedure based on integer moments. No attention has been paid to two formal aspects concerning fractional moments, such as conditions for the existence of the maximum entropy approximation based on them or convergence in entropy of this approximation to F. This paper aims to fill this gap by providing proofs of these two fundamental results. In fact, convergence in entropy can be involved in the optimal selection of the order of fractional moments for accelerating the convergence of the MaxEnt approximation to F, to clarify the entailment relationships of this type of convergence with other types of convergence useful in statistical applications, and to preserve some important prior features of the underlying F distribution. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
17 pages, 895 KiB  
Article
Modeling High-Frequency Zeros in Time Series with Generalized Autoregressive Score Models with Explanatory Variables: An Application to Precipitation
by Pedro Vidal-Gutiérrez, Sergio Contreras-Espinoza and Francisco Novoa-Muñoz
Axioms 2024, 13(1), 15; https://doi.org/10.3390/axioms13010015 - 25 Dec 2023
Viewed by 1185
Abstract
An extension of the Generalized Autoregressive Score (GAS) model is presented for time series with excess null observations to include explanatory variables. An extension of the GAS model proposed by Harvey and Ito is suggested, and it is applied to precipitation data from [...] Read more.
An extension of the Generalized Autoregressive Score (GAS) model is presented for time series with excess null observations to include explanatory variables. An extension of the GAS model proposed by Harvey and Ito is suggested, and it is applied to precipitation data from a city in Chile. It is concluded that the model provides adequate prediction, and furthermore, an analysis of the relationship between the precipitation variable and the explanatory variables is shown. This relationship is compared with the meteorology literature, demonstrating concurrence. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

21 pages, 1111 KiB  
Article
Estimation of Entropy for Generalized Rayleigh Distribution under Progressively Type-II Censored Samples
by Haiping Ren, Qin Gong and Xue Hu
Axioms 2023, 12(8), 776; https://doi.org/10.3390/axioms12080776 - 10 Aug 2023
Cited by 3 | Viewed by 1001
Abstract
This paper investigates the problem of entropy estimation for the generalized Rayleigh distribution under progressively type-II censored samples. Based on progressively type-II censored samples, we first discuss the maximum likelihood estimation and interval estimation of Shannon entropy for the generalized Rayleigh distribution. Then, [...] Read more.
This paper investigates the problem of entropy estimation for the generalized Rayleigh distribution under progressively type-II censored samples. Based on progressively type-II censored samples, we first discuss the maximum likelihood estimation and interval estimation of Shannon entropy for the generalized Rayleigh distribution. Then, we explore the Bayesian estimation problem of entropy under three types of loss functions: K-loss function, weighted squared error loss function, and precautionary loss function. Due to the complexity of Bayesian estimation computation, we use the Lindley approximation and MCMC method for calculating Bayesian estimates. Finally, using a Monte Carlo statistical simulation, we compare the mean square errors to examine the superiority of maximum likelihood estimation and Bayesian estimation under different loss functions. An actual example is provided to verify the feasibility and practicality of various estimations. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

26 pages, 1150 KiB  
Article
Statistical Inference for Two Gumbel Type-II Distributions under Joint Type-II Censoring Scheme
by Yu Qiu and Wenhao Gui
Axioms 2023, 12(6), 572; https://doi.org/10.3390/axioms12060572 - 8 Jun 2023
Cited by 1 | Viewed by 1248
Abstract
Comparative lifetime tests are extremely significant when the experimenters study the reliability of the comparative advantages of two products in competition. Considering joint type-II censoring, we deal with the inference when two product lines conform to two Gumbel type-II distributions. The maximum likelihood [...] Read more.
Comparative lifetime tests are extremely significant when the experimenters study the reliability of the comparative advantages of two products in competition. Considering joint type-II censoring, we deal with the inference when two product lines conform to two Gumbel type-II distributions. The maximum likelihood estimations of Gumbel type-II population parameters were obtained in the current research. An approximate confidence interval and a simultaneous confidence interval based on a Fisher information matrix were also constructed and compared with two bootstrap confidence intervals. Moreover, to evaluate the influence of the prior information, based on the concept of importance sampling, we calculated the Bayesian estimator together with their posterior risks in the case of gamma and non-informative priors under different loss functions. To compare the performances of the overall parameters’ estimator, a Monte Carlo simulation was performed using numerical and graphical methods. Finally, a real data analysis was conducted to verify the accuracy of all the models and methods mentioned. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

22 pages, 2006 KiB  
Article
Modeling Socioeconomic Determinants of Building Fires through Backward Elimination by Robust Final Prediction Error Criterion
by Albertus Untadi, Lily D. Li, Michael Li and Roland Dodd
Axioms 2023, 12(6), 524; https://doi.org/10.3390/axioms12060524 - 26 May 2023
Cited by 2 | Viewed by 1780
Abstract
Fires in buildings are significant public safety hazards and can result in fatalities and substantial financial losses. Studies have shown that the socioeconomic makeup of a region can impact the occurrence of building fires. However, existing models based on the classical stepwise regression [...] Read more.
Fires in buildings are significant public safety hazards and can result in fatalities and substantial financial losses. Studies have shown that the socioeconomic makeup of a region can impact the occurrence of building fires. However, existing models based on the classical stepwise regression procedure have limitations. This paper proposes a more accurate predictive model of building fire rates using a set of socioeconomic variables. To improve the model’s forecasting ability, a backward elimination by robust final predictor error (RFPE) criterion is introduced. The proposed approach is applied to census and fire incident data from the South East Queensland region of Australia. A cross-validation procedure is used to assess the model’s accuracy, and comparative analyses are conducted using other elimination criteria such as p-value, Akaike’s information criterion (AIC), Bayesian information criterion (BIC), and predicted residual error sum of squares (PRESS). The results demonstrate that the RFPE criterion is a more accurate predictive model based on several goodness-of-fit measures. Overall, the RFPE equation was found to be a suitable criterion for the backward elimination procedure in the socioeconomic modeling of building fires. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

16 pages, 10110 KiB  
Article
On Construction and Estimation of Mixture of Log-Bilal Distributions
by Showkat Ahmad Lone, Tabassum Naz Sindhu, Sadia Anwar, Marwa K. H. Hassan, Sarah A. Alsahli and Tahani A. Abushal
Axioms 2023, 12(3), 309; https://doi.org/10.3390/axioms12030309 - 19 Mar 2023
Cited by 4 | Viewed by 1772
Abstract
Recently, the use of mixed models for analyzing real data sets with infinite domains has gained favor. However, only a specific type of mixture model using mostly maximum likelihood estimation technique has been exercised in the literature, and fitting the mixture models for [...] Read more.
Recently, the use of mixed models for analyzing real data sets with infinite domains has gained favor. However, only a specific type of mixture model using mostly maximum likelihood estimation technique has been exercised in the literature, and fitting the mixture models for bounded data (between zero and one) has been neglected. In statistical mechanics, unit distributions are widely utilized to explain practical numeric values ranging between zero and one. We presented a classical examination for the trade share data set using a mixture of two log-Bilal distributions (MLBDs). We examine the features and statistical estimation of the MLBD in connection with three techniques. The sensitivity of the presented estimators with respect to model parameters, weighting proportions, sample size, and different evaluation methodologies has also been discussed. A simulation investigation is also used to endorse the estimation results. The findings on maximum likelihood estimation were more persuasive than those of existing mixture models. The flexibility and importance of the proposed distribution are illustrated by means of real datasets. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

8 pages, 726 KiB  
Article
Nonparametric Directional Dependence Estimation and Its Application to Cryptocurrency
by Hohsuk Noh, Hyuna Jang, Kun Ho Kim and Jong-Min Kim
Axioms 2023, 12(3), 293; https://doi.org/10.3390/axioms12030293 - 11 Mar 2023
Viewed by 1428
Abstract
This paper proposes a nonparametric directional dependence by using the local polynomial regression technique. With data generated from a bivariate copula having a nonmonotone regression structure, we show that our nonparametric directional dependence is superior to the copula directional dependence method in terms [...] Read more.
This paper proposes a nonparametric directional dependence by using the local polynomial regression technique. With data generated from a bivariate copula having a nonmonotone regression structure, we show that our nonparametric directional dependence is superior to the copula directional dependence method in terms of the root-mean-square error. To validate the directional dependence with real data, we use the log returns of daily prices of Bitcoin, Ethereum, Ripple, and Stellar. We conclude that our nonparametric directional dependence, by using the local polynomial regression technique with asymmetric-threshold GARCH models for marginal distributions, detects the directional dependence better than the copula directional dependence method by an asymmetric GARCH model. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

18 pages, 1535 KiB  
Article
Slow Manifolds for Stochastic Koper Models with Stable Lévy Noises
by Hina Zulfiqar, Shenglan Yuan and Muhammad Shoaib Saleem
Axioms 2023, 12(3), 261; https://doi.org/10.3390/axioms12030261 - 3 Mar 2023
Cited by 1 | Viewed by 1302
Abstract
The Koper model is a vector field in which the differential equations describe the electrochemical oscillations appearing in diffusion processes. This work focuses on the understanding of the slow dynamics of a stochastic Koper model perturbed by stable Lévy noise. We establish the [...] Read more.
The Koper model is a vector field in which the differential equations describe the electrochemical oscillations appearing in diffusion processes. This work focuses on the understanding of the slow dynamics of a stochastic Koper model perturbed by stable Lévy noise. We establish the slow manifold for a stochastic Koper model with stable Lévy noise and verify exponential tracking properties. We also present two practical examples to demonstrate the analytical results with numerical simulations. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

18 pages, 2633 KiB  
Article
Estimation of Uncertainty for Technology Evaluation Factors via Bayesian Neural Networks
by Juhyun Lee, Sangsung Park and Junseok Lee
Axioms 2023, 12(2), 145; https://doi.org/10.3390/axioms12020145 - 31 Jan 2023
Cited by 1 | Viewed by 1673
Abstract
In contemporary times, science-based technologies are needed for launching innovative products and services in the market. As technology-based management strategies are gaining importance, associated patents need to be comprehensively studied. Previous studies have proposed predictive models based on patent factors. However, technology-based management [...] Read more.
In contemporary times, science-based technologies are needed for launching innovative products and services in the market. As technology-based management strategies are gaining importance, associated patents need to be comprehensively studied. Previous studies have proposed predictive models based on patent factors. However, technology-based management strategies can influence the growth and decline of firms. Thus, this study aims to estimate uncertainties of the factors that are frequently used in technology-based studies. Furthermore, the importance of the factors may fluctuate over time. Therefore, we propose a Bayesian neural network model based on Flipout and four research hypotheses to evaluate the validity of our method. The proposed method not only estimates the uncertainties of the factors, but also predicts the future value of technologies. Our contribution is to (i) provide a tractable Bayesian neural network applicable to big data, (ii) discover factors that affect the value of technology, and (iii) present empirical evidence for the timeliness and objectivity of technology evaluation. In our experiments, 3781 healthcare-related cases of patents were used, and we found that the proposed hypotheses were all statistically significant. Therefore, we believe that reliable and stable technology-based management strategies can be established through our method. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

23 pages, 2434 KiB  
Article
Complete Study of an Original Power-Exponential Transformation Approach for Generalizing Probability Distributions
by Mustafa S. Shama, Farid El Ktaibi, Jamal N. Al Abbasi, Christophe Chesneau and Ahmed Z. Afify
Axioms 2023, 12(1), 67; https://doi.org/10.3390/axioms12010067 - 7 Jan 2023
Cited by 6 | Viewed by 1760
Abstract
In this paper, we propose a flexible and general family of distributions based on an original power-exponential transformation approach. We call it the modified generalized-G (MGG) family. The elegance and significance of this family lie in the ability to modify the standard distributions [...] Read more.
In this paper, we propose a flexible and general family of distributions based on an original power-exponential transformation approach. We call it the modified generalized-G (MGG) family. The elegance and significance of this family lie in the ability to modify the standard distributions by changing their functional forms without adding new parameters, by compounding two distributions, or by adding one or two shape parameters. The aim of this modification is to provide flexible shapes for the corresponding probability functions. In particular, the distributions of the MGG family can possess increasing, constant, decreasing, “unimodal”, or “bathtub-shaped“ hazard rate functions, which are ideal for fitting several real data sets encountered in applied fields. Some members of the MGG family are proposed for special distributions. Following that, the uniform distribution is chosen as a baseline distribution to yield the modified uniform (MU) distribution with the goal of efficiently modeling measures with bounded values. Some useful key properties of the MU distribution are determined. The estimation of the unknown parameters of the MU model is discussed using seven methods, and then, a simulation study is carried out to explore the performance of the estimates. The flexibility of this model is illustrated by the analysis of two real-life data sets. When compared to fair and well-known competitor models in contemporary literature, better-fitting results are obtained for the new model. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

13 pages, 335 KiB  
Article
Goodness-of-Fit Test for the Bivariate Hermite Distribution
by Pablo González-Albornoz and Francisco Novoa-Muñoz
Axioms 2023, 12(1), 7; https://doi.org/10.3390/axioms12010007 - 22 Dec 2022
Cited by 2 | Viewed by 1446
Abstract
This paper studies the goodness of fit test for the bivariate Hermite distribution. Specifically, we propose and study a Cramér–von Mises-type test based on the empirical probability generation function. The bootstrap can be used to consistently estimate the null distribution of the test [...] Read more.
This paper studies the goodness of fit test for the bivariate Hermite distribution. Specifically, we propose and study a Cramér–von Mises-type test based on the empirical probability generation function. The bootstrap can be used to consistently estimate the null distribution of the test statistics. A simulation study investigates the goodness of the bootstrap approach for finite sample sizes. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
16 pages, 3989 KiB  
Article
Technology Opportunity Analysis Based on Machine Learning
by Junseok Lee, Sangsung Park and Juhyun Lee
Axioms 2022, 11(12), 708; https://doi.org/10.3390/axioms11120708 - 8 Dec 2022
Cited by 1 | Viewed by 1827
Abstract
The sustainable growth of a company requires a differentiated research and development strategy through the discovery of technology opportunities. However, previous studies fell short of the need for utilizing outlier keywords, based on approaches from various perspectives, to discover technology opportunities. In this [...] Read more.
The sustainable growth of a company requires a differentiated research and development strategy through the discovery of technology opportunities. However, previous studies fell short of the need for utilizing outlier keywords, based on approaches from various perspectives, to discover technology opportunities. In this study, a technology opportunity discovery method utilizing outlier keywords is proposed. First, the collected patent data are divided into several subsets, and outlier keywords are derived using the W2V and LOF. The derived keywords are clustered through the K-means algorithm. Finally, the similarity between the clusters is evaluated to determine the cluster with the most similarity as a potential technology. In this study, 5679 cases of unmanned aerial vehicle (UAV) patent data were utilized, from which three technology opportunities were derived: UAV defense technology, UAV charging station technology, and UAV measurement precision improvement technology. The proposed method will contribute to discovering differentiated technology fields in advance using technologies with semantic differences and outlier keywords, in which the meaning of words is considered through W2V application. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

15 pages, 3690 KiB  
Article
New Financial Ratios Based on the Compositional Data Methodology
by Salvador Linares-Mustarós, Maria Àngels Farreras-Noguer, Núria Arimany-Serrat and Germà Coenders
Axioms 2022, 11(12), 694; https://doi.org/10.3390/axioms11120694 - 4 Dec 2022
Cited by 7 | Viewed by 2796
Abstract
Due to the type of mathematical construction, the use of standard financial ratios in studies analyzing the financial health of a group of firms leads to a series of statistical problems that can invalidate the results obtained. These problems originate from the asymmetry [...] Read more.
Due to the type of mathematical construction, the use of standard financial ratios in studies analyzing the financial health of a group of firms leads to a series of statistical problems that can invalidate the results obtained. These problems originate from the asymmetry of financial ratios. The present article justifies the use of a new methodology using Compositional Data (CoDa) to analyze the financial statements of an industry, improving analyses using conventional ratios, since the new methodology enables statistical techniques to be applied without encountering any serious drawbacks, such as skewness and outliers, and without the results depending on the arbitrary choice as to which of the accounting figures is the numerator of the ratio and which is the denominator. An example with data on the wine industry is provided. The results show that when using CoDa, outliers and skewness are much reduced, and results are invariant to numerator and denominator permutation. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

11 pages, 2026 KiB  
Article
Text Data Analysis Using Generalized Linear Mixed Model and Bayesian Visualization
by Sunghae Jun
Axioms 2022, 11(12), 674; https://doi.org/10.3390/axioms11120674 - 26 Nov 2022
Cited by 1 | Viewed by 1949
Abstract
Many parts of big data, such as web documents, online posts, papers, patents, and articles, are in text form. So, the analysis of text data in the big data domain is an important task. Many methods based on statistics or machine learning algorithms [...] Read more.
Many parts of big data, such as web documents, online posts, papers, patents, and articles, are in text form. So, the analysis of text data in the big data domain is an important task. Many methods based on statistics or machine learning algorithms have been studied for text data analysis. Most of them were analytical methods based on the generalized linear model (GLM). For the GLM, text data analysis is performed based on the assumption of the error included in the given data and follows the Gaussian distribution. However, the GLM has shown limitations in the analysis of text data, including data sparseness. This is because the preprocessed text data has a zero-inflated problem. To solve this problem, we proposed a text data analysis using the generalized linear mixed model (GLMM) and Bayesian visualization. Therefore, the objective of our study is to propose the use of GLMM to overcome the limitations of the conventional GLM in the analysis of text data with a zero-inflated problem. The GLMM uses various probability distributions as well as Gaussian for error terms and considers the difference between observations by clustering. We also use Bayesian visualization to find meaningful associations between keywords. Lastly, we carried out the analysis of text data searched from real domains and provided the analytical results to show the performance and validity of our proposed method. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

17 pages, 786 KiB  
Article
Copula Dynamic Conditional Correlation and Functional Principal Component Analysis of COVID-19 Mortality in the United States
by Jong-Min Kim
Axioms 2022, 11(11), 619; https://doi.org/10.3390/axioms11110619 - 7 Nov 2022
Cited by 3 | Viewed by 2079
Abstract
This paper shows a visual analysis and the dependence relationships of COVID-19 mortality data in 50 states plus Washington, D.C., from January 2020 to 1 September 2022. Since the mortality data are severely skewed and highly dispersed, a traditional linear model is not [...] Read more.
This paper shows a visual analysis and the dependence relationships of COVID-19 mortality data in 50 states plus Washington, D.C., from January 2020 to 1 September 2022. Since the mortality data are severely skewed and highly dispersed, a traditional linear model is not suitable for the data. As such, we use a Gaussian copula marginal regression (GCMR) model and vine copula-based quantile regression to analyze the COVID-19 mortality data. For a visual analysis of the COVID-19 mortality data, a functional principal component analysis (FPCA), graphical model, and copula dynamic conditional correlation (copula-DCC) are applied. The visual from the graphical model shows five COVID-19 mortality equivalence groups in the US, and the results of the FPCA visualize the COVID-19 daily mortality time trends for 50 states plus Washington, D.C. The GCMR model investigates the COVID-19 daily mortality relationship between four major states and the rest of the states in the US. The copula-DCC models investigate the time-trend dependence relationship between the COVID-19 daily mortality data of four major states. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

19 pages, 2975 KiB  
Article
A Method for Analyzing the Performance Impact of Imbalanced Binary Data on Machine Learning Models
by Ming Zheng, Fei Wang, Xiaowen Hu, Yuhao Miao, Huo Cao and Mingjing Tang
Axioms 2022, 11(11), 607; https://doi.org/10.3390/axioms11110607 - 1 Nov 2022
Cited by 11 | Viewed by 3255
Abstract
Machine learning models may not be able to effectively learn and predict from imbalanced data in the fields of machine learning and data mining. This study proposed a method for analyzing the performance impact of imbalanced binary data on machine learning models. It [...] Read more.
Machine learning models may not be able to effectively learn and predict from imbalanced data in the fields of machine learning and data mining. This study proposed a method for analyzing the performance impact of imbalanced binary data on machine learning models. It systematically analyzes 1. the relationship between varying performance in machine learning models and imbalance rate (IR); 2. the performance stability of machine learning models on imbalanced binary data. In the proposed method, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced dataset with gradually varying IR. Then, in order to obtain more objective classification results, the evaluation metric AFG, arithmetic mean of area under the receiver operating characteristic curve (AUC), F-measure and G-mean are used to evaluate the classification performance of machine learning models. Finally, based on AFG and coefficient of variation (CV), the performance stability evaluation method of machine learning models is proposed. Experiments of eight widely used machine learning models on 48 different imbalanced datasets demonstrate that the classification performance of machine learning models decreases with the increase of IR on the same imbalanced data. Meanwhile, the classification performances of LR, DT and SVC are unstable, while GNB, BNB, KNN, RF and GBDT are relatively stable and not susceptible to imbalanced data. In particular, the BNB has the most stable classification performance. The Friedman and Nemenyi post hoc statistical tests also confirmed this result. The SMOTE method is used in oversampling-based imbalanced data augmentation, and determining whether other oversampling methods can obtain consistent results needs further research. In the future, an imbalanced data augmentation algorithm based on undersampling and hybrid sampling should be used to analyze the performance impact of imbalanced binary data on machine learning models. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

16 pages, 663 KiB  
Article
The Use of a Log-Normal Prior for the Student t-Distribution
by Se Yoon Lee
Axioms 2022, 11(9), 462; https://doi.org/10.3390/axioms11090462 - 8 Sep 2022
Cited by 8 | Viewed by 3818
Abstract
It is typically difficult to estimate the number of degrees of freedom due to the leptokurtic nature of the Student t-distribution. Particularly in studies with small sample sizes, special care is needed concerning prior choice in order to ensure that the analysis [...] Read more.
It is typically difficult to estimate the number of degrees of freedom due to the leptokurtic nature of the Student t-distribution. Particularly in studies with small sample sizes, special care is needed concerning prior choice in order to ensure that the analysis is not overly dominated by any prior distribution. In this article, popular priors used in the existing literature are examined by characterizing their distributional properties on an effective support where it is desirable to concentrate on most of the prior probability mass. Additionally, we suggest a log-normal prior as a viable prior option. We show that the Bayesian estimator based on a log-normal prior compares favorably to other Bayesian estimators based on the priors previously proposed via simulation studies and financial applications. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

17 pages, 1993 KiB  
Article
A Study on Cryptocurrency Log-Return Price Prediction Using Multivariate Time-Series Model
by Sang-Ha Sung, Jong-Min Kim, Byung-Kwon Park and Sangjin Kim
Axioms 2022, 11(9), 448; https://doi.org/10.3390/axioms11090448 - 1 Sep 2022
Cited by 6 | Viewed by 4087
Abstract
Cryptocurrencies are highly volatile investment assets and are difficult to predict. In this study, various cryptocurrency data are used as features to predict the log-return price of major cryptocurrencies. The original contribution of this study is the selection of the most influential major [...] Read more.
Cryptocurrencies are highly volatile investment assets and are difficult to predict. In this study, various cryptocurrency data are used as features to predict the log-return price of major cryptocurrencies. The original contribution of this study is the selection of the most influential major features for each cryptocurrency using the volatility features of cryptocurrency, derived from the autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models, along with the closing price of the cryptocurrency. In addition, we sought to predict the log-return price of cryptocurrencies by implementing various types of time-series model. Based on the selected major features, the log-return price of cryptocurrency was predicted through the autoregressive integrated moving average (ARIMA) time-series prediction model and the artificial neural network-based time-series prediction model. As a result of log-return price prediction, the neural-network-based time-series prediction models showed superior predictive power compared to the traditional time-series prediction model. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

13 pages, 12532 KiB  
Article
Machine Learning-Based Modelling and Meta-Heuristic-Based Optimization of Specific Tool Wear and Surface Roughness in the Milling Process
by Siamak Pedrammehr, Mahsa Hejazian, Mohammad Reza Chalak Qazani, Hadi Parvaz, Sajjad Pakzad, Mir Mohammad Ettefagh and Adeel H. Suhail
Axioms 2022, 11(9), 430; https://doi.org/10.3390/axioms11090430 - 26 Aug 2022
Cited by 7 | Viewed by 2489
Abstract
The purpose of this research is to investigate different milling parameters for optimization to achieve the maximum rate of material removal with the minimum tool wear and surface roughness. In this study, a tool wear factor is specified to investigate tool wear parameters [...] Read more.
The purpose of this research is to investigate different milling parameters for optimization to achieve the maximum rate of material removal with the minimum tool wear and surface roughness. In this study, a tool wear factor is specified to investigate tool wear parameters and the amount of material removed during machining, simultaneously. The second output parameter is surface roughness. The DOE technique is used to design the experiments and applied to the milling machine. The practical data is used to develop different mathematical models. In addition, a single-objective genetic algorithm (GA) is applied to numerate the optimal hyperparameters of the proposed adaptive network-based fuzzy inference system (ANFIS) to achieve the best possible efficiency. Afterwards, the multi-objective GA is employed to extract the optimum cutting parameters to reach the specified tool wear and the least surface roughness. The proposed method is developed under MATLAB using the practically extracted dataset and neural network. The optimization results revealed that optimum values for feed rate, cutting speed, and depth of cut vary from 252.6 to 256.9 (m/min), 0.1005 to 0.1431 (mm/rev tooth), and from 1.2735 to 1.3108 (mm), respectively. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

18 pages, 1153 KiB  
Article
Effect of Fuzzy Time Series on Smoothing Estimation of the INAR(1) Process
by Mahmoud El-Morshedy, Mohammed H. El-Menshawy, Mohammed M. A. Almazah, Rashad M. El-Sagheer and Mohamed S. Eliwa
Axioms 2022, 11(9), 423; https://doi.org/10.3390/axioms11090423 - 24 Aug 2022
Cited by 4 | Viewed by 1685
Abstract
In this paper, the effect of fuzzy time series on estimates of the spectral, bispectral and normalized bispectral density functions are studied. This study is conducted for one of the integer autoregressive of order one (INAR(1)) models. The model of interest here is [...] Read more.
In this paper, the effect of fuzzy time series on estimates of the spectral, bispectral and normalized bispectral density functions are studied. This study is conducted for one of the integer autoregressive of order one (INAR(1)) models. The model of interest here is the dependent counting geometric INAR(1) which is symbolized by (DCGINAR(1)). A realization is generated for this model of size n = 500 for estimation. Based on fuzzy time series, the forecasted observations of this model are obtained. The estimators of spectral, bispectral and normalized bispectral density functions are smoothed by different one- and two-dimensional lag windows. Finally, after the smoothing, all estimators are studied in the case of generated and forecasted observations of the DCGINAR(1) model. We investigate the contribution of the fuzzy time series to the smoothing of these estimates through the results. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

35 pages, 4863 KiB  
Article
Fréchet Binomial Distribution: Statistical Properties, Acceptance Sampling Plan, Statistical Inference and Applications to Lifetime Data
by Salem A. Alyami, Mohammed Elgarhy, Ibrahim Elbatal, Ehab M. Almetwally, Naif Alotaibi and Ahmed R. El-Saeed
Axioms 2022, 11(8), 389; https://doi.org/10.3390/axioms11080389 - 8 Aug 2022
Cited by 8 | Viewed by 1789
Abstract
A new class of distribution called the Fréchet binomial (FB) distribution is proposed. The new suggested model is very flexible because its probability density function can be unimodal, decreasing and skewed to the right. Furthermore, the hazard rate function can be increasing, decreasing, [...] Read more.
A new class of distribution called the Fréchet binomial (FB) distribution is proposed. The new suggested model is very flexible because its probability density function can be unimodal, decreasing and skewed to the right. Furthermore, the hazard rate function can be increasing, decreasing, up-side-down and reversed-J form. Important mixture representations of the probability density function (pdf) and cumulative distribution function (cdf) are computed. Numerous sub-models of the FB distribution are explored. Numerous statistical and mathematical features of the FB distribution such as the quantile function (QUNF); moments (MO); incomplete MO (IMO); conditional MO (CMO); MO generating function (MOGF); probability weighted MO (PWMO); order statistics; and entropy are computed. When the life test is shortened at a certain time, acceptance sampling (ACS) plans for the new proposed distribution, FB distribution, are produced. The truncation time is supposed to be the median lifetime of the FB distribution multiplied by a set of parameters. The smallest sample size required ensures that the specified life test is obtained at a particular consumer’s risk. The numerical results for a particular consumer’s risk, FB distribution parameters and truncation time are generated. We discuss the method of maximum likelihood to estimate the model parameters. A simulation study was performed to assess the behavior of the estimates. Three real datasets are used to illustrate the importance and flexibility of the proposed model. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

15 pages, 1179 KiB  
Article
Forecasting Crude Oil Prices with Major S&P 500 Stock Prices: Deep Learning, Gaussian Process, and Vine Copula
by Jong-Min Kim, Hope H. Han and Sangjin Kim
Axioms 2022, 11(8), 375; https://doi.org/10.3390/axioms11080375 - 29 Jul 2022
Cited by 6 | Viewed by 3208
Abstract
This paper introduces methodologies in forecasting oil prices (Brent and WTI) with multivariate time series of major S&P 500 stock prices using Gaussian process modeling, deep learning, and vine copula regression. We also apply Bayesian variable selection and nonlinear principal component analysis (NLPCA) [...] Read more.
This paper introduces methodologies in forecasting oil prices (Brent and WTI) with multivariate time series of major S&P 500 stock prices using Gaussian process modeling, deep learning, and vine copula regression. We also apply Bayesian variable selection and nonlinear principal component analysis (NLPCA) for data dimension reduction. With a reduced number of important covariates, we also forecast oil prices (Brent and WTI) with multivariate time series of major S&P 500 stock prices using Gaussian process modeling, deep learning, and vine copula regression. To apply real data to the proposed methods, we select monthly log returns of 2 oil prices and 74 large-cap, major S&P 500 stock prices across the period of February 2001–October 2019. We conclude that vine copula regression with NLPCA is superior overall to other proposed methods in terms of the measures of prediction errors. Full article
(This article belongs to the Special Issue Statistical Methods and Applications)
Show Figures

Figure 1

Back to TopTop