Computational Statistics, Data Analysis and Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "D1: Probability and Statistics".

Deadline for manuscript submissions: 31 January 2026 | Viewed by 5918

Special Issue Editor


E-Mail Website
Guest Editor
School of Economics and Management, Beihang University, Beijing 100876, China
Interests: high-dimensional data analysis; non-parametric statistical analysis; machine learning; survival data analysis; statistical algorithm and applications

Special Issue Information

Dear Colleagues,

In today's data-driven world, computational statistics and data analysis are crucial in uncovering insights, making informed decisions, and driving innovation across various disciplines. We are pleased to invite contributions to our upcoming Special Issue of Mathematics, entitled "Computational Statistics, Data Analysis and Applications", which aims to bring together cutting-edge research and practical applications that showcase the power and versatility of these fields. This Issue focuses on the intersection of computational methods and statistical analysis. It aims to combine the latest research and practical applications in these fields.

In this Special Issue, we welcome original research articles and reviews that cover a range of topics within our scope. Potential contributions in this Special Issue will cover a wide range of topics, including but not limited to advanced statistical models and algorithms for big data, advanced data visualization, predictive analytics, time series analysis, machine learning, and applications of computational statistics in specific industries.

If successful, by presenting cutting-edge research and real-world case studies, this Special Issue provides valuable insights and practical solutions for researchers, practitioners, and decision-makers interested in leveraging computational statistics and data analysis to solve complex problems and drive innovation.

We look forward to receiving your contributions.

Dr. Shanshan Wang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • statistical models and algorithms
  • renewable estimation
  • data visualization
  • complex data analytics
  • machine learning
  • data analysis and applications
  • computational statistics
  • forecasting and time series analysis
  • censored data analysis and applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

22 pages, 472 KB  
Article
Domain-Driven Identification of Football Probabilities
by Artur Karimov, Aleksandr Koshkin, Dmitrii Kaplun and Denis Butusov
Mathematics 2025, 13(24), 3976; https://doi.org/10.3390/math13243976 - 13 Dec 2025
Viewed by 275
Abstract
Obtaining accurate estimates of the true probabilities of sporting events remains a long-standing problem in sports analytics. In this paper we propose a new domain-driven approach that infers true probabilities from betting odds. This task is not trivial, as betting odds are noisy [...] Read more.
Obtaining accurate estimates of the true probabilities of sporting events remains a long-standing problem in sports analytics. In this paper we propose a new domain-driven approach that infers true probabilities from betting odds. This task is not trivial, as betting odds are noisy because of bookmaker margins (vig), insider bets, and model imperfections. In this study, we present a novel approach that integrates estimates across multiple groups of betting markets to obtain more robust estimates of true probability. Our method takes market structure into account and constructs a constrained optimisation problem that is solved using the Dixon–Coles model of a football match. We compare our approach with a wide range of existing methods, using a large dataset of 359035 matches from more than 6000 leagues. The proposed method achieves the lowest log-loss and the best probability calibration among all tested approaches. It also performs the best in terms of expected profit convergence in Monte Carlo simulations, outperforming its competitors in terms of MSE and bias. This study contributes both to a new margin-removal (devig) method and provides a comprehensive comparative analysis of other known methods. Beyond football, this approach has potential applications in other sports with discrete scoring systems and potentially in other areas involving stochastic processes and market inference, such as prediction markets, finance, reliability engineering, and social prediction systems. Full article
(This article belongs to the Special Issue Computational Statistics, Data Analysis and Applications)
Show Figures

Figure 1

24 pages, 511 KB  
Article
A Novel Feature Representation and Clustering for Histogram-Valued Data
by Qing Zhao and Huiwen Wang
Mathematics 2025, 13(23), 3840; https://doi.org/10.3390/math13233840 - 30 Nov 2025
Viewed by 287
Abstract
In an era where large-scale data are produced and collected rapidly, great interest is attributed to symbolic data analysis in order to explore connotative and significant information from massive data. Recently, novel statistical techniques for histogram-valued data have been proposed and widely applied [...] Read more.
In an era where large-scale data are produced and collected rapidly, great interest is attributed to symbolic data analysis in order to explore connotative and significant information from massive data. Recently, novel statistical techniques for histogram-valued data have been proposed and widely applied in various fields where traditional methods are not suitable. However, existing research has to face challenges in modeling posed by the complicated expression and intrinsic constraints of histogram-valued data. In this work, we introduce a novel representation for a histogram, by means of capturing the location and shape information of the corresponding probability distribution. And on this basis, an effective graph clustering method is developed to partition multivariate histogram-valued data by learning a high-quality similarity matrix. Simulation experiments and empirical case analysis demonstrate the proposed method significantly facilitates the clustering effect for histogram-valued data and presents obvious advantages compared with competing approaches. Full article
(This article belongs to the Special Issue Computational Statistics, Data Analysis and Applications)
Show Figures

Figure 1

27 pages, 1355 KB  
Article
Comparing Weighted RMSD, Weighted MD, Infit, and Outfit Item Fit Statistics Under Uniform Differential Item Functioning
by Alexander Robitzsch
Mathematics 2025, 13(23), 3752; https://doi.org/10.3390/math13233752 - 23 Nov 2025
Viewed by 430
Abstract
In educational large-scale assessment studies, uniform differential item functioning (DIF) across countries often challenges the application of a common item response model, such as the two-parameter logistic (2PL) model, to all participating countries. DIF occurs when certain items provide systematic advantages or disadvantages [...] Read more.
In educational large-scale assessment studies, uniform differential item functioning (DIF) across countries often challenges the application of a common item response model, such as the two-parameter logistic (2PL) model, to all participating countries. DIF occurs when certain items provide systematic advantages or disadvantages to specific groups, potentially biasing ability estimates and secondary analyses. Identifying misfitting items caused by DIF is therefore essential, and several item fit statistics have been proposed in the literature for this purpose. This article investigates the performance of four commonly used item fit statistics under uniform DIF: the weighted root mean square deviation (RMSD), the weighted mean deviation (MD), the infit, and the outfit statistics. Analytical approximations were derived to relate the uniform DIF effect size to these item fit statistics, and the theoretical findings were confirmed through a comprehensive simulation study. The results indicate that distribution-weighted RMSD and MD statistics are less sensitive to DIF in very easy or very difficult items, whereas difficulty-weighted RMSD and MD exhibit consistent detection performance across all item difficulty levels. However, the sampling variance of the difficulty-weighted statistics is notably higher for items with extreme difficulty. Infit and outfit statistics were largely ineffective in detecting DIF in items of moderate difficulty, with sensitivity limited to very easy or very difficult items. To illustrate the practical application of these statistics, they were computed for the PISA 2006 reading study, and the distribution of the statistics across participating countries was descriptively examined. The findings guide selecting appropriate item fit statistics in large-scale assessments and highlight the strengths and limitations of different approaches under uniform DIF conditions. Full article
(This article belongs to the Special Issue Computational Statistics, Data Analysis and Applications)
Show Figures

Figure 1

Review

Jump to: Research

30 pages, 865 KB  
Review
A Selective Overview of Quantile Regression for Large-Scale Data
by Shanshan Wang, Wei Cao, Xiaoxue Hu, Hanyu Zhong and Weixi Sun
Mathematics 2025, 13(5), 837; https://doi.org/10.3390/math13050837 - 2 Mar 2025
Cited by 3 | Viewed by 4018
Abstract
Large-scale data, characterized by heterogeneity due to heteroskedastic variance or inhomogeneous covariate effects, arises in diverse fields of scientific research and technological development. Quantile regression (QR) is a valuable tool for detecting heteroskedasticity, and numerous QR statistical methods for large-scale data have been [...] Read more.
Large-scale data, characterized by heterogeneity due to heteroskedastic variance or inhomogeneous covariate effects, arises in diverse fields of scientific research and technological development. Quantile regression (QR) is a valuable tool for detecting heteroskedasticity, and numerous QR statistical methods for large-scale data have been rapidly developed. This paper provides a selective review of recent advances in QR theory, methods, and implementations, particularly in the context of massive and streaming data. We focus on three key strategies for large-scale QR analysis: (1) distributed computing, (2) subsampling methods, and (3) online updating. The main contribution of this paper is a comprehensive review of existing work and advancements in these areas, addressing challenges such as managing the non-smooth QR loss function, developing distributed and online updating formulations, and conducting statistical inference. Finally, we highlight several issues that require further study. Full article
(This article belongs to the Special Issue Computational Statistics, Data Analysis and Applications)
Show Figures

Figure 1

Back to TopTop