Advances in Statistical Approaches with Applications for Multivariate Data Analysis

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "D1: Probability and Statistics".

Deadline for manuscript submissions: 31 December 2026 | Viewed by 3992

Special Issue Editor


E-Mail
Guest Editor
Department of Community Health & Epidemiology, College of Medicine, University of Saskatchewan, SK S7N 5E5, Canada
Interests: longitudinal data analysis; statistical learning/machine learning; dynamic data science; spatial statistics; mixed models; statistical computing; time series analysis; biostatistics

Special Issue Information

Dear Colleagues,

Multivariate data analysis is a cornerstone of modern statistics and data science, enabling researchers to uncover complex relationships, patterns, and structures across multiple dimensions. With the exponential growth in data dimensions and complexity, traditional statistical methods often fall short in providing reliable and interpretable insights. This Special Issue, “Advances in Statistical Approaches with Applications for Multivariate Data Analysis”, brings together cutting-edge methodologies, theoretical developments, and innovative applications tailored to address challenges in multivariate data analysis across diverse domains.

In addition to methodological advancements, the Special Issue emphasizes practical applications. Papers showcase solutions to real-world problems in diverse domains, including genomics, image analysis, financial modeling, health sciences, and environmental sciences. This application-focused perspective demonstrates the relevance and adaptability of these advanced techniques in handling complex and heterogeneous data structures.

This Special Issue focuses on a wide range of topics, including, but not limited to novel developments in dimension reduction methods, advances in clustering and classification for multivariate data, robust multivariate statistical models for high-dimensional data, time series and longitudinal data analysis, machine/statistical learning methods for multivariate data, applications in genomics, neuroimaging, and computational biology, multivariate approaches for spatial and environmental data, statistical tools for dynamic systems and functional data, etc.

Dr. Md Erfanul Hoque
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multivariate data
  • biostatistical methods
  • time series analysis
  • dynamic data science
  • machine/statistical learning
  • computational statistics
  • bayesian analysis
  • longitudinal data analysis
  • image analysis
  • spatial/spatio temporal analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

26 pages, 12712 KB  
Article
Subsampling-Based Consensus Hierarchical Clustering for Robust Customer Segmentation with Mixed-Type Data
by Nooshin Marefat, Purificación Galindo-Villardón and Purificación Vicente-Galindo
Mathematics 2026, 14(8), 1294; https://doi.org/10.3390/math14081294 - 13 Apr 2026
Abstract
Hierarchical clustering is an unsupervised framework that organizes observations according to pairwise similarity relationships. In this study, an agglomerative hierarchical approach combined with Gower dissimilarity is employed to accommodate mixed-type customer data. To address data quality issues such as missing values and outliers, [...] Read more.
Hierarchical clustering is an unsupervised framework that organizes observations according to pairwise similarity relationships. In this study, an agglomerative hierarchical approach combined with Gower dissimilarity is employed to accommodate mixed-type customer data. To address data quality issues such as missing values and outliers, Multiple Imputation by Chained Equations (MICE) and Winsorization are incorporated into the preprocessing pipeline. To validate cluster stability and identify the optimal number of clusters, we employ silhouette analysis, the Davies–Bouldin Index (DBI), the Proportion of Ambiguous Clustering (PAC), and a subsampling-based consensus clustering framework. A consensus-based hierarchical tree derived from the consensus matrix is employed to assess the robustness of the segmentation structure. The resulting clusters are further evaluated through comparisons with baseline algorithms for mixed-type data, including Partitioning Around Medoids (PAM) based on Gower dissimilarity and the K-prototypes method, together with statistical tests confirming significant behavioral differences between the identified segments. From an application standpoint, these results provide a data-driven basis for customer targeting by identifying distinct behavioral patterns, thereby supporting more effective engagement strategies and optimized resource allocation. Full article
43 pages, 10109 KB  
Article
Stabilizer Variables for Measurement Invariance–Induced Heterogeneity: Identification Theory and Testing in Multi-Group Models
by Salim Yilmaz and Erhan Cene
Mathematics 2026, 14(6), 1064; https://doi.org/10.3390/math14061064 - 21 Mar 2026
Viewed by 366
Abstract
When measurement invariance (MI) is violated in multi-group structural equation models, group-specific measurement artifacts inflate the between-group variance of structural parameters beyond their true values. Existing remedies—partial invariance, group-specific estimation, or moderation analysis—address the consequences of inflation but not its mechanism. This article [...] Read more.
When measurement invariance (MI) is violated in multi-group structural equation models, group-specific measurement artifacts inflate the between-group variance of structural parameters beyond their true values. Existing remedies—partial invariance, group-specific estimation, or moderation analysis—address the consequences of inflation but not its mechanism. This article introduces the stabilizer variable, a covariate that absorbs measurement-induced parameter heterogeneity while maintaining structural independence from the focal relationship. Two theoretical results are established: a variance decomposition theorem showing that MI violations inflate dispersion through an identifiable artifactual component, and a purification theorem proving that a stabilizer reduces this dispersion via Frisch–Waugh–Lovell projection. Two stabilization mechanisms are identified: variance purification (Type A) and directional alignment (Type B). We then develop the stabilizer variable test, a dual-criterion procedure combining nonparametric bootstrap testing for stabilization magnitude with binomial testing for directional consistency, incorporating adaptive MI severity scoring with calibrated fit-index weights. Simulations comprising 949,100 replications across varying group counts, sample sizes, and MI severity levels demonstrate 80–99% power with false-positive rates below 2%. Practical guidelines recommend K10 groups and n100 per group for conservative applications. The framework generalizes to any multi-group regression context where systematic measurement error induces spurious parameter heterogeneity. Full article
Show Figures

Figure 1

33 pages, 1665 KB  
Article
Modeling Healthcare Data with a Novel Flexible Three-Parameter Distribution
by Thamer Manshi, Ammar M. Sarhan and M. E. Sobh
Mathematics 2026, 14(2), 359; https://doi.org/10.3390/math14020359 - 21 Jan 2026
Viewed by 321
Abstract
Developing flexible lifetime distributions is essential for accurately modeling reliability and lifetime data across various scientific and engineering contexts. In this work, we introduce a new three-parameter lifetime distribution, which extends the well-known two-parameter Sarhan–Tadj–Hamilton model. We derive and discuss several of its [...] Read more.
Developing flexible lifetime distributions is essential for accurately modeling reliability and lifetime data across various scientific and engineering contexts. In this work, we introduce a new three-parameter lifetime distribution, which extends the well-known two-parameter Sarhan–Tadj–Hamilton model. We derive and discuss several of its important theoretical properties, including the reliability characteristics and moments. The parameter estimation is carried out using both maximum likelihood and Bayesian approaches, providing a comprehensive comparison of inferential techniques. To further examine the efficiency and robustness of the proposed estimators, a detailed Monte Carlo simulation study is conducted under different sample sizes and parameter settings. The practical usefulness of the distribution is illustrated through its application to three real-world datasets, namely cancer and COVID-19 data, where it demonstrates superior fit and flexibility compared to existing and nested lifetime models. These findings highlight the potential of the proposed model as a valuable addition to the toolbox of applied statisticians and reliability practitioners. Full article
Show Figures

Figure 1

22 pages, 12979 KB  
Article
A High-Breakdown MCD-Based Robust Concordance Correlation Coefficient
by Hasan Bulut, Müjgan Zobu and Vedat Sağlam
Mathematics 2026, 14(1), 196; https://doi.org/10.3390/math14010196 - 4 Jan 2026
Viewed by 632
Abstract
The concordance correlation coefficient (CCC) is a popular measure of agreement between two continuous variables but is highly sensitive to outliers and data contamination. In this study, we propose a robust reformulation of the CCC by replacing classical moment estimators with Minimum Covariance [...] Read more.
The concordance correlation coefficient (CCC) is a popular measure of agreement between two continuous variables but is highly sensitive to outliers and data contamination. In this study, we propose a robust reformulation of the CCC by replacing classical moment estimators with Minimum Covariance Determinant (MCD) estimators. The proposed robust CCC preserves the interpretability of the classical coefficient while providing substantially improved robustness. Comprehensive Monte Carlo simulations under normal and non-normal distributions, varying sample sizes, correlation levels, and contamination schemes compare the proposed coefficient with the classical CCC and existing robust alternatives. The results show that the proposed robust CCC achieves superior stability and accuracy in contaminated settings while remaining competitive under clean data. Theoretical properties of the estimator are discussed, and its practical usefulness is demonstrated using real glucose measurement and blood pressure data sets. The proposed method is implemented in the MVTests R package, enabling straightforward application to real-world data. Full article
Show Figures

Figure 1

26 pages, 634 KB  
Article
Time-Weighted Result-Based Strength Indicators from Head-to-Head Outcomes: An Application to Trotter (Harness) Racing
by Manuel Ligero-Acosta, Juan M. Muñoz-Pichardo, María Dolores Gómez, María Ripollés-Lobo and Mercedes Valera
Mathematics 2026, 14(1), 167; https://doi.org/10.3390/math14010167 - 1 Jan 2026
Viewed by 633
Abstract
We propose a general methodology for constructing dynamic performance indicators (or strength metrics) in any sport that relies on comparative outcomes among competitors, using chronological positional data. Specifically, we develop a family of strength indicators for harness trotting races based on time-weighted, head-to-head [...] Read more.
We propose a general methodology for constructing dynamic performance indicators (or strength metrics) in any sport that relies on comparative outcomes among competitors, using chronological positional data. Specifically, we develop a family of strength indicators for harness trotting races based on time-weighted, head-to-head results. Using the official Balearic trotting records (1990–2023), we construct win, draw, and confrontation matrices up to each event and apply a triweight kernel to reduce the influence of older results. From these matrices, we derive a family of five bounded, interpretable indicators on the interval [0,1]: an overall average win rate, a category-adjusted version, and three distance-specific versions (short, medium, and long). Indicator validation is performed via predictive validation, employing regularized logistic regression models (Elastic Net) based on indicator differences between horse pairs. Standard metrics (accuracy, calibration, discrimination, and Brier score) are used for the validation analysis. The results confirm that the indicators are coherent, stable, and interpretable, demonstrating that the generic construction procedure yields robust outcomes. We conclude that these indicators establish a solid and easily updatable foundation for developing dynamic ranking systems and practical selection/handicap procedures in trotting. Full article
Show Figures

Figure 1

25 pages, 1808 KB  
Article
A Dependent Bivariate Burr XII Inverse Weibull Model: Application to Diabetic Retinopathy and Dependent Competing Risks Data
by Ammar M. Sarhan, Ahlam H. Tolba, Dina A. Ramadan and Thamer Manshi
Mathematics 2026, 14(1), 120; https://doi.org/10.3390/math14010120 - 28 Dec 2025
Viewed by 385
Abstract
This paper introduces a novel bivariate distribution, referred to as the Bivariate Burr XII Inverse Weibull (BBXII-IW) distribution, constructed via the Marshall–Olkin approach from the univariate Burr XII Inverse Weibull (BXII-IW) distribution. The proposed BBXII-IW model provides a flexible framework for modeling dependent [...] Read more.
This paper introduces a novel bivariate distribution, referred to as the Bivariate Burr XII Inverse Weibull (BBXII-IW) distribution, constructed via the Marshall–Olkin approach from the univariate Burr XII Inverse Weibull (BXII-IW) distribution. The proposed BBXII-IW model provides a flexible framework for modeling dependent bivariate data, including competing risk scenarios. The key statistical properties of the distribution are derived, and parameter estimation is conducted using the maximum likelihood method. The model’s performance is evaluated using two types of real-world datasets: (1) bivariate data and (2) dependent competing risk data related to diabetic retinopathy. The results demonstrate that the BBXII-IW distribution offers an improved fit compared to existing models, highlighting its flexibility and practical relevance in modeling complex dependent structures. Full article
Show Figures

Figure 1

33 pages, 752 KB  
Article
Flux and First-Passage Time Distributions in One-Dimensional Integrated Stochastic Processes with Arbitrary Temporal Correlation and Drift
by Holger Nobach and Stephan Eule
Mathematics 2025, 13(19), 3163; https://doi.org/10.3390/math13193163 - 2 Oct 2025
Viewed by 804
Abstract
The arrival of tracers at boundaries with defined distances from the origin of their motion in stochastically fluctuating advection processes is investigated. The advection model is a stationary one-dimensional integrated stochastic process with an arbitrary a priori known correlation and with possible mean [...] Read more.
The arrival of tracers at boundaries with defined distances from the origin of their motion in stochastically fluctuating advection processes is investigated. The advection model is a stationary one-dimensional integrated stochastic process with an arbitrary a priori known correlation and with possible mean drift. The current (direction-sensitive), the total flux (direction-insensitive) of tracers through a non-absorbing boundary, and the first-passage times of the tracers at an absorbing boundary are derived depending on the correlation function of the carrying flow velocity. While the general derivations are universal with respect to the distribution function of the advection’s increments, the current and the total flux are explicitly derived for a Gaussian distribution. The first-passage time is derived implicitly through an integral that is solved numerically in the present study. No approximations or restrictions to special cases of the advection process are used. One application is one-dimensional Gaussian turbulence, where the one-dimensional random velocity carries tracer particles through space. Finally, subdiffusive or superdiffusive behavior can temporarily be reached by such a stochastic process with an adequately designed correlation function. Full article
Show Figures

Figure 1

Back to TopTop