Multivariate Statistical Analysis and Application

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "D1: Probability and Statistics".

Deadline for manuscript submissions: 31 July 2025 | Viewed by 13180

Special Issue Editors


E-Mail Website
Guest Editor
Department of Statistics, University of Salamanca, 37008 Salamanca, Spain
Interests: multivariate data analysis; statistical data analysis; corporate social responsibility; sustainability

E-Mail Website
Guest Editor
Department of Statistics and Operational Research, University of Cádiz, Cádiz, Spain
Interests: statgraphics; coaching

Special Issue Information

Dear Colleagues,

Statistics is undoubtedly the most widely used mathematical tool in scientific research. Almost by definition, all scientific areas have the study of a phenomenon as their object, and through the systematic collection of data that describe it, statistics provides the research methodology for the treatment of the data. This Special Issue invites papers on data analysis topics with potential applications in the life and social sciences. We solicit papers on statistics applied to any science or field, where the analysis methods represent an important point of the investigation, but without forgetting the social impact of the results. Within applied statistics, this Special Issue focuses on multivariate methods, with particular emphasis on dimensionality reduction, textual content analysis, and composite indices. We also encourage the submission of papers that include modules and computational packages that allow the reproduction and implementation of the results.

Dr. Víctor Amor-Esteban
Prof. Dr. David Almorza-Gomar
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multivariate statistical methods
  • applied statistics
  • biplot methods
  • data analysis
  • dimensionality reduction techniques
  • composite index
  • statistical text analysis
  • sentiment analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 8462 KiB  
Article
Comparison of Trivariate Copula-Based Conditional Quantile Regression Versus Machine Learning Methods for Estimating Copper Recovery
by Heber Hernández, Martín Alberto Díaz-Viera, Elisabete Alberdi and Aitor Goti
Mathematics 2025, 13(4), 576; https://doi.org/10.3390/math13040576 - 10 Feb 2025
Cited by 1 | Viewed by 774
Abstract
In this study, an innovative methodology using trivariate copula-based conditional quantile regression (CBQR) is proposed for estimating copper recovery. This approach is compared with six supervised machine learning regression methods, namely, Decision Tree, Extra Tree, Support Vector Regression (linear and epsilon), Multilayer Perceptron, [...] Read more.
In this study, an innovative methodology using trivariate copula-based conditional quantile regression (CBQR) is proposed for estimating copper recovery. This approach is compared with six supervised machine learning regression methods, namely, Decision Tree, Extra Tree, Support Vector Regression (linear and epsilon), Multilayer Perceptron, and Random Forest. For comparison purposes, an open access database representative of a porphyry copper deposit is used. The database contains geochemical information on minerals, mineral zoning data, and metallurgical test results related to copper recovery by flotation. To simulate a high undersampling scenario, only 5% of the copper recovery information was used for training and validation, while the remaining 95% was used for prediction, applying in all these stages error metrics, such as R2, MaxRE, MAE, MSE, MedAE, and MAPE. The results demonstrate that trivariate CBQR outperforms machine learning methods in accuracy and flexibility, offering a robust alternative solution to model complex relationships between variables under limited data conditions. This approach not only avoids the need for intensive tuning of multiple hyperparameters, but also effectively addresses estimation challenges in scenarios where traditional methods are insufficient. Finally, the feasibility of applying this methodology to different data scales is evaluated, integrating the error associated with the change in scale as an inherent part of the estimation of conditioning variables in the geostatistical context. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
Show Figures

Figure 1

17 pages, 573 KiB  
Article
Fitting Penalized Estimator for Sparse Covariance Matrix with Left-Censored Data by the EM Algorithm
by Shanyi Lin, Qian-Zhen Zheng, Laixu Shang, Ping-Feng Xu and Man-Lai Tang
Mathematics 2025, 13(3), 423; https://doi.org/10.3390/math13030423 - 27 Jan 2025
Cited by 1 | Viewed by 633
Abstract
Estimating the sparse covariance matrix can effectively identify important features and patterns, and traditional estimation methods require complete data vectors on all subjects. When data are left-censored due to detection limits, common strategies such as excluding censored individuals or replacing censored values with [...] Read more.
Estimating the sparse covariance matrix can effectively identify important features and patterns, and traditional estimation methods require complete data vectors on all subjects. When data are left-censored due to detection limits, common strategies such as excluding censored individuals or replacing censored values with suitable constants may result in large biases. In this paper, we propose two penalized log-likelihood estimators, incorporating the L1 penalty and SCAD penalty, for estimating the sparse covariance matrix of a multivariate normal distribution in the presence of left-censored data. However, the fitting of these penalized estimators poses challenges due to the observed log-likelihood involving high-dimensional integration over the censored variables. To address this issue, we treat censored data as a special case of incomplete data and employ the Expectation Maximization algorithm combined with the coordinate descent algorithm to efficiently fit the two penalized estimators. Through simulation studies, we demonstrate that both penalized estimators achieve greater estimation accuracy compared to methods that replace censored values with constants. Moreover, the SCAD penalized estimator generally outperforms the L1 penalized estimator. Our method is used to analyze the proteomic datasets. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
Show Figures

Figure 1

14 pages, 899 KiB  
Article
Extensions to Mean–Geometric Mean Linking
by Alexander Robitzsch
Mathematics 2025, 13(1), 35; https://doi.org/10.3390/math13010035 - 26 Dec 2024
Cited by 2 | Viewed by 631
Abstract
Mean-geometric mean (MGM) linking is a widely used method for linking two groups within the two-parameter logistic (2PL) item response model. However, the presence of differential item functioning (DIF) can lead to biased parameter estimates using the traditional MGM method. To address this, [...] Read more.
Mean-geometric mean (MGM) linking is a widely used method for linking two groups within the two-parameter logistic (2PL) item response model. However, the presence of differential item functioning (DIF) can lead to biased parameter estimates using the traditional MGM method. To address this, alternative linking methods based on robust loss functions have been proposed. In this article, the conventional L2 loss function is compared with the L0.5 and L0 loss functions in MGM linking. Our results suggest that robust loss functions are preferable when dealing with outlying DIF effects, with the L0 function showing particular advantages in tests with larger item sets and sample sizes. Additionally, a simulation study demonstrates that defining MGM linking based on item intercepts rather than item difficulties leads to more accurate linking parameter estimates. Finally, robust Haberman linking slightly outperforms robust MGM linking in two-group comparisons. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
22 pages, 541 KiB  
Article
Cross-Validated Functional Generalized Partially Linear Single-Functional Index Model
by Mustapha Rachdi, Mohamed Alahiane, Idir Ouassou, Abdelaziz Alahiane and Lahoucine Hobbad
Mathematics 2024, 12(17), 2649; https://doi.org/10.3390/math12172649 - 26 Aug 2024
Viewed by 883
Abstract
In this paper, we have introduced a functional approach for approximating nonparametric functions and coefficients in the presence of multivariate and functional predictors. By utilizing the Fisher scoring algorithm and the cross-validation technique, we derived the necessary components that allow us to explain [...] Read more.
In this paper, we have introduced a functional approach for approximating nonparametric functions and coefficients in the presence of multivariate and functional predictors. By utilizing the Fisher scoring algorithm and the cross-validation technique, we derived the necessary components that allow us to explain scalar responses, including the functional index, the nonlinear regression operator, the single-index component, and the systematic component. This approach effectively addresses the curse of dimensionality and can be applied to the analysis of multivariate and functional random variables in a separable Hilbert space. We employed an iterative Fisher scoring procedure with normalized B-splines to estimate the parameters, and both the theoretical and practical evaluations demonstrated its favorable performance. The results indicate that the nonparametric functions, the coefficients, and the regression operators can be estimated accurately, and our method exhibits strong predictive capabilities when applied to real or simulated data. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
Show Figures

Figure 1

16 pages, 735 KiB  
Article
Lp-Norm for Compositional Data: Exploring the CoDa L1-Norm in Penalised Regression
by Jordi Saperas-Riera, Glòria Mateu-Figueras and Josep Antoni Martín-Fernández
Mathematics 2024, 12(9), 1388; https://doi.org/10.3390/math12091388 - 1 May 2024
Viewed by 1417
Abstract
The Least Absolute Shrinkage and Selection Operator (LASSO) regression technique has proven to be a valuable tool for fitting and reducing linear models. The trend of applying LASSO to compositional data is growing, thereby expanding its applicability to diverse scientific domains. This paper [...] Read more.
The Least Absolute Shrinkage and Selection Operator (LASSO) regression technique has proven to be a valuable tool for fitting and reducing linear models. The trend of applying LASSO to compositional data is growing, thereby expanding its applicability to diverse scientific domains. This paper aims to contribute to this evolving landscape by undertaking a comprehensive exploration of the L1-norm for the penalty term of a LASSO regression in a compositional context. This implies first introducing a rigorous definition of the compositional Lp-norm, as the particular geometric structure of the compositional sample space needs to be taken into account. The focus is subsequently extended to a meticulous data-driven analysis of the dimension reduction effects on linear models, providing valuable insights into the interplay between penalty term norms and model performance. An analysis of a microbial dataset illustrates the proposed approach. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
Show Figures

Figure 1

23 pages, 1126 KiB  
Article
Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations
by Qi Zhang, Yihui Zhang and Yemao Xia
Mathematics 2024, 12(5), 783; https://doi.org/10.3390/math12050783 - 6 Mar 2024
Viewed by 1342
Abstract
Semi-continuous data are very common in social sciences and economics. In this paper, a Bayesian variable selection procedure is developed to assess the influence of observed and/or unobserved exogenous factors on semi-continuous data. Our formulation is based on a two-part latent variable model [...] Read more.
Semi-continuous data are very common in social sciences and economics. In this paper, a Bayesian variable selection procedure is developed to assess the influence of observed and/or unobserved exogenous factors on semi-continuous data. Our formulation is based on a two-part latent variable model with polytomous responses. We consider two schemes for the penalties of regression coefficients and factor loadings: a Bayesian spike and slab bimodal prior and a Bayesian lasso prior. Within the Bayesian framework, we implement a Markov chain Monte Carlo sampling method to conduct posterior inference. To facilitate posterior sampling, we recast the logistic model from Part One as a norm-type mixture model. A Gibbs sampler is designed to draw observations from the posterior. Our empirical results show that with suitable values of hyperparameters, the spike and slab bimodal method slightly outperforms Bayesian lasso in the current analysis. Finally, a real example related to the Chinese Household Financial Survey is analyzed to illustrate application of the methodology. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
Show Figures

Figure 1

22 pages, 3073 KiB  
Article
The Waste Hierarchy at the Business Level: An International Outlook
by Beatriz Aibar-Guzmán, Sónia Monteiro, Fátima David and Francisco M. Somohano-Rodríguez
Mathematics 2023, 11(22), 4574; https://doi.org/10.3390/math11224574 - 8 Nov 2023
Cited by 1 | Viewed by 1873
Abstract
Sustainable waste management is becoming a common goal in most countries. The national legal framework largely determines the waste management practices, the socio-demographic characteristics, and the economic level of the country and, in the case of businesses, the type of business, the industry [...] Read more.
Sustainable waste management is becoming a common goal in most countries. The national legal framework largely determines the waste management practices, the socio-demographic characteristics, and the economic level of the country and, in the case of businesses, the type of business, the industry in which it operates, and the sector-specific regulations to which it is subject. This paper aims to examine the importance that firms worldwide place on waste management by analyzing the evolution over time of waste management practices used by firms and how this evolution has varied across countries and sectors. The X-STATIS technique is applied to conduct a multivariate analysis using data from seven-hundred and eighty firms from twenty-eight countries and eight sectors from 2016 to 2020 (3900 observations). The results show that waste management has become more important worldwide over time. In terms of waste management practices, the management of the impacts of generated waste occupies the first place in the ranking, performed by 97.5% of the sampled firms in 2020; this is followed by the methods of the disposal of non-hazardous waste (66%) while waste prevention policies occupy the last place in the ranking (30.6%). At the country level, the most committed countries are Taiwan (74.3%) and Finland (70.6%), followed by France, Spain, Russia, Italy, and the United States (60.0–66.9%); meanwhile, the least committed countries are the United Kingdom, Australia, and Ireland (35–36%). At the sector level, consumer goods (63.7%) and oil and gas (63.0%) lead the ranking while the least committed sectors are technology and telecommunications (50.0%) and real estate services (49.3%). The evolution of companies’ commitment to waste management is gradual in all sectors, with oil and gas at the top, with a percentage variation of 21.4%, and consumer goods at the bottom, with 5.2%. In addition, our results suggest that the sector influences waste management practices more than the country of origin of the firms. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
Show Figures

Figure 1

22 pages, 4186 KiB  
Article
Application of Multivariate Statistical Techniques as an Indicator of Variability of the Effects of COVID-19 on the Paris Memorandum of Understanding on Port State Control
by Jose Manuel Prieto, Víctor Amor-Esteban, David Almorza-Gomar, Ignacio Turias and Francisco Piniella
Mathematics 2023, 11(14), 3188; https://doi.org/10.3390/math11143188 - 20 Jul 2023
Cited by 5 | Viewed by 1263
Abstract
The first pandemic of the 21st Century was declared at the beginning of the year 2020 due to the spread of the COVID-19 virus. Its effects devastated the world economy and greatly affected maritime transport, one of the precursors of globalisation. This paper [...] Read more.
The first pandemic of the 21st Century was declared at the beginning of the year 2020 due to the spread of the COVID-19 virus. Its effects devastated the world economy and greatly affected maritime transport, one of the precursors of globalisation. This paper studies the effects of the pandemic on this type of transport, using data from 23,803 Paris Memorandum of Understanding Port State Control (PSC) inspections conducted in the top 10 major European ports. Comparisons have been made between Pre-COVID (2013–2019) and COVID (2020–2021) years, by way of multivariate methodologies: CO-X-STATIS, X-STATIS, and correspondence tables. The results were striking and indicate a clear change in the conduct of inspections during the COVID period, both quantitatively and qualitatively, showing a drastic reduction in the number of inspections and a change in type, with exhaustive inspections assuming a secondary role. Another notable result came from the use of the same methodology to study the different countries of registry and their evolution within PSC inspections during the Pre-COVID and COVID periods, where different behaviours were identified based on a ship’s flag. These results can help us to determine important supervisory objectives for each country’s maritime administration and their inspectors, to indicate weaknesses in the inspection routines caused by the pandemic, and to attempt corrections to improve maritime safety. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
Show Figures

Figure 1

15 pages, 7918 KiB  
Article
Trends in Agroforestry Research from 1993 to 2022: A Topic Model Using Latent Dirichlet Allocation and HJ-Biplot
by Karime Montes-Escobar, Javier De la Hoz-M, Mónica Daniela Barreiro-Linzán, Carolina Fonseca-Restrepo, Miguel Ángel Lapo-Palacios, Douglas Andrés Verduga-Alcívar and Carlos Alfredo Salas-Macias
Mathematics 2023, 11(10), 2250; https://doi.org/10.3390/math11102250 - 11 May 2023
Cited by 8 | Viewed by 2730
Abstract
Background: There is an immense debate about the factors that could limit the adoption of agroforestry systems. However, one of the most important is the generation of scientific information that supports the viability and benefits of the proposed techniques. Statistical analysis: This work [...] Read more.
Background: There is an immense debate about the factors that could limit the adoption of agroforestry systems. However, one of the most important is the generation of scientific information that supports the viability and benefits of the proposed techniques. Statistical analysis: This work used the Latent Dirichlet Allocation (LDA) modeling method to identify and interpret scientific information on topics in relation to existing categories in a set of documents. It also used the HJ-Biplot method to determine the relationship between the analyzed topics, taking into consideration the years under study. Results: A review of the literature was conducted in this study and a total of 9794 abstracts of scientific articles published between 1993 and 2022 were obtained. The United States, India, Brazil, the United Kingdom, and Germany were the five countries that published the largest number of studies about agroforestry, particularly soil organic carbon, which was the most studied case. The five more frequently studied topics were: soil organic carbon, adoption of agroforestry practices, biodiversity, climatic change global policies, and carbon and climatic change. Conclusion: the LDA and HJ-Biplot statistical methods are useful tools for determining topicality in text analysis in agroforestry and related topics. Full article
(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)
Show Figures

Figure 1

Back to TopTop