entropy-logo

Journal Browser

Journal Browser

Statistical Methods for Modeling High-Dimensional and Complex Data: Second Edition

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: 31 July 2025 | Viewed by 6518

Special Issue Editor


E-Mail Website
Guest Editor
Department of Mathematics and Statistics, York University, Toronto, ON M3J 1P3, Canada
Interests: statistical modeling and inference for data with a very complex structure and/or with high dimension
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Statistical models help us to understand the structure of systems or processes in various fields of engineering, natural sciences, and social sciences. One of the most important tasks in statistics is the development of methods and theories for building statistical models for datasets, which are approximations of the reality embodied in the observed data. In general, such models are not unique. For a given set of competing models, it is important to choose the best approximation model among them before performing statistical analysis.

Since data often exhibit complex structures, statistical models are expected to capture this complexity, which can further deepen our understanding of the underlying data-generating mechanisms and advance related fields in science and engineering. This Special Issue calls for newly developed statistical methods to model high-dimensional, complex data, especially methods based on entropy or information theory.

Prof. Dr. Yuehua Wu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • model selection
  • spatiotemporal modeling
  • cluster analysis
  • high-dimensional statistics
  • data mining
  • multiple change-point detection

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 7837 KiB  
Article
Online Monitoring and Fault Diagnosis for High-Dimensional Stream with Application in Electron Probe X-Ray Microanalysis
by Tao Wang, Yunfei Guo, Fubo Zhu and Zhonghua Li
Entropy 2025, 27(3), 297; https://doi.org/10.3390/e27030297 - 13 Mar 2025
Viewed by 481
Abstract
This study introduces an innovative two-stage framework for monitoring and diagnosing high-dimensional data streams with sparse changes. The first stage utilizes an exponentially weighted moving average (EWMA) statistic for online monitoring, identifying change points through extreme value theory and multiple hypothesis testing. The [...] Read more.
This study introduces an innovative two-stage framework for monitoring and diagnosing high-dimensional data streams with sparse changes. The first stage utilizes an exponentially weighted moving average (EWMA) statistic for online monitoring, identifying change points through extreme value theory and multiple hypothesis testing. The second stage involves a fault diagnosis mechanism that accurately pinpoints abnormal components upon detecting anomalies. Through extensive numerical simulations and electron probe X-ray microanalysis applications, the method demonstrates exceptional performance. It rapidly detects anomalies, often within one or two sampling intervals post-change, achieves near 100% detection power, and maintains type-I error rates around the nominal 5%. The fault diagnosis mechanism shows a 99.1% accuracy in identifying components in 200-dimensional anomaly streams, surpassing principal component analysis (PCA)-based methods by 28.0% in precision and controlling the false discovery rate within 3%. Case analyses confirm the method’s effectiveness in monitoring and identifying abnormal data, aligning with previous studies. These findings represent significant progress in managing high-dimensional sparse-change data streams over existing methods. Full article
Show Figures

Figure 1

28 pages, 413 KiB  
Article
Penalized Exponentially Tilted Likelihood for Growing Dimensional Models with Missing Data
by Xiaoming Sha, Puying Zhao and Niansheng Tang
Entropy 2025, 27(2), 146; https://doi.org/10.3390/e27020146 - 1 Feb 2025
Viewed by 581
Abstract
This paper develops a penalized exponentially tilted (ET) likelihood to simultaneously estimate unknown parameters and select variables for growing dimensional models with missing response at random. The inverse probability weighted approach is employed to compensate for missing information and to ensure the consistency [...] Read more.
This paper develops a penalized exponentially tilted (ET) likelihood to simultaneously estimate unknown parameters and select variables for growing dimensional models with missing response at random. The inverse probability weighted approach is employed to compensate for missing information and to ensure the consistency of parameter estimators. Based on the penalized ET likelihood, we construct an ET likelihood ratio statistic to test the contrast hypothesis of parameters. Under some wild conditions, we obtain the consistency, asymptotic properties, and oracle properties of parameter estimators and show that the constrained penalized ET likelihood ratio statistic for testing the contrast hypothesis possesses the Wilks’ property. Simulation studies are conducted to validate the finite sample performance of the proposed methodologies. Thyroid data taken from the First People’s Hospital of Yunnan Province is employed to illustrate the proposed methodologies. Full article
31 pages, 13420 KiB  
Article
Subspace Learning for Dual High-Order Graph Learning Based on Boolean Weight
by Yilong Wei, Jinlin Ma, Ziping Ma and Yulei Huang
Entropy 2025, 27(2), 107; https://doi.org/10.3390/e27020107 - 22 Jan 2025
Viewed by 966
Abstract
Subspace learning has achieved promising performance as a key technique for unsupervised feature selection. The strength of subspace learning lies in its ability to identify a representative subspace encompassing a cluster of features that are capable of effectively approximating the space of the [...] Read more.
Subspace learning has achieved promising performance as a key technique for unsupervised feature selection. The strength of subspace learning lies in its ability to identify a representative subspace encompassing a cluster of features that are capable of effectively approximating the space of the original features. Nonetheless, most existing unsupervised feature selection methods based on subspace learning are constrained by two primary challenges. (1) Many methods only predominantly focus on the relationships between samples in the data space but ignore the correlated information between features in the feature space, which is unreliable for exploiting the intrinsic spatial structure. (2) Graph-based methods typically only take account of one-order neighborhood structures, neglecting high-order neighborhood structures inherent in original data, thereby failing to accurately preserve local geometric characteristics of the data. To pursue filling this gap in research, taking dual high-order graph learning into account, we propose a framework called subspace learning for dual high-order graph learning based on Boolean weight (DHBWSL). Firstly, a framework for unsupervised feature selection based on subspace learning is proposed, which is extended by dual-graph regularization to fully investigate geometric structure information on dual spaces. Secondly, the dual high-order graph is designed by embedding Boolean weights to learn a more extensive node from the original space such that the appropriate high-order adjacency matrix can be selected adaptively and flexibly. Experimental results on 12 public datasets demonstrate that the proposed DHBWSL outperforms the nine recent state-of-the-art algorithms. Full article
Show Figures

Figure 1

24 pages, 74482 KiB  
Article
Bayesian Regression Analysis for Dependent Data with an Elliptical Shape
by Yian Yu, Long Tang, Kang Ren, Zhonglue Chen, Shengdi Chen and Jianqing Shi
Entropy 2024, 26(12), 1072; https://doi.org/10.3390/e26121072 - 9 Dec 2024
Viewed by 1066
Abstract
This paper proposes a parametric hierarchical model for functional data with an elliptical shape, using a Gaussian process prior to capturing the data dependencies that reflect systematic errors while modeling the underlying curved shape through a von Mises–Fisher distribution. The model definition, Bayesian [...] Read more.
This paper proposes a parametric hierarchical model for functional data with an elliptical shape, using a Gaussian process prior to capturing the data dependencies that reflect systematic errors while modeling the underlying curved shape through a von Mises–Fisher distribution. The model definition, Bayesian inference, and MCMC algorithm are discussed. The effectiveness of the model is demonstrated through the reconstruction of curved trajectories using both simulated and real-world examples. The discussion in this paper focuses on two-dimensional problems, but the framework can be extended to higher-dimensional spaces, making it adaptable to a wide range of applications. Full article
Show Figures

Figure 1

17 pages, 1232 KiB  
Article
Optimizing Prognostic Predictions in Liver Cancer with Machine Learning and Survival Analysis
by Kaida Cai, Wenzhi Fu, Zhengyan Wang, Xiaofang Yang, Hanwen Liu and Ziyang Ji
Entropy 2024, 26(9), 767; https://doi.org/10.3390/e26090767 - 7 Sep 2024
Cited by 1 | Viewed by 1727
Abstract
This study harnesses RNA sequencing data from the Cancer Genome Atlas to unearth pivotal genetic markers linked to the progression of liver hepatocellular carcinoma (LIHC), a major contributor to cancer-related deaths worldwide, characterized by a dire prognosis and limited treatment avenues. We employ [...] Read more.
This study harnesses RNA sequencing data from the Cancer Genome Atlas to unearth pivotal genetic markers linked to the progression of liver hepatocellular carcinoma (LIHC), a major contributor to cancer-related deaths worldwide, characterized by a dire prognosis and limited treatment avenues. We employ advanced feature selection techniques, including sure independence screening (SIS) combined with the least absolute shrinkage and selection operator (Lasso), smoothly clipped absolute deviation (SCAD), information gain (IG), and permutation variable importance (VIMP) methods, to effectively navigate the challenges posed by ultra-high-dimensional data. Through these methods, we identify critical genes like MED8 as significant markers for LIHC. These markers are further analyzed using advanced survival analysis models, including the Cox proportional hazards model, survival tree, and random survival forests. Our findings reveal that SIS-Lasso demonstrates strong predictive accuracy, particularly in combination with the Cox proportional hazards model. However, when coupled with the random survival forests method, the SIS-VIMP approach achieves the highest overall performance. This comprehensive approach not only enhances the prediction of LIHC outcomes but also provides valuable insights into the genetic mechanisms underlying the disease, thereby paving the way for personalized treatment strategies and advancing the field of cancer genomics. Full article
Show Figures

Figure 1

21 pages, 3074 KiB  
Article
Tail Risk Dynamics under Price-Limited Constraint: A Censored Autoregressive Conditional Fréchet Model
by Tao Xu, Lei Shu and Yu Chen
Entropy 2024, 26(7), 555; https://doi.org/10.3390/e26070555 - 28 Jun 2024
Viewed by 1026
Abstract
This paper proposes a novel censored autoregressive conditional Fréchet (CAcF) model with a flexible evolution scheme for the time-varying parameters, which allows deciphering tail risk dynamics constrained by price limits from the viewpoints of different risk preferences. The proposed model can well accommodate [...] Read more.
This paper proposes a novel censored autoregressive conditional Fréchet (CAcF) model with a flexible evolution scheme for the time-varying parameters, which allows deciphering tail risk dynamics constrained by price limits from the viewpoints of different risk preferences. The proposed model can well accommodate many important empirical characteristics of financial data, such as heavy-tailedness, volatility clustering, extreme event clustering, and price limits. We then investigate tail risk dynamics via the CAcF model in the price-limited stock markets, taking entropic value at risk (EVaR) as a risk measurement. Our findings suggest that tail risk will be seriously underestimated in price-limited stock markets when the censored property of limit prices is ignored. Additionally, the evidence from the Chinese Taiwan stock market shows that widening price limits would lead to a decrease in the incidence of extreme events (hitting limit-down) but a significant increase in tail risk. Moreover, we find that investors with different risk preferences may make opposing decisions about an extreme event. In summary, the empirical results reveal the effectiveness of our model in interpreting and predicting time-varying tail behaviors in price-limited stock markets, providing a new tool for financial risk management. Full article
Show Figures

Figure 1

Back to TopTop