Special Issue "Statistical Inference from High Dimensional Data"

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Statistical Physics".

Deadline for manuscript submissions: 29 February 2020.

Special Issue Editor

Dr. Carlos Fernandez-Lozano
E-Mail Website
Guest Editor
Department of Computer Science, Faculty of Computer Science, University of A Coruña, CITIC, A Coruña 15071, Spain
Interests: machine learning; feature selection; complex biological systems; cancer systems; bionformatics; biomedical data science; computational biology

Special Issue Information

Dear Colleagues,

Continuous improvement and cost reduction in next-generation sequencing platforms is enabling better understanding of multifactorial and complex pathologies such as cancer. This is the typical problem in which the amount of data matters and where, in addition, the so-called curse of dimensionality occurs (the number of variables is many orders of magnitude greater than the number of cases). In this Special Issue, we welcome contributions that apply different approaches of Statistical Inference or Machine Learning for the characterization of complex pathologies using -omic data. We strongly encourage interdisciplinary works with real data (TCGA, HMP, clinicogenomic data or related datasets) and heterogeneous data integration (clinical, genomic, proteomic, and so on).

This Special Issue solicit submissions in, but not limited to, the following areas:

  • Applications based on statistical inference from high dimensional data;
  • Dimensionality reduction with imbalanced biological datasets;
  • Applications based on feature selection (e.g., text processing, bioinformatics, medical informatics and natural language processing);
  • Applications based on Information Theory for data integration (e.g., semantic interoperability, clustering, classification);
  • Applications based on feature selection methods using meta-heuristic search methods such as genetic algorithms, particle swarm optimization and so on;
  • Applications based on feature extraction (e.g., PCA, LDA);
  • Applications based on prior knowledge (e.g., ontologies, pathways).

Dr. Carlos Fernandez-Lozano
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Feature selection
  • Machine learning
  • Statistical inference
  • Dimensionality
  • Complex biological systems
  • Multifactorial diseases
  • Computational biology
  • Bioinformatics
  • Information theory
  • Large-scale data analysis
  • Information theory
  • Data mining

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Sub-Graph Regularization on Kernel Regression for Robust Semi-Supervised Dimensionality Reduction
Entropy 2019, 21(11), 1125; https://doi.org/10.3390/e21111125 - 15 Nov 2019
Abstract
Dimensionality reduction has always been a major problem for handling huge dimensionality datasets. Due to the utilization of labeled data, supervised dimensionality reduction methods such as Linear Discriminant Analysis tend achieve better classification performance compared with unsupervised methods. However, supervised methods need sufficient [...] Read more.
Dimensionality reduction has always been a major problem for handling huge dimensionality datasets. Due to the utilization of labeled data, supervised dimensionality reduction methods such as Linear Discriminant Analysis tend achieve better classification performance compared with unsupervised methods. However, supervised methods need sufficient labeled data in order to achieve satisfying results. Therefore, semi-supervised learning (SSL) methods can be a practical selection rather than utilizing labeled data. In this paper, we develop a novel SSL method by extending anchor graph regularization (AGR) for dimensionality reduction. In detail, the AGR is an accelerating semi-supervised learning method to propagate the class labels to unlabeled data. However, it cannot handle new incoming samples. We thereby improve AGR by adding kernel regression on the basic objective function of AGR. Therefore, the proposed method can not only estimate the class labels of unlabeled data but also achieve dimensionality reduction. Extensive simulations on several benchmark datasets are conducted, and the simulation results verify the effectiveness for the proposed work. Full article
(This article belongs to the Special Issue Statistical Inference from High Dimensional Data)
Open AccessArticle
Radiomics Analysis on Contrast-Enhanced Spectral Mammography Images for Breast Cancer Diagnosis: A Pilot Study
Entropy 2019, 21(11), 1110; https://doi.org/10.3390/e21111110 - 13 Nov 2019
Abstract
Contrast-enhanced spectral mammography is one of the latest diagnostic tool for breast care; therefore, the literature is poor in radiomics image analysis useful to drive the development of automatic diagnostic support systems. In this work, we propose a preliminary exploratory analysis to evaluate [...] Read more.
Contrast-enhanced spectral mammography is one of the latest diagnostic tool for breast care; therefore, the literature is poor in radiomics image analysis useful to drive the development of automatic diagnostic support systems. In this work, we propose a preliminary exploratory analysis to evaluate the impact of different sets of textural features in the discrimination of benign and malignant breast lesions. The analysis is performed on 55 ROIs extracted from 51 patients referred to Istituto Tumori “Giovanni Paolo II” of Bari (Italy) from the breast cancer screening phase between March 2017 and June 2018. We extracted feature sets by calculating statistical measures on original ROIs, gradiented images, Haar decompositions of the same original ROIs, and on gray-level co-occurrence matrices of the each sub-ROI obtained by Haar transform. First, we evaluated the overall impact of each feature set on the diagnosis through a principal component analysis by training a support vector machine classifier. Then, in order to identify a sub-set for each set of features with higher diagnostic power, we developed a feature importance analysis by means of wrapper and embedded methods. Finally, we trained an SVM classifier on each sub-set of previously selected features to compare their classification performances with respect to those of the overall set. We found a sub-set of significant features extracted from the original ROIs with a diagnostic accuracy greater than 80 % . The features extracted from each sub-ROI decomposed by two levels of Haar transform were predictive only when they were all used without any selection, reaching the best mean accuracy of about 80 % . Moreover, most of the significant features calculated by HAAR decompositions and their GLCMs were extracted from recombined CESM images. Our pilot study suggested that textural features could provide complementary information about the characterization of breast lesions. In particular, we found a sub-set of significant features extracted from the original ROIs, gradiented ROI images, and GLCMs calculated from each sub-ROI previously decomposed by the Haar transform. Full article
(This article belongs to the Special Issue Statistical Inference from High Dimensional Data)
Show Figures

Figure 1

Open AccessArticle
Identify Risk Pattern of E-Bike Riders in China Based on Machine Learning Framework
Entropy 2019, 21(11), 1084; https://doi.org/10.3390/e21111084 - 06 Nov 2019
Abstract
In this paper, the risk pattern of e-bike riders in China was examined, based on tree-structured machine learning techniques. Three-year crash/violation data were acquired from the Kunshan traffic police department, China. Firstly, high-risk (HR) electric bicycle (e-bike) riders were defined as those with [...] Read more.
In this paper, the risk pattern of e-bike riders in China was examined, based on tree-structured machine learning techniques. Three-year crash/violation data were acquired from the Kunshan traffic police department, China. Firstly, high-risk (HR) electric bicycle (e-bike) riders were defined as those with at-fault crash involvement, while others (i.e. non-at-fault or without crash involvement) were considered as non-high-risk (NHR) riders, based on quasi-induced exposure theory. Then, for e-bike riders, their demographics and previous violation-related features were developed based on the crash/violation records. After that, a systematic machine learning (ML) framework was proposed so as to capture the complex risk patterns of those e-bike riders. An ensemble sampling method was selected to deal with the imbalanced datasets. Four tree-structured machine learning methods were compared, and a gradient boost decision tree (GBDT) appeared to be the best. The feature importance and partial dependence were further examined. Interesting findings include the following: (1) tree-structured ML models are able to capture complex risk patterns and interpret them properly; (2) spatial-temporal violation features were found as important indicators of high-risk e-bike riders; and (3) violation behavior features appeared to be more effective than violation punishment-related features, in terms of identifying high-risk e-bike riders. In general, the proposed ML framework is able to identify the complex crash risk pattern of e-bike riders. This paper provides useful insights for policy-makers and traffic practitioners regarding e-bike safety improvement in China. Full article
(This article belongs to the Special Issue Statistical Inference from High Dimensional Data)

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Feature Selection for Land Cover Classification with Sentinel-II
Authors: Klemen Kenda and Filip Koprivec
Abstract: Earth observation data has become one of the most prominent sources of Big Data and requires capable infrastructure for real-time processing. One of the most important challenges in the field is represented by land-cover classification. There are numerous algorithms and approaches, which are based on advanced machine learning techniques, which have pushed to the limits of model accuracies. The efficiency of those methodologies was, however, rarely addressed. We propose a multi-objective optimization based approach (using genetic algorithms) to reduce the number of the state-of-the-art feature sets in order to reduce processing yet keep almost the same accuracy.
Back to TopTop