entropy-logo

Journal Browser

Journal Browser

Big Data Analytics and Information Science for Business and Biomedical Applications

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Signal and Data Analysis".

Deadline for manuscript submissions: closed (30 April 2021) | Viewed by 21815

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada
Interests: model selection; post-estimation and prediction; shrinkage and empirical Bayes; Bayesian data analysis; machine learning; business; information science; statistical genetics; image analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Mathematics and Statistics, University of Victoria, Victoria, BC V8W 3P4, Canada
Interests: Bayesian methods; statistical computing; spatial statistics; high-dimensional data; statistical modeling; neuroimaging statistics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In today’s data-centric world, there is a host of buzzwords appearing everywhere in digital and print media. We encounter data in every walk of life, and the information it contains can be used to improve society, business, health, and medicine. This presents a substantial opportunity for analytically and objectively minded researchers. Making sense of data and extracting meaningful information from it may not be an easy task. The rapid growth in the size and scope of datasets in a host of disciplines has created the need for innovative statistical strategies for analyzing and visualizing such data.

An enormous trove of digital data has been produced by biomedicine researchers worldwide, including genetic variants genotyped or sequenced at genome-wide scales, gene expression measured under different experimental conditions, biomedical imaging data including neuroimaging data, electronic medical records (EMR) of patients, and many more.

The rise of ‘Big Data’ will not only deepen our understanding of complex human traits and diseases, but will also shed light on disease prevention, diagnosis, and treatment. Undoubtedly, comprehensive analysis of Big Data in genomics and neuroimaging calls for statistically rigorous methods. Various statistical methods have been developed to accommodate the features of genomic studies as well as studies examining the function and structure of the brain. Meanwhile, statistical theories have also correspondingly been developed.

Alongside biomedical applications, there has been a tremendous increase and interest in the use of Big Data towards business and financial applications. Financial time series analysis and prediction problems present many challenges for the development of statistical methodology and computational strategies for streaming data.

The analysis of Big Data in biomedical as well as business and financial research has drawn much attention from researchers worldwide. This Special Issue aims to provide a platform for the deep discussion of novel statistical methods developed for the analysis of Big Data in these areas. Both applied and theoretical contributions to these areas will be showcased.

The contributions to this Special Issue will present new and original research in statistical methods and applications in biomedical and business research. Contributions can have either an applied or theoretical perspective and emphasize different statistical problems with special emphasis on data analytics and statistical methodology. Manuscripts summarizing the most recent state-of-the-art on these topics are welcome.

Prof. Dr. S. Ejaz Ahmed
Dr. Farouk Nathoo
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Related Special Issues

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 3809 KiB  
Article
Cancer Research Trend Analysis Based on Fusion Feature Representation
by Jingqiao Wu, Xiaoyue Feng, Renchu Guan and Yanchun Liang
Entropy 2021, 23(3), 338; https://doi.org/10.3390/e23030338 - 12 Mar 2021
Cited by 2 | Viewed by 1642
Abstract
Machine learning models can automatically discover biomedical research trends and promote the dissemination of information and knowledge. Text feature representation is a critical and challenging task in natural language processing. Most methods of text feature representation are based on word representation. A good [...] Read more.
Machine learning models can automatically discover biomedical research trends and promote the dissemination of information and knowledge. Text feature representation is a critical and challenging task in natural language processing. Most methods of text feature representation are based on word representation. A good representation can capture semantic and structural information. In this paper, two fusion algorithms are proposed, namely, the Tr-W2v and Ti-W2v algorithms. They are based on the classical text feature representation model and consider the importance of words. The results show that the effectiveness of the two fusion text representation models is better than the classical text representation model, and the results based on the Tr-W2v algorithm are the best. Furthermore, based on the Tr-W2v algorithm, trend analyses of cancer research are conducted, including correlation analysis, keyword trend analysis, and improved keyword trend analysis. The discovery of the research trends and the evolution of hotspots for cancers can help doctors and biological researchers collect information and provide guidance for further research. Full article
Show Figures

Figure 1

35 pages, 8030 KiB  
Article
Ant Colony System Optimization for Spatiotemporal Modelling of Combined EEG and MEG Data
by Eugene A. Opoku, Syed Ejaz Ahmed, Yin Song and Farouk S. Nathoo
Entropy 2021, 23(3), 329; https://doi.org/10.3390/e23030329 - 11 Mar 2021
Cited by 3 | Viewed by 1858
Abstract
Electroencephalography/Magnetoencephalography (EEG/MEG) source localization involves the estimation of neural activity inside the brain volume that underlies the EEG/MEG measures observed at the sensor array. In this paper, we consider a Bayesian finite spatial mixture model for source reconstruction and implement Ant Colony System [...] Read more.
Electroencephalography/Magnetoencephalography (EEG/MEG) source localization involves the estimation of neural activity inside the brain volume that underlies the EEG/MEG measures observed at the sensor array. In this paper, we consider a Bayesian finite spatial mixture model for source reconstruction and implement Ant Colony System (ACS) optimization coupled with Iterated Conditional Modes (ICM) for computing estimates of the neural source activity. Our approach is evaluated using simulation studies and a real data application in which we implement a nonparametric bootstrap for interval estimation. We demonstrate improved performance of the ACS-ICM algorithm as compared to existing methodology for the same spatiotemporal model. Full article
Show Figures

Figure 1

15 pages, 830 KiB  
Article
Ensemble Linear Subspace Analysis of High-Dimensional Data
by S. Ejaz Ahmed, Saeid Amiri and Kjell Doksum
Entropy 2021, 23(3), 324; https://doi.org/10.3390/e23030324 - 9 Mar 2021
Cited by 4 | Viewed by 1517
Abstract
Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed [...] Read more.
Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor. Full article
Show Figures

Figure 1

15 pages, 1291 KiB  
Article
Evaluation of Survival Outcomes of Endovascular Versus Open Aortic Repair for Abdominal Aortic Aneurysms with a Big Data Approach
by Hao Mei, Yaqing Xu, Jiping Wang and Shuangge Ma
Entropy 2020, 22(12), 1349; https://doi.org/10.3390/e22121349 - 30 Nov 2020
Cited by 2 | Viewed by 1886
Abstract
Abdominal aortic aneurysm (AAA) is a localized enlargement of the abdominal aorta. Once ruptured AAA (rAAA) happens, repairing procedures need to be applied immediately, for which there are two main options: open aortic repair (OAR) and endovascular aortic repair (EVAR). It is of [...] Read more.
Abdominal aortic aneurysm (AAA) is a localized enlargement of the abdominal aorta. Once ruptured AAA (rAAA) happens, repairing procedures need to be applied immediately, for which there are two main options: open aortic repair (OAR) and endovascular aortic repair (EVAR). It is of great clinical significance to objectively compare the survival outcomes of OAR versus EVAR using randomized clinical trials; however, this has serious feasibility issues. In this study, with the Medicare data, we conduct an emulation analysis and explicitly “assemble” a clinical trial with rigorously defined inclusion/exclusion criteria. A total of 7826 patients are “recruited”, with 3866 and 3960 in the OAR and EVAR arms, respectively. Mimicking but significantly advancing from the regression-based literature, we adopt a deep learning-based analysis strategy, which consists of a propensity score step, a weighted survival analysis step, and a bootstrap step. The key finding is that for both short- and long-term mortality, EVAR has survival advantages. This study delivers a new big data strategy for addressing critical clinical problems and provides valuable insights into treating rAAA using OAR and EVAR. Full article
Show Figures

Figure 1

34 pages, 443 KiB  
Article
Sparse Multicategory Generalized Distance Weighted Discrimination in Ultra-High Dimensions
by Tong Su, Yafei Wang, Yi Liu, William G. Branton, Eugene Asahchop, Christopher Power, Bei Jiang, Linglong Kong and Niansheng Tang
Entropy 2020, 22(11), 1257; https://doi.org/10.3390/e22111257 - 5 Nov 2020
Cited by 1 | Viewed by 1855
Abstract
Distance weighted discrimination (DWD) is an appealing classification method that is capable of overcoming data piling problems in high-dimensional settings. Especially when various sparsity structures are assumed in these settings, variable selection in multicategory classification poses great challenges. In this paper, we propose [...] Read more.
Distance weighted discrimination (DWD) is an appealing classification method that is capable of overcoming data piling problems in high-dimensional settings. Especially when various sparsity structures are assumed in these settings, variable selection in multicategory classification poses great challenges. In this paper, we propose a multicategory generalized DWD (MgDWD) method that maintains intrinsic variable group structures during selection using a sparse group lasso penalty. Theoretically, we derive minimizer uniqueness for the penalized MgDWD loss function and consistency properties for the proposed classifier. We further develop an efficient algorithm based on the proximal operator to solve the optimization problem. The performance of MgDWD is evaluated using finite sample simulations and miRNA data from an HIV study. Full article
25 pages, 1715 KiB  
Article
Segmentation of High Dimensional Time-Series Data Using Mixture of Sparse Principal Component Regression Model with Information Complexity
by Yaojin Sun and Hamparsum Bozdogan
Entropy 2020, 22(10), 1170; https://doi.org/10.3390/e22101170 - 17 Oct 2020
Cited by 5 | Viewed by 2493
Abstract
This paper presents a new and novel hybrid modeling method for the segmentation of high dimensional time-series data using the mixture of the sparse principal components regression (MIX-SPCR) model with information complexity (ICOMP) criterion as the fitness function. Our [...] Read more.
This paper presents a new and novel hybrid modeling method for the segmentation of high dimensional time-series data using the mixture of the sparse principal components regression (MIX-SPCR) model with information complexity (ICOMP) criterion as the fitness function. Our approach encompasses dimension reduction in high dimensional time-series data and, at the same time, determines the number of component clusters (i.e., number of segments across time-series data) and selects the best subset of predictors. A large-scale Monte Carlo simulation is performed to show the capability of the MIX-SPCR model to identify the correct structure of the time-series data successfully. MIX-SPCR model is also applied to a high dimensional Standard & Poor’s 500 (S&P 500) index data to uncover the time-series’s hidden structure and identify the structure change points. The approach presented in this paper determines both the relationships among the predictor variables and how various predictor variables contribute to the explanatory power of the response variable through the sparsity settings cluster wise. Full article
Show Figures

Figure 1

21 pages, 430 KiB  
Article
A Nuisance-Free Inference Procedure Accounting for the Unknown Missingness with Application to Electronic Health Records
by Jiwei Zhao and Chi Chen
Entropy 2020, 22(10), 1154; https://doi.org/10.3390/e22101154 - 14 Oct 2020
Cited by 1 | Viewed by 1657
Abstract
We study how to conduct statistical inference in a regression model where the outcome variable is prone to missing values and the missingness mechanism is unknown. The model we consider might be a traditional setting or a modern high-dimensional setting where the sparsity [...] Read more.
We study how to conduct statistical inference in a regression model where the outcome variable is prone to missing values and the missingness mechanism is unknown. The model we consider might be a traditional setting or a modern high-dimensional setting where the sparsity assumption is usually imposed and the regularization technique is popularly used. Motivated by the fact that the missingness mechanism, albeit usually treated as a nuisance, is difficult to specify correctly, we adopt the conditional likelihood approach so that the nuisance can be completely ignored throughout our procedure. We establish the asymptotic theory of the proposed estimator and develop an easy-to-implement algorithm via some data manipulation strategy. In particular, under the high-dimensional setting where regularization is needed, we propose a data perturbation method for the post-selection inference. The proposed methodology is especially appealing when the true missingness mechanism tends to be missing not at random, e.g., patient reported outcomes or real world data such as electronic health records. The performance of the proposed method is evaluated by comprehensive simulation experiments as well as a study of the albumin level in the MIMIC-III database. Full article
Show Figures

Figure 1

20 pages, 973 KiB  
Article
Forecasting Financial Time Series through Causal and Dilated Convolutional Neural Networks
by Lukas Börjesson and Martin Singull
Entropy 2020, 22(10), 1094; https://doi.org/10.3390/e22101094 - 29 Sep 2020
Cited by 11 | Viewed by 2967
Abstract
In this paper, predictions of future price movements of a major American stock index were made by analyzing past movements of the same and other correlated indices. A model that has shown very good results in audio and speech generation was modified to [...] Read more.
In this paper, predictions of future price movements of a major American stock index were made by analyzing past movements of the same and other correlated indices. A model that has shown very good results in audio and speech generation was modified to suit the analysis of financial data and was then compared to a base model, restricted by assumptions made for an efficient market. The performance of any model, trained by looking at past observations, is heavily influenced by how the division of the data into train, validation and test sets is made. This is further exaggerated by the temporal structure of the financial data, which means that the causal relationship between the predictors and the response is dependent on time. The complexity of the financial system further increases the struggle to make accurate predictions, but the model suggested here was still able to outperform the naive base model by more than 20% and 37%, respectively, when predicting the next day’s closing price and the next day’s trend. Full article
Show Figures

Figure 1

28 pages, 1290 KiB  
Article
Consistent Estimation of Generalized Linear Models with High Dimensional Predictors via Stepwise Regression
by Alex Pijyan, Qi Zheng, Hyokyoung G. Hong and Yi Li
Entropy 2020, 22(9), 965; https://doi.org/10.3390/e22090965 - 31 Aug 2020
Cited by 4 | Viewed by 2667
Abstract
Predictive models play a central role in decision making. Penalized regression approaches, such as least absolute shrinkage and selection operator (LASSO), have been widely used to construct predictive models and explain the impacts of the selected predictors, but the estimates are typically biased. [...] Read more.
Predictive models play a central role in decision making. Penalized regression approaches, such as least absolute shrinkage and selection operator (LASSO), have been widely used to construct predictive models and explain the impacts of the selected predictors, but the estimates are typically biased. Moreover, when data are ultrahigh-dimensional, penalized regression is usable only after applying variable screening methods to downsize variables. We propose a stepwise procedure for fitting generalized linear models with ultrahigh dimensional predictors. Our procedure can provide a final model; control both false negatives and false positives; and yield consistent estimates, which are useful to gauge the actual effect size of risk factors. Simulations and applications to two clinical studies verify the utility of the method. Full article
Show Figures

Figure 1

22 pages, 883 KiB  
Article
Variable Selection Using Nonlocal Priors in High-Dimensional Generalized Linear Models With Application to fMRI Data Analysis
by Xuan Cao and Kyoungjae Lee
Entropy 2020, 22(8), 807; https://doi.org/10.3390/e22080807 - 23 Jul 2020
Cited by 4 | Viewed by 2118
Abstract
High-dimensional variable selection is an important research topic in modern statistics. While methods using nonlocal priors have been thoroughly studied for variable selection in linear regression, the crucial high-dimensional model selection properties for nonlocal priors in generalized linear models have not been investigated. [...] Read more.
High-dimensional variable selection is an important research topic in modern statistics. While methods using nonlocal priors have been thoroughly studied for variable selection in linear regression, the crucial high-dimensional model selection properties for nonlocal priors in generalized linear models have not been investigated. In this paper, we consider a hierarchical generalized linear regression model with the product moment nonlocal prior over coefficients and examine its properties. Under standard regularity assumptions, we establish strong model selection consistency in a high-dimensional setting, where the number of covariates is allowed to increase at a sub-exponential rate with the sample size. The Laplace approximation is implemented for computing the posterior probabilities and the shotgun stochastic search procedure is suggested for exploring the posterior space. The proposed method is validated through simulation studies and illustrated by a real data example on functional activity analysis in fMRI study for predicting Parkinson’s disease. Full article
Show Figures

Figure 1

Back to TopTop