Next Article in Journal
Cross-Subject EEG Emotion Recognition Using SSA-EMS Algorithm for Feature Extraction
Previous Article in Journal
Symbolic Analysis of the Quality of Texts Translated into a Language Preserving Vowel Harmony
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropy Methods on Finding Optimal Linear Combinations with an Application to Biomarkers

by
Mehmet Sinan İyisoy
1,* and
Pınar Özdemir
2
1
Department of Medical Education and Informatics, Necmettin Erbakan University, Konya 42090, Turkey
2
Department of Biostatistics, Hacettepe University, Ankara 06100, Turkey
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(9), 985; https://doi.org/10.3390/e27090985
Submission received: 28 July 2025 / Revised: 16 September 2025 / Accepted: 17 September 2025 / Published: 21 September 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

Identifying an optimal linear combination of continuous variables is a key objective in various fields of research, such as medicine. This manuscript explores the use of information-theoretical approaches used to establish these linear combinations. Coefficients obtained from logistic regression can be used to construct such a linear combination, and this approach has been commonly adopted in the literature for comparison purposes. The main contribution of this work is to propose novel ways of determining these linear combination coefficients by optimizing information-theoretical objective functions. Biomarkers are usually continuous measurements utilized to diagnose if a patient has the underlying disease. Certain disease contexts may lack high diagnostic power biomarkers, making their optimal combination a critical area of interest. We apply the above-mentioned novel methods to the problem of a combination of biomarkers. We assess the performance of our proposed methods against combinations derived from logistic regression coefficients, by comparing area under the ROC curve (AUC) values and other metrics in a broad simulation and a real life data application.
MSC:
62B10; 94A15

1. Introduction

Information theoretical concepts are inherently and deeply linked to statistical principles. These concepts play a crucial role in the processes of data analysis and modeling. The rich set of tools information theory offers focus on quantifying uncertainty in a random variable, divergence, dependence, and information gain.
They facilitate the understanding of how data can be utilized more efficiently, which variables provide greater informational value, and how uncertainty within the data can be quantified. They are used in areas like model selection, estimation, hypothesis testing, learning, and decision-making.
By integrating information-theoretic objectives into statistical inference and learning, researchers can develop models that are both interpretable and data-efficient, making these methods especially valuable in high-dimensional and data-scarce domains such as bioinformatics, neuroscience, and medical diagnostics.
To name a few, tools and notions from information theory such as Shannon entropy, Kullback–Leibler divergence, and mutual information provide rigorous means for assessing the complexity and informativeness of models and data. These measures are central to a wide range of statistical applications, including model selection (e.g., through criteria like AIC, Akaike’s Information Criteria and MDL, Mininum Description Length), feature selection, clustering, density estimation, and hypothesis testing.
Additionally, methods based on the principle of maximum entropy allow for the estimation of probability distributions under partial information, yielding minimally biased solutions constrained only by known data characteristics. Cross-entropy and information gain are widely used in classification tasks and decision tree algorithms, respectively, to optimize predictive performance.
To define the core problem addressed in this manuscript, consider a given matrix X of dimension n x d with continuous individual predictors x 1 , x 2 , , x d being its columns and a given binary outcome vector y of 0 / 1 . Our aim is to find a coefficient vector w such that the linear combination X w would be a better predictor of y than all individual columns x 1 , x 2 , , x d of X.
It is well known that logistic regression maximizes the logarithm of the likelihood function
l β 0 , β 1 , β 2 , , β d = i = 1 n P y i | X i , β 0 , β 1 , β 2 , , β d = i = 1 n p ( X i ) y i 1 p ( X i ) 1 y i
having p X i = P y = 1 | X i , β = σ ( X i β ) to estimate coefficients vector β where σ x = 1 1 + e x is the sigmoid function, n is the number of samples, and X i is the i t h row of X. After estimating coefficients vector β from logistic regression, one can easily select w = ( β 1 , β 2 , , β d ) as a candidate coefficient vector to be used for linear combination of individual predictors.
The novel approaches we will exhibit in the following pages utilize information-theoretical principles for the determination of coefficient vector w. Section 2 reviews previous methods used for linear combinations of biomarkers, all of which are non-information-theoretic in nature. Section 3 is mainly for the details of our proposed methods. Section 4 is devoted to details and results of the simulation, and Section 5 showcases an application to real-world data.

2. Previous Methods Used in Linear Combinations of Biomarkers

A biomarker is typically employed as a diagnostic or evaluative tool in medical research. Identifying a single ideal biomarker that offers both high sensitivity and specificity is challenging, particularly when high specificity is needed for population screening. An effective alternative is the combination of multiple biomarkers, which can deliver superior performance compared to using just one biomarker.
Especially in cases where there is more than one weak biomarker, it is important to use them in combination [1]. Combining biomarkers has several advantages such as increased diagnostic accuracy, better prediction of disease course and prognosis, and the development of personalized medical applications.
Numerous studies in literature employ ROC (Receiver Operating Characteristic) analysis to assess the performance of combined biomarkers developed through various methodologies. Those in binary classification focused on maximizing various non-parametric estimates of the area under the ROC curve (AUC) or the Mann–Whitney U statistic to derive the optimal linear combination of single biomarkers.
AUC is an essential performance indicator of binary models. AUC values show the model’s ability to distinguish between two classes ( 0 / 1 ) of the binary outcome (discriminative ability). High AUC values indicate better and more accurate discrimination.
The authors in [2] used Fisher’s discriminant functions under multivariate normality and (non)proportional covariance settings to produce a linear combination which maximizes AUC. Later, the authors in [3] enhanced and investigated some other properties of these linear combinations and proposed alternative combinations.
The authors in [4] proposed a distribution-free rank based approach for deriving linear combinations of two biomarkers that maximize the area or partial area under the emprical ROC curve. They compared its performance to those obtained from optimizing logistic likelihood function and linear discriminant analysis. They claimed that linear discriminant analysis optimizes AUC when multivariate normality was achieved. They provided further insights such as selecting emprical AUC as an objective function may yield better performances than selecting a logistic likelihood function in another publication [5].
The nonparametric Min–Max procedure proposed in [6] is claimed to be more robust to distributional assumptions, is easier to compute, and has better performance. The methodology proposed in [7] extended combination approaches to the problem where outcome is not dichotomous but ordinal with more than two categories. They proposed two new methods and compared their performances with existing methods.
In the study [1], they compared existing methods where the number of biomarkers are large, biomarkers are weak, and the number of observations are not an order of magnitude greater than the number of biomarkers. They also proposed a new method for combinations.
Underlining certain inadequacies of present combination methods, the authors in [8] proposed a new kernel-based AUC optimization method and claimed that it outperformed the smoothed AUC method previously proposed in [9].
The authors in [10] propose a derivative-free black-box optimization technique, called Spherically Constrained Optimization Routine (SCOR), to identify optimal linear combinations where the outcome is ordinal. The method proposed in [11], Nonparametric Predictive Inference (NPI), examines the best linear combination of two biomarkers, where the dependency between two biomarkers is modeled using parametric copulas. Another copula-based combination method in [12] utilized different copulas for an optimal linear combination of two biomarkers with a binary outcome.
It is important to note that none of the preceding methods employed information-theoretical concepts in their methodologies.

3. Information Theoretical Methods for Linear Combinations

This section details maximum entropy, minimum cross entropy, minimum relative entropy, and maximum mutual information concepts.

3.1. Entropy Maximization MaxEnt

Having historical roots in physics, the maximum entropy principle is an approach to finding the distribution that maximizes the entropy of a probability distribution under certain constraints. It is proposed by Jaynes [13] as a general principle of inference and has been applied successfully to numerous fields. Maximizing entropy means choosing the distribution that reflects maximum uncertainty. Axiomatic derivation of maximum entropy and minimum cross entropy principle was done by [14]. The authors in [15] applied the maximum entropy principle to recover information from multinomial data.
Suppose we know that a system has a set of possible states y k , k { 0 , 1 } with unknown probabilities p i k = P ( Y = y k | X ) for i = 1 , 2 , , n and we have a given matrix X of dimension n x d and a binary vector y of dimension n. We want to find a coefficient vector w of dimension d so that vector X w would be a better predictor of Y.
The maximum entropy principle explains how to select a certain distribution p ^ among different possible distributions that satisfy our constraints. We treat p i k as variables when finding the maximum entropy solution to our problem. Notice that, since we deal with binary vector y and p i 0 = 1 p i 1 , solving for p i 1 = P ( Y = y 1 | X ) will suffice and reduce the complexity of the problem.
In mathematical terms, we want to find p ^ that maximizes H ( p ) :
p ^ = arg max p H p = arg max p i = 1 n p i 0 log p i 0 + p i 1 log p i 1
Replacing p i 0 = 1 p i 1 , this would become
p ^ = arg max p i = 1 n ( 1 p i 1 ) log ( 1 p i 1 ) + p i 1 log p i 1
We impose the following constraints to fit the data:
i = 1 n X i j y i = i = 1 n X i j p i 1 , j = 1 , 2 , , d ,
and
k = 1 2 p i k = 1
where i = 1 n X i j y i values are empirical expectations obtained from data and d + 1 < < n . Since, in general, we have many more observations than predictors, n is much larger than d + 1 . We can relax the second constraint (2) to 0 p i 1 1 since p i 0 values can be easily obtained by p i 0 = 1 p i 1 . Hence, the final optimization problem becomes
p ^ = arg max p i = 1 n ( 1 p i 1 ) log ( 1 p i 1 ) + p i 1 log p i 1
under the constraints
i = 1 n X i j y i = i = 1 n X i j p i 1 , j = 1 , 2 , , d
and
0 p i 1 1 .
This is a solvable convex optimization problem with linear constraints. After determining p ^ namely p i 1 values, w still needs to be found. Since we based the problem on finding probabilities first rather than the coefficient vector w first, we need to glue these probabilities to our given matrix X. We use logits
l i = l o g p i 1 1 p i 1
obtained from those probabilities and perform least squares estimation to equation X w = l where l is the logits vector.
Other ways of estimating w without directly finding probabilities were possible, as we did below in cross entropy minimization. However, we deliberately focused here on this method to show that the method finds probabilities complying to the Maximum Entropy principle.

3.2. Cross-Entropy Minimization MinCrEnt

First proposed by Kullback [14], this principle incorporates cross entropy of two distributions to find the optimal solution.
Suppose we know that a system has a set of possible states y k , k { 0 , 1 } with unknown probabilities p i k = P ( Y = y k | X ) for i = 1 , 2 , , n , as described in Section 3.1.
The principle states that, among different possible distributions p that satisfy certain constraints, one should select the one which minimizes the cross entropy with q where q is a known a priori distribution. In our case, we select q to be the empirical distribution of the outcome (namely, Y):
p ^ = arg min p H p , q = arg min p i = 1 n p i 0 log q i 0 + p i 1 log q i 1 .
Again using p i 0 = 1 p i 1 , we can simplify the objective function to
p ^ = arg min p i = 1 n ( 1 p i 1 ) log ( 1 q i 1 ) + p i 1 log q i 1 .
The objective function we obtained on right hand side indeed coincides with the negative of log-likelihood function from logistic regression. We know the distribution of q i 1 values a priori, but we do not know the p i 1 value. A natural and inevitable choice for getting p i 1 values is to use a transformation of given data matrix X with coefficient vector w. Thus, we define
p i 1 = σ j = 1 d X i j w j
where σ is the sigmoid function. Hence, the optimization problem becomes
p ^ = arg min p H p , q
= arg min p i = 1 n 1 σ j = 1 d X i j w j log 1 q i 1 + σ j = 1 d X i j w j log q i 1
where q i 1 values are known probabilities obtained from vector y.
We do not impose any further constraints here because use of sigmoid function confines the probabilities between 0 and 1, and we already made use of X. This is an unconstrained convex optimization problem that can easily be solved.

3.3. Relative Entropy Minimization (MinRelEnt)

Relative entropy minimization is a technique less known and partly misidentified with cross entropy minimization [16]. Like cross entropy, relative entropy requires a reference distribution. We used the empirical distribution of y as a reference distribution q in cross entropy minimization, but we choose the uniform distribution q i 1 = 0.5 here. The optimization problem becomes
p ^ = arg min p D ( p q ) = arg min p i = 1 n p i 0 log p i 0 q i 0 + p i 1 log p i 1 q i 1
or
p ^ = arg min p i = 1 n ( 1 p i 1 ) log 1 p i 1 1 q i 1 + p i 1 log p i 1 q i 1
As we did in entropy maximization, we use the same constraint on empirical expectations to determine probabilities
i = 1 n X i j y i = i = 1 n X i j p i 1 , j = 1 , 2 , , d
and for probabilities we set
0 p i 1 1 .
This is also a convex optimization problem with linear constraints. After finding probabilities, we transform them to logits and solve X w = l where l is the logits vector using least squares as we did for entropy maximization. It is worth noting that alternative choices for q i 1 are possible, such as the empirical distribution of y or incorporating any other prior belief about y.

3.4. Mutual Information Maximization MaxMutInf

As was previously done in a similar but not the same setting by Faivishevsky and Goldberger [17], we estimate coefficient vector w through mutual information maximization. We use the following identity for mutual information:
I m e a n X ; Y = H m e a n X H m e a n X | Y
where X is the predictor matrix, Y is the emprical distribution of outcome vector y in our problem, and I m e a n is the mutual information estimator defined by Faivishevsky and Goldberger. Since Y is categorical, we deal with conditional entropies corresponding to states of Y , y k ( k { 0 , 1 } ) :
H m e a n ( X | Y ) = k 0 , 1 p Y = y k H m e a n X | Y = y k
where H m e a n ( X | Y = y k ) is the entropy of X restricted to values where Y takes the value y k (also called in-class entropy). Hence, the optimization problem becomes
w ^ = arg max w I m e a n X w ; Y = arg max w H m e a n X w k 0 , 1 p Y = y k H m e a n X w | Y = y k
or
w ^ = arg max w 1 n ( n 1 ) i j log x i x j w 2 k 0 , 1 p Y = y k n k ( n k 1 ) i k j k log x i k x j k w 2
where n k is the number of observations having class value y k . It is easy to see that k { 0 , 1 } n k = n . The smoothness of MeanNN entropy estimator enables its gradient analytically computed. Therefore, I m e a n ( X w , Y ) with respect to w becomes
I m e a n w = 2 n ( n 1 ) i j x i x j x i x j T w x i x j w 2 k { 0 , 1 } 2 p Y = y k n k ( n k 1 ) i k j k x i k x j k x i k x j k T w x i x j w 2
Since the gradient of mutual information is achieved, we use gradient ascent method to maximize mutual information.

4. Simulation

An extensive simulation study was conducted to compare the efficiencies of those methods on combination of continuous variables (or biomarkers). Imitating but also enriching the one performed previously in [1] for comparison purposes, we considered normal, gamma, and beta distributions under different settings.
In all settings, we assumed mean values μ 0 = 0 corresponding to class y 0 . For class y 1 , we set mean values as μ 1 = 0.4 i d n 1 , where d denotes the number of predictors and 0 i d 1 . As a result, the mean values for y 0 , 1 classes were evenly spaced and dependent on the number of predictors d. Candidate biomarkers generated through this approach, typically have AUC values between 0.5 A U C 0.75 , most commonly falling below 0.7 .
We used multivariate normal variables with equal and not equal covariance structures to generate normal, beta, and gamma variates using normal copula and inverse transform sampling. When covariances are equal, we fixed Σ 0 = Σ 1 = ( 1 γ ) I d x d + γ J d x d where I is the identity matrix, J is a matrix of all 1 s, and γ = 0.15 . For unequal covariance, we set Σ 0 = 0.9 I d x d + 0.1 J d x d keeping Σ 1 = ( 1 γ ) I d x d + γ J d x d unchanged.
For each setting, 1000 datasets were generated and randomly divided into two sets of equal size. One set was used for training and the other for testing. Coefficient estimates obtained from training datasets were recorded for each method. Using the test datasets, linear combinations were computed based on the previously estimated coefficients, and corresponding performance metrics were evaluated. These metrics included AUC, Area Under the Precision–Recall Curve (AUPRC), and Matthews Correlation Coefficient (MCC). In addition to mean values, 95% confidence intervals and median values were reported for each metric.
Sample sizes were set to n = 25 ,   100 ,   200 , and the number of predictors was varied across d = 3 , 5 , 10 , 15 . Unlike the referenced simulation study, we also examined scenarios with unequal class allocation, specifically cases where n 0 = 2 n 1 . In the unequal class allocation scenarios, the sample sizes for the y = 1 class were adjusted to n 1 = 12 , 50 , 100 . Table 1, Table 2 and Table 3 present AUC values obtained from the simulation results. Additional results including those for AUPRC, MCC, and all unequal allocation cases are provided in the Supplementary Material.
All simulations were conducted using Julia version 1.11.0-rc3. We utilized the Ipopt.jl package version 1.11.0 for nonlinear constrained optimization in entropy-based methods, the GLM.jl package version 1.9.0 for logistic regression, and custom gradient ascent code for the mutual information maximization method.
The code used in the simulations will be released as public Julia and R packages in future publications. Interested researchers may contact the author for access.

5. Application to Real-Life Data

We used publicly available Wisconsin datasets for the prognosis [18] and diagnosis [19] of breast cancer. There are 30 continuous predictors in each dataset to be used for the prediction of either prognosis or diagnosis of breast cancer indicated 0 / 1 .
The prognostic dataset has 198 observations while the diagnostic has 569. Frequencies of y 1 class are 47 / 198 and 212 / 569 in those datasets. We considered the first 5 predictors in each dataset for our calculations, namely the variables radius_mean, texture_mean, perimeter_mean, area_mean, and smoothness_mean.
Predictive ability of these variables are given in Table 4. As can be seen from the Table 4, those predictors have a very low predictive ability for the prognosis of breast cancer, but a high ability for diagnosis.
We performed logistic regression and entropy optimization techniques described in Section 3 and gathered coefficients shown in Table 5 and Table 6, representing candidate linear combinations for both datasets.
Finally, calculating linear combinations using those coefficients, we obtained AUC, AUPRC, and MCC values given in Table 7 and Table 8. Our methods yielded metric values very similar to those derived from logistic regression.

6. Discussion and Conclusions

The primary objective of this study was to explore how information-theoretical methods can be applied to construct linear combinations of continuous variables in the context of a binary classification problem.
We addressed this question by introducing four distinct approaches (MaxEnt, MinCrEnt, MinRelEnt, and MaxMutInf), each grounded in fundamental principles of information theory.
We believe that identifying and formalizing these approaches may guide future research directions by drawing greater attention to information-theoretical concepts and encouraging their broader application in this problem setting. This study also represents the first systematic evaluation of information-theoretic approaches applied in this setting.
Earlier methods were often constrained by strong distributional assumptions about biomarkers, such as normality and equal or proportional covariance structures. Additionally, the number of predictors they could handle was typically limited to just two.
More recent approaches have relaxed these assumptions, allowing for the inclusion of a larger number of biomarkers in linear combinations. However, most of these methods rely on performance metrics like sensitivity, specific segments of the ROC curve, or the AUC (Area Under the Curve) as the optimization objective. Some later methods extended their applicability to multi-class classification problems using metrics such as the Volume Under the Surface (VUS) or the Hypervolume Under the Manifold (HUM).
Among these, the SCOR algorithm [10] and the method proposed in [1] have shown promising results in maximizing the AUC or HUM objective.
Notably, methods established in [11,13] distinguish themselves by incorporating copulas into the picture, representing the first known use of copulas in the context of biomarker combination. Despite this innovation, they currently remain limited to handling only two biomarkers.
An important distinction of our proposed methods is that, whereas previous methods primarily aimed to maximize a metric such as the AUC value, the proposed methods in this article rely solely on information-theoretic criteria and never incorporated AUC or any other metric as an optimization objective.
As another distinction, unlike many previous studies that relied on a single evaluation metric, our assessment of model performance employed a comprehensive set of three metrics: AUC, AUPRC, and MCC. This multi-faceted approach provides a more robust and nuanced comparison of model performance across different aspects of classification quality.
The proposed methods here are straightforward to apply in binary classification problems or to be extended into a multiclass setting. These methods are computationally simpler than most existing approaches, with the exception of MaxMutInf. Due to the need to compute complex gradient functions when optimizing mutual information, MaxMutInf is relatively more challenging to implement. Extending our methods into classification problems involving more than two outcome levels may represent a valuable direction for future research.
The three entropy-based methods (MaxEnt, MinCrEnt, and MinRelEnt) demonstrated test performances that were consistently comparable to those derived from logistic regression, across all simulation conditions and real data applications. The MaxMutInf method yielded slightly greater test AUC, AUPRC, and MCC values compared to the other approaches, particularly in simulation settings involving Beta and Gamma distributed data. The differences were more pronounced in simulations with smaller sample sizes. However, in some settings involving normally distributed data, the MaxMutInf method produced test AUC and AUPRC metrics that were comparable or even marginally lower than those of other methods.
The Maximum Mutual Information method appeared more robust to distributional asymmetries, as well as to smaller and unequal sample sizes, except in a few settings involving normally distributed variables. Exploring and implementing different differential entropy estimators beyond the Kozachenko–Leonenko entropy estimator may be a future research direction that could potentially reveal more insights into the performance of Mutual Information Maximization.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e27090985/s1, Table S1: Mean AUPRC values with 95% confidence intervals obtained for multivariate Normal distributions in equal allocation settings. Table S2: Mean AUPRC values with 95% confidence intervals obtained for Gamma distributions in equal allocation settings. Table S3: Mean AUPRC values with 95% confidence intervals obtained for Beta distributions in equal allocation settings. Table S4: Mean MCC values with 95% confidence intervals obtained for multivariate Normal distributions in equal allocation settings. Table S5: Mean MCC values with 95% confidence intervals obtained for Gamma distributions in equal allocation settings. Table S6: Mean MCC values with 95% confidence intervals obtained for Beta distributions in equal allocation settings. Table S7: Mean AUC values with 95% confidence intervals obtained for multivariate Normal distributions in unequal allocation settings. Table S8: Mean AUC values with 95% confidence intervals obtained for Gamma distributions in unequal allocation settings. Table S9: Mean AUC values with 95% confidence intervals obtained for Beta distributions in unequal allocation settings. Table S10: Mean AUPRC values with 95% confidence intervals obtained for multivariate Normal distributions in unequal allocation settings. Table S11: Mean AUPRC values with 95% confidence intervals obtained for Gamma distributions in unequal allocation settings. Table S12: Mean AUPRC values with 95% confidence intervals obtained for Beta distributions in unequal allocation settings. Table S13: Mean MCC values with 95% confidence intervals obtained for multivariate Normal distributions in unequal allocation settings. Table S14: Mean MCC values with 95% confidence intervals obtained for Gamma distributions in unequal allocation settings. Table S15: Mean MCC values with 95% confidence intervals obtained for Beta distributions in unequal allocation settings.

Author Contributions

Software, M.S.İ.; Validation, M.S.İ.; Formal Analysis, M.S.İ.; Resources, M.S.İ.; Writing—Original Draft, M.S.İ.; Writing—Review & Editing, M.S.İ.; Visualization, M.S.İ.; Supervision, P.Ö. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Acknowledgments

We thank Yasemin Öztürk for her help in building this manuscript. We would also like to thank the anonymous reviewers for their insightful comments that led to significant contributions to the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ROCReceiver Operating Characteristics
AUCArea under the ROC curve
AUPRCArea under the precision recall curve
MCCMatthews Correlation Coefficient
MaxMutInfMaximum Mutual Information
MaxEntMaximum Entropy
MinCrEntMinimum Cross Entropy
MinRelEntMinimum Relative Entropy
LogResLogistic Regression

References

  1. Yan, L.; Tian, L.; Liu, S. Combining large number of weak biomarkers based on AUC. Stat. Med. 2015, 34, 3811–3830. [Google Scholar] [CrossRef] [PubMed]
  2. Su, J.Q.; Liu, J.S. Linear Combinations of Multiple Diagnostic Markers. J. Am. Stat. Assoc. 1993, 88, 1350–1355. [Google Scholar] [CrossRef]
  3. Liu, A.; Schisterman, E.F.; Zhu, Y. On linear combinations of biomarkers to improve diagnostic accuracy. Stat. Med. 2005, 24, 37–47. [Google Scholar] [CrossRef] [PubMed]
  4. Pepe, M.S.; Thompson, M.L. Combining diagnostic test results to increase accuracy. Biostatistics 2000, 1, 123–140. [Google Scholar] [CrossRef] [PubMed]
  5. Pepe, M.S.; Cai, T.; Longton, G. Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve. Biometrics 2006, 62, 221–229. [Google Scholar] [CrossRef] [PubMed]
  6. Liu, C.; Liu, A.; Halabi, S. A min–max combination of biomarkers to improve diagnostic accuracy. Stat. Med. 2011, 30, 2005–2014. [Google Scholar] [CrossRef] [PubMed]
  7. Kang, L.; Xiong, C.; Crane, P.; Tian, L. Linear combinations of biomarkers to improve diagnostic accuracy with three ordinal diagnostic categories. Stat. Med. 2013, 32, 631–643. [Google Scholar] [CrossRef] [PubMed]
  8. Fong, Y.; Yin, S.; Huang, Y. Combining biomarkers linearly and nonlinearly for classification using the area under the ROC curve. Stat. Med. 2016, 35, 3792–3809. [Google Scholar] [CrossRef] [PubMed]
  9. Lloyd, C.J. Using Smoothed Receiver Operating Characteristic Curves to Summarize and Compare Diagnostic Systems. J. Am. Stat. Assoc. 1998, 93, 1356–1364. [Google Scholar] [CrossRef]
  10. Das, P.; De, D.; Maiti, R.; Kamal, M.; Hutcheson, K.A.; Fuller, C.D.; Chakraborty, B.; Peterson, C.B. Estimating the optimal linear combination of predictors using spherically constrained optimization. BMC Bioinform. 2022, 23, 436. [Google Scholar] [CrossRef] [PubMed]
  11. Muhammad, N.; Coolen-Maturi, T.; Coolen, F.P. Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests. Stat. Optim. Inf. Comput. 2018, 6, 398–408. [Google Scholar] [CrossRef][Green Version]
  12. Islam, S.; Anand, S.; Hamid, J.; Thabane, L.; Beyene, J. A copula-based method of classifying individuals into binary disease categories using dependent biomarkers. Stat. Methods Appl. 2020, 29, 871–897. [Google Scholar] [CrossRef]
  13. Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
  14. Shore, J.; Johnson, R. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef]
  15. Golan, A.; Judge, G.; Perloff, J.M. A Maximum Entropy Approach to Recovering Information from Multinomial Response Data. J. Am. Stat. Assoc. 1996, 91, 841–853. [Google Scholar] [CrossRef]
  16. Banavar, J.; Maritan, A. The maximum relative entropy principle. arXiv 2007, arXiv:cond-mat/0703622. [Google Scholar] [CrossRef]
  17. Faivishevsky, L.; Goldberger, J. Dimensionality reduction based on non-parametric mutual information. Neurocomputing 2012, 80, 31–37. [Google Scholar] [CrossRef]
  18. Wolberg, W.; Street, W.; Mangasarian, O. Breast Cancer Wisconsin (Prognostic). UCI Machine Learning Repository. 1995. Available online: https://archive.ics.uci.edu/dataset/16/breast+cancer+wisconsin+prognostic (accessed on 17 April 2025).
  19. Wolberg, W.; Mangasarian, O.; Street, N.; Street, W. Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository, 1993. 1993. Available online: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic (accessed on 17 April 2025).
Table 1. Mean AUC values with 95% confidence intervals obtained for multivariate Normal distributions.
Table 1. Mean AUC values with 95% confidence intervals obtained for multivariate Normal distributions.
LogResMaxMutInfMaxEntMinRelEntMinCrEnt
ndCovarianceMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)Median
253Equal0.615 (0.545,0.673)0.5970.617 (0.547,0.669)0.6030.615 (0.545,0.673)0.5970.615 (0.545,0.673)0.5970.615 (0.545,0.673)0.597
255Equal0.619 (0.551,0.673)0.6040.625 (0.557,0.686)0.6090.62 (0.552,0.673)0.6080.62 (0.552,0.673)0.6080.62 (0.552,0.673)0.607
2510Equal0.613 (0.545,0.662)0.5970.634 (0.555,0.693)0.6220.612 (0.545,0.667)0.5970.612 (0.545,0.667)0.5970.61 (0.545,0.662)0.591
2515Equal0.614 (0.551,0.662)0.5970.634 (0.558,0.691)0.6220.612 (0.545,0.667)0.5960.612 (0.545,0.667)0.5960.611 (0.545,0.66)0.596
1003Equal0.599 (0.555,0.639)0.5980.585 (0.546,0.619)0.5800.599 (0.556,0.638)0.5990.599 (0.556,0.638)0.5990.599 (0.556,0.638)0.599
1005Equal0.601 (0.556,0.642)0.5960.593 (0.55,0.631)0.5910.601 (0.557,0.643)0.5970.601 (0.557,0.643)0.5970.601 (0.557,0.643)0.597
10010Equal0.612 (0.571,0.652)0.6100.608 (0.569,0.646)0.6060.612 (0.572,0.653)0.6120.612 (0.572,0.653)0.6120.612 (0.572,0.653)0.612
10015Equal0.617 (0.575,0.658)0.6170.616 (0.575,0.653)0.6150.618 (0.575,0.657)0.6170.618 (0.575,0.657)0.6170.618 (0.575,0.657)0.617
2003Equal0.607 (0.577,0.636)0.6080.582 (0.547,0.612)0.5800.607 (0.578,0.636)0.6070.607 (0.578,0.636)0.6070.607 (0.578,0.636)0.607
2005Equal0.614 (0.587,0.642)0.6140.592 (0.562,0.62)0.5920.615 (0.587,0.642)0.6150.615 (0.587,0.642)0.6150.615 (0.587,0.642)0.615
20010Equal0.633 (0.605,0.664)0.6350.61 (0.582,0.638)0.6110.634 (0.605,0.664)0.6360.634 (0.605,0.664)0.6360.634 (0.605,0.664)0.636
20015Equal0.641 (0.612,0.673)0.6430.616 (0.588,0.646)0.6160.642 (0.613,0.673)0.6440.642 (0.613,0.673)0.6440.642 (0.613,0.673)0.644
253Not Equal0.615 (0.545,0.669)0.5970.617 (0.547,0.673)0.6030.614 (0.545,0.673)0.5970.614 (0.545,0.673)0.5970.614 (0.545,0.673)0.597
255Not Equal0.62 (0.551,0.673)0.6040.626 (0.558,0.687)0.6100.621 (0.551,0.673)0.6090.621 (0.551,0.673)0.6090.621 (0.551,0.673)0.609
2510Not Equal0.615 (0.549,0.667)0.6030.637 (0.564,0.7)0.6230.614 (0.547,0.667)0.6030.614 (0.547,0.667)0.6030.612 (0.545,0.667)0.596
2515Not Equal0.615 (0.551,0.667)0.6030.638 (0.564,0.699)0.6230.613 (0.547,0.667)0.5970.613 (0.547,0.667)0.5970.611 (0.547,0.662)0.596
1003Not Equal0.599 (0.556,0.638)0.6000.586 (0.546,0.621)0.5820.599 (0.556,0.639)0.5990.599 (0.556,0.639)0.5990.599 (0.556,0.639)0.599
1005Not Equal0.602 (0.559,0.643)0.5990.595 (0.551,0.633)0.5930.602 (0.558,0.644)0.6000.602 (0.558,0.644)0.6000.602 (0.558,0.644)0.600
10010Not Equal0.614 (0.574,0.653)0.6140.613 (0.574,0.65)0.6110.615 (0.575,0.655)0.6150.615 (0.575,0.655)0.6150.615 (0.575,0.655)0.615
10015Not Equal0.62 (0.578,0.659)0.6200.621 (0.582,0.658)0.6210.621 (0.579,0.66)0.6220.621 (0.579,0.66)0.6220.621 (0.579,0.66)0.622
2003Not Equal0.607 (0.578,0.635)0.6070.583 (0.548,0.614)0.5810.607 (0.579,0.636)0.6070.607 (0.579,0.636)0.6070.607 (0.579,0.636)0.607
2005Not Equal0.616 (0.589,0.644)0.6150.594 (0.564,0.623)0.5940.616 (0.589,0.643)0.6150.616 (0.589,0.643)0.6150.616 (0.589,0.643)0.615
20010Not Equal0.636 (0.608,0.667)0.6370.615 (0.586,0.643)0.6170.637 (0.61,0.666)0.6380.637 (0.61,0.666)0.6380.637 (0.61,0.666)0.638
20015Not Equal0.644 (0.616,0.676)0.6460.623 (0.594,0.653)0.6220.645 (0.616,0.676)0.6470.645 (0.616,0.676)0.6470.645 (0.616,0.676)0.647
Note: Statistics presented in this table were derived from test datasets described in Section 4.
Table 2. Mean AUC values with 95% confidence intervals obtained for Gamma distributions.
Table 2. Mean AUC values with 95% confidence intervals obtained for Gamma distributions.
LogResMaxMutInfMaxEntMinRelEntMinCrEnt
ndCovarianceMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)Median
253Equal0.613 (0.545,0.66)0.6000.63 (0.564,0.682)0.6200.613 (0.545,0.661)0.6030.613 (0.545,0.661)0.6030.613 (0.545,0.661)0.603
255Equal0.613 (0.551,0.66)0.5960.636 (0.558,0.695)0.6270.612 (0.545,0.662)0.5970.612 (0.545,0.662)0.5970.612 (0.545,0.662)0.597
2510Equal0.601 (0.54,0.643)0.5840.641 (0.565,0.705)0.6350.604 (0.542,0.647)0.5900.604 (0.542,0.647)0.5900.602 (0.543,0.643)0.590
2515Equal0.606 (0.544,0.654)0.5900.649 (0.571,0.712)0.6430.606 (0.545,0.649)0.5900.606 (0.545,0.649)0.5900.604 (0.545,0.649)0.584
1003Equal0.593 (0.55,0.631)0.5920.603 (0.563,0.64)0.6020.594 (0.549,0.632)0.5920.594 (0.549,0.632)0.5920.594 (0.549,0.632)0.592
1005Equal0.596 (0.555,0.633)0.5920.617 (0.581,0.654)0.6160.597 (0.555,0.633)0.5930.597 (0.555,0.633)0.5930.597 (0.555,0.633)0.593
10010Equal0.596 (0.553,0.632)0.5940.629 (0.591,0.667)0.6290.597 (0.554,0.633)0.5970.597 (0.554,0.633)0.5970.597 (0.554,0.633)0.597
10015Equal0.591 (0.55,0.628)0.5890.639 (0.602,0.678)0.6390.593 (0.552,0.629)0.5900.593 (0.552,0.629)0.5900.593 (0.552,0.629)0.590
2003Equal0.602 (0.574,0.629)0.6040.602 (0.574,0.63)0.6040.602 (0.576,0.629)0.6050.602 (0.576,0.629)0.6050.602 (0.576,0.629)0.605
2005Equal0.608 (0.58,0.637)0.6080.614 (0.586,0.644)0.6140.608 (0.58,0.638)0.6090.608 (0.58,0.638)0.6090.608 (0.58,0.638)0.609
20010Equal0.612 (0.583,0.642)0.6140.63 (0.604,0.657)0.6310.613 (0.584,0.643)0.6140.613 (0.584,0.643)0.6140.613 (0.584,0.643)0.614
20015Equal0.616 (0.586,0.646)0.6160.642 (0.615,0.669)0.6430.617 (0.587,0.646)0.6160.617 (0.587,0.646)0.6160.617 (0.587,0.646)0.616
253Not Equal0.613 (0.545,0.662)0.6030.63 (0.564,0.682)0.6220.614 (0.545,0.667)0.6040.614 (0.545,0.667)0.6040.614 (0.545,0.667)0.604
255Not Equal0.613 (0.551,0.663)0.5970.636 (0.558,0.695)0.6270.612 (0.545,0.662)0.5970.612 (0.545,0.662)0.5970.613 (0.545,0.662)0.597
2510Not Equal0.601 (0.54,0.647)0.5840.643 (0.567,0.707)0.6360.604 (0.545,0.647)0.5900.604 (0.545,0.647)0.5900.603 (0.545,0.647)0.590
2515Not Equal0.607 (0.539,0.654)0.5910.653 (0.577,0.718)0.6470.606 (0.54,0.649)0.5900.606 (0.54,0.649)0.5900.604 (0.54,0.65)0.590
1003Not Equal0.594 (0.551,0.632)0.5930.603 (0.563,0.641)0.6020.594 (0.552,0.633)0.5930.594 (0.552,0.633)0.5930.594 (0.552,0.633)0.593
1005Not Equal0.599 (0.558,0.636)0.5960.619 (0.581,0.656)0.6170.599 (0.557,0.637)0.5970.599 (0.557,0.637)0.5970.599 (0.557,0.637)0.597
10010Not Equal0.6 (0.557,0.638)0.6000.632 (0.593,0.669)0.6310.601 (0.558,0.639)0.6010.601 (0.558,0.639)0.6010.601 (0.558,0.639)0.601
10015Not Equal0.596 (0.556,0.633)0.5940.643 (0.607,0.681)0.6430.597 (0.557,0.635)0.5950.597 (0.557,0.635)0.5950.597 (0.557,0.635)0.595
2003Not Equal0.603 (0.576,0.63)0.6060.603 (0.575,0.631)0.6040.603 (0.577,0.63)0.6060.603 (0.577,0.63)0.6060.603 (0.577,0.63)0.606
2005Not Equal0.61 (0.583,0.639)0.6110.615 (0.587,0.645)0.6150.611 (0.583,0.64)0.6110.611 (0.583,0.64)0.6110.611 (0.583,0.64)0.611
20010Not Equal0.617 (0.587,0.646)0.6180.633 (0.606,0.66)0.6330.617 (0.588,0.647)0.6190.617 (0.588,0.647)0.6190.617 (0.588,0.647)0.619
20015Not Equal0.621 (0.591,0.651)0.6220.646 (0.619,0.672)0.6460.622 (0.592,0.652)0.6220.622 (0.592,0.652)0.6220.622 (0.592,0.652)0.622
Note: Statistics presented in this table were derived from test datasets described in Section 4.
Table 3. Mean AUC values with 95% confidence intervals obtained for Beta distributions.
Table 3. Mean AUC values with 95% confidence intervals obtained for Beta distributions.
LogResMaxMutInfMaxEntMinRelEntMinCrEnt
ndCovarianceMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)MedianMean (95%CI)Median
253Equal0.631 (0.558,0.688)0.6170.648 (0.571,0.714)0.6410.63 (0.558,0.692)0.6170.63 (0.558,0.692)0.6170.63 (0.558,0.692)0.617
255Equal0.622 (0.552,0.679)0.6100.656 (0.578,0.723)0.6490.623 (0.552,0.68)0.6100.623 (0.552,0.68)0.6100.623 (0.552,0.68)0.610
2510Equal0.614 (0.549,0.667)0.5970.669 (0.59,0.737)0.6670.616 (0.551,0.673)0.6040.616 (0.551,0.673)0.6040.615 (0.545,0.667)0.604
2515Equal0.611 (0.546,0.662)0.5930.679 (0.604,0.75)0.6790.614 (0.545,0.667)0.6000.614 (0.545,0.667)0.6000.61 (0.545,0.667)0.596
1003Equal0.626 (0.583,0.665)0.6280.63 (0.592,0.668)0.6300.626 (0.583,0.666)0.6290.626 (0.583,0.666)0.6290.626 (0.583,0.666)0.629
1005Equal0.634 (0.595,0.673)0.6360.646 (0.609,0.685)0.6460.634 (0.595,0.674)0.6370.634 (0.595,0.674)0.6370.634 (0.595,0.674)0.637
10010Equal0.636 (0.599,0.673)0.6370.663 (0.627,0.7)0.6640.637 (0.599,0.674)0.6380.637 (0.599,0.674)0.6380.637 (0.599,0.674)0.638
10015Equal0.635 (0.595,0.678)0.6350.676 (0.64,0.714)0.6770.636 (0.595,0.678)0.6360.636 (0.595,0.678)0.6360.636 (0.595,0.678)0.636
2003Equal0.638 (0.611,0.665)0.6400.631 (0.604,0.661)0.6330.638 (0.612,0.665)0.6400.638 (0.612,0.665)0.6400.638 (0.612,0.665)0.640
2005Equal0.648 (0.621,0.676)0.6490.645 (0.617,0.675)0.6460.648 (0.621,0.677)0.6500.648 (0.621,0.677)0.6500.648 (0.621,0.677)0.650
20010Equal0.658 (0.634,0.684)0.6580.664 (0.638,0.691)0.6640.659 (0.633,0.684)0.6590.659 (0.633,0.684)0.6590.659 (0.633,0.684)0.659
20015Equal0.665 (0.638,0.693)0.6660.679 (0.655,0.705)0.6800.666 (0.638,0.694)0.6660.666 (0.638,0.694)0.6660.666 (0.638,0.694)0.666
253Not Equal0.632 (0.563,0.692)0.6200.648 (0.571,0.714)0.6430.631 (0.564,0.691)0.6200.631 (0.564,0.691)0.6200.631 (0.564,0.691)0.620
255Not Equal0.624 (0.551,0.68)0.6150.658 (0.578,0.725)0.6510.625 (0.552,0.682)0.6130.625 (0.552,0.682)0.6130.625 (0.552,0.682)0.613
2510Not Equal0.616 (0.551,0.667)0.6000.672 (0.593,0.74)0.6670.618 (0.551,0.673)0.6040.618 (0.551,0.673)0.6040.617 (0.551,0.669)0.607
2515Not Equal0.614 (0.547,0.667)0.5970.686 (0.61,0.76)0.6860.616 (0.545,0.669)0.6030.616 (0.545,0.669)0.6030.612 (0.545,0.667)0.597
1003Not Equal0.628 (0.585,0.667)0.6290.631 (0.593,0.669)0.6300.628 (0.585,0.669)0.6300.628 (0.585,0.669)0.6300.628 (0.585,0.669)0.630
1005Not Equal0.638 (0.599,0.677)0.6410.649 (0.612,0.688)0.6480.638 (0.599,0.678)0.6410.638 (0.599,0.678)0.6410.638 (0.599,0.678)0.641
10010Not Equal0.642 (0.605,0.681)0.6440.668 (0.631,0.705)0.6690.643 (0.606,0.682)0.6450.643 (0.606,0.682)0.6450.643 (0.606,0.682)0.645
10015Not Equal0.643 (0.605,0.687)0.6430.682 (0.647,0.72)0.6840.643 (0.604,0.687)0.6440.643 (0.604,0.687)0.6440.643 (0.604,0.687)0.644
2003Not Equal0.64 (0.613,0.667)0.6420.632 (0.605,0.662)0.6340.64 (0.613,0.667)0.6420.64 (0.613,0.667)0.6420.64 (0.613,0.667)0.642
2005Not Equal0.652 (0.624,0.68)0.6530.647 (0.618,0.676)0.6480.652 (0.624,0.68)0.6530.652 (0.624,0.68)0.6530.652 (0.624,0.68)0.653
20010Not Equal0.665 (0.64,0.692)0.6650.67 (0.643,0.695)0.6700.665 (0.64,0.692)0.6650.665 (0.64,0.692)0.6650.665 (0.64,0.692)0.665
20015Not Equal0.673 (0.647,0.701)0.6740.685 (0.66,0.711)0.6870.674 (0.648,0.702)0.6740.674 (0.648,0.702)0.6740.674 (0.648,0.702)0.674
Note: Statistics presented in this table were derived from test datasets described in Section 4.
Table 4. AUC values for the first five predictors in each dataset.
Table 4. AUC values for the first five predictors in each dataset.
VariablePrognosticDiagnostic
radius_mean0.6110.938
texture_mean0.5350.776
perimeter_mean0.6130.947
area_mean0.6180.938
smoothness_mean0.5320.722
Table 5. Coefficients obtained for prognostic dataset.
Table 5. Coefficients obtained for prognostic dataset.
PrognosticLogResMaxMutInfMaxEntMinRelEntMinCrEnt
radius_mean−0.6092090.537218−2.03368−2.03368−2.03368
texture_mean−0.05188590.995454−0.159585−0.159585−0.159585
perimeter_mean4.83955 × 10 5 0.0269349−0.0419467−0.0419467−0.0419467
area_mean0.006679780.06933752.455392.455392.45539
smoothness_mean2.936370.1156250.02845660.02845660.0284566
Table 6. Coefficients obtained for the diagnostic dataset.
Table 6. Coefficients obtained for the diagnostic dataset.
DiagnosticLogResMaxMutInfMaxEntMinRelEntMinCrEnt
radius_mean−6.275250.538316−1.98606−1.98606−17.6992
texture_mean0.36410.9973581.698411.698411.57074
perimeter_mean0.6071570.02754777.142627.1426213.7528
area_mean0.04177760.0702387−0.053743−0.05374310.5983
smoothness_mean118.4620.1158561.749351.749351.67574
Table 7. Predictive ability of calculated linear combinations for prognostic data.
Table 7. Predictive ability of calculated linear combinations for prognostic data.
MetricLogResMaxMutInfMaxEntMinRelEntMinCrEnt
AUC0.6260.6100.6180.6180.618
AUPRC0.3880.3320.3400.3400.340
MCC0.1940.1650.02030.02030.0203
Table 8. Predictive ability of calculated linear combinations for diagnostic data.
Table 8. Predictive ability of calculated linear combinations for diagnostic data.
MetricLogResMaxMutInfMaxEntMinRelEntMinCrEnt
AUC0.9840.9510.9530.9530.940
AUPRC0.9730.9350.9360.9360.921
MCC0.8530.5770.6030.6030.293
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

İyisoy, M.S.; Özdemir, P. Entropy Methods on Finding Optimal Linear Combinations with an Application to Biomarkers. Entropy 2025, 27, 985. https://doi.org/10.3390/e27090985

AMA Style

İyisoy MS, Özdemir P. Entropy Methods on Finding Optimal Linear Combinations with an Application to Biomarkers. Entropy. 2025; 27(9):985. https://doi.org/10.3390/e27090985

Chicago/Turabian Style

İyisoy, Mehmet Sinan, and Pınar Özdemir. 2025. "Entropy Methods on Finding Optimal Linear Combinations with an Application to Biomarkers" Entropy 27, no. 9: 985. https://doi.org/10.3390/e27090985

APA Style

İyisoy, M. S., & Özdemir, P. (2025). Entropy Methods on Finding Optimal Linear Combinations with an Application to Biomarkers. Entropy, 27(9), 985. https://doi.org/10.3390/e27090985

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop