Abstract
Identifying an optimal linear combination of continuous variables is a key objective in various fields of research, such as medicine. This manuscript explores the use of information-theoretical approaches used to establish these linear combinations. Coefficients obtained from logistic regression can be used to construct such a linear combination, and this approach has been commonly adopted in the literature for comparison purposes. The main contribution of this work is to propose novel ways of determining these linear combination coefficients by optimizing information-theoretical objective functions. Biomarkers are usually continuous measurements utilized to diagnose if a patient has the underlying disease. Certain disease contexts may lack high diagnostic power biomarkers, making their optimal combination a critical area of interest. We apply the above-mentioned novel methods to the problem of a combination of biomarkers. We assess the performance of our proposed methods against combinations derived from logistic regression coefficients, by comparing area under the ROC curve (AUC) values and other metrics in a broad simulation and a real life data application.
MSC:
62B10; 94A15
1. Introduction
Information theoretical concepts are inherently and deeply linked to statistical principles. These concepts play a crucial role in the processes of data analysis and modeling. The rich set of tools information theory offers focus on quantifying uncertainty in a random variable, divergence, dependence, and information gain.
They facilitate the understanding of how data can be utilized more efficiently, which variables provide greater informational value, and how uncertainty within the data can be quantified. They are used in areas like model selection, estimation, hypothesis testing, learning, and decision-making.
By integrating information-theoretic objectives into statistical inference and learning, researchers can develop models that are both interpretable and data-efficient, making these methods especially valuable in high-dimensional and data-scarce domains such as bioinformatics, neuroscience, and medical diagnostics.
To name a few, tools and notions from information theory such as Shannon entropy, Kullback–Leibler divergence, and mutual information provide rigorous means for assessing the complexity and informativeness of models and data. These measures are central to a wide range of statistical applications, including model selection (e.g., through criteria like AIC, Akaike’s Information Criteria and MDL, Mininum Description Length), feature selection, clustering, density estimation, and hypothesis testing.
Additionally, methods based on the principle of maximum entropy allow for the estimation of probability distributions under partial information, yielding minimally biased solutions constrained only by known data characteristics. Cross-entropy and information gain are widely used in classification tasks and decision tree algorithms, respectively, to optimize predictive performance.
To define the core problem addressed in this manuscript, consider a given matrix X of dimension with continuous individual predictors being its columns and a given binary outcome vector y of . Our aim is to find a coefficient vector w such that the linear combination would be a better predictor of y than all individual columns of X.
It is well known that logistic regression maximizes the logarithm of the likelihood function
having to estimate coefficients vector where is the sigmoid function, n is the number of samples, and is the row of X. After estimating coefficients vector from logistic regression, one can easily select as a candidate coefficient vector to be used for linear combination of individual predictors.
The novel approaches we will exhibit in the following pages utilize information-theoretical principles for the determination of coefficient vector w. Section 2 reviews previous methods used for linear combinations of biomarkers, all of which are non-information-theoretic in nature. Section 3 is mainly for the details of our proposed methods. Section 4 is devoted to details and results of the simulation, and Section 5 showcases an application to real-world data.
2. Previous Methods Used in Linear Combinations of Biomarkers
A biomarker is typically employed as a diagnostic or evaluative tool in medical research. Identifying a single ideal biomarker that offers both high sensitivity and specificity is challenging, particularly when high specificity is needed for population screening. An effective alternative is the combination of multiple biomarkers, which can deliver superior performance compared to using just one biomarker.
Especially in cases where there is more than one weak biomarker, it is important to use them in combination [1]. Combining biomarkers has several advantages such as increased diagnostic accuracy, better prediction of disease course and prognosis, and the development of personalized medical applications.
Numerous studies in literature employ ROC (Receiver Operating Characteristic) analysis to assess the performance of combined biomarkers developed through various methodologies. Those in binary classification focused on maximizing various non-parametric estimates of the area under the ROC curve (AUC) or the Mann–Whitney U statistic to derive the optimal linear combination of single biomarkers.
AUC is an essential performance indicator of binary models. AUC values show the model’s ability to distinguish between two classes of the binary outcome (discriminative ability). High AUC values indicate better and more accurate discrimination.
The authors in [2] used Fisher’s discriminant functions under multivariate normality and (non)proportional covariance settings to produce a linear combination which maximizes AUC. Later, the authors in [3] enhanced and investigated some other properties of these linear combinations and proposed alternative combinations.
The authors in [4] proposed a distribution-free rank based approach for deriving linear combinations of two biomarkers that maximize the area or partial area under the emprical ROC curve. They compared its performance to those obtained from optimizing logistic likelihood function and linear discriminant analysis. They claimed that linear discriminant analysis optimizes AUC when multivariate normality was achieved. They provided further insights such as selecting emprical AUC as an objective function may yield better performances than selecting a logistic likelihood function in another publication [5].
The nonparametric Min–Max procedure proposed in [6] is claimed to be more robust to distributional assumptions, is easier to compute, and has better performance. The methodology proposed in [7] extended combination approaches to the problem where outcome is not dichotomous but ordinal with more than two categories. They proposed two new methods and compared their performances with existing methods.
In the study [1], they compared existing methods where the number of biomarkers are large, biomarkers are weak, and the number of observations are not an order of magnitude greater than the number of biomarkers. They also proposed a new method for combinations.
Underlining certain inadequacies of present combination methods, the authors in [8] proposed a new kernel-based AUC optimization method and claimed that it outperformed the smoothed AUC method previously proposed in [9].
The authors in [10] propose a derivative-free black-box optimization technique, called Spherically Constrained Optimization Routine (SCOR), to identify optimal linear combinations where the outcome is ordinal. The method proposed in [11], Nonparametric Predictive Inference (NPI), examines the best linear combination of two biomarkers, where the dependency between two biomarkers is modeled using parametric copulas. Another copula-based combination method in [12] utilized different copulas for an optimal linear combination of two biomarkers with a binary outcome.
It is important to note that none of the preceding methods employed information-theoretical concepts in their methodologies.
3. Information Theoretical Methods for Linear Combinations
This section details maximum entropy, minimum cross entropy, minimum relative entropy, and maximum mutual information concepts.
3.1. Entropy Maximization MaxEnt
Having historical roots in physics, the maximum entropy principle is an approach to finding the distribution that maximizes the entropy of a probability distribution under certain constraints. It is proposed by Jaynes [13] as a general principle of inference and has been applied successfully to numerous fields. Maximizing entropy means choosing the distribution that reflects maximum uncertainty. Axiomatic derivation of maximum entropy and minimum cross entropy principle was done by [14]. The authors in [15] applied the maximum entropy principle to recover information from multinomial data.
Suppose we know that a system has a set of possible states with unknown probabilities for and we have a given matrix X of dimension and a binary vector y of dimension n. We want to find a coefficient vector w of dimension d so that vector would be a better predictor of Y.
The maximum entropy principle explains how to select a certain distribution among different possible distributions that satisfy our constraints. We treat as variables when finding the maximum entropy solution to our problem. Notice that, since we deal with binary vector y and , solving for will suffice and reduce the complexity of the problem.
In mathematical terms, we want to find that maximizes :
Replacing , this would become
We impose the following constraints to fit the data:
and
where values are empirical expectations obtained from data and . Since, in general, we have many more observations than predictors, n is much larger than . We can relax the second constraint (2) to since values can be easily obtained by . Hence, the final optimization problem becomes
under the constraints
and
This is a solvable convex optimization problem with linear constraints. After determining namely values, w still needs to be found. Since we based the problem on finding probabilities first rather than the coefficient vector w first, we need to glue these probabilities to our given matrix X. We use logits
obtained from those probabilities and perform least squares estimation to equation where l is the logits vector.
Other ways of estimating w without directly finding probabilities were possible, as we did below in cross entropy minimization. However, we deliberately focused here on this method to show that the method finds probabilities complying to the Maximum Entropy principle.
3.2. Cross-Entropy Minimization MinCrEnt
First proposed by Kullback [14], this principle incorporates cross entropy of two distributions to find the optimal solution.
Suppose we know that a system has a set of possible states with unknown probabilities for , as described in Section 3.1.
The principle states that, among different possible distributions p that satisfy certain constraints, one should select the one which minimizes the cross entropy with q where q is a known a priori distribution. In our case, we select q to be the empirical distribution of the outcome (namely, Y):
Again using , we can simplify the objective function to
The objective function we obtained on right hand side indeed coincides with the negative of log-likelihood function from logistic regression. We know the distribution of values a priori, but we do not know the value. A natural and inevitable choice for getting values is to use a transformation of given data matrix X with coefficient vector w. Thus, we define
where is the sigmoid function. Hence, the optimization problem becomes
where values are known probabilities obtained from vector y.
We do not impose any further constraints here because use of sigmoid function confines the probabilities between 0 and 1, and we already made use of X. This is an unconstrained convex optimization problem that can easily be solved.
3.3. Relative Entropy Minimization (MinRelEnt)
Relative entropy minimization is a technique less known and partly misidentified with cross entropy minimization [16]. Like cross entropy, relative entropy requires a reference distribution. We used the empirical distribution of y as a reference distribution q in cross entropy minimization, but we choose the uniform distribution here. The optimization problem becomes
or
As we did in entropy maximization, we use the same constraint on empirical expectations to determine probabilities
and for probabilities we set
This is also a convex optimization problem with linear constraints. After finding probabilities, we transform them to logits and solve where l is the logits vector using least squares as we did for entropy maximization. It is worth noting that alternative choices for are possible, such as the empirical distribution of y or incorporating any other prior belief about y.
3.4. Mutual Information Maximization MaxMutInf
As was previously done in a similar but not the same setting by Faivishevsky and Goldberger [17], we estimate coefficient vector w through mutual information maximization. We use the following identity for mutual information:
where X is the predictor matrix, Y is the emprical distribution of outcome vector y in our problem, and is the mutual information estimator defined by Faivishevsky and Goldberger. Since Y is categorical, we deal with conditional entropies corresponding to states of :
where is the entropy of X restricted to values where Y takes the value (also called in-class entropy). Hence, the optimization problem becomes
or
where is the number of observations having class value . It is easy to see that . The smoothness of MeanNN entropy estimator enables its gradient analytically computed. Therefore, with respect to w becomes
Since the gradient of mutual information is achieved, we use gradient ascent method to maximize mutual information.
4. Simulation
An extensive simulation study was conducted to compare the efficiencies of those methods on combination of continuous variables (or biomarkers). Imitating but also enriching the one performed previously in [1] for comparison purposes, we considered normal, gamma, and beta distributions under different settings.
In all settings, we assumed mean values corresponding to class . For class , we set mean values as , where d denotes the number of predictors and . As a result, the mean values for classes were evenly spaced and dependent on the number of predictors d. Candidate biomarkers generated through this approach, typically have AUC values between , most commonly falling below .
We used multivariate normal variables with equal and not equal covariance structures to generate normal, beta, and gamma variates using normal copula and inverse transform sampling. When covariances are equal, we fixed where I is the identity matrix, J is a matrix of all 1 s, and . For unequal covariance, we set keeping unchanged.
For each setting, 1000 datasets were generated and randomly divided into two sets of equal size. One set was used for training and the other for testing. Coefficient estimates obtained from training datasets were recorded for each method. Using the test datasets, linear combinations were computed based on the previously estimated coefficients, and corresponding performance metrics were evaluated. These metrics included AUC, Area Under the Precision–Recall Curve (AUPRC), and Matthews Correlation Coefficient (MCC). In addition to mean values, 95% confidence intervals and median values were reported for each metric.
Sample sizes were set to , and the number of predictors was varied across . Unlike the referenced simulation study, we also examined scenarios with unequal class allocation, specifically cases where . In the unequal class allocation scenarios, the sample sizes for the class were adjusted to . Table 1, Table 2 and Table 3 present AUC values obtained from the simulation results. Additional results including those for AUPRC, MCC, and all unequal allocation cases are provided in the Supplementary Material.
Table 1.
Mean AUC values with 95% confidence intervals obtained for multivariate Normal distributions.
Table 2.
Mean AUC values with 95% confidence intervals obtained for Gamma distributions.
Table 3.
Mean AUC values with 95% confidence intervals obtained for Beta distributions.
All simulations were conducted using Julia version 1.11.0-rc3. We utilized the Ipopt.jl package version 1.11.0 for nonlinear constrained optimization in entropy-based methods, the GLM.jl package version 1.9.0 for logistic regression, and custom gradient ascent code for the mutual information maximization method.
The code used in the simulations will be released as public Julia and R packages in future publications. Interested researchers may contact the author for access.
5. Application to Real-Life Data
We used publicly available Wisconsin datasets for the prognosis [18] and diagnosis [19] of breast cancer. There are 30 continuous predictors in each dataset to be used for the prediction of either prognosis or diagnosis of breast cancer indicated .
The prognostic dataset has 198 observations while the diagnostic has 569. Frequencies of class are and in those datasets. We considered the first 5 predictors in each dataset for our calculations, namely the variables radius_mean, texture_mean, perimeter_mean, area_mean, and smoothness_mean.
Predictive ability of these variables are given in Table 4. As can be seen from the Table 4, those predictors have a very low predictive ability for the prognosis of breast cancer, but a high ability for diagnosis.
Table 4.
AUC values for the first five predictors in each dataset.
We performed logistic regression and entropy optimization techniques described in Section 3 and gathered coefficients shown in Table 5 and Table 6, representing candidate linear combinations for both datasets.
Table 5.
Coefficients obtained for prognostic dataset.
Table 6.
Coefficients obtained for the diagnostic dataset.
Finally, calculating linear combinations using those coefficients, we obtained AUC, AUPRC, and MCC values given in Table 7 and Table 8. Our methods yielded metric values very similar to those derived from logistic regression.
Table 7.
Predictive ability of calculated linear combinations for prognostic data.
Table 8.
Predictive ability of calculated linear combinations for diagnostic data.
6. Discussion and Conclusions
The primary objective of this study was to explore how information-theoretical methods can be applied to construct linear combinations of continuous variables in the context of a binary classification problem.
We addressed this question by introducing four distinct approaches (MaxEnt, MinCrEnt, MinRelEnt, and MaxMutInf), each grounded in fundamental principles of information theory.
We believe that identifying and formalizing these approaches may guide future research directions by drawing greater attention to information-theoretical concepts and encouraging their broader application in this problem setting. This study also represents the first systematic evaluation of information-theoretic approaches applied in this setting.
Earlier methods were often constrained by strong distributional assumptions about biomarkers, such as normality and equal or proportional covariance structures. Additionally, the number of predictors they could handle was typically limited to just two.
More recent approaches have relaxed these assumptions, allowing for the inclusion of a larger number of biomarkers in linear combinations. However, most of these methods rely on performance metrics like sensitivity, specific segments of the ROC curve, or the AUC (Area Under the Curve) as the optimization objective. Some later methods extended their applicability to multi-class classification problems using metrics such as the Volume Under the Surface (VUS) or the Hypervolume Under the Manifold (HUM).
Among these, the SCOR algorithm [10] and the method proposed in [1] have shown promising results in maximizing the AUC or HUM objective.
Notably, methods established in [11,13] distinguish themselves by incorporating copulas into the picture, representing the first known use of copulas in the context of biomarker combination. Despite this innovation, they currently remain limited to handling only two biomarkers.
An important distinction of our proposed methods is that, whereas previous methods primarily aimed to maximize a metric such as the AUC value, the proposed methods in this article rely solely on information-theoretic criteria and never incorporated AUC or any other metric as an optimization objective.
As another distinction, unlike many previous studies that relied on a single evaluation metric, our assessment of model performance employed a comprehensive set of three metrics: AUC, AUPRC, and MCC. This multi-faceted approach provides a more robust and nuanced comparison of model performance across different aspects of classification quality.
The proposed methods here are straightforward to apply in binary classification problems or to be extended into a multiclass setting. These methods are computationally simpler than most existing approaches, with the exception of MaxMutInf. Due to the need to compute complex gradient functions when optimizing mutual information, MaxMutInf is relatively more challenging to implement. Extending our methods into classification problems involving more than two outcome levels may represent a valuable direction for future research.
The three entropy-based methods (MaxEnt, MinCrEnt, and MinRelEnt) demonstrated test performances that were consistently comparable to those derived from logistic regression, across all simulation conditions and real data applications. The MaxMutInf method yielded slightly greater test AUC, AUPRC, and MCC values compared to the other approaches, particularly in simulation settings involving Beta and Gamma distributed data. The differences were more pronounced in simulations with smaller sample sizes. However, in some settings involving normally distributed data, the MaxMutInf method produced test AUC and AUPRC metrics that were comparable or even marginally lower than those of other methods.
The Maximum Mutual Information method appeared more robust to distributional asymmetries, as well as to smaller and unequal sample sizes, except in a few settings involving normally distributed variables. Exploring and implementing different differential entropy estimators beyond the Kozachenko–Leonenko entropy estimator may be a future research direction that could potentially reveal more insights into the performance of Mutual Information Maximization.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e27090985/s1, Table S1: Mean AUPRC values with 95% confidence intervals obtained for multivariate Normal distributions in equal allocation settings. Table S2: Mean AUPRC values with 95% confidence intervals obtained for Gamma distributions in equal allocation settings. Table S3: Mean AUPRC values with 95% confidence intervals obtained for Beta distributions in equal allocation settings. Table S4: Mean MCC values with 95% confidence intervals obtained for multivariate Normal distributions in equal allocation settings. Table S5: Mean MCC values with 95% confidence intervals obtained for Gamma distributions in equal allocation settings. Table S6: Mean MCC values with 95% confidence intervals obtained for Beta distributions in equal allocation settings. Table S7: Mean AUC values with 95% confidence intervals obtained for multivariate Normal distributions in unequal allocation settings. Table S8: Mean AUC values with 95% confidence intervals obtained for Gamma distributions in unequal allocation settings. Table S9: Mean AUC values with 95% confidence intervals obtained for Beta distributions in unequal allocation settings. Table S10: Mean AUPRC values with 95% confidence intervals obtained for multivariate Normal distributions in unequal allocation settings. Table S11: Mean AUPRC values with 95% confidence intervals obtained for Gamma distributions in unequal allocation settings. Table S12: Mean AUPRC values with 95% confidence intervals obtained for Beta distributions in unequal allocation settings. Table S13: Mean MCC values with 95% confidence intervals obtained for multivariate Normal distributions in unequal allocation settings. Table S14: Mean MCC values with 95% confidence intervals obtained for Gamma distributions in unequal allocation settings. Table S15: Mean MCC values with 95% confidence intervals obtained for Beta distributions in unequal allocation settings.
Author Contributions
Software, M.S.İ.; Validation, M.S.İ.; Formal Analysis, M.S.İ.; Resources, M.S.İ.; Writing—Original Draft, M.S.İ.; Writing—Review & Editing, M.S.İ.; Visualization, M.S.İ.; Supervision, P.Ö. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Winconsin datasets can be found at https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic and https://archive.ics.uci.edu/dataset/16/breast+cancer+wisconsin+prognostic (accessed on 17 April 2025).
Acknowledgments
We thank Yasemin Öztürk for her help in building this manuscript. We would also like to thank the anonymous reviewers for their insightful comments that led to significant contributions to the manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| ROC | Receiver Operating Characteristics |
| AUC | Area under the ROC curve |
| AUPRC | Area under the precision recall curve |
| MCC | Matthews Correlation Coefficient |
| MaxMutInf | Maximum Mutual Information |
| MaxEnt | Maximum Entropy |
| MinCrEnt | Minimum Cross Entropy |
| MinRelEnt | Minimum Relative Entropy |
| LogRes | Logistic Regression |
References
- Yan, L.; Tian, L.; Liu, S. Combining large number of weak biomarkers based on AUC. Stat. Med. 2015, 34, 3811–3830. [Google Scholar] [CrossRef] [PubMed]
- Su, J.Q.; Liu, J.S. Linear Combinations of Multiple Diagnostic Markers. J. Am. Stat. Assoc. 1993, 88, 1350–1355. [Google Scholar] [CrossRef]
- Liu, A.; Schisterman, E.F.; Zhu, Y. On linear combinations of biomarkers to improve diagnostic accuracy. Stat. Med. 2005, 24, 37–47. [Google Scholar] [CrossRef] [PubMed]
- Pepe, M.S.; Thompson, M.L. Combining diagnostic test results to increase accuracy. Biostatistics 2000, 1, 123–140. [Google Scholar] [CrossRef] [PubMed]
- Pepe, M.S.; Cai, T.; Longton, G. Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve. Biometrics 2006, 62, 221–229. [Google Scholar] [CrossRef] [PubMed]
- Liu, C.; Liu, A.; Halabi, S. A min–max combination of biomarkers to improve diagnostic accuracy. Stat. Med. 2011, 30, 2005–2014. [Google Scholar] [CrossRef] [PubMed]
- Kang, L.; Xiong, C.; Crane, P.; Tian, L. Linear combinations of biomarkers to improve diagnostic accuracy with three ordinal diagnostic categories. Stat. Med. 2013, 32, 631–643. [Google Scholar] [CrossRef] [PubMed]
- Fong, Y.; Yin, S.; Huang, Y. Combining biomarkers linearly and nonlinearly for classification using the area under the ROC curve. Stat. Med. 2016, 35, 3792–3809. [Google Scholar] [CrossRef] [PubMed]
- Lloyd, C.J. Using Smoothed Receiver Operating Characteristic Curves to Summarize and Compare Diagnostic Systems. J. Am. Stat. Assoc. 1998, 93, 1356–1364. [Google Scholar] [CrossRef]
- Das, P.; De, D.; Maiti, R.; Kamal, M.; Hutcheson, K.A.; Fuller, C.D.; Chakraborty, B.; Peterson, C.B. Estimating the optimal linear combination of predictors using spherically constrained optimization. BMC Bioinform. 2022, 23, 436. [Google Scholar] [CrossRef] [PubMed]
- Muhammad, N.; Coolen-Maturi, T.; Coolen, F.P. Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests. Stat. Optim. Inf. Comput. 2018, 6, 398–408. [Google Scholar] [CrossRef][Green Version]
- Islam, S.; Anand, S.; Hamid, J.; Thabane, L.; Beyene, J. A copula-based method of classifying individuals into binary disease categories using dependent biomarkers. Stat. Methods Appl. 2020, 29, 871–897. [Google Scholar] [CrossRef]
- Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
- Shore, J.; Johnson, R. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef]
- Golan, A.; Judge, G.; Perloff, J.M. A Maximum Entropy Approach to Recovering Information from Multinomial Response Data. J. Am. Stat. Assoc. 1996, 91, 841–853. [Google Scholar] [CrossRef]
- Banavar, J.; Maritan, A. The maximum relative entropy principle. arXiv 2007, arXiv:cond-mat/0703622. [Google Scholar] [CrossRef]
- Faivishevsky, L.; Goldberger, J. Dimensionality reduction based on non-parametric mutual information. Neurocomputing 2012, 80, 31–37. [Google Scholar] [CrossRef]
- Wolberg, W.; Street, W.; Mangasarian, O. Breast Cancer Wisconsin (Prognostic). UCI Machine Learning Repository. 1995. Available online: https://archive.ics.uci.edu/dataset/16/breast+cancer+wisconsin+prognostic (accessed on 17 April 2025).
- Wolberg, W.; Mangasarian, O.; Street, N.; Street, W. Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository, 1993. 1993. Available online: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic (accessed on 17 April 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).