Geo-information Targeting: Logistic Regression, Special Cases and Extensions

Logistic regression is a classical linear model for logit-transformed conditional probabilities of a binary target variable. It recovers the true conditional probabilities if the joint distribution of predictors and the target is of log-linear form. Weights-of-evidence is an ordinary logistic regression with parameters equal to the differences of the weights of evidence if all predictor variables are discrete and conditionally independent given the target variable. The hypothesis of conditional independence can be tested in terms of log-linear models. If the assumption of conditional independence is violated, the application of weights-of-evidence does not only corrupt the predicted conditional probabilities, but also their rank transform. Logistic regression models, including the interaction terms, can account for the lack of conditional independence, appropriate interaction terms compensate exactly for violations of conditional independence. Multilayer artificial neural nets may be seen as nested regression-like models, with some sigmoidal activation function. Most often, the logistic function is used as the activation function. If the net topology, i.e., its control, is sufficiently versatile to mimic interaction terms, artificial neural nets are able to account for violations of conditional independence and yield very similar results. Weights-of-evidence cannot reasonably include interaction terms; subsequent modifications of the weights, as often suggested, cannot emulate the effect of interaction terms.


Introduction
The objective of potential modeling or targeting [1] is to identify locations, i.e., pixels or voxels, for which the probability of an event spatially referenced in this way, e.g., a well-defined type of ore mineralization, is relatively maximum, i.e., is larger than in neighbor pixels or voxels.The major prerequisite for such predictions is a sufficient understanding of the causes of the target to be predicted.Conceptual models of ore deposits have been compiled by [2].They may be read as factor models (in the sense of mathematical statistics), and a proper factor model may be turned into a regression-type model when using the factors as spatially-referenced predictors, which are favorable to or prohibitive of the target event.Thus, we may distinguish necessary or sufficient dependencies between the binary target T(x) indicating the presence or absence of the target at an areal or volumetric location x ⊂ D ⊂ R d , d = 1, 2, 3, and the spatially referenced predictors (B 0 (x), B 1 (x), . . ., B m (x)) T = B(x), which may be binary, discrete or continuous.Then, mathematical models and their numerical realizations are required to turn descriptive models into constructive ones, i.e., into quantitatively predictive models.Generally, a model considers the predictor B(x), with B 0 (x) ≡ 1 for all x ⊂ D, and assigns a parameter (θ 0 , . . ., θ m ) T = θ to them, which quantifies, by means of a link function F, the extent of dependence of the conditional probability P (T(x) = 1|B(x)) on the predictors, i.e., P (T(x) = 1|B(x)) = F (θ|B(x)) Since the target T(x), as well as the predictor B(x) refer to areal or volumetric locations x ⊂ D, we may think of a two-dimensional digital map image of pixels or a three-dimensional digital geomodel of voxels.The pixels or voxels initially provide the physical support of the predictors and the target and will then be assigned the predicted conditional probability and the associated estimation errors, respectively.Then, the numerical results of targeting depend on the size of the objects, pixels or voxels, i.e., on the spatial resolution they provide.If the actual spatial reference of the target (or the predictors) is rather pointwise, i.e., if their physical support is rather of zero measure, then the dependence on the spatial resolution must not be ignored, because already, the estimate O(T = 1) = P (T = 1)/(1 − P (T = 1)) of the unconditional odds will be largely affected, as the total number of pixels or voxels depends on the spatial resolution, while the total number of pointwise occurrences is constant.If the spatial resolution provided by the pixels or voxels is poor with respect to the area or volume of the actual physical support of the predictors or target, then the numerical results of any kind of mathematical method of targeting are rather an artifact of the inappropriate spatial resolution.
To estimate the model parameters θ, data within a training region are required.The mathematical modeling assumption associated with a training dataset is complete knowledge, i.e., in particular, we assume that we know all occurrences of the target variable T = 1.However, in contrast to geostatistics [3], potential modeling does not consider spatially-induced dependencies between the predictors and the target.In fact, potential modeling applies the assumption of independently identically distributed random variables.Their distribution does not depend on the location.Therefore, any spatial reference can be dropped, and models of the form: are considered, only.

The Modeling Assumption of Conditional Independence
The random variables B 1 , . . ., B m are conditionally independent given the random target variable T, if the joint conditional probability factorizes into the individual conditional probabilities: Equivalently, but more instructively in terms of irrelevance, the random variables B 1 , . . ., B m are conditionally independent given the random variable T, if knowing T renders all other B j except B i irrelevant for predicting B i , i.e., in terms of conditional probabilities.
It is emphasized that independence does not imply conditional independence and vice versa.The significant correlation of predictor variables does not imply that they are not conditionally independent.On the contrary, variables B 1 and B 2 may be significantly correlated and conditionally independent given the variable T, in particular when T can be interpreted to represent a common cause for B 1 and B 2 , cf. the illustrative example [4].In this way, conditional independence is a probabilistic approach to causality, while correlation is not.To relax the restrictive assumption that all predictor variables are conditionally independent given the target variable, the assumption of conditional independence of subsets of predictor variables, referred to as the Bayesian belief network, provides intermediate models that are less restrictive, but more tractable than general models [5].A suitable choice of subsets are the cliques of the graphical model [6] representing the variables and their conditional independence relationships leading to interaction terms in logistic regression models [7].

Logistic Regression
A modern account of logistic regression is given by [8].The conditional expectation of an indicator random target variable T given a (m + 1)-variate random predictor variable is equal to its conditional probability, i.e., for Omitting the binomially distributed error term ( [8]), as is done often, the ordinary logistic regression model without interaction terms for the conditional probability to be predicted can be written as [8]: • in terms of a logit: • in terms of a probability: • with the logistic function: The ordinary logistic regression model is optimum, i.e., it agrees with the true conditional probability, if the predictor variables are discrete and conditionally independent given the target variable [7].Here, the predictor variables are assumed to be discrete to ensure that the joint probability of B and T has a representation as a log-linear model, which is then subject to factorization, according to the Hammersley-Clifford theorem [7].
The logistic regression model can be generalized to include any interaction terms of the form B i * . . .* B j , i.e., any product terms of predictors: Lacking conditional independence can be exactly compensated for by corresponding interaction terms included in the logistic regression model, and the resulting logistic regression model with interaction terms is optimum for continuous predictor variables if the joint distribution of the target variable and the predictor variables is of a log-linear form.A log-linear form is ensured if the predictor variables are discrete.Thus, for discrete predictor variables, the logistic regression model, including appropriate interaction terms, is optimum [7].
Given m ≥ 2 predictor variables possible interaction terms.To be a feasible model, the total number 2 m of all possible terms would have to be reasonably smaller than the sample size n.However, the interaction term B 1 ⊗ . . .⊗ B k , k ≤ m, is actually required if B 1 , . . ., B k are not conditionally independent given T.
Logistic regression parameters can be interpreted with respect to logits analogously to the parameters of linear regression model, e.g., β represents the increment of logitP (T = 1 | B) if B is increased by one unit [8].There are more involved interpretations to come, cf.Appendix B.
Given a sample b ,i , t i , i = 1, . . ., n, = 1, . . ., m, the parameters of the logistic regression model are estimated with the maximum likelihood method numerically realized in Fisher's scoring algorithm (a form of Newton-Raphson, a special case of an iteratively reweighted least squares algorithm) and encoded in any major statistical software package.

Weights-of-Evidence
The model of weights-of-evidence is the special case of a logistic regression model without interaction terms, if all predictor variables are binary and conditionally independent given the target variable [9].It reads, e.g., in terms of the conditional probability to be predicted: where: with contrasts C defined as: with weights-of-evidence: and with: provided that: Since the model of weights-of-evidence [10][11][12][13][14] is based on the naive Bayesian approach [5,[14][15][16][17] assuming conditional independence of B given T, it can be derived in elementary terms from Bayes' theorem for indicator random variables B 0 , B 1 , . . ., B m : where the (conditional) odds O of an event are defined as the ratio of the (conditional) probabilities of an event and its complement.Now, the naive Bayes' assumption of conditional independence of all predictor variables B given the target variable T leads to the most efficient simplification: and, in turn, to weights-of-evidence in terms of odds: i.e., updating the unconditional "prior" odds O(T = 1) by successive multiplication with "Bayes factors" P (B | T = 1)/P (B | T = 0) to result in final conditional "posterior" odds O(T = 1 | B) [13]; see Appendix 1 for a complete derivation.Due to the simplifying assumption of conditional independence and in contrast to general logistic regression, the ratios of conditional probabilities involved in the definition of the weights of evidence, Equation (4), can be estimated by mere counting.Moreover, weights-of-evidence can easily be generalized to discrete random variables, as a discrete variable with s different states can be split into (s − 1) different binary random variables to be used in regression models.

Testing Conditional Independence
A straightforward test of conditional independence employs the relationship of weights-of-evidence and log-linear models.If predictor variables are discrete and conditionally independent given the target variable, then by virtue of the Hammersley-Clifford theorem, a correspondingly factorized simple log-linear model without interaction terms is sufficiently large to represent the joint distribution [7,9].Thus, if the likelihood ratio test of this null-hypothesis with respect to an appropriate log-linear model leads to its reasonable rejection, then the assumption of conditional independence can be rejected, too.This test does not rely on any assumption involving the normal distribution, as the omnibus tests [18,19] do.
These omnibus tests use deviations of a characteristic of a fitted model F θ | b i , t i , i = 1, . . ., n from properties of the mathematical model F (θ | B) known from probability and mathematical statistics, to interfere on the validity of the modeling assumption of conditional independence.The omnibus tests take as characteristic the mean of conditional probabilities over all objects, i.e., pixels or voxels, in the training dataset of sample size n, Thus, for a proper ("true") model, the mean of ) equal to P (T = 1), estimated by the relative frequency of T = 1 in the training dataset.Deviations of the mean from P (T = 1) would indicate that the model may not be true.For a weights-of-evidence model, deviations could be caused by a lack of conditional independence, while for a logistic regression model, ) is always satisfied (up to numerical accuracy).Based on Equation ( 7), [19] developed the omnibus test, and [18] the new omnibus test.
A more sophisticated statistical test for real predictor variables was recently suggested by [20].

Weights-of-Evidence vs. Logistic Regression
The parameters of ordinary logistic regression are equal to the contrasts of the weights, if all predictor variables are indicators and conditionally independent given the target variable.Thus, weights-of-evidence is the special case of ordinary logistic regression if the predictors B are indicator variables and conditional independent given T. The other way round, logistic regression is the canonical generalization of weights-of-evidence [14,17].Note that the weights-of-evidence model cannot be enlarged to include interaction terms.Generally, i.e., without assuming conditional independence, the relationship of ordinary logistic regression parameters and the contrast of weights of evidence is clearly non-linear [21,22]; see Appendix 2 for an explicit derivation. When . ., n, are estimated by maximum likelihood applied to the ordinary logistic regression model, Equation (7) always holds, because it is part of the maximum likelihood systems of equations.Having recognized weights-of-evidence as a special case of logistic regression, when predictors are indicator variables and conditionally independent given the target variable, the above comparison may now be seen as checking the statistics of different models.Analogously, the estimated contrasts C of weights of evidence may be compared with the estimated logistic regression coefficients β .Then, any deviation between them is indicative of violations of the modeling assumption of conditional independence.

Weights-of-Evidence vs. the τ -or ν-Model
The modeling assumption with respect to the factors of Equation ( 6) of the τ -model [23][24][25] is: Then, modified weights are defined as: and: The modeling assumption with respect to the factors of Equation ( 6) of the ν-model [26,27] is: Then, modified weights are defined as: = ln ν (1) + W (1) , W = ln ν , = 1, . . ., m, and: A this point, we may conclude that there is no way to emulate the effect of interaction terms of logistic regression models by manipulating the weights of evidence or their contrasts.

Artificial Neural Nets
General regression models can be tackled by various approaches of statistical learning, including artificial neural nets [15].With respect to artificial neural nets and statistical learning [5,15,16,28], the logistic regression model, Equation (1), is called a single-layer perceptron or single-layer feed-forward artificial neural net; minimization of the sum of squared residuals is referred to as training; gradient methods to solve for the model parameters are referred to as the linear perceptron training rule; the stepsize along the negative gradient is called the learning rate, etc.The notion of random variables, conditional independence, estimation and estimation error and the significance of model parameters does not seem to exist in the realm of artificial neural nets, not even under new labels.Nevertheless, artificial neural nets may be seen as providing a generalization to enlarge the logistic regression model by way of nesting logistic regression models with or without interaction terms.A minor additional generalization is to replace the logistic function Λ by other sigmoidal functions and to model the conditional probability of a categorical variable T (of more than two categories) given predictor variables B.
The basic multi-layer neural network model can be described as a sequence of functional transformations [15,[29][30][31], often depicted as a graph: • input: predictor variables B; • first layer: linear combinations A (1) j , j = 1, . . ., J, of the predictor variables B, referred to as input units or activations: or: to mimic interaction terms: • hidden layer: each of them is subject to a transformation applying a nonlinear differentiable activation function h usually of sigmoidal shape, referred to as hidden units: , j = 1, . . ., J • second layer: linear combinations A (2) j , k = 1, . . ., K of hidden units, referred to as output unit activations: • output: each of the output unit activations is subject to an activation function S, e.g., logistic function: Then: If K = 1 and S = Λ, h = id, J = 0, then we are of course back to ordinary logistic regression, Equation (1).On the other hand, in Equation ( 10), the linear combination of predictors, as given by Equation ( 8), can easily be replaced by the enlarged combination, including interaction terms, as given by Equation ( 9), where the consideration of interactions terms requires a more versatile net topology.The lack of the notion of significant parameters and, in turn, significant models is prohibitive of the successive construction of proper models.Instead, all variables and a sufficiently versatile net topology are plugged in, and coefficients for all variables are determined numerically with some gradient method.This procedure does not seem to meet the idea of parsimonious models.

Balancing
Methods of statistical learning are prone to fail if the odds O(T = 1) are too small, cf.[32][33][34][35][36][37].Simple balancing mimics preferential sampling, i.e., a new balanced dataset is constructed by weighting all objects with T = 1 with a weight 1 < µ ∈ R.This kind of balancing immediately results in: and then in: is a proper ("true") model for the balanced sample, then: is a proper model of logitP (T = 1 | B) with respect to the initial sample, and: whatever the proper models are.For instance, if ordinary logistic regression without interaction terms, i.e., assuming conditional independence, is a proper model for the balanced sample, then: is a proper model for the initial sample.Thus, balancing by weighting objects, pixels or voxels, supporting T = 1 with µ > 1, does exactly what it is designed for: it increases the odds O(T = 1) by a factor of µ and preserves proper models, i.e., their parameters, otherwise.
It is emphasized that Equation ( 11) holds for mathematical models, if they are proper.Then, Equation ( 11) may be read as back transformation of balancing by weighting objects with T = 1 with weight µ.It may not hold for fitted models, i.e., estimation of the parameters of a poor model may corrupt the equation; then, F θ | B = F θ | B bal − ln µ.Note that the weights of evidence are not all affected by this kind of balancing.

Numerical Complexity of Logistic Regression
Potential modeling with logistic regression using a 3D training dataset of n voxels and (m + 1) predictor variables to fit the regression parameters requires to resolve a system of (m + 1) non-linear equations.Usually, the total number of predictor variables is much smaller than the total number of voxels.Statisticians' numerical method of choice is iteratively reweighted least squares.The numerical complexity of one iteration step is of the order of 2n(m + 1) 2 flops; the total number of iterations cannot generally be estimated.Considering the size of the problem for 3D geomodels with a reasonable spatial resolution clearly indicates that its numerical solution requires a highly efficient data management of 3D geomodels in voxel mode and very fast numerics based on massively parallel processing.

Examples
Both datasets are fabricated to serve certain purposes.The mathematical assumption associated with a training dataset is complete knowledge, i.e., in particular, we assume that we know all occurrences of the target variable T = 1.Otherwise, not even the odds or logits of T = 1 could be estimated properly.Thus, previously unknown occurrences or their probabilities cannot be predicted with respect to the training dataset.Comparing the estimated conditional probabilities with counted conditional frequencies provides a check of the appropriateness of the applied model.A more powerful check is to use only part of the data out of the training dataset to estimate the parameters of a model and, then, to validate the model with the remaining data that were not used before.All computations were done with the free statistical software, R [38].

Dataset RANKIT Revisited
A first presentation and discussion of the dataset RANKIT (Figure 1) has been given in [9].The RANKIT dataset comprises two predictor variables B 1 , B 2 and a target variable T referring to pixels of a digital map image.The predictor variables B 1 , B 2 are uncorrelated and not conditionally independent given the target variable T.  Here, the example is completed by considering a randomly rearranged dataset RANKITMIX (Figure 1), which originates from the dataset RANKIT by rearranging the pixel references (i, j) of triplets (b k1 , b k2 , t k ), k = 1, . . ., n, of realizations of B 1 , B 2 and T in the dataset at random.The uni-directional variograms of Figure 1 clearly indicate that the two datasets differ in their spatial statistics.
However, the datasets RANKIT and RANKITMIX have identical ordinary statistics like contingency tables, Tables 1 and 2, or a correlation matrix, Table 3, in common.
Table 2. Contingency tables of T and B 1 and B 2 , respectively, of datasets RANKIT and RANKITMIX, respectively.
The indicator predictor variables B 1 and B 2 seem to be uncorrelated, while B 1 and T, and B 2 and T, respectively, are significantly correlated for all significance levels α > 0.002213 and α > 0.02101, respectively; cf.Table 3.Therefore, for both the RANKIT and the RANKITMIX dataset, respectively, the models of weights-of-evidence, ordinary logistic regression without interaction term and enlarged logistic regression with the interaction term read explicitly: ologReg : logRegwI : Table 4. Comparison of conditional probabilities predicted with elementary counting, weights-of-evidence (WofE), ordinary logistic regression without interaction terms (oLogReg), logistic regression including interaction terms (LogRegwI), and neural nets using a genetic algorithm (ANNGA [38]) applied to the training dataset RANKIT.Since the mathematical modeling assumption of conditional independence is violated, only logistic regression with interaction terms yields a proper model and predicts the conditional probabilities almost exactly.The results of weights-of-evidence, logistic regression with or without interaction terms and artificial neural net applied to the fabricated datasets RANKIT and the RANKITMIX dataset, respectively, are summarized in Table 4. Figure 2 depicts the results of dataset RANKIT, and Figure 3 depicts the results of dataset RANKITMIX.
Obviously, the digital map images of Figures 2 and 3 are related to each other by the same rearrangement as the datasets RANKIT and RANKITMIX of the top row of Figure 1.This relationship can be depicted like a commutative diagram (Figure 4), for instance with respect to logistic regression, including interaction terms.To state it explicitly, each of the methods of targeting considered here commutes with any random rearrangement applied simultaneously to all digital map images involved in or resulting from targeting.Thus, targeting and potential modeling, resp., are not spatial methods; they do not employ spatially-induced dependencies, which have been shown to be different by looking at the semi-variograms of the datasets; Figure 1.
After balancing with m = 10, the models of weights-of-evidence and enlarged logistic regression with interaction terms read explicitly:

Dataset DFQR
The dataset DFQR is visualized as a digital map image in Figure 5.The contingencies are given in Tables 5 and 6.
Table 6.Contingency tables of T and B 1 and B 2 , respectively, of dataset DFQR.
The correlation matrix (Table 7) indicates that B 1 and B 2 are uncorrelated, and significantly correlated with T for all significance levels α > 0.001652.The test of conditional independence referring to log-linear models (Table 8) shows that the null-hypothesis of conditional independence of B 1 and B 2 given T cannot reasonably be rejected.The corresponding conditional relative frequencies factorize almost exactly, i.e., and: With O(T = 1) = 0.1904, logit P (T = 1) = −1.6582, the weights-of-evidence model reads explicitly: The ordinary logistic regression model without interaction terms reads explicitly: where: • β 0 is significant for all α > 1.12e − 08, and • β 1 , β 2 are significant for all α > 0.00651.
The two models are almost identical; small deviations of their parameters result from small violations of conditional independence.While the test with p = 0.999950 indicates that the null-hypothesis of conditional independence cannot reasonable be rejected, the conditional relative frequencies do not factorize perfectly, but only approximately.The conditional probabilities estimated with weights-of-evidence or ordinary logistic regression almost exactly recover the conditional probabilities estimated elementarily by counting conditional frequencies for the training dataset DFQR; cf.Table 9.

Conclusions
Targeting or potential modeling applies regression or regression-like models to estimate the conditional probability of a target variable given predictor variables.All models considered here: • assume independently identical distributed random predictor and target variables, respectively, i.e., all models are non-spatial and do not consider spatially-induced dependencies, as, for instance, geostatistics; therefore, rearranging the dataset at random results in random map images or geomodels, but does not change the fitted models.• are not pointwise; they involve random variables referring to locations given in terms of areal pixels of 2D digital map images or volumetric voxels of 3D geomodels; thus, their results depend on the spatial resolution of the map image or the geomodel, respectively.• require a training region to fit the model parameters; that is to say that the mathematical modeling assumption associated with the training region is that it provides "ground truth".
Then, the models can be put in a hierarchy, beginning with the naive Bayesian model of weights-of-evidence depending on the modeling assumption of conditional independence of all predictor variables given the target variable.It is the special case of the logistic regression model if the predictor variables (i) are indicator or discrete random variables and (ii) conditionally independent given the target variable.In this case, the contrasts of weights-of-evidence are identical to the logistic regression coefficients.Otherwise, there is no linear relationship between weights of evidence and logistic regression parameters.
The canonical generalization of the naive Bayesian model featuring weights of evidence to the case of lacking conditional independence is logistic regression, including interaction terms.If the interactions terms are chosen to correspond to violations of conditional independence, they are compensating exactly for these violations, if the predictor variables are discrete; for continuous predictor variables, they compensate exactly only if the joint probability is log-linear; otherwise, they may compensate approximately.Thus, in the case of discrete predictor variables, the logistic regression model is optimum.
Applying weights-of-evidence despite lacking conditional independence corrupts both the predicted conditional probabilities, as well as their rank-transforms.Their is no way to emulate the effect of interaction terms by "correcting" the weights of evidence subsequently, e.g., by powering or multiplying with some τ -or ν-coefficients.
To further enlarge the models, nesting logistic regression-like models is an option.Irrespective of the vocabulary, nesting giving rise to "hidden layers" is the hard core of artificial neural nets.If the configuration of the net topology is sufficiently versatile, artificial neural net models can compensate for the lack of conditional independence, much in the same way as logistic regression models, including interaction terms.When the odds of the target are too small, some "balancing" may be required.A simple balancing method was shown to leave the model parameters unchanged, if the model itself is proper.
The possibility to include interaction terms in logistic regression models or other models originating in statistical learning opens a promising route toward an effective means to abandon the severe modeling assumption of conditional independence and to cope with the lack of conditional independence in practice.
A promising future perspective is regression accounting for spatially-induced dependencies to get rid of (i) the training region partitioned in pixels or voxels and the dependence on the spatial resolution that they provide; and (ii) the modeling assumption of independently identical distributed random variables.with W (1) = ln S , W (0) = ln N , if Equation ( 5) holds.Then, the weights-of-evidence model in terms of contrasts, Equation (2), is derived from its initial representation in terms of weights as follows: Thus, for the slightly more general Equation (15) than Equation ( 13), the initial correspondence Equation (3) becomes a little bit more involved, i.e., with the notation of Equation ( 14 for all B j , j = 1, . . ., m, j = , fixed, cf.[8].Applying Bayes' formula and the assumption of the conditional independence of B given T, the left-hand side of Equation ( 17) simplifies to the contrast C of weights of W (1) and W (0) .In this way, Equation ( 17

Figure 1 .
Figure 1.Spatial distribution of two indicator predictor variables B 1 , B 2 and the indicator target variable T of the dataset RANKIT and two uni-directional semi-variograms (left); and the spatial distribution of two indicator predictor variables B 1 , B 2 and the indicator target variable T of the dataset RANKITMIX and two uni-directional semi-variograms (right), revealing different spatial distributions and different geostatistical characteristics than RANKIT.The red lines indicate the values of the classical sample variances.

Figure 2 .Figure 3 .
Figure 2. Spatial distribution of predicted conditional probabilities P (T = 1 | B 1 B 2 ) for the training dataset RANKIT according to: elementary estimation (top left); logistic regression with interaction term (top center); artificial neural net ANNGA of R (top right); weights-of-evidence (bottom left); logistic regression without interaction (bottom right).relative conditional frequencies for rankit

Figure 4 .
Figure 4. Commutation of targeting and simultaneous random rearrangement of all digital map images.

Figure 5 .
Figure 5. Spatial distribution of two indicator predictor variables B 1 , B 2 and the indicator target variable T of dataset DFQR.

Figure 6 .
Figure 6.Spatial distribution of predicted conditional probabilities P (T = 1 | B 1 B 2 ) for the training dataset DFQR according to: elementary estimation (top left); weights-of-evidence (top center); artificial neural net ANNGA of R (top right), ordinary logistic regression(bottom left), logistic regression with interaction term (bottom right).relative conditional frequencies for DFQR

Table 1 .
Unconditional contingency table of B1 and B2, and conditional contingency tables of B 1 and B 2 given T of datasets RANKIT and RANKITMIX, respectively.

Table 3 .
Correlation matrix of the datasets RANKIT and RANKITMIX, respectively.

Table 5 .
Unconditional contingency table of B 1 and B 2 , and conditional contingency tables of B 1 and B 2 given T of dataset DFQR.

Table 7 .
Correlation matrix of dataset of dataset DFQR.

Table 8 .
Significance test of the null-hypothesis of conditional independence, referring to a log-linear model for dataset DFQR.