# Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

**Multiple Correlation**. In the multivariate world, many of the observed phenomena require a nonlinear model, and hence, a good measure of correlation should be able to detect both linear and nonlinear correlations. The so-called Coefficient of Multiple Correlation ${R}^{2}$ is computed in multiple regression from the square matrix ${R}_{xx}$ formed by all the paired correlations between variables [3]. It measures how well a given variable can be predicted using a linear function of the set of the other variables. In effect, R measures the linear correlation between the observed and the predicted values of the target attribute or response Y.

**Interaction.**Consider a pure multivariate linear regression model of a continuous random variable Y explained by a set of continuous variables ${X}_{1},{X}_{2},\dots ,{X}_{n}$. From here on, we adopt statistical usage whereby capital letters refer to random variables and the corresponding small case letters refer to particular values or outcomes observed. Each outcome ${y}_{i}$ is modeled as a linear combination of the observed variable values [31],

**Contributions**. The main contribution of this paper is that it proposes a formalization of the concept of interaction for both continuous and categorical responses. Interaction is often found in Multiple Linear Regression [31] and Analysis of Variance models [34], and it is described as a departure from the linearity of effect in each variable. However, for an all-categorical-variables context, there is no definition of interaction. This work proposes a definition that is facilitated by the MSU measure and shows that it is suitable for both types of variables. The detection and quantification of interactions in any group of features of a categorical dataset is the second aim of the work.

## 2. Patterned Records and the Detection of Interactions

**Definition**

**1.**

**Definition**

**2.**

**Example**

**1.**

**Definition of MSU.**Let ${X}_{i}$ be a categorical (discrete) random variable with cardinality $c\left({X}_{i}\right)\in \mathbb{N}$, and possible values ${x}_{ij}$ with $j=\{1,\dots ,c\left({X}_{i}\right)\}$. Let $P\left({X}_{i}\right)$ be its probability mass function. The entropy H of the individual variable ${X}_{i}$ is a measure of the uncertainty in predicting the value of ${X}_{i}$ and is defined as:

- (a)
- The MSU values are in the unit range, $MSU\left({X}_{1:n}\right)\in [0,1]$;
- (b)
- Higher values in the measure correspond to higher correlation among variables, i.e., a value of 0 implies that all variables are independent while a value of 1 corresponds to a perfect correlation among variables; and
- (c)
- MSU detects linear and non-linear correlations between any mix of categorical and/or discretized numerical variables.

**Interaction among continuous variables.**Let us begin with a two-variable example. Consider the regression model

**Interaction among categorical variables.**Categorical or nominal features are also employed to build various types of multivariate models with a categorical response. Established modeling techniques include, for example, Categorical Principal Components Analysis, Multiple Correspondence Analysis, and Multiple Factor Analysis [36]. In this realm, we can measure the strength of association between two, three, or more categorical variables by means of both MSU and the study of patterns’ behavior; this will, in turn, allow us to detect interactions.

## 3. Simulations Using Patterns

#### 3.1. Three-Way XOR

#### 3.2. Four-Way XOR

#### 3.3. Four-Way AND

#### 3.4. Further Simulations

#### 3.5. Discussion and Interpretation of Results

**Definition**

**3.**

**gain in multiple correlation**obtained by adding B (or BC) to AC, forming ABC is defined as

**Definition**

**4.**

**interaction**among variables in $\mathcal{C}$ on top of j variables as

**Example**

**2.**

**Complexity of Interaction Calculation**. The following approach is module-based. In a dataset of r observation rows on n variables, let ${c}_{i}$ be the cardinality of the i-th variable. The two sets being considered are $\mathcal{C}$ with k variables and $\mathcal{A}$ with j variables, such that $\mathcal{A}\subset \mathcal{C}$.

- Entropy of each attribute—For each attribute ${X}_{i}$, there are ${c}_{i}$ frequencies $P\left({x}_{i}\right)$ and ${c}_{i}$ logarithms ${log}_{2}\left(P\left({x}_{i}\right)\right)$, which are multiplied according to Equation (4), giving $3{c}_{i}$ operations. This is conducted k times, giving $3{\sum}_{1}^{k}{c}_{i}$.
- Joint entropy of all k attributes—There are ${\prod}_{1}^{k}{c}_{i}$ combinations of values, and for each one of them, the frequencies as well as their logarithms are calculated and multiplied according to Equation (5), giving $3{\prod}_{1}^{k}{c}_{i}$ operations. This is conducted one time.
- $msucost$($\mathcal{C}$)—Using Equation (6), the costs of the numerator and the denominator are added, followed by one division and one difference. This gives $3{\sum}_{1}^{k}{c}_{i}+3{\prod}_{1}^{k}{c}_{i}+2$ operations.

**Theorem**

**1.**

**Proof.**

**Definition**

**5.**

**intrinsic interaction**due to pattern $\mathcal{P}$.

## 4. Comparison with Interaction on Continuous Variables

- Discretize bf, st and mc;
- Take as pattern the set of distinct observed records, discretized;
- Simulate sampling scenarios to find ${M}_{L}$;
- Check whether the ${M}_{L}$ value reveals interactions.

#### 4.1. Discretization

#### 4.2. Seeking Interaction in the Pattern

#### 4.3. Creating Ad Hoc Interaction

#### 4.4. Discretizing the Modified Data

^{o}exponent, meaning that they have been recategorized just because of modified cutoff values. All this can be verified by comparing Table 9 with Table 6.

#### 4.5. Interaction in the New Pattern

## 5. Discussion on ${\mathbf{M}}_{\mathbf{L}}$ and Linear Models

## 6. Conclusions and Future Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Pearson, K. Note on regression and inheritance in the case of two parents. Proc. R. Soc. Lond.
**1895**, 58, 240–242. [Google Scholar] - Spearman, C. The proof and measurement of association between two things. Am. J. Psychol.
**1904**, 15, 72–101. [Google Scholar] [CrossRef] - Crocker, D.C. Some Interpretations of the Multiple Correlation Coefficient. Am. Stat.
**1972**, 26, 31–33. [Google Scholar] [CrossRef] - Viole, F.; Nawrocki, D.N. Deriving Nonlinear Correlation Coefficients from Partial Moments. SSRN Electron. J.
**2012**. [Google Scholar] [CrossRef] - Colignatus, T. Correlation and Regression in Contingency Tables. A Measure of Association or Correlation in Nominal Data (Contingency Tables), Using Determinants. 2007. Available online: https://mpra.ub.uni-muenchen.de/3660/ (accessed on 30 September 2021).
- McGill, W. Multivariate information transmission. Trans. Ire Prof. Group Inf. Theory
**1954**, 4, 93–111. [Google Scholar] [CrossRef] - Watanabe, S. Information theoretical analysis of multivariate correlation. IBM J. Res. Dev.
**1960**, 4, 66–82. [Google Scholar] [CrossRef] - Han, T.S. Multiple mutual informations and multiple interactions in frequency data. Inf. Control
**1980**, 46, 26–45. [Google Scholar] [CrossRef] [Green Version] - Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. 2010. Available online: https://arxiv.org/abs/1004.2515 (accessed on 30 September 2021).
- Lizier, J.T.; Heinzle, J.; Horstmann, A.; Haynes, J.D.; Prokopenko, M. Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity. J. Comput. Neurosci.
**2011**, 30, 85–107. [Google Scholar] [CrossRef] - Timme, N.; Alford, W.; Flecker, B.; Beggs, J.M. Synergy, redundancy, and multivariate information measures: An experimentalist’s perspective. J. Comput. Neurosci.
**2014**, 36, 119–140. [Google Scholar] [CrossRef] - Sakhanenko, N.A.; Galas, D.J. Biological data analysis as an information theory problem: Multivariable dependence measures and the Shadows algorithm. J. Comput. Biol.
**2015**, 22, 1005–1024. [Google Scholar] [CrossRef] [Green Version] - Mohammadi, S.; Desai, V.; Karimipour, H. Multivariate mutual information-based feature selection for cyber intrusion detection. In Proceedings of the 2018 IEEE Electrical Power and Energy Conference (EPEC), Toronto, ON, Canada, 10–11 October 2018; pp. 1–6. [Google Scholar]
- Cerf, N.J.; Adami, C. Negative entropy and information in quantum mechanics. Phys. Rev. Lett.
**1997**, 79, 5194. [Google Scholar] [CrossRef] [Green Version] - Chanda, P.; Zhang, A.; Brazeau, D.; Sucheston, L.; Freudenheim, J.L.; Ambrosone, C.; Ramanathan, M. Information-theoretic metrics for visualizing gene-environment interactions. Am. J. Hum. Genet.
**2007**, 81, 939–963. [Google Scholar] [CrossRef] [Green Version] - Jakulin, A.; Bratko, I.; Smrke, D.; Demšar, J.; Zupan, B. Attribute interactions in medical data analysis. In Conference on Artificial Intelligence in Medicine in Europe; Springer: Berlin/Heidelberg, Germany, 2003; pp. 229–238. [Google Scholar]
- Brenner, N.; Strong, S.P.; Koberle, R.; Bialek, W.; Steveninck, R.R.d.R.v. Synergy in a neural code. Neural Comput.
**2000**, 12, 1531–1552. [Google Scholar] [CrossRef] - Sosa-Cabrera, G.; García-Torres, M.; Gómez-Guerrero, S.; Schaerer, C.; Divina, F. A Multivariate approach to the Symmetrical Uncertainty Measure: Application to Feature Selection Problem. Inf. Sci.
**2019**, 494, 1–20. [Google Scholar] [CrossRef] - Ince, R.A. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy
**2017**, 19, 318. [Google Scholar] [CrossRef] [Green Version] - Arias-Michel, R.; García-Torres, M.; Schaerer, C.; Divina, F. Feature Selection Using Approximate Multivariate Markov Blankets. In Proceedings of the Hybrid Artificial Intelligent Systems—11th International Conference, HAIS 2016, Seville, Spain, 18–20 April 2016; pp. 114–125. [Google Scholar] [CrossRef]
- Sosa-Cabrera, G.; Gómez-Guerrero, S.; Schaerer, C.; García-Torres, M.; Divina, F. Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection. In Proceedings of the 4th Conference of Computational Interdisciplinary Science, São José dos Campos, Brazil, 7–10 November 2016; pp. 54–59. [Google Scholar]
- Sosa-Cabrera, G.; García-Torres, M.; Gómez-Guerrero, S.; Schaerer, C.; Divina, F. Understanding a multivariate semi-metric in the search strategies for attributes subset selection. In Proceedings of the Brazilian Society of Computational and Applied Mathematics, Campinas, Brazil, 17–21 September 2018; Volume 6. [Google Scholar]
- Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell.
**1997**, 97, 273–324. [Google Scholar] [CrossRef] [Green Version] - Yu, L.; Liu, H. Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res.
**2004**, 5, 1205–1224. [Google Scholar] - Janzing, D.; Minorics, L.; Blöbaum, P. Feature relevance quantification in explainable AI: A causal problem. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, Online, 26–28 August 2020; pp. 2907–2916. [Google Scholar]
- Zeng, Z.; Zhang, H.; Zhang, R.; Yin, C. A novel feature selection method considering feature interaction. Pattern Recognit.
**2015**, 48, 2656–2666. [Google Scholar] [CrossRef] - Chen, Z.; Wu, C.; Zhang, Y.; Huang, Z.; Ran, B.; Zhong, M.; Lyu, N. Feature selection with redundancy-complementariness dispersion. Knowl.-Based Syst.
**2015**, 89, 203–217. [Google Scholar] [CrossRef] [Green Version] - Lopez-Arevalo, I.; Aldana-Bobadilla, E.; Molina-Villegas, A.; Galeana-Zapién, H.; Muñiz-Sanchez, V.; Gausin-Valle, S. A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning. Entropy
**2020**, 22, 1391. [Google Scholar] [CrossRef] - Dinh, D.T.; Huynh, V.N. k-PbC: An improved cluster center initialization for categorical data clustering. Appl. Intell.
**2020**, 50. [Google Scholar] [CrossRef] - Rivera Rios, E.J.; Medina-Pérez, M.A.; Lazo-Cortés, M.S.; Monroy, R. Learning-Based Dissimilarity for Clustering Categorical Data. Appl. Sci.
**2021**, 11, 3509. [Google Scholar] [CrossRef] - Hanck, C.; Arnold, M.; Gerber, A.; Schmelzer, M. Introduction to Econometrics with R; University of Duisburg: Essen, Germany, 2020. [Google Scholar]
- McCabe, C.J.; Kim, D.S.; King, K.M. Improving present practices in the visual display of interactions. Adv. Methods Pract. Psychol. Sci.
**2018**, 1, 147–165. [Google Scholar] [CrossRef] [PubMed] - Freitas, A.A. Understanding the crucial role of attribute interaction in data mining. Artif. Intell. Rev.
**2001**, 16, 177–199. [Google Scholar] [CrossRef] - Jaccard, J.J. Interaction Effects in Factorial Analysis of Variance; Quantitative Applications in the Social Sciences Series; SAGE Publications, Inc: Newbury Park, CA, USA, 1997. [Google Scholar]
- Vajapeyam, S. Understanding Shannon’s Entropy Metric for Information. 2014. Available online: https://arxiv.org/ftp/arxiv/papers/1405/1405.2061.pdf (accessed on 30 September 2021).
- Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson-Prentice Hall: Upper Saddle River, NJ, USA, 2007. [Google Scholar]
- Joseph, L. Interactions in Multiple Linear Regression. Department of Epidemiology and Biostatistics, McGill University. Available online: https://www.medicine.mcgill.ca/epidemiology/joseph/courses/EPIB-621/interaction.pdf (accessed on 30 September 2021).
- Stats.Blue. Multiple Linear Regression Calculator. Available online: https://stats.blue/Stats_Suite/multiple_linear_regression_calculator.html (accessed on 30 September 2021).

**Figure 1.**Dataset, pattern, and sample in a 3-variable example. The dataset (or population) may contain many records, of which only a sample is actually collected. Pattern is the name given to the set of distinct records in the sample.

**Figure 2.**Moving a few body fat data points to produce an interaction: On a graph of bf as a function of product $st.c\xb7mc.c$, six points were moved to induce interaction in the linear regression.

**Table 1.**MSU values of 3-way XOR: minimum of 0.5 and maximum of 0.75. Here $C=A\u2a01B$ where ⨁ represents the XOR operation.

3-Way Collective | 3-Way ABC | 1-Way A | 1-Way B | 1-Way C | ||||
---|---|---|---|---|---|---|---|---|

A | B | C | X | $P\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ |

0 | 0 | 0 | 000 | 0.25 | −0.5 | |||

0 | 1 | 1 | 011 | 0.25 | −0.5 | −0.5 | −0.5 | −0.5 |

1 | 0 | 1 | 101 | 0.25 | −0.5 | |||

1 | 1 | 0 | 110 | 0.25 | −0.5 | −0.5 | −0.5 | −0.5 |

$H\left(X\right)$ | 2 | 1 | 1 | 1 | ||||

$MSU$ | 0.5 | |||||||

3-Way Collective | 3-Way ABC | 1-Way A | 1-Way B | 1-Way C | ||||

A | B | C | X | $P\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ |

0 | 0 | 0 | 000 | 0.25 | −0.5 | |||

0 | 1 | 1 | 011 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | −0.5 | −0.31 | −5.30 × 10${}^{-78}$ |

1 | 0 | 1 | 101 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | |||

1 | 1 | 0 | 110 | 0.75 | −0.311 | −0.311 | −0.5 | 0. |

$H\left(X\right)$ | 0.811 | 0.811 | 0.811 | 5.30 × 10${}^{-78}$ | ||||

$MSU$ | 0.75 |

**Table 2.**MSU values of the 4-way XOR with a minimum of 1/3 and a maximum of 0.746. Here $D=A\u2a01B\u2a01C$.

4-Way Collective | 4-Way ABCD | 1-Way A | 1-Way B | 1-Way C | 1-Way D | |||||
---|---|---|---|---|---|---|---|---|---|---|

A | B | C | D | X | $P\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ |

0 | 0 | 0 | 0 | 0000 | 0.125 | −0.375 | ||||

0 | 0 | 1 | 1 | 0011 | 0.125 | −0.375 | ||||

0 | 1 | 0 | 1 | 0101 | 0.125 | −0.375 | ||||

0 | 1 | 1 | 0 | 0110 | 0.125 | −0.375 | −0.5 | −0.5 | −0.5 | −0.5 |

1 | 0 | 0 | 1 | 1001 | 0.125 | −0.375 | ||||

1 | 0 | 1 | 0 | 1010 | 0.125 | −0.375 | ||||

1 | 1 | 0 | 0 | 1100 | 0.125 | −0.375 | ||||

1 | 1 | 1 | 1 | 1111 | 0.125 | −0.375 | −0.5 | −0.5 | −0.5 | −0.5 |

$H\left(X\right)$ | 3 | 1 | 1 | 1 | 1 | |||||

$MSU$ | 0.333 | |||||||||

A | B | C | D | X | $P\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ |

0 | 0 | 0 | 0 | 0000 | 1.000 | 0.000 | ||||

0 | 0 | 1 | 1 | 0011 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | ||||

0 | 1 | 0 | 1 | 0101 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | ||||

0 | 1 | 1 | 0 | 0110 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | 0.0 | 0.0 | 0.0 | 0.0 |

1 | 0 | 0 | 1 | 1001 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | ||||

1 | 0 | 1 | 0 | 1010 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | ||||

1 | 1 | 0 | 0 | 1100 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | ||||

1 | 1 | 1 | 1 | 1111 | 1.00 × 10${}^{-80}$ | −2.66 × 10${}^{-78}$ | −1.06 × 10${}^{-77}$ | −1.06 × 10${}^{-77}$ | −1.06 × 10${}^{-77}$ | −1.06 × 10${}^{-77}$ |

$H\left(X\right)$ | −1.86 × 10${}^{-77}$ | −1.06 × 10${}^{-77}$ | −1.06 × 10${}^{-77}$ | −1.06 × 10${}^{-77}$ | −1.06 × 10${}^{-77}$ | |||||

$MSU$ | 0.746 |

**Table 3.**MSU values of the 4-way AND show a minimum of 0.2045 and a maximum of 1. Here, $D=A\wedge B\wedge C$.

4-Way Collective | 4-Way ABCD | 1-Way A | 1-Way B | 1-Way C | 1-Way D | |||||
---|---|---|---|---|---|---|---|---|---|---|

A | B | C | D | X | $P\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ | $P\left(X\right)$ $logP\left(X\right)$ |

0 | 0 | 0 | 0 | 0000 | 0.125 | −0.375 | ||||

0 | 0 | 1 | 1 | 0011 | 0.125 | −0.375 | ||||

0 | 1 | 0 | 1 | 0101 | 0.125 | −0.375 | ||||

0 | 1 | 1 | 0 | 0110 | 0.125 | −0.375 | −0.5 | −0.5 | −0.5 | −0.169 |

1 | 0 | 0 | 1 | 1001 | 0.125 | −0.375 | ||||

1 | 0 | 1 | 0 | 1010 | 0.125 | −0.375 | ||||

1 | 1 | 0 | 0 | 1100 | 0.125 | −0.375 | ||||

1 | 1 | 1 | 1 | 1111 | 0.125 | −0.375 | −0.5 | −0.5 | −0.5 | −0.375 |

$H\left(X\right)$ | 3 | 1 | 1 | 1 | 0.544 | |||||

$MSU$ | 0.205 |

Name | n | c | k | Probab Distribution | Partial MSU Values | Global MSU |
---|---|---|---|---|---|---|

XOR | 3 | 2 | 4 | Equal likelohoods | MSU(AC) = 0 | MSU(ABC) = 0.5 |

MSU(BC) = 0 | ||||||

3 | 2 | 4 | 0.25; 1.00 ×${10}^{-80}$; 1.00 ×${10}^{-80}$; 0.75 | MSU(AC) = 0 | MSU(ABC) = 0.75 | |

MSU(BC) = 0 | ||||||

XOR | 4 | 2 | 8 | Equal likelihoods | MSU(AD) = 0 | MSU(ABCD) = 0.333 |

MSU(BD) = 0 | ||||||

MSU(CD) = 0 | ||||||

4 | 2 | 8 | 1; 1.00 ×${10}^{-80}$; 1.00 ×${10}^{-80}$; … | MSU(AD) = 0.371 | MSU(ABCD) = 0.746 | |

MSU(BD) = 0.371 | ||||||

MSU(CD) = 0.371 | ||||||

AND | 3 | 2 | 4 | Equal likelihoods | MSU(AC) = 0.258 | MSU(ABC) = 0.433 |

MSU(CD) = 0.258 | ||||||

3 | 2 | 4 | 0.25; 1.00 ×${10}^{-21}$; 1.00 ×${10}^{-21}$; 0.75 | MSU(AC) = 0.75 | MSU(ABC) = 1 | |

MSU(CD) = 0.75 | ||||||

AND | 4 | 2 | 8 | Equal likelihoods | MSU(AD) = 0.179 | MSU(ABCD) = 0.205 |

MSU(BD) = 0.179 | ||||||

MSU(CD) = 0.179 | ||||||

4 | 2 | 8 | 0.2; 1.00 ×${10}^{-80}$; …; 1.00 ×${10}^{-80}$; 0.8 | MSU(AD) = 1 | MSU(ABCD) = 1 | |

MSU(BD) = 1 | ||||||

MSU(CD) = 1 | ||||||

OR | 3 | 2 | 4 | 1.00 ×${10}^{-21}$; 0.1; 1.00 ×${10}^{-21}$; 0.9 | MSU(AC) = 0 | MSU(ABC) = 0 |

MSU(BC) = 0.654 | ||||||

3 | 2 | 4 | Equal likelihoods | MSU(AC) = 0.344 | MSU(ABC) = 0.433 | |

MSU(BC) = 0.344 | ||||||

3 | 2 | 4 | 0.4; 1.00 ×${10}^{-21}$; 1.00 ×${10}^{-21}$; 0.6 | MSU(AC) = 1 | MSU(ABC) = 1 | |

MSU(BC) = 1 | ||||||

OR | 4 | 2 | 8 | 1.00 ×${10}^{-80}$; 0.001; 0.001; | MSU(AD) = 0 | MSU(ABCD) = 0.005 |

0.009; 0.01; 0.125; | MSU(BD) = 0 | |||||

0.125; 0.729 | MSU(CD) = 0 | |||||

4 | 2 | 8 | Equal likelihoods | MSU(AD) = 0.179 | MSU(ABCD) = 0.205 | |

MSU(BD) = 0.179 | ||||||

MSU(CD) = 0.179 | ||||||

4 | 2 | 8 | 0.2; 1.00 ×${10}^{-80}$; …; 1.00 ×${10}^{-80}$; 0.8 | MSU(AD) = 1 | MSU(ABCD) = 1 | |

MSU(BD) = 1 | ||||||

MSU(CD) = 1 | ||||||

$A\wedge notB$ | 3 | 2 | 4 | 1.00 ×${10}^{-21}$; 0.25; 1.00 ×${10}^{-21}$; 0.75 | MSU(AC) = 0 | MSU(ABC) = 0 |

MSU(BC) = 0.654 | ||||||

3 | 2 | 4 | 1.00 ×${10}^{-21}$; 1.00 ×${10}^{-21}$; 0.1; 0.9 | MSU(AC) = 0 | MSU(ABC) = 0.75 | |

MSU(BC) = 1 |

# | st.c | mc.c | bf |
---|---|---|---|

1 | −5.805 | 1.48 | 11.9 |

2 | −0.605 | 0.58 | 22.8 |

3 | 5.395 | 9.38 | 18.7 |

4 | 4.495 | 3.48 | 20.1 |

5 | −6.205 | 3.28 | 12.9 |

6 | 0.295 | −3.92 | 21.7 |

7 | 6.095 | −0.02 | 27.1 |

8 | 2.595 | 2.98 | 25.4 |

9 | −3.205 | −4.42 | 21.3 |

10 | 0.195 | −2.82 | 19.3 |

11 | 5.795 | 2.38 | 25.4 |

12 | 5.095 | 0.68 | 27.2 |

13 | −6.605 | −4.62 | 11.7 |

14 | −5.605 | 0.98 | 17.8 |

15 | −10.705 | −6.32 | 12.8 |

16 | 4.195 | 2.48 | 23.9 |

17 | 2.395 | −1.92 | 22.6 |

18 | 4.895 | −3.02 | 25.4 |

19 | −2.605 | −0.52 | 14.8 |

20 | −0.105 | −0.12 | 21.1 |

# | dst | dmc | dbf |
---|---|---|---|

1 | low | high | low |

2 | med | med | high |

3 | high | high | low |

4 | high | high | med |

5 | low | high | low |

6 | med | low | med |

7 | high | med | high |

8 | med | high | high |

9 | low | low | med |

10 | med | low | med |

11 | high | high | high |

12 | high | med | high |

13 | low | low | low |

14 | low | med | low |

15 | low | low | low |

16 | high | high | high |

17 | med | low | med |

18 | high | low | high |

19 | low | med | low |

20 | med | med | med |

Pattern 1 | $\mathit{P}\left(\mathit{X}\right)$ | $\mathit{P}\left(\mathit{X}\right)log\left(\mathit{P}\right(\mathit{X}\left)\right)$ | 1-Way $\mathit{dst}$ | 1-Way $\mathit{dmc}$ | 1-Way $\mathit{dbf}$ | ||
---|---|---|---|---|---|---|---|

low | low | low | 0.027 | −0.141 | −0.302 | −0.360 | −0.390 |

low | low | med | 0.027 | −0.141 | |||

low | med | low | 0.008 | −0.054 | |||

low | high | low | 0.023 | −0.126 | |||

med | low | med | 0.015 | −0.093 | −0.228 | −0.194 | −0.530 |

med | med | med | 0.008 | −0.054 | |||

med | high | high | 0.023 | −0.126 | |||

high | low | high | 0.046 | −0.205 | −0.186 | −0.209 | −0.507 |

high | med | high | 0.019 | −0.110 | |||

high | high | low | 0.077 | −0.285 | |||

high | high | med | 0.332 | −0.528 | |||

high | high | high | 0.386 | −0.530 | |||

Entropy: | 2.448 | 0.716 | 0.763 | 1.428 | |||

MSU: | 0.237 |

# | st.c | mc.c | bf.mod |
---|---|---|---|

1 | −5.805 | 1.48 | 11.9 |

2 | −0.605 | 0.58 | 22.8 |

3 | 5.395 | 9.38 | 31 |

4 | 4.495 | 3.48 | 20.1 |

5 | −6.205 | 3.28 | 12.9 |

6 | 0.295 | −3.92 | 21.7 |

7 | 6.095 | −0.02 | 24 |

8 | 2.595 | 2.98 | 25.4 |

9 | −3.205 | −4.42 | 21.3 |

10 | 0.195 | −2.82 | 19.3 |

11 | 5.795 | 2.38 | 25.4 |

12 | 5.095 | 0.68 | 22 |

13 | −6.605 | −4.62 | 28 |

14 | −5.605 | 0.98 | 17.8 |

15 | −10.705 | −6.32 | 32 |

16 | 4.195 | 2.48 | 23.9 |

17 | 2.395 | −1.92 | 22.6 |

18 | 4.895 | −3.02 | 17 |

19 | −2.605 | −0.52 | 14.8 |

20 | −0.105 | −0.12 | 21.1 |

**Table 9.**Modified Body Fat Data discretized. Superscript symbol o denotes recategorized data because of modified cutoff values. Superscript symbol * denotes underlying numerical value modified to produce interaction.

# | dst | dmc | dbf |
---|---|---|---|

1 | low | high | low |

2 | med | med | med ^{o} |

3 | high | high | high * |

4 | high | high | low ^{o} |

5 | low | high | low |

6 | med | low | med |

7 | high | med | high * |

8 | med | high | high |

9 | low | low | med |

10 | med | low | low ^{o} |

11 | high | high | high |

12 | high | med | med * |

13 | low | low | high * |

14 | low | med | low |

15 | low | low | high * |

16 | high | high | high |

17 | med | low | med |

18 | high | low | low * |

19 | low | med | low |

20 | med | med | med |

Pattern 2 | $\mathit{P}\left(\mathit{X}\right)$ | $\mathit{P}\left(\mathit{X}\right)log\left(\mathit{P}\right(\mathit{X}\left)\right)$ | 1-Way dst | 1-Way dmc | 1-Way dbf | ||
---|---|---|---|---|---|---|---|

low | low | med | 0.04 | −0.185 | −0.523 | −0.521 | −0.468 |

low | low | high | 0.06 | −0.244 | |||

low | med | low | 0.08 | −0.292 | |||

low | high | low | 0.13 | −0.383 | |||

med | low | low | 0.06 | −0.244 | −0.435 | −0.494 | −0.423 |

med | low | med | 0.03 | −0.152 | |||

med | med | med | 0.03 | −0.152 | |||

med | high | high | 0.05 | −0.216 | |||

high | low | low | 0.11 | −0.350 | −0.491 | −0.515 | −0.514 |

high | med | med | 0.06 | −0.244 | |||

high | med | high | 0.07 | −0.269 | |||

high | high | low | 0.18 | −0.445 | |||

high | high | high | 0.1 | −0.332 | |||

Entropy: | 3.506 | 1.449 | 1.530 | 1.406 | |||

MSU: | 0.301 |

Name | n | c | k | Record Frequencies | Partial MSU Values | Global MSU | Interaction |
---|---|---|---|---|---|---|---|

Pattern1 | 3 | 3 | 13 | 7, 7, 2, 6, 4, 2 | MSU(dst, dbf) = 0.142 | MSU(dst, dmc, dbf) = 0.237 | 0.095 |

2, 6, 12, 5, 20, 86, 100 | MSU(dmc, dbf) = 0.012 | ||||||

3 | 3 | 13 | 2, 1, 2, 2, 3, 1, 1, 1, 1, 2, 1, 1, 2 | MSU(dst, dbf) = 0.441 | MSU(dst, dmc, dbf) = 0.367 | $-0.074$ | |

(original observations) | MSU(dmc, dbf) = 0.097 | ||||||

3 | 3 | 13 | Equal frequencies | MSU(dst, dbf) = 0.312 | MSU(dst, dmc, dbf) = 0.326 | 0.014 | |

MSU(dmc, dbf) = 0.043 | |||||||

Pattern2 | 3 | 3 | 13 | 4, 6, 8, 13, 6, 3 | MSU(dst, dbf) = 0.037 | MSU(dst, dmc, dbf) = 0.301 | 0.176 |

3, 5, 11, 6, 7, 18, 10 | MSU(dmc, dbf) = 0.124 | ||||||

3 | 3 | 13 | 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 3 | MSU(dst, dbf) = 0.152 | MSU(dst, dmc, dbf) = 0.367 | 0.206 | |

(original observations) | MSU(dmc, dbf) = 0.161 | ||||||

3 | 3 | 13 | Equal frequencies | MSU(dst, dbf) = 0.043 | MSU(dst, dmc, dbf) = 0.326 | 0.186 | |

MSU(dmc, dbf) = 0.141 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Gómez-Guerrero, S.; Ortiz, I.; Sosa-Cabrera, G.; García-Torres, M.; Schaerer, C.E.
Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty. *Entropy* **2022**, *24*, 64.
https://doi.org/10.3390/e24010064

**AMA Style**

Gómez-Guerrero S, Ortiz I, Sosa-Cabrera G, García-Torres M, Schaerer CE.
Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty. *Entropy*. 2022; 24(1):64.
https://doi.org/10.3390/e24010064

**Chicago/Turabian Style**

Gómez-Guerrero, Santiago, Inocencio Ortiz, Gustavo Sosa-Cabrera, Miguel García-Torres, and Christian E. Schaerer.
2022. "Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty" *Entropy* 24, no. 1: 64.
https://doi.org/10.3390/e24010064