# Analysis of Categorical Data with the R Package confreq

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. A Person-Centered Perspective on Data

^{®}, etc. Next to the wide form of data, there is also the so-called long form of data. In long form data, every row in the data matrix represents the single observation of an interaction belonging to a particular variable and case. In this form, the data matrix essentially comprises three columns whereby the fist column holds a case identifier (person-ID), the second column holds a variable identifier (the variable names) and finally the last, third column holds the respective measure, resulting from the observed interaction between the person (first column) and the measurement instrument or unit (item), as identified in the second column (see middle column in Table 2). The long form of data is what is typically known and (at least internally) used in software computing log linear models.

## 3. Introduction to the confreq Framework in R

`install.packages("confreq",dependencies = TRUE)`

`lazar`’, ‘

`Lienert1978`’, ‘

`LienertLSD`’, and ‘

`newborns`’) are in tabulated form (cf. Table 2) and therefore carry the confreq specific R class label

`c(“data.frame” “Pfreq”)`, but the fifth one (‘

`suicide`’) is in the (classic) wide form and thus is of

`class`

`“data.frame”`. Under the keyword mainfunction only two functions are listed which represent the core feature of confreq. These two functions are what most users will probably apply most often within confreq. The function

`CFA()`calculates different variants of the CFA which can all be derived from the basic principle of residual analysis when searching for types and antitypes. The function

`S2CFA()`deals with a variant of CFA with which significantly discriminatory configurations can be found between two (sub) samples. The feature section methods includes two generic S3

`plot()`and

`summary()`methods, respectively, for both main functions in confreq. Another area with miscellaneous items lists different functions under the keyword misc. The functions listed here are typically called internally by the two main functions, but are nevertheless exported in confreq and are thus optionally available to the user to directly process specific analysis questions. Finally, there is a section named utilities under which various functions for data preparation and reorganization are subsumed. The functions

`dat2fre()`converts categorical data from the classic wide format to tabulated data, which can then be used with the functions

`CFA()`and

`S2CFA()`; the function

`fre2dat()`does the opposite. The function

`dat2cov()`also converts data from the classic wide format into tabulated data, but here also continuous variables can be considered which are aggregated (e.g., mean aggregation) for each configuration from the categorical variables in the data set. Finally, an important function is

`fre2tab()`which converts the tabulated data format to the typical R ’table’ format (R class

`“table”`). With this functionality, the connection of confreq to other packages in R or to the basic functionality for categorical data in R is given.

## 4. Working with confreq

#### 4.1. A First Look on a Classical Data Example

- Clouding of consciousness [Bewußtseinstrübungen] (C)
- Thought disturbance [Denkstörung] (T)
- Affective disturbance [Affektivitätsbeeinflussung] (A)

`R_snippet_001`’ loads the package and makes them available in R.

Listing 1. R_snippet_001.R. |

`R_snippet_002.R`’, will result in two different forms of visualizing the cross tabulated data as shown in the two graphical panels in Figure 1.

Listing 2. R_snippet_002.R. |

`R_snippet_003.R`’.

Listing 3. R_snippet_003.R. |

#### 4.2. The CFA Main Effect Model of Independency

`R_snippet_004`’ is used.

Listing 4. R_snippet_004.R. |

`form`’ is explicitly defined using the R like representation of the equation given in (1). However, if the argument ‘

`form`’ is not further specified, the CFA main effect model is automatically assumed in confreq.

`R_snippet_004`’ will return the summarized results, which are basically divided into three sections. The first section recapitulates the function call, the second section contains the results of the global model testing and the third part refers to the local tests for identifying types and antitypes (see console output below).

`adjalpha`’ in the summary function (see examples in ‘

`R_snippet_005`’ below).

Listing 5. R_snippet_005.R. |

`“pChi”`), the ${\chi}^{2}$-approximation to the z-test (

`“z.pChi”`), the binomial approximation to the z-test (

`“z.pBin”`), the binomial test using Stirling’s approximation (

`“p.stir”`), see, e.g., in [32] p. 52 for the p values and Fisher’s exact binomial test (

`“ex.bin.test”`) [33]. Further information on the different test statistics is given in Stemmler [9]. Which test statistic is used is specified (post hoc) by selecting the appropriate character expression (one of

`c(“pChi”, “z.pChi”, “z.pBin”, “p.stir”, “ex.bin.test”,)`) in the argument

`type`in the

`summary`function (see examples in ‘

`R_snippet_005`’). When using the default settings in the function

`CFA()`, all test statistics are calculated in advance, so that their selection in the

`summary`function can be freely chosen later on when applying

`summary`to the respective result object. Note, however, that there is one exception to this principal functionality in confreq, which arises from the necessary way of implementing Fisher’s exact test. As shown in [9], for example, the test requires the (multiple) calculation of fractions with factorials of large numbers, especially for larger sample sizes and thus cell sizes, and for contingency tables of higher dimensionality. Using principles of multiple precision arithmetic as provided in the package ‘

`gmp`’ [34], the test has been implemented in confreq in such a way that for any computer system there are no principle numerical limitations with respect to the size of the contingency tables to be analyzed. However, the problem of increasing computation times with increasing size of the analysis task still remains. For this reason there is the option in the function

`CFA()`to suppress the (a priori) calculation of the exact test by setting the argument ’

`bintest = FALSE`’ – in contrast to the default setting which is ‘

`bintest = TRUE`’. In case of disabling the test when calling

`CFA()`and still requesting it with the method function

`summary()`confreq will return an error message suggesting to run

`CFA()`again while setting ‘

`bintest = TRUE`’ (try the subsequent ‘

`R_snippet_006`’).

Listing 6. R_snippet_006.R. |

`plot()`’ provided in confreq to the result object from the application of the

`CFA()`function to the tabulated data (cf. ‘

`R_snippet_007`’).

Listing 7. R_snippet_007.R. |

`summary()`’ method also for the ‘

`plot()`’ method, one can specify which significance test should be used for the display of the types and antitypes. In addition, the ‘

`fill`’ argument can be used to specify the colors with which the types, antitypes and non-significant cells are to be colored (see, e.g., code line 5 in ‘

`R_snippet_007`’). As the plotting functionality in confreq is based on the grid graphics package [13], as it is also used in the package vcd [14], single cells in the graphical display can be controlled and colored individually at a later time (cf. last code lines 14 and 16 in ‘

`R_snippet_007`’).

#### 4.3. Modifying the CFA-Model Design Matrices

`CFA()`) and second modified or extended, and then, third, used for a recalculation of the expected frequencies based on the new model. This offers the maximum flexibility for the realization of the most different CFA models. Let us first look at the design matrix from the previous CFA main effect model. The result object from the ’

`CFA()`’ function is ultimately a list with different entries and one of them relates to the design matrix. Therefore, based on the Lienert LSD data example, this can be displayed by simply entering the command ‘

`res1$designmatrix`’ (see second line in ‘

`R_snippet_008`’). Below there is a shortened display of the output of the design matrix for the CFA main effect model with the Lienert-LSD-data.

`R_snippet_008`’.

Listing 8. R_snippet_008.R. |

`R_snippet_008`’) assumes the null hypothesis that the cells are equally distributed. In concrete terms, the underlying assumption is that the frequencies are the same for each cell (configuration) of the multidimensional contingency table. This model is referred to as configural cluster analysis (CCA) or named as the zero-order CFA model because it does not contain any main effects cf. [9].

`C:T`and

`C:A`and thus represents a link to the first analysis by Lienert (see Table 4), according to which the two groups

`C = +`and

`C = −`were analyzed separately. The finding from this model that the configuration ‘

`C = +, T = −, A = −`’ is shown as a significant type suggests that this configuration is apparently (at least partly) responsible for the nonlinear relationship between the variables in reference to the total sample. Moreover, if the model does not fit, it is a test of significance for the 3-way interaction.

`R_snippet_008`’ represents the so-called saturated model. The saturated model takes into account all interaction terms of each order (here all double and one triple interaction) between the variables involved. This model reproduces the observed frequencies perfectly and thus represents a baseline for the comparison of different CFA models. Furthermore, the saturated model (in comparison with others) can emphasize the importance of the interaction terms.

`R_snippet_008`’, the different CFA models are specified by entering a model formula in the argument ‘

`form`’. In confreq, however, the argument form in the function

`CFA()`can also be directly assigned a design matrix, which was previously modified according to the own model ideas. In the code lines 17 to 19 in ‘

`R_snippet_008`’ the model specification using a modified design matrix is demonstrated on the example of the CFA zero order model. The comparison with the specification of the same model via the model formula (cf. code lines 22, 23) shows that there are no differences here.

`C = −, T = −, A = −`’) has a frequency of zero for the data collected. We now assume (hypothetically, for demonstration purposes) that this combination of (non) observed symptoms is an impossible combination of attributes—which, by the way, might not seem so implausible from a clinical perspective, as this configuration would imply the complete ineffectiveness of LSD.

`blank`’ in the

`CFA()`function (see examples in ‘

`R_snippet_009`’).

Listing 9. R_snippet_009.R. |

`res1`’ in code line 7 in ‘

`R_snippet_005`’) and the other one with the excluded configuration number 8 (cf. summary of result object ‘

`res6`’ in code line 3 in ‘

`R_snippet_009`’) clearly shows the biasing influence of the structural extreme cell frequencies. It becomes clear that the local testing of the most frequent pattern number 1 in the Lienert data (‘

`C = +, T = +, A = +`’; ${f}_{obs.}=20$) in the first model (‘

`res1`’) with ${f}_{exp.}=12.506$ surprisingly does not lead to a significant type, whereas in the second model (‘

`res6`’) this pattern is (correctly) recognized as a significant type with ${f}_{exp.}=9.828$. This finding underlines the importance of the comparative application of different CFA models to the data.

`res6$designmatrix`’, shows how this is implemented in the context of the log-linear modeling of the expected frequencies (cf. R-output below).

`C = −, T = −, A = −`’), is consistently coded with ‘0’. Note that if several configurations are to be excluded from the analyses as ‘extreme cells’, a column must be added to the model matrix for each configuration to code this ‘effect’, respectively.

`res3`,

`res4`, and

`res5`(see ‘

`R_snippet_008`’) as well as in the R-object

`res6`(see ‘

`R_snippet_009`’) in conjunction with the respective degrees of freedom ($df$) of the global model test shows that the degrees of freedom of any CFA model is determined by the number of rows and columns of the respective design matrix. The number of rows of the design matrix minus 1 represents the information s given by the data and the number of columns (without intercept) represents the number of parameters t ‘consumed’ by the respective (explanatory) model. The degrees of freedom for any CFA model are generally defined as the difference between the given information s and the number of model parameters t as given in Equation (2), cf. also in [29]:

#### 4.4. Introducing Covariates into the CFA-Model

`R_snippet_010`’.

Listing 10. R_snippet_0010.R. |

`R_snippet_010`’ create a

`data.frame`assigned to R object ‘

`d`’ comprising four categorical variables (as ‘R factors’) with their respective frequencies and code line 10 assigns a special ‘

`class`’ to ‘

`d`’ to let confreq “know” that these are tabulated data. Code lines 13 to 20 create a matrix object (‘

`dcov`’) comprising the means of the covariates for the 16 configurations given in ‘

`d`’, respectively. Stemmler [9] points out that, as in this example, “Usually, the cell means of the continuous covariate are used …” [9] p. 105 but also other summary statistics of the covariates can be used for the respective configuration (cell of the contingency table).

`R_snippet_011.R`’.

Listing 11. R_snippet_0011.R. |

`cova`’, where we can assign a matrix object, holding the means of the covariates for the configurations, respectively. The code line 7 in ‘

`R_snippet_011.R`’ will run a model that contains three covariates (‘$DIF$’, ‘$SCO$’, and ‘$CON$’) in addition to the CFA main effects model, which itself contains a functional extension as a quasi-independence model. The code line 11 in ‘

`R_snippet_011.R`’ performs the CFA model with only one covariate: right-handedness (‘$RHD$’).

`res8$designmatrix`’ into the R console (see output below).

#### 4.5. Comparing Pattern Frequencies for Two Samples with CFA

`W`’) that can be assigned to either the natural (category ‘

`N`’) or the social sciences (category ‘

`S`’). To investigate the coherence of epistemological beliefs, three aspects were recorded: The ontological aspect ‘

`O`’, which refers to the question whether there is a reality independent of our representations, our thinking, our language, and our perceptions, at all. The epistemological aspect ‘

`E`’, which refers to the, in the narrower sense of philosophical terminology, epistemological question if the truth of scientific knowledge can be established in principle. As well as the science-critical aspect ‘

`K`’, which refers to a more or less optimistic or pessimistic view of the present state of knowledge in the sciences. The three variables were re-coded in a dichotomous manner, with ‘

`+`’ representing agreement and ‘

`−`’ representing disagreement. Based on theoretical considerations, Schmid and Lutz [44] initially state that some combinations of the three aspects represent non-coherent belief systems. The data can be reconstructed using the information from the original publication given in Schmid and Lutz [44] p. 36 by running the code lines 3 to 11 within the ‘

`R_snippet_012`’ below.

Listing 12. R_snippet_0012.R. |

`R_snippet_012`’ can be executed for the calculations and code line 14 can be used to output the results to the R console (see display of the results below).

`R_snippet_012`’.

## 5. Summary and Conclusions

^{®}software SPSS

^{®}, e.g., [47] using the command ‘LOGLINEAR’ cf. [9] p. 35, for an example. However, these existing alternatives each have drawbacks. For example, the program by von Eye [45] is not cross-platform compatible and is only available for Windows. The package CFA by Mair and Funke [46], although implementing some interesting automated procedures in the area of functional CFA, has a serious computational inaccuracy [9] p. 17, and moreover has not been updated since 2017. The SPSS

^{®}function ‘LOGLINEAR’ provides some coefficients of the CFA, but does not provide a comprehensive implementation of the CFA as in the R package confreq.

`CFA()`and

`S2CFA()`with several arguments, which implement different variants of the CFA in R. To identify the types and antitypes currently five different tests for significance are available and two methods for controlling the alpha level inflation. Future releases of confreq possibly might include more procedures or rather test statistics to test for significance of the configurations. Furthermore, the confreq framework now allows for the implementation of further alpha adjustment procedures. Additional wrapper functions that access the core functions are also conceivable for future versions in order to implement procedural variants of the CFA in an automatic manner, such as the successive exclusion of significant patterns until the optimal fit of the LLM is achieved.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

CFA | configural frequencies analysis [Kofigurationsfrequenzanalyse—KFA] |

LLM | log-linear modeling |

LSD | lysergic acid diethylamide—a psychogenic drug used in the 1970s for psychopathological experiments because it was believed that LSD could (temporarily) mimic pathological phenomena such as psychosis. |

## References

- Stevens, S.S. On the theory of scales of measurement. Science
**1946**, 103, 677–680. [Google Scholar] [CrossRef] - Lord, F.M. On the Statistical Treatment of Football Numbers. Am. Psychol.
**1953**, 8, 750–751. [Google Scholar] [CrossRef] - Zand Scholten, A.; Borsboom, D. A reanalysis of Lord’s statistical treatment of football numbers. J. Math. Psychol.
**2009**, 53, 69–75. [Google Scholar] [CrossRef] - Velleman, P.F.; Wilkinson, L. Nominal, Ordinal, Interval, and Ratio Typologies Are Misleading. Am. Stat.
**1993**, 47, 65. [Google Scholar] [CrossRef] - Niederee, R.; Mausfeld, R. Skalenniveau, Invarianz und Bedeutsamkeit [Scale level, invariance, and meaningfulness]. In Handbuch Quantitative Methoden [Handbook Quantitative Methods]; Erdfelder, E., Mausfeld, R., Meiser, T., Rudinger, R., Eds.; Psychologie Verlags Union: Weinheim, Germany, 1996; pp. 385–398. [Google Scholar]
- Niederée, R. There Is More to Measurement than Just Measurement: Measurement Theory, Symmetry, and Substantive Theorizing. Review of Foundations of Measurement. Vol. 3: Representation, Axiomatization, and Invariance, by R. Duncan Luce, David H. Krantz, Patrick Suppes, and Amos Tversky. J. Math. Psychol.
**1994**, 38, 527–594. [Google Scholar] [CrossRef] - Kyngdon, A. Psychological Measurement Needs Units, Ratios, and Real Quantities: A Commentary on Humphry. Meas. Interdiscip. Res. Perspect.
**2011**, 9, 55–58. [Google Scholar] [CrossRef] - Green, P.E.; Carroll, J.D. Mathematical Tools for Applied Multivariate Analysis; Academic Press: New York, NY, USA, 1976. [Google Scholar]
- Stemmler, M. Person-Centered Methods: Configural Frequency Analysis (CFA) and other Methods for the Analysis of Contingency Tables, 2nd ed.; Springer Briefs in Statistics; Springer Publishing Company: New York, NY, USA, 2020. [Google Scholar]
- Krauth, J.; Lienert, G.A. Die Konfigurationsfrequenzanalyse (KFA) und Ihre Anwendung in Psychologie und Medizin: Ein Multivariates nIchtparametrisches Verfahren zur Aufdeckung von Typen und Syndromen; mit 70 Tabellen; Alber-Broschur Psychologie, Alber Karl: Freiburg, Germany, 1973. [Google Scholar]
- Lienert, G.A. Die Konfigurationsfrequenzanalyse: I. Ein neuer Weg zu Typen und Syndromen. Z. Klin. Psychol. Psychother.
**1971**, 19, 99–115. [Google Scholar] [PubMed] - Heine, J.H.; Alexandrowicz, R.W.; Stemmler, M. Confreq: Configural Frequencies Analysis Using Log-Linear Modeling; R Package Version 1.5.6-4. 2021. Available online: https://CRAN.R-project.org/package=confreq (accessed on 2 September 2021).
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021; Available online: https://www.R-project.org/ (accessed on 2 September 2021).
- Meyer, D.; Zeileis, A.; Hornik, K. Vcd: Visualizing Categorical Data; R Package Version 1.4-8. 2020. Available online: https://CRAN.R-project.org/package=vcd (accessed on 2 September 2021).
- Meyer, D.; Zeileis, A.; Hornik, K. The Strucplot Framework: Visualizing Multi-Way Contingency Tables with vcd. J. Stat. Softw.
**2006**, 17, 1–48. [Google Scholar] [CrossRef] - Rosato, N.S.; Baer, J.C. Latent Class Analysis: A Method for Capturing Heterogeneity. Soc. Work. Res.
**2012**, 36, 61–69. [Google Scholar] [CrossRef] - Heine, J.H. Untersuchungen zum Antwortverhalten und zu Modellen der Skalierung bei der Messung Psychologischer Konstrukte [Studies on the Response Behavior and Models of Scaling in the Measurement of Psychological Constructs]; Monographie [monograph]; Universität der Bundeswehr: München, Germany, 2020. [Google Scholar]
- Stern, W. Die Differentielle Psychologie in Ihren Methodischen Grundlagen; Verlag von Johann Ambrosius Barth: Leipzig, Germany, 1911. [Google Scholar]
- Von Eye, A.; Bogat, G.A. Person-Oriented and Variable-Oriented Research: Concepts, Results, and Development. Merrill-Palmer Q.
**2006**, 52, 390–420. [Google Scholar] [CrossRef] - Leuner, H. Die Experimentelle Psychose: Ihre Psychopharmakologie, Phänomenologie und Dynamik in Beziehung zur Person. Versuch Einer Konditonal-Genetischen und Funktionalen Psychopathologie der Psychose; Springer: Göttingen, Germany, 1962. [Google Scholar]
- Meehl, P.E. Configural scoring. J. Consult. Psychol.
**1950**, 14, 165–171. [Google Scholar] [CrossRef] - Simpson, E.H. The Interpretation of Interaction in Contingency Tables. J. R. Stat. Soc. Ser.
**1951**, 13, 238–241. [Google Scholar] [CrossRef] - Yule, G.U. Notes on the theory of association of attributes in statistics. Biometrika
**1903**, 2, 121–134. [Google Scholar] [CrossRef] - Bortz, J.; Döring, N. Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler, 4th ed.; Springer: Heidelberg, Germany, 2006. [Google Scholar]
- Sälzer, C.; Heine, J.H. Students’ skipping behavior on truancy items and (school) subjects and its relation to test performance in PISA 2012. Int. J. Educ. Dev.
**2016**, 46, 103–113. [Google Scholar] [CrossRef] - Stemmler, M.; Heine, J.H. Using Configural Frequency Analysis as a Person-centered Analytic Approach with Categorical Data. Int. J. Behav. Dev.
**2017**, 41, 632–646. [Google Scholar] [CrossRef] - Börnert-Ringleb, M.; Wilbert, J. The Association of Strategy Use and Concrete-Operational Thinking in Primary School. Front. Educ.
**2018**, 3. [Google Scholar] [CrossRef][Green Version] - Lazarides, R.; Dietrich, J.; Taskinen, P.H. Stability and change in students’ motivational profiles in mathematics classrooms: The role of perceived teaching. Teach. Teach. Educ.
**2019**, 79, 164–175. [Google Scholar] [CrossRef] - Heine, J.H.; Stemmler, M. Die (Nicht-)Bedeutsamkeit des »Migrationshintergrundes« für die PISA-Leistung—Eine Analyse mittels KFA und LCA. In Klassifikationsanalysen in den Sozialwissenschaften; Reinecke, J., Tarnai, C., Eds.; Waxmann Verlag: Münster, Germany, 2021; pp. 75–99. [Google Scholar]
- Bonferroni, C.E. Il calcolo delle assicurazioni su gruppi di teste. In Studi in Onore del Professore Salvatore Ortu Carboni; Carboni, S.O., Ed.; Bardi: Rome, Italy, 1935; pp. 13–60. [Google Scholar]
- Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat.
**1979**, 6, 65–70. [Google Scholar] - Feller, W. An Introduction to Probability Theory and Its Applications; John Wiley: New York, NY, USA, 1967. [Google Scholar]
- Fisher, R.A. The Logic of Inductive Inference. J. R. Stat. Soc.
**1935**, 98, 39–82. [Google Scholar] [CrossRef][Green Version] - Lucas, A.; Scholz, I.; Boehme, R.; Jasson, S.; Maechler, M. Gmp: Multiple Precision Arithmetic; R Package Version 0.6-2. 2021. Available online: https://CRAN.R-project.org/package=gmp (accessed on 2 September 2021).
- Langeheine, R. Log-Lineare Modelle zur Multivariaten Analyse Qualitativer Daten. Eine Einführung; Oldenbourg Verlag: München, Germany, 1980. [Google Scholar]
- Wermuth, N. Anmerkungen zur Konfigurationsfrequenzanalyse. Z. Klin. Psychol. Psychother.
**1973**, 3, 5–21. [Google Scholar] - Victor, N.; Kieser, M. A test procedure for an alternative approach to Configural Frequency Analysis. Methodika
**1991**, 5, 87–97. [Google Scholar] - Victor, N. An alternativ approach to Configural Frequency Analysis. Methodika
**1989**, 3, 61–73. [Google Scholar] - Kieser, M.; Victor, N. Configural frequency analysis (CFA) revisited—A new look at an old approach. Biom. J.
**1999**, 41, 967–983. [Google Scholar] [CrossRef] - Victor, N. A note on contingency tables with one structural zero. Biom. J.
**1983**, 25, 283–289. [Google Scholar] [CrossRef] - Glück, J.; von Eye, A. Including covariates in Configural Frequency Analysis. Psychol. Beitr.
**2000**, 42, 405–417. [Google Scholar] - Glück, J. Spatial Strategies—Kognitive Strategien Bei Raumvorstellungsleistungen. [Spatial Strategies—Cognitive Strategies on Spatial Tasks.]. Unpublished Ph.D. Thesis, University of Vienna, Vienna, Austria, 1999. [Google Scholar]
- Stemmler, M.; Heine, J.H.; Wallner, S. Person-centered data analysis with covariates and the R-package confreq. Methodology
**2021**, 17, 149–167. [Google Scholar] [CrossRef] - Schmid, S.; Lutz, A. Epistemologische Überzeugungen als kohärente Laientheorien [Epistemological Beliefs as Coherent Lay Theories]. Z. Pädagogische Psychol. Ger. J. Educ. Psychol.
**2007**, 21, 29–40. [Google Scholar] [CrossRef] - Von Eye, A. Configural Frequency Analysis–Version 2000. A program for 32 Bit Windows Operating Systems. Methods Psychol. Res. Online
**2001**, 6, 129–139. [Google Scholar] - Mair, P.; Funke, S. cfa: Configural Frequency Analysis (CFA); R Package Version 0.10-0. 2017. Available online: https://CRAN.R-project.org/package=cfa (accessed on 2 September 2021).
- IBM Corporation. IBM SPSS Statistics for Windows, Version 26.0; IBM Corporation: Armonk, NY, USA, 2019. [Google Scholar]

**Figure 1.**Different types of graphical displays for the data from the Lienert LSD trial, see in [11] p. 103, ‘Tabelle 1’; left panel (

**a**): ‘

`strucplot`’ with labeling and cell frequencies added; right panel (

**b**) ‘

`doubledecker`’ plot to visualize the influence of explanatory variables on one dependent variable.

**Table 1.**Different perspectives on data analysis according to William Stern [18].

Perspective on Data | Object of Research | Research Discipline |
---|---|---|

variable-centered | one characteristic on many individuals | variation research |

two or more characteristics on many individuals | correlation research | |

person-centered | one individuality (person) with regard to many characteristics | psychography |

two or more individualities (persons) with regard to many characteristics | comparative research |

Wide Form | Long Form | Tabulated Data | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

case | $va{r}_{A}$ | $va{r}_{B}$ | $va{r}_{C}$ | case | variable | measure | pattern | measure | ||

$c1$ | f | − | − | $c1$ | $va{r}_{A}$ | f | $va{r}_{A}$ | $va{r}_{B}$ | $va{r}_{C}$ | Freq |

$c2$ | m | + | − | $c2$ | $va{r}_{A}$ | m | f | − | − | 19 |

$c3$ | f | − | − | ⋮ | ⋮ | ⋮ | f | − | + | 15 |

$c4$ | f | − | + | $c100$ | $va{r}_{A}$ | f | f | + | − | 7 |

$c5$ | m | − | − | $c1$ | $va{r}_{B}$ | − | f | + | + | 10 |

$c6$ | m | + | − | $c2$ | $va{r}_{B}$ | + | m | − | − | 16 |

$c7$ | m | + | + | ⋮ | ⋮ | ⋮ | m | − | + | 12 |

$c8$ | f | − | + | $c100$ | $va{r}_{B}$ | + | m | + | − | 9 |

⋮ | ⋮ | ⋮ | ⋮ | $c1$ | $va{r}_{C}$ | − | m | + | + | 12 |

$c100$ | f | + | + | $c2$ | $va{r}_{C}$ | − | ||||

⋮ | ⋮ | ⋮ | ||||||||

$c100$ | $va{r}_{C}$ | + |

**Table 3.**Data from the Lienert LSD trial, see Lienert [11], p. 103, ‘Tabelle 1’.

C | T | A | Freq | |
---|---|---|---|---|

1 | + | + | + | 20 |

2 | + | + | − | 1 |

3 | + | − | + | 4 |

4 | + | − | − | 12 |

5 | − | + | + | 3 |

6 | − | + | − | 10 |

7 | − | − | + | 15 |

8 | − | − | − | 0 |

**Table 4.**Results for bivariate analysis for data from the Lienert LSD trial, see in Lienert [11], p. 103.

$\mathit{C}=+$ | $\mathit{C}=-$ | Total | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

A | A | A | |||||||||

+ | − | + | − | + | − | ||||||

T | + | 20 | 1 | T | + | 3 | 10 | T | + | 23 | 11 |

− | 4 | 12 | − | 15 | 0 | − | 19 | 12 | |||

${\chi}^{2}=19.66,df=1,p=0.00$ * | ${\chi}^{2}=17.95,df=1,p=0.00$ * | ${\chi}^{2}=0.29,df=1,p=0.62$ * |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Heine, J.-H.; Stemmler, M.
Analysis of Categorical Data with the R Package *confreq*. *Psych* **2021**, *3*, 522-541.
https://doi.org/10.3390/psych3030034

**AMA Style**

Heine J-H, Stemmler M.
Analysis of Categorical Data with the R Package *confreq*. *Psych*. 2021; 3(3):522-541.
https://doi.org/10.3390/psych3030034

**Chicago/Turabian Style**

Heine, Jörg-Henrik, and Mark Stemmler.
2021. "Analysis of Categorical Data with the R Package *confreq*" *Psych* 3, no. 3: 522-541.
https://doi.org/10.3390/psych3030034