Exploring Learning Difﬁculties in Abstract Algebra: The Case of Group Theory

: In an earlier contribution to Education Sciences we presented a new concept inventory to assess students’ conceptual understanding of introductory group theory—the CI 2 GT. This concept inventory is now leveraged in a pretest-post-test design with N = 143 pre-service teachers to enrich this body of work with quantitative results. On the one hand, our ﬁndings indicate three recurring learning difﬁculties which will be discussed in detail. On the other hand, we provide a summative evaluation of the Hildesheim Teaching Concept and discuss students’ learning gain in different sub-domains of group theory. Together, the results allow for an empirical perspective on educational aspects of group theory and thus bridge the gap between qualitative and quantitative research in this ﬁeld which constitutes a desideratum to date.


Literature Review
Over the past 10 years, research into educational aspects of abstract algebra has gained increasingly more traction. With the works of Wasserman et al. [1][2][3][4] and Melhuish [5][6][7], various studies have shown the benefits of learning abstract algebra and group theory in particular, as well as different pitfalls for students-an overview of the research results in this regard is presented in [8].
For example, from a subsample of N = 286 undergraduate students it was derived by Melhuish and Fagan [6] that learners tend to conflate and overgeneralize basic properties such as associativity and commutativity. This finding was substantiated both by (a) Larsen [9] who found with the method of teaching experiments on N = 5 undergraduate mathematics students that associativity and commutativity have a potential to lead to many errors in algebra education as both properties are related to order in ways that are often not carefully distinguished and by (b) Zaslavsky and Peled [10] who in the context of an in-service professional development course for N = 67 mathematics teachers showed that the participants felt the properties of associativity and commutativity were logically dependent. Another learning difficulty was identified by Veith et al. [11], who used an acceptance survey with N = 9 secondary school students to identify linguistic preconceptions which posed learning difficulties regarding binary operations and isometries of the equilateral triangle.
On a more positive note, the concepts of abstract algebra also provide beneficial opportunities for affective learner characteristics and mathematical stances in general. Specifically, with a mathematics for teachers course presented in [1], the introduction to algebraic structures such as groups exhibited a positive impact on N = 12 K-12 teachers' beliefs and intended practices. Moreover, in a further study by Even, N = 15 mathematics teachers participating in an advanced mathematics course voiced the opinion that dealing with the concepts of algebraic structures helped them develop more knowledge of the nature of the discipline itself [12].
It can be concluded that the studies in abstract algebra education research so far have two striking similarities, namely that (a) the body of research consists mainly of qualitative investigations and (b) the samples are mostly comprised of secondary school teachers and mathematics majors. The first statement can be ascribed to a lack of respective test instruments. Therefore, to overcome this lack of quantitative insights into student learning of abstract algebra, two concept inventories were developed in recent times-the GTCA (Group Theory Concept Assessment) by Melhuish [5] and the CI 2 GT (Concept Inventory for Introductory Group Theory) by Veith et al. [13]-their main difference being the target group. While the GTCA is developed for mathematics majors the CI 2 GT is developed for students who "only enter this area on a superficial level" [13] (p. 2). Nonetheless, with the CI 2 GT operationalizing conceptual understanding of introductory group theory it is now possible to study group theory education from an empirical perspective. Hence, in this paper we make use of the CI 2 GT to enrich this body of research with quantitative insights, tackling, among other aspects, the often described problems tied to associativity and commutativity with new methods.
As for statement (b) the questions arise whether the benefits tied to dealing with concepts of abstract algebra as well as the reported learning difficulties can also be observed in samples consisting of primary school teachers. In this regard, Chick and Harris [14] found in their 2007 study that the N = 14 examined primary school teachers displayed overall poor sense of how their mathematical contents build the foundation of later algebra. They concluded that "for some teachers, this limited perspective may be due to their own educational history" [14] (p. 133) and further demanded that "more needs to be done to help teachers understand what the key aspects are and how they contribute to the understanding that needs to be developed in the secondary school." [14] (p. 133). Thus, in order to better identify which content domains precisely profit from abstract algebra concepts and how they may be transformed, Wasserman [2] explored the potential abstract algebra offers for school mathematics instruction across the entire spectrum, ranging from elementary school to high school content areas. In the case of elementary schools, it was elaborated as to how inverse operations and arithmetic properties such as commutativity and associativity manifest themselves in primary mathematics education. To this end, Wasserman concluded that "transforming teachers' knowledge regarding these content areas through understanding more abstract ideas about algebraic structures likely is accomplished through fostering reflection on their connections to and their importance for more elementary content in school mathematics." [2] (p. 42), building on the the CCSS-M (Common Core State Standards in Mathematics, cf. [15]). In the context of elementary school mathematics, this reflection may be facilitated by group theory, which after all is precisely the generalization of arithmetic properties of binary operations and inverses. Thus, as teachers "need to know the mathematics they are teaching, as well as how to teach it" [16] (p. 5), we argue that great potential may come with introducing pre-service primary teachers to basic notions of group theory as they can be used to build upon in geometry (e.g., Dihedral groups) and arithmetic (e.g., Cyclic groups) as well as in linear algebra courses, where further algebraic structures are studied such as fields and vector spaces.
In addition, in our prior research contributions (cf. [8,11,13]) we derived desiderata regarding abstract algebra education from the literature (cf. [12,[17][18][19][20][21][22]). This article contributes empirical evidence regarding these desiderata, for example by asking (for our research questions see Section 2): • How should instructional elements be designed when teaching group theory? • Do learning difficulties found with qualitative methods also present themselves in a quantitative setting? If so, which difficulties can be observed and how pronounced are they?
The investigation of these questions required analysis methods addressed in Section 3.3 which we adopted from physics education research -namely the normalized gain expressed by Hake's g [23] and Hasan et al.'s Certainty of Response Index CRI [24].

The Hildesheim Teaching Concept
The Hildesheim Teaching Concept is a teaching concept focusing on introductory group theory elements aimed at secondary and undergraduate mathematics education. Details regarding the Hildesheim Teaching Concept are presented in our earlier contribution [8]. Hence, we only outline the main aspects here.
The curriculum is the result of an in-depth literature review where viewpoints from the new math era of the 1960s (cf. [25]) were merged with viewpoints from contemporary works on abstract algebra education. In particular, the development process was guided by Larsen's TAAFU project (Teaching Abstract Algebra for Understanding), presented in [26,27]. The main differences lie in (a) "exploring groups via symmetries" [8] (p. 12) in a hands-on way using haptic learning material and thus translating the introduction presented in [26] into a more physically engaging process, and (b) adjusting the content depth to be more in line with the curricula pre-service primary school teachers are presented with. For example, cosets, quotient groups, normal subgroups and kernels are cut. From this, the multifaceted perspectives have been synthesized into a coherent teaching trajectory spanning three units (of 90 minutes each) across multiple aspects of introductory group theory (cf. Section 3.2). From mathematics education literature, it was derived that this introduction should be guided by three groups specifically -the dihedral group D 3 of the regular triangle, the dihedral group D 4 of the square and cyclic groups Z n (cf. [28]). We refer readers unfamiliar with these mathematical concepts to the reference handbook [29] where these notions are explained in rigorous detail.
The Hildesheim Teaching Concept has been subject to a formative evaluation using the method of probing acceptance among students in a laboratory setting (cf. [11]). The results of this pilot study suggested that the instructional elements of the concept are well accepted by learners. In addition, the instructional elements were found to be potentially conducive to fostering algebraic thinking (cf. [11]). These findings are now to be complemented and substantiated in the course of a summative evaluation that...

•
· · · examines the impact of the curriculum on students' development of conceptual understanding of group theory, and; • · · · that explores possible learning difficulties that appear regarding introductory group theory.

Research Questions
As elaborated in Section 1, we aim to clarify the following research questions with this contribution: RQ1: Do learners achieve an adequate conceptual understanding of introductory group theory when instructed with the Hildesheim Teaching Concept and which concepts post the most hurdles for learners? RQ2: Which learning difficulties regarding introductory group theory can be identified?
We elaborate on the operationalization of adequate conceptual understanding in Section 3.3.

Study Design and Samples
To clarify the research questions, two studies were conducted: 1.
An expert survey with N = 9 experts from mathematics and mathematics education.

2.
A quantitative evaluation of the Hildesheim Teaching Concept with N = 143 preservice teachers.
The quantitative evaluation was conducted as part of a two-week group theory programme. The instructions of this programme were based on the Hildesheim Teaching Concept and the CI 2 GT was administered in a pretest-post-test design. The two weeks of the programme were identical in the sense that they consisted of one lecture (90 min), followed by an exercise session (90 min) and a problem sheet for the participants to solve at home. The first week focused on the introduction of groups via the dihedral groups D 3 and D 4 , as suggested by the Hildesheim Teaching Concept, and the second week focused on cyclic groups Z n and applications of group theory.
The pre-service teachers participating in this study were pre-service primary school mathematics teachers in their first semester. Thus, for the vast majority, it can be expected that the participants had no prior knowledge in abstract algebra-this assumption was tested with the pre-test.

Instruments
To assess the learners' conceptual understanding of introductory group theory, we used the Concept Inventory of Introductory Group Theory -the CI 2 GT [13]. The CI 2 GT is a concept inventory consisting of 20 two-tier single-choice items, where exactly one out of three answer options in tier one is correct. In the second tier, the respondents additionally rate their answer confidence on a 5-point rating scale (1 =guessed, . . . , 5 = very confident). A point was only assigned if the correct answer option was chosen and the respondent was confident (4) or very confident (5). Consequently, if the respondent indicated uncertainty (CRI of 1, 2 or 3) no points were assigned, regardless of which answer option was chosen. The internal consistency is expressed by Cronbach's α = 0.76. All items of the CI 2 GT can be found in the Appendix A. It is noteworthy, that for this article the answer options for all items have been sorted such that option 1 is always the correct answer. For the test administration during our study, however, answer options appeared in a randomized order.
To analyze the strengths and weaknesses of the Hildesheim Teaching Concept, we analyzed the students' growth regarding conceptual understanding in different content domains. These content domains were extracted from an expert survey (N = 9). The experts (mathematicians and mathematics educators) were asked to assign each item of the CI 2 GT to one or more sub-domains of group theory, namely: A free response option was included in case no domain seemed suitable by the expert. In the first round, the experts' assignments were summed up for each item (cf. Figure 1).
In the next step, the sub-domains D1 to D6 were merged such that each item could be assigned precisely to one overarching domain. This resulted in a total of three domains: Domain 1 including D1, Domain 2 including D2 and D3, Domain 3 including D4, D5 and D6 (cf. Figure 2).
The CI 2 GT items corresponding to these sub-domains are shown in Table 1. As typical for concept inventories comprised of large domains no high values of α are to be expected. Thus, according to Lienert and Raatz [30], for concept inventories values of α ≥ 0.55 are considered sufficient. In combination with the adjustment α = n−1 n · α for small scales of length n by Bauer [31]    The students' learning gain through the intervention with the Hildesheim Teaching Concept was investigated using the CI 2 GT. The students' pre-and post-test scores were compared using Hake's g [23] as is common practice for this study design [32]. The pretest score itself was solely used to check prior knowledge. The idea behind Hake's g lies in taking into account the students' possible learning gain: For example, if student A scored 80 out of 100 possible points in the pretest and student B only scored 10 it will be impossible for A to gain more than 20 points, but not for B. Additionally, an increase from 80 to 100 is certainly more challenging than one from 10 to 30 as no room for errors is allowed. In other words, the difference of pre-and post-test scores does not measure reliable at the ends of the scale [33]. Hake's g thus expresses the normalized gain g = postscore% − prescore% 100% − prescore% and takes values below 1, indicating how much of the overall possible learning gain was achieved. Values of 0.30 ≤ g < 0.70 are considered medium learning gain according to Hake [23] (p. 65). In this regard, it is noteworthy that high-g values are exceptionally rare-in fact, of the 62 courses including N = 6542 students analysed by Hake no course lied in the high-g region, the average normalized gain was g = 0.48 ± 0.14 [23] (p. 66). In summary, Hake's g allows to compare learning gains of different teaching concepts and will be used to investigate which domains show the greatest learning gain. Lastly, in this article, we understand adequate conceptual understanding to be achieved by students who scored at least 50% of the total post-test score as has been achieved in prior research (cf. [34]). In this regard, it is important to note that for the clarification of the first part of RQ1 the pretest score is not relevant. As mentioned in Section 3.1, the participants had no prior instructions and thus a significant difference in the pretest and post-test scores is to be expected by design of this research project (cf. Table 3). By defining adequate conceptual understanding solely based on the post-test scores, however, the investigation of this part of the research question is divorced from a difference in the test scores.
Lastly, to explore the difference in pre-and post-test scores we used a Wilcoxon signedrank test (cf. [35]) and to explore differences between the three different learning gains we used a Kruskal-Wallis-Test (cf. [36]). The differences were further specified using dwasssteel-critchlow-fligner pairwise comparisons [37]. These non-parametric tests have up to 95% test power of their parametric analogues [38].

Analysis Carried Out to Answer RQ2
In order to identify learning difficulties we utilized the Certainty of Response Index (CRI) established by Hasan et al. [24]. As mentioned in Section 3.2, each question of the CI 2 GT was accompanied by an additional request for the respondent to assess their confidence with the given answer from 1 (guessed) to 5 (very confident). The Certainty of Response Index is the option selected in this regard. This enables to classify the answers in a matrix scheme (cf. Table 2). Table 2. Decision matrix based on combinations of correct or wrong answer and low or high CRI adapted from [24].

Correct answer and low CRI Uncertainty of Knowledge
Correct answer and high CRI Knowledge of scientific concept Wrong Answer Wrong answer and low CRI Lack of knowledge

Wrong answer and high CRI Misconceptions
As seen in Table 2, wrong answer options that were confidently selected (CRI > 3) indicate the presence of misconceptions or learning difficulties (cf. [39]). When investigating learning difficulties for larger sample sizes this method may be utilized in two different ways:

1.
Calculate the average CRI for each wrong answer option and investigate options with CRI > 3.

2.
Calculate for each wrong answer option the number of responses that were given confident (CRI = 4) or very confident (CRI = 5).
While in [24] the first method was used, we argue that using just the average CRI has a potential to embezzle learning difficulties. For example, if every respondent selected the wrong option 2 for some item and half of those selections were due to guessing (CRI = 1) while the other half was given confidently (CRI = 4) the average CRI for option 2 would be Thus, in this case, the answer pattern will be seen as unproblematic even though N 2 participants confidently gave a wrong answer. The drawback of the second analysis method, however, is that no thresholds are established in the literature so far. To leverage the second method nonetheless, we set a lower threshold of 10% of total responses for a wrong answer option that were given confidently or very confidently. Hence, we combined both methods to analyze answer patterns. The results are provided in Table 5 and show that each learning difficulty obtained from method 2 is also identified by method 1.

Results and Discussion
In the following, we will present the results and their discussion bundled for each research question.

Results Regarding RQ1
The descriptives of the pre-and post-test scores are provided in Table 3 alongside the statistics of the Wilcoxon signed-rank test to ensure that the difference is significant. The normalized gain as well as the different normalized gains for each of the three domains are presented in Table 4 and Figures 3 and 4. A Kruskal-Wallis-Test comparing the different learning gains for the three domains was highly significant (H(2) = 10.6, p < 0.01). A Dwass-Steel-Critchlow-Fligner pairwise comparison further indicates that the difference between g 1 and g 2 is significant (p < 0.05) and the difference between g 1 and g 3 is highly significant (p < 0.01). In contrast, the difference between g 2 and g 3 is not statistically significant (p = 0.37). Table 4. Mean value µ and standard derivation σ for the normalized gain g for each of the three domains as well as the total gain g tot .

Discussion of RQ1
The total normalized gain at 0.40 ± 0.21 is satisfactory and comparable to similar research projects (cf. 0.35 ± 0.21 in [34] (p. 156) or 0.37 ± 0.18 in [33] (p. 68)). The smallest learning gain was approximately 6% as indicated by the boxplot of g total in Figure 3, thus a non-negative impact can be recorded for all participants. Consequently, with the Hildesheim Teaching Concept, all students increased their conceptual understanding of group theory and on average reached a reasonable learning gain. In addition, 85 out of the 143 participants (59%) reached an adequate understanding of group theory as defined in Section 3.3.
For the different domains, it can be observed that, while close, the first domain (on Definitional Fundamentals) records the smallest increase and the third domain (on Intermediate Concepts) records the highest increase (cf. Table 4 and Figure 4). As presented in Section 4, the differences of g 1 and g 2 , as well as g 1 and g 3 , are statistically significant. Thus, it can be stated that learners recorded the most significant gain in advanced concepts while the lowest progress is attributed to the fundamentals of group theory, i.e., naive set theory, binary operations, associativity and commutativity (cf. Section 3).
Lastly, to ensure that higher learning gain is not due to higher prior knowledge (measured via pretest score) we divided the sample into two groups. The median was not suitable to split the sample as the overwhelming majority of participants scored 0 in the pretest. Thus, we established one group (N 1 = 49) consisting of students with a pretest score greater than 0 and the rest (N 2 = 94). The average normalized gain for the group with prior knowledge was g >0 = 0.37 (σ = 0.19) and for the group without prior knowledge g 0 = 0.44 (σ = 0.17). Thus, higher learning gain is not directly related to a higher prior knowledge.
Summarizing all results with reference to research question 1, we conclude that the Hildesheim Teaching Concept seems to be conducive to learning about abstract algebra. Overall, the participants achieved an adequate conceptual understanding of introductory group theory. The strength of our teaching-learning-sequence lies in fostering intermediate concepts while it can be improved regarding definitional fundamentals. This insight will be used to further refine the teaching concept by reworking the instructional elements concerning basic notions. Table 5 displays the number of respondents that selected either one of the items' distractors but also stated to be confidently or very confidently that their given answer was correct. As described in Section 3.3, answers of this type serve as fruitful indicators for identifying learning difficulties and systematic errors. We see that 16 out of 20 items have an option with CRI > 3 and the lower threshold of 10% concerning method 2 is relevant for 12 out of 20 items. Thus, the data gathered with the CI 2 GT provide opportunities to uncover learning difficulties. In Section 4.4, we will demonstrate particularly conspicuous examples and how they tie in with similar findings from abstract algebra education research, addressing our second research question. Table 5. Number of responses regarding the wrong answer options 2 and 3 chosen confidently or very confidently by our study participants for each item. The first column provides the total number of responses (tot. #), the second column provides the relative number of responses (rel. #) and the third column provides the average CRI.

Discussion of RQ2
To identify learning difficulties we analyzed the response behaviour as presented in Table 5. If one of the wrong answer options was selected confidently or very confidently by at least 10% of the participants the respective option was investigated more thoroughly to identify learning obstacles regarding introductory group theory among our participants. 14 options qualified regarding these standards. Within these learning difficulties, three categories emerged that summarize similar obstacles and which we will present in the following. Each of these categories may be associated with precisely one of the content domains (obtained from the expert survey) that are represented in our concept inventory (cf. Section 3). Thus, the separation of group theory by contents also reflects different themes of learning difficulties that come along with the established domains. The themes will be presented in ascending order to match the hierarchical structure of the domains and an overview is presented in Table 6.
Since we cannot present every item of the CI 2 GT in detail in this discussion, we refer the reader to Appendix A. For readability, answer option x of item y will be abbreviated by option y-x (etc.). Table 6. An overview of the three recurring themes in learning difficulties. The percentages describe how many of the participants selected the respective answer option confidently (CRI = 4) or very confidently (CRI = 5).

Domain Domain Description
Learning Difficulty Item Option 2 Option 3

Problems with Associativity and Commutativity
The most glaring learning obstacle is directly observed with option 1-2. Roughly 50% of all participants stated confidently that the purpose of associativity is to be able to neglect order of concatenation. In other words, commutativity and associativity get mixed up which is substantiated by results from Larsen [9] who found that students struggle to differentiate between those properties. Additionally, associativity is not checked when verifying the properties of a group structure (cf. option 3-3) as 10% of the students confidently marked (Z, −) as a group. Tirosh et al. [40] have shown that in some cases students may even see associativity as a direct consequence of commutativity which is clearly not the case as the example illustrates. On the other hand, commutativity is often left unchecked (cf. option 7-2 and option 7-3) or assumed even when explicitly excluded (cf. option 4-3), leading to multiple different hurdles. Thus, it can be summarized that associativity and commutativity are properties whose purpose is somewhat unclear for many learners -they are confused with one another and often falsely generalized. This particular finding was also reported on by Melhuish and Fagan [6] who researched introductory group theory with the concept inventory GTCA (cf. [5]).

Problems with Inverses and the Neutral Element
The next recurring theme is located in the second domain and concerns the role of inverses and the neutral element. Similar to associativity these concepts have a potential to be overlooked when studying groups. Option 3-2 shows that 13% of participants confidently stated that (Q, ·) is a group, missing that 0 does not have an inverse in this structure. Furthermore, option 5-2 and option 10-2 suggest that starting examples in introductory courses such as (Z, +), (Q, +) or (R \ {0}, ·) get overgeneralized in the sense that 0 and 1 are a priori special elements and always self-evident candidates for neutral elements, even when dealing with completely different binary operations. In addition, the concepts of inverse elements and the neutral element are mixed up in more abstract scenarios as option 12-3 shows where an inverse element had to be extracted from a Cayley Table (cf. Table 7).
Here, it is clear that w must be the neutral element for •, and thus, we have to look for w in the row (or column) of z to find its inverse. A total of 25% of students, however, detected z • z = w and confidently jumped to the conclusion that w must be the inverse of z, reversing the roles of inverses and the neutral element. This learning difficulty might be tied to Cayley Tables as a similar problem can be observed with option 20-3 (cf. Table 8). Table 8. The tables from item 20 of the CI 2 GT: The task lies in identifying which of the three presented tables is a Cayley Table. • Here, the first table can be ruled out immediately as the column of c contains b twice. However, the second table also does not make any sense as the row of c suggests that it is the neutral element, contradicting the column of c. The fact that neutral elements in groups are always left-and rightneutral was disregarded. A total of 14% confidently stated that the second table is a Cayley Table. These observations result in two central aspects for instructors: Firstly, the usual mathematical simplification of using the symbol 1 generally for multiplicative identities and the symbol 0 for additive identities is to be treated with care, especially in introductory courses. Secondly, leaving vivid and concrete examples such as (Z, +) for more abstract ones poses hurdles for beginners that are expressed by mistakes even in the most basic fundamentals such as inverses and the neutral elements.

Problems with Visualizations of Abstract Notions
The last theme is more subtle and concerns abstract concepts such as isomorphisms and their relation to symmetry. In this regard (cf. option 9-2 and option 9-3), learners confidently think that two groups being isomorphic means that they are identical (27%) or that that their Cayley Tables are identical (17%). On the surface level, this does not look too harmful, however, the whole concept of isomorphisms is to enrich this sense of uniqueness with mathematical precision. This level of abstraction is vital not only for group theory but for mathematics in general, as equivalence relations often present the highest degree of distinction possible, thus phrases such as "up to isomorphism", "up to homeomorphism", "up to congruency", etc. are ubiquitous and further differentiation is neither necessary nor possible from a mathematical point of view.
This observation carries over to option 13-2 and option 14-2 where respondents have to classify the symmetry group of a given figure up to isomorphism. For both figures (cf. Figure 5), the wrong options D 2 and D 3 , respectively, seemed attractive. A total of 17% were confident that the left figure has symmetry group D 2 and 18% were confident that the right figure has symmetry group D 3 . From this, we assume that the students focused on the circles in the left figure and on the resemblance with a triangle of the right figure and concluded that they must have the same symmetry as the regular 2-gon and 3-gon, respectively. The underlying structural components of these groups were, however, neglected. The left figure does not have a rotational symmetry other than the identity and the right figure does not have an axial symmetry other than the identity. Thus, with the structure of the dihedral group in mind one can quickly rule out these options.
In both cases, a component of gestalt simplified the underlying functionality of the mathematical object when modeling a mathematical concept-uniqueness in the first example and symmetry in the second. An explanation for this observation could be found in a separation of two cognitive dimensions. In this regard, Ubben and Heusler [41] empirically extracted two cognitive dimensions underlying students' mental models from an exploratory study in the context of physics education research, namely the Fidelity of Gestalt and the Functional Fidelity. The first dimension is described as an understanding of ones' mental models as "exact visual representations of phenomena or exact depictions of how things look" [41] (p. 1356) while the second dimension constitutes how "much the mental models' underlying abstract functionality [...] is perceived as accurate" [41] (p. 1360). It is noteworthy that this two-factorial structure of students' mental models has been confirmed in different thematic contexts and was supported by literature from educational psychology and neurology (cf. [42]). Although this two-dimensional model stems from physics education research, we argue that the psychological components accurately describe the observations discussed in this article.
With this in mind, the results suggest that the discussed learning difficulties arise from a predominant gestalt thinking, indicating a lack of abstraction. Furthermore, the learning difficulty observed with option 20-3 may also be described with a lack of functional thinking (cf. Table 8): the second table follows the filling rules of Sudoku and thus looks like a promising solution on the surface. However, as pointed out above, a deeper look reveals how and why the group axioms are violated nonetheless.
Ultimately, we refrain from delving too deep into this characterization of mental models at this point as the CI 2 GT was not designed with this specific cognitive structure in mind. With findings indicating a possible learning opportunity for educational research into group theory, however, we plan on incorporating this theory into instruments for future investigations. After all, extracting a two-factorial structure empirically in mathematics education might open up completely new possibilities for this research body.

Limitations of This Study
Before we present our conclusion it is necessary to address the limitations of the study presented hereby. The most striking aspect is the lack of a control group that was treated with a different teaching concept. This is due to a lack of other published material in this field. The only other teaching concept of abstract algebra contents is provided by Larsen's TAAFU project (cf. [26,27]). However, as mentioned in Section 1, different topics are covered and thus the CI 2 GT cannot be applied to a potential control group taught with this concept. Conclusively, no direct comparisons are drawn in the results presented in this article. Nonetheless, Hake's g as an expression of normalized gain is designed to bypass this problem to some extent. As mentioned in Section 3.3, this parameter was used to compare learning gains of different treatments and with a value of g = 0.40 the normalized gain of the Hildesheim Teaching Concept can be classified as medium, aligning it with the vast majority of interventions presented in the aforementioned study (cf. [23]).
With regards to RQ2, another crucial aspect arises from linguistic subtleties in the items of the CI 2 GT, namely items 1 and 9. This will now be addressed to better contextualize our discussion of this research question. Answer option 2 of item 1 reads "The associativity property is required because we do not want the order of composition to matter." This distractor is ambiguous. Associativity is a characteristic of binary operations that is required for composing three or more elements. Since a priori the expression a • b • c (for some a, b, c ∈ M and some set M) bears no mathematical meaning, one needs to interpret it as either a • (b • c) or (a • b) • c. If • is associative, both expressions are equal and the abuse of notation a • b • c is justified. However, one could also describe associativity by stating that the order implied by the brackets does not matter. Thus, this answer option might be misinterpreted. It is to be revised (still hinting at commutativity) in future studies to examine whether the identified misconception in Section 4.4 still holds.
Answer option 2 of item 9 reads "The notion isomorphic means that the Cayley tables are identical." Here, even though one could argue that the word identical has a somewhat sacred meaning within mathematics, it might be mistaken for similar, meaning that both Cayley tables are only required to have a matching pattern, i.e., the Cayley tables presented in items 10 and 12. This is crucial because in this case, the statement becomes seemingly true (even though the notion pattern does not exist formally). Therefore, this answer option should be revised to eliminate possible confusion of notions. The revision of this item's distractor may be influenced by recently gathered expert opinions on the topic of sameness in mathematics, presented in [43].

Conclusions and Outlook
In this contribution, we demonstrated how data assessed with the CI 2 GT may be utilized to research various educational aspects of group theory. By measuring conceptual understanding of introductory group theory we were able to (a) investigate different subdomains assigned by experts in the field and the respective learning gains as well as (b) leverage the CRI to uncover and discuss recurring learning difficulties.
On the one hand, this substantiates the qualities of the Hildesheim Teaching Concept from an empirical point of view, complementing the formative assessment presented in [11] and providing useful information regarding group theory instructions. However, the discussed results further mark a fruitful starting point for future research and contribute to theory building in abstract algebra education research in the sense of Design-Based Research. Methods from neurology, physics education and mathematics education were synthesized to describe learning barriers in detail. In this regard, we provided the first empirical indications of Functional Fidelity as well as Fidelity of Gestalt being potentially relevant cognitive dimensions to describe mental models and thus of great use when characterizing learning difficulties within mathematics education. However, some of the items' distractors also possessed ambiguous qualities that might have interfered with the observed learning difficulties. With this in mind, the following research questions should be investigated in the future:

1.
Does a revision of items 1 and 9 lead to a disappearance of the observed learning difficulties? 2.
How and to what extent do the expounded learning difficulties impede learning gain? 3.
Can these systematic learning obstacles also be observed in qualitative settings with individual learners? If so, can they be characterized in greater detail? 4.
Can the cognitive structure of gestalt and functionality be extracted empirically to enrich the understanding of learning processes in introductory group theory?

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study to publish this paper.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.    The first and the third.
The first and the second.
The second and the third.  The third.
The first.