## Appendix B. Detailed Timeline of Methodological Steps Taken in This Study

February: For coding of the geometry content knowledge for the area of a trapezoid, we adopted the IQA instrument rubric [

26]; for each participant, we individually evaluated and compared our assessment of their answers to the (a) question on the first five exemplars from the Trapezoid PCK Instrument (i.e., “(a) Based on the diagram above, describe [student]’s thinking. If [s/he] were to complete the formal derivation of the area formula in [her/his] diagrams, would [her/his] method work for any trapezoid? Why, or why not?”). We examined our individual scores and identified scores that did not match. During the peer debriefing sessions, we adjusted our rubric to develop consistent scoring for the participants’ responses and reached consensus on scoring them. We then completed the process for the other three exemplars. Based on the average scores per teacher and their ranking, we selected four marker cases for this study, one for each level (1–4). From the start, we wrote memos to record nascent ideas, document what prompted us to make data-gathering decisions and initial sampling, and note development of theoretical categories.

March: Individually, we developed rubrics for evaluation of (b) and (c) sections for each exemplar (i.e., “(b) If […]’s approach presents a mathematical limitation, what kind of thinking might lead [her/him] to the limitation presented in this item?” and “(c) If [student]’s approach presents a mathematical limitation, how might [s/he] have developed it?”). The rubrics were based on research literature of developmental levels in Geometry, such as discussed by van Hiele and Piaget. We then focused on two main mathematics concepts in the rubrics—those of trapezoid and area, compared and modified our rubrics to reach consensus, and used the newly developed rubric to complete scoring of sections (b) and (c) for the first five exemplars provided by marker cases. Further, we discussed possible changes in the instrument for clarity purposes. These changes acknowledged the breath and richness of teachers’ responses to exemplars, and required that in order to be highly ranked, their responses should recognize students’ developmental level on the van Hiele scale with respect to the area concept and the trapezoid concept.

April: Memo-writing helped us to “flag incomplete categories and gaps in [our] analysis”; it prompted us “to predict

where and

how [we] can find needed data to fill such gaps and to saturate categories” [

22] (p. 199, italics in original). Aligned with the logic of theoretical sampling, we decided to add new marker cases to illuminate our categories. Based on the average scores per teacher and our consensus of their ordering according to the coding, we added two more marker cases, for the total of six. Using rubric for sections (b) and (c), we compared our individual analyses for exemplars 1 and 5 (which presented two non-generalizable approaches which used an enclosing trapezoid technique) for marker cases, reached consensus, and made changes in the rubric. We then completed analyses for the marker cases on exemplars 3 and 6 (which presented two generalizable approaches, one using an enclosing trapezoid and the other using a decomposing trapezoid strategy).

June: We looked at the rubric for sub-questions (b) and (c), and agreed on what works and what should be changed. Then, we completed analysis for exemplars 1, 3, 5, and 6 for marker cases. After considering our marker cases’ data and re-examining them, we decided to add one more marker case, to expose our theoretical interpretation to more empirical inspection [

22]. We then created an initial version of a rubric for sub-question “(d) What further question(s) might you ask [student] to understand [student’s] thinking?”

Next, we discussed grouping of sub-questions into two clusters, the first cluster representing Pre-Active Behaviors [

32] and consisting of sub-questions (e), (f), and (g) (i.e., “(e) What instructional strategies and/or tasks would you use during the next instructional period to address [student]’s challenge(s) (if any presented)? Why?”; “(f) If applicable, how would you use technology or manipulatives to address [student]’s challenge(s)?”; “(g) How would you extend this problem to help [student] further develop [her/his] understanding of the area of a trapezoid?”). These questions were related to Pre-Active Behaviors because they addressed planning, thinking how to deal with learning and behavioral issues with students, and assessment outside the classroom, in other words, things that teacher does before and after school or during recess.

Sub-questions (b) and (c) were considered as describing Interactive Behaviors [

32], since they relate to what happens when teacher is with students (in our case the participants used these sub-questions to think about the mathematics task). After inspecting Pre-Active Behaviors, we decided to look into (d) separately, or in combination with (b) and (c). We checked participants’ answers under (d), to see if we are finding the evidence that one can be mathematically weak but pedagogically strong. Then we completed analyses for the marker cases on all items for sub-question (d), went over the current version of rubric for (d), and suggested modifications to better discriminate between marker cases. After modifying the rubric by evaluating marker cases’ responses and seeing how many responses address characteristics in sub-questions (a)–(g), we made decision on how to split the levels in the rubric. Going back and forth with creation of the rubrics, looking into the literature and exploring data through that lens, revising the rubrics iteratively, treating data analysis and collection simultaneously—were aligned with the emergent nature of the Grounded Theory method.

July: We finalized the rubric for sub-question (d) and analyzed exemplars 1, 3, 5, and 6 for seven marker cases. After creating a draft rubric for sub-questions (e), (f), and (g), we consulted the IQA rubrics [

26] to see if they could be somehow utilized. We identified the IQA RUBRIC 1:

Potential of the Task, as relevant for (e); and IQA RUBRIC 2:

Implementation of the Task, as relevant for both (f) and (g). Then we looked for and read the articles related to the theoretical framework of our project, and completed the rubrics for sub-questions (e), (f), and (g). This stage ended with compilation of data from marker cases for exemplars 1, 3, 5, 6 for (e) and (f), and exemplars 1–9 for (g).

August: We completed analyses of marker cases for sub-questions (e), (f), and (g) using developed rubrics and identified five dimensions of PCK—namely, geometric knowledge (a), knowledge of student challenges and conceptions (b) and (c); ability to ask diagnostics questions (d); knowledge of applicable instructional strategies and tools (e)–(f); and ability to extend understanding of geometric problem (g). We then created profiles for marker cases. With this research phase, we completed the steps of theoretical sorting, diagraming, and integrating [

22].

September: We contacted marker cases to schedule observations of those participants who were teaching Geometry. To complete our analysis, we checked reflections of marker cases to identify their individual characteristics, such as attitudes, perspectives, and motivations. We then analyzed the marker cases’ answers to the question, “Given that all of these students were in the same class you taught, what level 1–4 would you assign to each response [using the criteria of: (a) math appropriateness (suitability to generate expected outcome/formula for the area of any trapezoid), (b) clarity (how clear/unambiguous is this student’s strategy/approach), (c) sophistication (how sophisticated/complex is the student’s approach), and (d) limitations (how limited is this approach)]” [

25]. Then we adjusted the profiles, calculated correlations, and created demographics summaries. Using the IQA rubrics [

26], the course instructor completed two in-school observations for each teacher.

Throughout October–January 2015: We created qualitative descriptions of the observed lessons illustrated with additional artifacts gathered during the lessons (e.g., pictures, screenshots, etc.). To allow for easy use of the Trapezoid PCK instrument, we created online, dynamic, version of its six representative exemplars and looked into the reliability and validity of the instrument. Reliability was addressed in a multi-rater agreement. All exemplars’ data were first separately evaluated by each author, after which any disagreement was discussed and resolved. Face validity of the Trapezoid PCK instrument was achieved before the data collection commenced, through the multiple rounds of peer review, and a content validity through analysis of items by two authors, both experts in the field. Finally, we agreed upon the new labels for the categories (e.g., focus on why vs. how; explorative/exploration vs. formulaic approach), and selected representative quotes for our report.