Collaboration and Shared Responsibility in Team Teaching: A Large-Scale Survey Study

: The practice of team teaching—how teachers deliver team teaching in the classroom— substantially determines its effect. The collaboration between the teachers and the level of shared responsibility between them are two important dimensions of the team teaching practice. To date, no instrument exists to measure these dimensions. However, in view of empirical research within the context of team teaching, such an instrument is important. Therefore, the Collaboration and Shared Responsibility in Team Teaching (CSTT) scale is developed, making it possible to assess these two important dimensions. The CSTT scale was used in a large-scale cross-sectional survey study (n = 555). Next to a validation of the scale, this study provides empirical evidence on the differences between groups of teachers regarding (a) teaching experience, (b) education type, and (c) frequency of team teaching. Results show that teachers overall report high scores on both dimensions. Further, this study indicates that there are no signiﬁcant differences between the groups based on (a) teaching experience and (b) education type for both collaboration and shared responsibility. There are, however, signiﬁcant differences between groups in terms of the (c) frequency of team teaching.


Introduction
Until now, teaching has remained a highly individualised activity, with only little collaboration with other teachers [1]. Teachers seem to work primarily in their own classroom, largely isolated from other colleagues [2]. This isolation of teachers may impair learning opportunities for teachers as well as for students [3]. In this respect, evidence suggests that teacher collaboration results in positive outcomes for both teachers and students [4]. Teachers who collaborate can feel less isolated [5] and be more effective in their teaching [6,7]. Further, students whose teachers collaborate may experience higher learning outcomes, richer and more varied lessons, and increased support [8]. Hence, collaboration has been put forward by scholars as a way to improve teachers' teaching practice (e.g., [6,9,10]).
Collaboration within schools has, therefore, gained importance [11]. In this respect, educational institutions show a growing interest in teaching models in which teachers are more committed to collaborating, sharing expertise and experiences, supporting each other, and learning collaboratively [12], such as team teaching [13]. Team teaching can be described as two or more teachers in some level of collaboration in the planning, delivery, and/or evaluation of a course or courses [14]. Given its promising character [15], attention for team teaching has increased significantly during the past two decades [16]. This applies to both fundamental research and educational practice [17]. Despite an increased interest in and emphasis on teacher collaboration [18], team teaching has only been studied to a limited extent [19].
The practice of team teaching-how teachers deliver team teaching in the classroomsubstantially determines its effect [20]. Because the practice of team teaching plays a crucial role in the impact of team teaching, Sweigart and Landrum [21] recommend that further analysis of the practice of team teaching is necessary. However, to date, there exists no appropriate measurement instrument that captures important dimensions of the practice of team teaching. Therefore, this study attempts to contribute to the research base on team teaching by developing an instrument to assess important dimensions of the practice of team teaching (i.e., collaboration and shared responsibility). By pioneering the development of such an instrument, this study aims to fill an existing gap. Additionally, this study aims to investigate whether differences exist between groups of teachers regarding these important dimensions. It thus goes beyond instrument development and also advances research on collaborative learning environments. In this way, it not only fills a gap in research but also reveals valuable insights that can foster more effective team teaching practices.

Team Teaching
In the literature, there are many synonyms (e.g., co-teaching, collaborative teaching, and cooperative teaching) for team teaching that are often used interchangeably [19,22]. In this regard, the concept of team teaching is considered to be an umbrella term [19]. It generally refers to the collaboration between two or more teachers in the planning, delivery, and/or evaluation of a course or courses [14]. Additionally, various definitions of team teaching can be found [16,19,23]. For instance, Welch, Brownell [24] (p. 38) define team teaching as: "the simultaneous presence of two educators in a classroom setting who share responsibility in the development, implementation, and evaluation of direct service in the form of an instructional or behavioural intervention to a group of students with diverse needs". Thousand, Villa [25] (p. 5) describe it as: "when two or more people share responsibility for teaching some or all of the students assigned to a classroom". Fuller and Bail [26] define team teaching as two or more people sharing responsibility for teaching some or all of the students assigned to a class. Overall, these definitions highlight the collaborative nature of team teaching, as well as the shared responsibility among the teachers involved.
Although there is some research, theory development on team teaching is still in its infancy [19], especially within the context of compulsory education. Furthermore, research is mainly small-scale and qualitative in nature [21,27], and has an almost sole focus on experiences and perceptions about team teaching. Despite these gaps, team teaching is regarded in the literature as a teaching model that holds several benefits for both teachers and students. Teachers report increased emotional and professional support, increased reflective dialogue, professional and personal growth, and learning gains [23]. Nevertheless, concerns have been recognised as well. For instance, teachers using team teaching indicate that compared to individual teaching the workload increases [28,29]. Moreover, they perceive that students compare between teachers and that this could potentially even lead to competition [30]. The literature also mentions benefits for students. Students who are taught in a team teaching environment report richer and more varied learning opportunities, quicker assistance, and more individualised attention [23]. However, some concerns also exist for students. For instance, students may potentially be confused when being confronted with multiple teachers in the classroom and not know which teacher they should turn to [31].

Measurement Instruments
Although until now, no appropriate instrument exists that advances insights in teachers' team teaching practice, several measurement instruments have been developed to capture aspects related to one's team teaching practice. These instruments have either a focus on learners' perceptions of team teaching [32], on student teachers' perceptions on team teaching [33], or on perceived advantages and disadvantages of team teaching [34]. The existing measurement instruments mainly respond to experiences and perceptions, but do not so much try to capture typical features of the practice of team teaching itself, which is the aim of this study. The newly developed instrument could be used to complement the existing instruments.

The Practice of Team Teaching
The practice of team teaching-how teachers deliver team teaching in the classroomis expressed in the literature through the models of team teaching [25,35]. The models of team teaching represent the ways in which team teaching is established in the classroom (e.g., observation model, parallel model, teaming model). However, these team teaching models do not correspond to the complex reality within the classroom. For this reason, the objective of this study is to develop an instrument able to capture collaboration and shared responsibility as two important dimensions of the practice of team teaching. Simons, Coetzee [22] recently placed the most common models of team teaching on two continua: collaboration and shared responsibility.
Following Vangrieken, Dochy [4], collaboration can be defined as "joint interaction in the group in all activities that are needed to perform a shared task". It refers to teachers actually doing things together [36]. Doing things together entails negotiation, discussion, and consideration of opposing viewpoints [37]. Following research by Valckx, Devos [38], shared responsibility means that colleagues create a common sense of responsibility for all students' learning [39]. Shared responsibility is, according to Sleegers, Den Brok [40], part of interpersonal capacity. This notion is echoed by Griffin and Robertson [41], who shift responsibility for a class from the teacher to the team in which that teacher works. More specifically, it represents a shift from the traditional approach which sees a teacher mainly responsible for his or her own class, to one where the whole team takes responsibility for the learning of the students in the classes for which they cover [42]. To accomplish this, team members must take on additional work, support other teachers, and contribute to the decisions that will support students in other classes [42].

Research Goals
The main aim of this study is to provide empirical evidence on important dimensions of the practice of team teaching (i.e., collaboration and shared responsibility). Presumably, high levels of collaboration and shared responsibility will be assessed, due to the inherent nature of team teaching. However, there is, to the best of our knowledge, no appropriate instrument available to capture these important dimensions. Therefore, it is necessary to first develop an instrument to measure these important dimensions. It is hypothesised that it is possible to develop such an instrument. Next, this study aims to investigate differences between groups of teachers regarding these dimensions of the practice of team teaching. More specifically, differences between groups of teachers based on (a) teaching experience, (b) education type, and (c) frequency of team teaching. Following [22], it is assumed that the degree of collaboration and shared responsibility depends on the practice of team teaching. However, there is no empirical research that demonstrates in what ways this occurs. This study aims to fill this research gap and thus contribute to increasing knowledge in this area. In addition, this study provides valuable insights that can inform practice and policy regarding team teaching.
In accordance with the purposes of this study, three research goals (RG) are formulated: • RG1: Development of an instrument to capture collaboration and shared responsibility in team teaching; • RG2: Providing empirical evidence on collaboration and shared responsibility in the practice of team teaching; • RG3: Investigating whether differences exist between groups of teachers regarding important dimensions of the practice of team teaching.

Method
The threefold purpose of this study is similarly reflected in the method. First, an instrument to measure collaboration and shared responsibility in team teaching was developed using four phases. Second, empirical evidence is provided on these two dimensions of the practice of team teaching. Third, differences in teachers' practice of team teaching across several groups of teachers (i.e., based on (a) teaching experience, (b) education type, (c) and frequency of team teaching) were explored, using two-sample t-tests and one-way analysis of variance (ANOVA), after tests for measurement invariance. To address the first research goal, a measurement instrument was developed and descriptive statistics were used. The instrument (i.e., the CSTT scale) was developed in four phases, a procedure inspired by Gehlbach and Brinkworth [43]. First, based on the research literature on team teaching, a list of items for each dimension (i.e., collaboration and shared responsibility) was generated. Second, an expert review was conducted with a panel of 12 teachers, teacher educators, and researchers. Third, a pilot study was conducted with 20 team teachers in order to improve content validity. Fourth, a validation and reliability study was carried out with 555 teachers, based on exploratory factor analyses, confirmatory factor analyses, and internal consistency analyses.

Phase 1-Preliminary Version
The preliminary version of the CSTT scale consisted of 20 items organised in two scales: collaboration (12 items) and shared responsibility (8 items). For the dimension of collaboration, five items were based on items of the Student Teachers' Team teaching Perceptions Questionnaire (STTPQ) from De Backer, Simons [33]. Seven items were inspired by conditions for successful collaboration, derived from the literature review about student teachers' team teaching by [14]. For the dimension of shared responsibility, four items were based on items of an instrument developed by Vangrieken, Grosemans [44]. Four additional items were newly constructed to capture other key tasks in the classroom context, such as evaluating the lesson, but also school outcomes of students (i.e., learning outcomes, wellbeing, and motivation). A 5-point Likert scale was used, ranging from 'I totally disagree' (0) to 'I totally agree' (4).

Phase 2-Expert Review
An expert review of the items was conducted to assess the content validity of the survey by requesting detailed responses concerning clarity, relevance, and quality of items. The expert panel consisted of 12 experts from the field of education (1 teacher, 4 teacher educators, and 7 researchers). One item (i.e., 'During team teaching classes, I feel that my team teaching colleague(s) and I work together efficiently.') was deleted within the scale of collaboration, because the item was too general and was covered by several other items. Furthermore, one item of the scale of shared responsibility was reworded to enhance item clarity (i.e., 'reflection' instead of 'evaluation'). Lastly, the sequence of items was modified; the last five items of the scale for collaboration were placed first, because they are more straightforward items to answer.

Phase 3-Pilot Study
After the measurement instrument was developed and reviewed by experts, a pilot study was conducted. The pilot study was undertaken over a three-week period (from 16 March to 6 April 2022) with 20 team teachers. Out of the 20 participants, 17 were female. The mean age of the participating team teachers was 35 years old (standard deviation (SD) = 6.6; Range: 23-44) and they had on average 9 (SD = 6.3; Range: 0-20) years of experience in teaching. Two teachers worked in pre-primary education, eight in primary education, eight in secondary education, and two in adult education. Participants were asked to fill out the instrument, and afterwards they could leave written remarks. No major changes were made after the pilot study. The instrument so far consisted of 11 items for collaboration and 8 items for shared responsibility.

Phase 4-Validation and Reliability Study
A total of 555 participants in 86 Flemish (the Dutch speaking part of Belgium) schools completed the survey. The survey was conducted from March 2022 to June 2022. Data were collected by a convenience sample procedure. All Flemish schools, encompassing preschool, primary, secondary, and adult education, were contacted by e-mail with information about the purpose and the design of the study and asked to participate. In schools that agreed to participate in the study, the survey was administered to all teachers with team teaching experience. Teachers who indicated that they never engaged in team teaching during a course were not included in this study. The online platform Qualtrics was used and informed consent was obtained from all participants. The data was subjected to a rigorous cleaning process within the R statistical environment. Cases in which participants were missing data for the collaboration dimension (11 items) and the shared responsibility dimension (8 items) were identified and subsequently excluded from the analysis. This resulted in the removal of 101 participants from the analyses.
Most participants are female (85%), have a bachelor degree (86%), and are working full-time (70%) as a teacher. Their age differs from 22 to 62 years, with a mean age of 39 years (SD = 10.37). The 555 participants have a mean experience of 15 years (SD = 10.67), ranging from 0 to 41 years. More details are shown in Table 1. Data of 555 participants were used to validate the measurement instrument. Data were analysed by conducting (1) exploratory factor analyses (EFA) to examine the factor structure (i.e., number of factors), (2) confirmatory factor analyses (CFA) to assess the stability of the factor structure, and (3) reliability analyses-based on Cronbach's alpha-to determine the internal consistency of the factors. These subsequent steps of scale construction are based on recommendations of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education (AERA, APA, NCME) [45].
To check the normality, the skewness and kurtosis of each item was screened [46,47] (see Appendix A; C = Collaboration; SR = Shared Responsibility). According to Kline [46], there may be a concern for non-normality if the absolute values of skewness are greater than three and if the absolute values of kurtosis are greater than ten. This is not the case, thus, the data in the present study could be considered as normal. However, based on visual inspection of the distribution, some items tended to differ from the normal distribution. Therefore, an estimator was used that calculates robust standard errors. In this study, the weighted least squares were used. The correlations between the items are positive and range from 0.17 (i.e., C11-CS3) to 0.86 (i.e., CS7-CS8), measuring that one item tends to increase when the other increases.
The Kaiser-Meyer-Olkin coefficient of sampling adequacy [48] and the Bartlett's test of Sphericity [49] were used to assess suitability of the data for factor analysis. The Kaiser-Meyer-Olkin coefficient for this dataset is 0.94, exceeding the recommended value of 0.60 [50]. Additionally, all the KMO values for the individual items are >0.86. The Bartlett test reached statistical significance (X 2 = 8186.78, df = 171, p < 0.001). Both measures indicate that properties of the correlation matrix justified factor analysis being carried out.
In order to minimise the chance of overanalysing the data, the total sample (n = 555) was randomly divided into two equal groups using the odds and evens split method [47]. More specifically, the dataset was split up in two random subsamples in which all team teachers were equally represented in order to create a development (n1 = 277) and a validation sample (n2 = 278). Thus, EFA and CFA, respectively, could be carried out on separate subsamples. Two-sample t-tests were conducted to examine possible significant differences between the two subsamples. Both subsamples were equivalent with regard to gender (i.e., male and female) (t(552) = −0.710, p = 0.478), age (t(552) = 1.083, p = 0.279), and teaching experience (t(553) = 0.834, p = 0.405).
The EFA was conducted on the data of the first subsample (n1 = 278) to identify the number of latent variables underlying the measured items without strong theoretical assumptions on how many factors existed [50]. Because team teachers' collaboration and shared responsibility tend to correlate [22] and the preliminary analysis confirmed factor dependence, an exploratory weighted least squares factoring analysis was performed, with a direct oblimin rotation (oblique rotation technique) [50]. The latter allows factors to be correlated and produces estimates of correlations among factors [51]. In order to perform the EFA, the psych package [52] in R was used.
As advised by O'Connor [53], several statistical criteria were used to determine the number of factors to withhold. The Kaiser criterion [54] to retain eigenvalues bigger than one, and Cattell's scree test [55] were carried out. Since both criteria sometimes overestimate the number of factors to withhold [56], Horn's parallel analysis [57] and Velicer's Minimum Average Partial (MAP) technique [58,59] were also conducted.
In addition, item factor loadings were screened. Following the recommendations of Hair, Black [60], all items with loadings of less than 0.70 were excluded from further analyses. Furthermore, all items in the retained factor solution with strong cross-loadings on other factors (i.e., when the gap between the primary target loading and the crossloading is smaller than 0.25) were also removed [47]. These items are affected by more than one factor and are thus deemed too intricate [47].
A confirmatory weighted least square mean and variance adjusted factoring analysis was performed on the data of the second subsample (n2 = 277) to assess the stability of the proposed factor structure of the earlier conducted EFA [50]. For the CFA, several fit indices were calculated to determine whether the proposed factor structure of the EFA fits the empirical data. In order to conduct the CFA, the lavaan package in R [61] was used.
The following fit indices were evaluated: the χ 2 -test and the associated p-value, the χ 2 /df ratio, the comparative fit index (CFI) [62], the Tucker-Lewis index (TLI) [63], the root mean square error of approximation (RMSEA) [64], and the standardised root mean square residual (SRMR) [65]. Chi-Square value along with its p-value and degrees of freedom is reported but disregarded, due to limitations of the Chi-square test [66]. For the χ 2 /df ratio, a value ≤ 3 determines an acceptable fit [67]. Furthermore, CFI and TLI scores ≥ 0.90 indicate adequate fit, while scores of ≥0.95 indicate a good fit [68]. Following Hu and Bentler [68], cut-off values of ≤0.06 and ≤0.08 for RMSEA and SRMR, respectively, indicate a good fit.
Cronbach's alpha was calculated on the data of the complete sample (n = 555) as a measure of internal consistency in order to determine the psychometric quality of the scales. Factors with a Cronbach's alpha of 0.80 are considered reliable [47]. In order to perform the reliability analyses, the psych package in R [52] was used.

Empirical Evidence on Collaboration and Shared Responsibility in the Practice of Team Teaching (RG2)
To gain a first empirical insight into the dimensions of collaboration and shared responsibility in the practice of team teaching, descriptive statistics were used. The mean scores of the dimensions, their standard deviation, and their correlation (Chronbach's alpha) were calculated. Furthermore, a paired samples t-test was conducted to assess if there was a significant difference between the mean scores of the dimensions. The psych package in R [52] was used.

Differences in Teachers' Practice of Team Teaching across Several Groups of Teachers (RG3)
To address the third research question, two-sample t-tests and one-way analysis of variance (ANOVA) were used, after tests for measurement invariance. Multiple group measurement invariance based on multigroup confirmatory factor analyses was performed to test whether the factor structure of the developed measurement instrument is invariant across (a) teaching experience, (b) education type, and (c) frequency of team teaching.
To verify whether the developed instrument measures the same constructs, with the same structure across (a) teaching experience, participants are divided into two groups based on their teaching experience (i.e., teachers with less than five years of experience, teachers with more than five years of experience). Five years is chosen as a turning point, as this is also common in previous research (e.g., [69,70]). Regarding (b) education type, participants of four types are surveyed: pre-primary, primary, secondary, and adult education. The third group characteristic studied was (c) frequency of team teaching, indicating the frequency the participant uses team teaching. For this characteristic, the tipping point of once a week was opted for. Teachers who apply team teaching less than once a week and teachers who use team teaching more than once week are compared. At each step of the analysis, a series of factor models were estimated, which impose increasing constraints on the parameters for the groups. Before testing measurement invariance, a baseline model was determined for each group [71]. The two-factor structure of the CSTT scale served as the initial model that was tested when creating the baseline models for each group.
Four levels of measurement invariance are described (from less constrained to more constrained): (1) configural invariance, (2) metric invariance, (3) scalar invariance, and (4) strict invariance [72]. First, the baseline model was tested for equivalent factor structures (i.e., (1) configural invariance). It was tested by specifying the same measurement model across the groups. In this model, both the number of factors and the factor-indicator correspondence are the same, but all factor loadings and item intercepts are freely estimated within each group [46]. If only configural invariance is established, it would indicate that teachers conceptualise constructs (i.e., collaboration and shared responsibility) similarly, but it would not guarantee that individual items are interpreted in the same way. In the second model, the factor loadings were required to be equivalent across the groups [46], while the item intercepts were allowed to vary freely. Metric invariance (2) indicates that factor loadings are equal across groups and that items are, therefore, interpreted in a similar way. The third model constrains, additionally, the item intercepts to be equivalent across the groups [46]. If (3) scalar (or measurement) invariance is achieved, differences in means of the observed items can be interpreted as a consequence of the differences in the means of the latent constructs. The (4) strict invariance model is the constrained version of the scalar model, where the factor loadings, intercepts, and residual variances are fixed across groups [73]. The multiple group measurement invariance is analysed using the multi-group CFA with weighted least square mean and variance adjusted [74]. The analyses are carried out in the lavaan package in R [61].
To compare the different models, changes in CFI, RMSEA, and SRMR were evaluated. According to Cheung and Rensvold [75] and Chen [76], ∆CFI should be smaller than or equal to 0.010 and ∆RMSEA smaller than or equal to 0.015. Additionally, Chen [76] also suggests a criterion of changes in ∆SRMR of 0.030.
When measurement invariance is established across (a) teaching experience, (b) education type, or (c) frequency of team teaching, it is possible to compare across these groups mean sum scores for collaboration and shared responsibility. To assess whether there are significant differences between these groups, two-sample t-tests and an ANOVA with Bonferroni post hoc tests were performed. When a significant difference was identified, effect sizes (Cohen's d effect size index (d) and Omega Squared (ω2)) were additionally calculated in view of interpreting the importance of the analysis results. All data analyses were conducted in R, using the psych [52] and rstatix package [77].  To interpret the factor structure, an exploratory weighted least squares factoring analysis (n1 = 278) with a direct oblimin rotation was applied. The item factor loadings of the two-factor solution, with the sum of squared loadings (SS) of 6.753 (Factor 1), and 4.167 (Factor 2) were examined. Of the 19 items, four have a factor loading of less than 0.70 (i.e., C10, SR2, SR7, and SR8). No item has a cross-loading of more than 0.25 with the other factor. Thus, 15 items were retained. The items that were deleted are shown in italics in Table 2.  To interpret the factor structure, an exploratory weighted least squares factoring analysis (n1 = 278) with a direct oblimin rotation was applied. The item factor loadings of the two-factor solution, with the sum of squared loadings (SS) of 6.753 (Factor 1), and 4.167 (Factor 2) were examined. Of the 19 items, four have a factor loading of less than 0.70 (i.e., C10, SR2, SR7, and SR8). No item has a cross-loading of more than 0.25 with the other factor. Thus, 15 items were retained. The items that were deleted are shown in italics in Table 2.
A confirmatory weighted least square mean and variance adjusted factoring analysis was performed with the second subsample (n2 = 277) in order to confirm the number of factors found in the EFA and determine whether they are independent or related to each other. A confirmatory weighted least square mean and variance adjusted factoring analysis was performed on the two-factor structure with 15 items resulting from the EFA (see Table 3).  Several fit indices were calculated to determine whether the proposed factor structure of the EFA fits the empirical data: X 2 = 108.094, df = 89, p = 0.082; X 2 /df = 1.213; CFI = 0.975; TLI = 0.971; RMSEA = 0.028; SRMR = 0.039, with all fit indices meeting the generally accepted norms for CFA [68]. The results of the CFA show a good fit for the initial two-factor model with collaboration (10 items) and shared responsibility (five items) as factors. Collaboration is defined as the joint interaction in the group in all activities that are needed to perform a shared task [4]. Shared responsibility means that colleagues create a common sense of responsibility for all students' learning [39].
A reliability analysis was performed on the complete dataset (n = 555) to examine the internal consistency of the two factors (i.e., collaboration and shared responsibility). The newly constructed scale is found to be highly reliable, with Chronbach's alphas of 0.949 and 0.879, respectively, for collaboration and shared responsibility as two important dimensions of the practice of team teaching (see Table 3).

Empirical Evidence on Collaboration and Shared Responsibility in the Practice of Team Teaching (RG2)
Descriptive statistics showed teachers reported high scores on both dimensions. The dimension collaboration (10 items) has a mean score of 3.54 (SD = 0.587) on a scale of 0 to 4, and for the dimension shared responsibility (five items), a mean score of 3.05 (SD = 0.880) was found. Since the correlation between both dimensions is 0.433, a high score for collaboration corresponds to a high score for shared responsibility, and vice versa. These results indicated that teachers experience a high degree of collaboration, and a high degree of shared responsibility as two related dimensions of the practice of team teaching. Additionally, a paired sample t-test demonstrated there is a significant difference between the mean score for collaboration and the mean score for shared responsibility (t(554) = 14.167, p < 0.001). Thus, teachers reported a significantly higher score for the dimension collaboration than for the dimension shared responsibility.

Differences in Teachers' Practice of Team Teaching across Several Groups of Teachers (RG3)
The current study attempted to establish scalar invariance for (a) teaching experience (i.e., teachers with less than five years of experience, teachers with more than five years of experience), (b) education type (i.e., pre-primary, primary, secondary, and adult education), and (c) frequency of team teaching (i.e., teachers who team teach less than once a week, teachers who team teach more than once a week).
Based on the establishment of measurement invariance across (a) teaching experience, (b) education type, and (c) frequency of team teaching, it is possible to compare the mean sum scores across these groups.
The mean scores for collaboration and shared responsibility between teachers with less than five years of experience (n = 127, M c = 3.51, M SR = 2.98) and teachers with more than five years of experience (n = 428, M c = 3.55, M SR = 3.07) were compared, using a two-sample t-test. Results indicate no significant differences between the two groups for both collaboration (t(210.068) = −0.750, p = 0.454) and shared responsibility (t(212.881) = −0.963, p = 0.337). This means that teachers with less than five years of experience report the same extent of collaboration and shared responsibility in comparison with teachers with more than five years of experience.
Furthermore, one-way analyses of variance with Bonferroni post hoc tests were used to compare the mean scores between teachers of pre-primary (n = 100, M c = 3.54, M SR = 3. . This means that teachers from pre-primary, primary, secondary, and adult education report the same extent of collaboration and shared responsibility. Subsequently, the mean scores for collaboration and shared responsibility between teachers who team teach less than once a week (n = 148, M c = 3.60, M SR = 3.14) and teachers who team teach more than once a week (n = 407, M c = 3.39, M SR = 2.81) were compared, using a two-sample t-test. Results indicate significant differences between the two groups for both collaboration (t(216.279) = 3.252, p = 0.001, d = 0.35) and shared responsibility (t(237.401) = 3.737, p < 0.001, d = 0.38). This means that teachers who team teach less than once a week report a significantly lower score for collaboration and shared responsibility in comparison with teachers who team teach more than once a week. Moreover, Cohen's d effect size index indicates small differences between the two groups of teachers [78].

Discussion
The impact of team teaching is determined primarily by how it is put into practice [20]. In order to conduct further and more in-depth research on team teaching [21], it is necessary to have an instrument that can map that effective realisation of team teaching. Therefore, the Collaboration and Shared Responsibility in Team Teaching (CSTT) scale was developed. EFA, CFA, and reliability analyses based on a large-scale cross-sectional survey dataset (N = 555) allowed the identification of two factors: collaboration (10 items, α = 0.951) and shared responsibility (5 items, α = 0.879). Collaboration is defined as the joint interaction in the group in all activities that are needed to perform a shared task [4]. Shared responsibility means that colleagues create a common sense of responsibility for all students' learning [39]. The two-factor structure does fully align with the theoretically assumed two-dimensional structure [22]. The CSTT scale makes it, therefore, possible to assess collaboration and shared responsibility as two important dimensions of the team teaching practice. The development of the CSTT scale represents an advancement in the ability to assess and understand the subtleties of the team teaching practice. This scale serves as a specific tool to systematically assess the multifaceted aspects of collaboration and shared responsibility, two crucial dimensions that define the effectiveness of team teaching. In short, the CSTT scale serves as a lens through which it is possible to identify strengths as well as areas for improvement within the practice of team teaching. Its development enriches the toolkit available to both researchers and teachers. As a result, this study fills a gap in research and also enables teachers to develop more effective practices of team teaching.
Next, the first empirical insight into the practice of team teaching was provided. The results show that teachers with team teaching experience report a high degree of collaboration, and a high degree of shared responsibility. This means that teachers can count on each other for questions and concerns and give each other emotional and professional support. They mutually trust and respect each other, are open to reflection, and give each other feedback. It also implies that teachers are both responsible for the course or courses, and for their students' learning outcomes, well-being, and motivation. Previous research agrees that collaboration and shared responsibility can have a major impact on both teachers and students. For instance, the review study of Vangrieken, Dochy [4] shows that although achieving teacher collaboration proves challenging, it has many benefits for teachers and students, but also for the school. A recent study by Berry [79] indicates that a shared sense of responsibility for the education of students with disabilities can have positive effects on both teachers and students. It is particularly encouraging to note that teachers report high levels of collaboration and shared responsibility, since these are considered in the research literature as two important dimensions of the practice of team teaching [22]. Although most teachers report a high score for both dimensions of the practice of team teaching, there are also team teachers who report a lower score for a particular dimension, or even for both dimensions. The lower level of collaboration and shared responsibility could be explained by the team teaching model used. The models of team teaching represent the ways in which team teaching is established in the classroom (e.g., observation model, parallel model, teaming model). For instance, the observation model would imply a lower level of collaboration and shared responsibility, compared to the teaming model [14]. In the observation model, one teacher observes while the other teacher teaches the course [13]. The focus of the observation is on the students. Further research measuring the relationship between the two could address this. The question could be raised whether it is necessary for both dimensions to score high in order to speak of quality team teaching. In our view, a lower score for one or both dimensions is not necessarily a negative sign. In this respect, it is important to emphasise that this measurement instrument is not normative, but rather seeks to reflect important dimensions of team teaching, without being all-encompassing.
Furthermore, tests for measurement invariance are reported on the two factors in the CSTT scale, providing support for configural, metric, scalar, and strict invariance by length of (a) teaching experience, (b) education type, and (c) frequency of team teaching. This means that teachers across these groups interpret the developed measurement instrument in a consistent manner. Therefore, it can be stated that the CSTT scale is a solid and robust instrument to be used with both experienced and less experienced teachers, with teachers from pre-primary, primary, secondary, and adult education, and with team teachers with both a low and a high frequency of team teaching. The CSTT scale is a 15-item scale, including 10 items to measure collaboration and five items for shared responsibility. It can be stated that its application is simple and fast, and it can be useful as a diagnostic measure, allowing the assessment of teachers' practice of team teaching. The CSTT scale has important implications for planning teaching and learning activities that contribute to improving the practice of teaching with respect to team teaching. For example, a team teaching team could use this scale as a tool to talk about their collaboration and shared responsibility as a team. If one or more teachers report, for instance, low(er) scores on collaboration compared with others, this may indicate a need to talk about it. To go deeper into conversation, even items can be discussed more concretely. A fairly low score on the item about discussing experiences openly could spark a conversation.
As measurement invariance of the CSTT scale is established across (a) teaching experience, (b) education type, and (c) frequency of team teaching, differences between these groups could be examined. Results indicate that there are no significant differences between the groups based on (a) teaching experience and (b) education type for both collaboration and shared responsibility. There are, however, significant differences between groups in terms of the (c) frequency of team teaching. Teachers who team teach less than once a week experience less collaboration and shared responsibility with their team teaching colleague(s), compared with teachers who team teach more than once a week. This finding suggests that teachers who frequently engage in team teaching experience more collaboration and shared responsibility.
Having conducted one of the first large-scale quantitative survey studies on team teaching, this study presents an instrument (i.e., the CSTT scale) to measure two important dimensions of the practice of team teaching. The development of this instrument is an important contribution to the field as it makes all kinds of new avenues of research possible to further investigate the practice of team teaching. Further research can, for example, investigate the relationship between the practice of team teaching and teachers' effective teaching behaviour. The first empirical insights show that team teachers experience a high degree of collaboration and shared responsibility. Additionally, the frequency of team teaching influences these dimensions of the practice of team teaching.
This study is not without limitations. First, although there are good theoretical reasons to believe that the practice of team teaching can be further conceptualised as collaboration and shared responsibility [22], other conceptualisations are also possible. For example, further research could take other dimensions such as team similarity, team efficacy, and team potency into account, as team teaching is a complex concept.
Second, it is necessary to be aware that the outcomes of the CSTT scale remain selfreported data. This implies that teachers' answers may have been influenced by social desirability, as is a risk with any form of subjective data collection [80]. However, through-out the process of survey development and administration, several steps were taken to reduce social desirability bias. This included an expert review and a pilot study. Future research should combine this data with other data collection methods, such as observation or interview data. Results from other data collection methods can verify the validity of the CSTT scale.
Third, although the sample met all criteria required to develop the questionnaire, it solely consists of Flemish schools. This limits our claims to the generalisability of the questionnaire and the results to other contexts. Therefore, future research is encouraged to translate, adapt, and validate the CSTT scale in other educational settings. Moreover, the translation of the CSTT scale into different languages and its validation in different contexts will offer opportunities for additional and comparative research on the practice of team teaching in other regions and contexts. To facilitate this, the original Dutch version and an English translation are included as an Appendix. Authors should discuss the results and how they can be interpreted from the perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted. To adapt and use this scale in different cultures, contexts, and countries, researchers are advised to adopt a systematic process that takes into account both linguistic and cultural nuances. This procedure requires an extensive reiteration of the previously completed steps of this study, carefully considering linguistic and cultural nuances. Initially, the scale should be translated so that the essence of its constituent items is preserved. Next, pilot and/or expert testing is crucial to uncover possible language or comprehension problems. Finally, psychometric assessments must be conducted to determine the reliability and validity of the adapted scale.

Conclusions
This manuscript reports on the development of the Collaboration and Shared Responsibility in Team Teaching (CSTT) scale. The CSTT scale is an instrument to measure collaboration and shared responsibility, as two important dimensions of the team teaching practice. The first empirical evidence shows overall high scores on both dimensions. Further results indicate that there are no significant differences between the groups based on (a) teaching experience and (b) education type for both collaboration and shared responsibility. There are, however, significant differences between groups in terms of the (c) frequency of team teaching. In sum, the CSTT scale is a solid and robust instrument, which can be useful as a diagnostic measure to assess teachers' team teaching practice.