Development of an ESD Indicator for Teacher Training and the National Monitoring for ESD Implementation in Germany

Education for Sustainable Development (ESD) is a core element of UNESCO’s Sustainable Development Goal (SDGs) Target 4.7, which seeks to ensure that all learners acquire the knowledge and skills needed to promote sustainable development through education for sustainable development. The German Federal Ministry for Education and Research (BMBF) followed suit in 2015 and launched a high scale national monitoring of the current state of ESD implementation. In this context, suitable ESD indicators should be analyzed to inform policy and research agendas. The present project is part of the national monitoring within Germany’s Global Action Program (GAP) actions. The research team at the University of Education in Freiburg conducted a study to evaluate the accessibility of data and the measurability of ESD-relevant teacher training (TT). During the two-step procedure for data collection on ESD-relevant TTs in Germany, an extensive list of ESD related search terms first captured 66,935 TTs with possible ESD relevance in the evaluation period. Second, the collected data was analyzed using Mayring’s qualitative content analysis. The 66,935 TTs were thereby reduced to 3818 TTs with different degrees of ESD relevance. The results of the evaluation study show that suitable ESD indicators, the FESD (formula for the ESD-indicator for TTs) (basic), FESD (basic, rated) and FESD (pro), could be developed and calculated for 15 of 16 federal states in Germany. The gathered insights show a path towards ESD monitoring in TT to clarify the needs and achievements of ESD implementation in the field of continuing education of teachers. However, the presented indicators only show a possible path for ESD indicator development. A comprehensive set of ESD indicators should also focus on the micro or output (e.g., ESD competencies) level. These insights for the future seem worth striving for not only in Germany or on the national level but also internationally to foster ESD, Target 4.7 of the SDGs and the SDGs in general.


Introduction
ESD is placed at the center of the 2030 Sustainable Development Agenda and has been widely recognized as a key enabler of sustainable development and an integral element of quality education [1] (p. 4).This agenda is an equally important focus of European and transnational endeavors connected to the SDGs and ESD.These national endeavors, processes, and analyses are not only 'glocally' relevant, they contribute equally to a general understanding of the challenges, needs, and success criteria of the ESD implementation process in general (e.g., [2]).Accordingly, we present the project located in Germany, ESD_indicator_teacher training, placed in the broader concept of the national monitoring actions of ESD, hoping that the project contributes to a broad transnational understanding of ESD implementation efforts, measurement possibilities, and remaining challenges.In 2015, the German Federal Ministry for Education and Research (BMBF), established a multi-level implementation structure to realize the GAP on ESD.It includes a steering body (the national platform), six expert forums (early childhood education, school, vocational education and training, higher education, informal and non-formal learning/youth and local authorities), a youth forum, and 10 partner networks consisting of practitioners in the different thematic and educational fields of ESD (such as early childhood education and higher education but also biodiversity or media).The main bodies of the structure-national platform, expert forums as well as partner networks-are designed to facilitate the information flow between top-down and bottom-up hierarchies and within the vertical structures of the complex system of actors involved in ESD governance in Germany (for ESD governance, see [3,4]) and to connect (inter)national and sometimes abstract political aims with concrete educational practices.No central or national German educational policy exists.Given the importance of the federal states concerning education in Germany, representatives of the single federal states are included in almost all the expert forums.The implementation process is informed by an international advisor at the German UNESCO Commission and a scientific advisor at the Institut Futur, located at the Freie Universität Berlin.The department of the scientific advisor is also responsible for the wide-scale national monitoring of the current state of ESD implementation with the aim of informing the stakeholders involved in the GAP structure, the scientific community, and the public about the status quo of ESD in Germany and effective measures for upscaling it.This implementation structure of the GAP addresses what Laessøe and Mochizuki [5] have identified as missing elements in the actual governance efforts related to ESD, i.e., a lack of concrete national action plans, curricula frameworks and guidelines, fragmentation, a lack of coordinating bodies, a lack of systematic monitoring and evaluation of ESD policy efforts, as well as what actually happens in practice (see [5], p. 38).Consequently, a consensus has formed among stakeholders for the need of ESD operationalization and a major desideratum to develop quantitative indicators (see [6],p.85).The aim to develop indicators for ESD has been explicitly voiced since the beginning of the UN Decade for ESD and first attempts for its operationalization have been suggested (see, e.g., [7][8][9]).However, many tasks on how to generate a set of ESD indicators that are specific, measurable, attainable, realistic, and timely [10] (p.35) remain incomplete.Moreover, preceding analyses on ESD indicators have mostly drawn on either qualitative interviews or state government reports [8,9,11].Additionally, these studies mainly had a macro focus or relied on self-report surveys with dichotomous scale (yes/no) formats (e.g., see [12][13][14][15]).
This article seeks to contribute to the emerging research agenda on how to measure ESD implementation.Agreeing on standardized ways of operationalizing ESD is crucial, given that the operationalization can be understood as being "influenced by international as well as unique socio-political and cultural contexts which make them [ . . .] more or less floating concepts rather than consistent and fixed concepts" [5] (p.32).This statement mirrors the level of operationalization that is seen as an important characteristic of ESD, that is, standardization should address 'glocality'.When operationalizing ESD, the definition must incorporate a large-scale or international comparability as well as aspects that reflect national priorities or other local specifics are important (see [9], p. 100).
On the content-level, ESD is linked to an idea of universal concepts and human rights applicable to everyone, but at the same time concrete links to the local circumstances (and lifeworlds) should also be taken into account (see [16], p. 28).Although discussions about the openness or vagueness of the concept are ongoing, the increasing research and monitoring efforts of ESD (indicators) provide a chance to not only enhance the visibility of the concept but to bring about a new level of concreteness by means of indicators.In that context and on an international level, the observation is that "the lack of common visions for SD and ESD hampers the definition and the selection of common patterns for ESDI" (ESD Indicators, the authors) ( [9], p. 102).Clearly, a certain level of openness is essential for ESD implementation, given the participatory demand of the concept and its applicability to cultures and local contexts (see, e.g., [16]).Thus, the increasing efforts to monitor and evaluate ESD must carefully address preserving the necessary contingency of the concept, while concurrently effectively contributing to upscale it by operationalizing the core ideas of where and what ESD should be.
Against this background, the University of Education in Freiburg conducted an evaluation study to analyze accessibility of data and the measurability of ESD in the specific field of in-service TTs accredited by the states.The databases for TT therefore equally include extracurricular TTs from the non-formal sector or TTs cooperation between the states and NGOs (if accredited by the states).The 16 federal states are each responsible and, to a large extent, sovereign in terms of educational policy, which is why the TTs are also listed in non-centralized databases of each federal state.Through collecting primary data, the project, ESD_indicator_teacher training, aimed to answer the following two research questions.RQ 1: How accessible is the relevant data on TTs in the 16 German federal states for continuous monitoring?RQ 2: Is it possible to develop a replicable and applicable ESD indicator for teacher training (as part of a larger, longitudinal set of ESD indicators)?
To situate these questions in the broader framework of the ESD implementation process in Germany, Section 2 provides a brief status of the national ESD monitoring process, and then the theoretical implications for (ESD) indicators (and TT in ESD) are discussed.Section 3 describes the method, the development of several potential indicators for ESD-relevance in TTs, and the analysis.Section 4 contains the findings and the proposed indicators: FESD (basic), FESD (basic, rated) and FESD (pro).Section 5 concludes the study.Section 6 provides an outlook and the results of the indicator development and their implications are discussed.

Broader Framework of National Monitoring for ESD
National as well as international trends of quantifying and measuring can be observed in the context of education [17,18], critical [19], for example, the visibility and influence of the Program for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS).
The same trend applies to the context of sustainability-related research, for example, the indicator development of the Sustainable Development Goals (SDGs) [20] and the German Sustainable Development Strategy [21] (pp.146-147).This trend also expands to the field of ESD, e.g., [22,23].These tendencies can be framed and explained in the context of the growing importance of evidence-based policy making, which deepens cooperation between research and policy making and the role of research outcomes for political decision making and practice [24][25][26][27].The described trends contribute to the growing demand of monitoring efforts in education, which can be understood as an indicator-based, systematic observation of different aspects of an educational system ( [28], p. 163).This focus on measurability underscores the crucial role of the development of adequate indicators, especially for complex phenomena such as educational processes, particularly for ESD or sustainability in general [13,[29][30][31][32][33].
The implementation of the UNESCO GAP on ESD in Germany is monitored with the aim of providing robust information about the extent and quality of the ESD implementation.The complexity of ESD monitoring in six different educational areas mirroring the expert forums (early childhood education, school, vocational education and training, higher education, informal and non-formal learning, and local authorities) provides ample opportunity for scientific advice on effective national and regional strategies for further upscaling ESD in Germany.Additionally, the monitoring enables the crosscutting approach to become a vital part of the structures underlying educational systems.
The general monitoring process stretches over a period of three years and comprises four research phases.During the first phase, the main aim was to provide an overview over the use of different ESD-related concepts (for more details, see [34]) within different groups of central documents for five out of the total of six educational areas (excluding informal learning because of a lack of comparable documents in this field of education).The encompassing monitoring process reflects not only the need for a differentiated view, according to the educational areas, but also to balance and complement the strengths and drawbacks of the various methodological approaches of grasping ESD.
The research in the different monitoring phases was based on national [7,8] and international indicator propositions, for example, on the suggestions of UNECE (see, e.g., [11]), which include information about whether, for example, the proposition refers to ESD in national policy documents, in curricula of formal education, or in teacher training [35][36][37][38].The indicators for that broader area of national monitoring have been partly adapted to fit the specifics of the six educational fields (for the set of indicators see: [34] (p.[4][5].This monitoring has been set up as an encompassing process not only to fulfill the specifics of the different educational areas but also to go beyond a merely descriptive approach and to also include explanatory aims to better grasp the diffusion process of ESD.This research at the science-policy interface, therefore, was conducted to provide an overview of the status quo of ESD in Germany and to provide recommendations of how to accelerate the uptake of ESD at the structural level of educational systems.The results of the first monitoring phases show that a great need still exists for advancing the indicators on ESD to address the different educational areas and concurrently base them on suitable datasets (e.g., that show a certain dynamic and are meaningful for ESD).
The national ESD monitoring in Germany is the broader context in which the TTs indicators described in this article have been developed.The study on TTs was commissioned by the scientific advisor of the GAP as a part of the national monitoring of ESD in Germany.The underlying theoretical and methodological orientations of the evaluation study for the ESD_indicator_teacher training will be described in the following section.

Theoretical Implications for the Indicator Development
For the present project, we used the indicator definition given by the authoring group for educational reporting in Germany.This definition nonetheless also comprises the purpose of indicators in a global perspective.In their report on the development of indicators for educational reporting, they defined indicators as quantitative tools (or proxy variables) providing a simple and comprehensible status report on the quality or the state of art of a more complex, usually multi-dimensional system [39] (p.15).This definition, however, fails to capture another important aspect of indicators-their use as a tool or policy instrument to show trends and development tendencies (e.g., [40], p. 234).Thus, indicators "are considered to provide condensed information that can be transformed into knowledge relevant for decision making" [41], which is why they are equally used in educational monitoring to assure quality education or other relevant policy fields, e.g., [9,14,[42][43][44][45][46][47][48][49].
Important preconditions of indicators are that they need to be seen as comprehensive, credible, and fair in the eyes of their recipients (see [50] pp.[23][24].In developing the method of this study, we placed special emphasis on the target of developing an indicator that is "based on adequate samples, [and has] appropriate levels of reliability, good validity, and above all, positive reactivity" ([51], p. 3).Other important aspects are the criteria of applicability and acceptance.To ensure that these criteria are addressed, many stakeholders, including all spokespersons for ESD and TTs from the 16 federal ministries of educations, have been included in the process of developing the national ESD indicators.Given that the discourse on how to operationalize and define ESD is still an ongoing debate in research, all stakeholders need to be committed to a high level of transparency from the outset in the development the indicators.
Compared to the ongoing discussions on different ESD definitions, a consensus exists on the essential role of teachers in the ESD implementation (see e.g., [52], p. 172) and that teacher professional development can be an effective way to enhance and ensure the quality of schooling [53][54][55].
Additionally, Lipsky's classical oeuvre on implementation research drew attention to the distinctive role of teachers (called street ministers of education) as policy makers on the ground (see [56], p. 12).This conceptualization of policy makers is equally important in the context of indicators relevant for policy making, discussed later in Sections 5 and 6.In line with this statement is the assumption of the essential role of the teacher as change agent in ESD development.However, ESD research investigating the state of the art and effects of ESD, especially in the first phase of teacher education is limited (e.g., see [57][58][59]).Few studies have conducted research on in-service teacher education courses related to ESD, conceptualized TT for ESD, or developed assessment tools, see e.g., [15,60].ESD research needs to follow the example of other research disciplines in education, especially the research field of vocational education and training (i.e., continuing education of teachers).Considering these aspects, Lipowsky's [61] and Huber's [62] findings on the importance and success criteria of TT served as a general theoretical background for the presented study.
In the context of ESD, the role of the teacher is equally stressed by the UNESCO Roadmap that states that "[e]ducators and trainers [as] powerful agents of change for delivering the educational response to sustainable development.However, for them to help usher in the transition to a sustainable society, they must first acquire the necessary knowledge, skills, attitudes and values" [63] (p.20).As described above, research on ESD (implementation) still lacks data and quantitative methods for operationalization.In the future, it would be desirable to develop indicators or measurement instruments that assess the more complex construct of ESD, for example, competencies or the above-mentioned dimensions of knowledge, skills, attitudes, and values.
The research on indicators of the ESD implementation process offers many possibilities for mapping the progress of ESD with TT as one important field.In this analysis, quantitative data was gathered to examine multiple aspects of ESD-relevant TT offered at the federal state level.

Implications of Method for Indicator Development
The German Institute for International Educational Research (DIPF) describes two possibilities for indicator development.One approach is a deductive top-down procedure in which indicators are derived from scientific modelling.The other approach is an inductive bottom-up procedure that includes the selection and aggregation of relevant factors extracted from existing data (see [44], p. 13).The chosen indicator development approach for the described project combines the two aspects.We employed statistical modelling to analyze ESD indicators for TT (FESD), and we analyzed the existing datasets from the federal states' databases that list the TTs.Section 3.2 describes the first step of the data retrieval.Subsequently, the coding process of the raw data will be presented in Section 3.3.

Data Collection
A two-step procedure was chosen for data collection on ESD-relevant TT.In reference to other ESD-monitoring projects, we worked with the given key words (see [34], p. 7) and further extended the list to incorporate the complex implications of the ESD concept.Note that this first step was applied only to ensure that no TT containing ESD-relevant content would be overlooked.The output of the key word retrieval was scanned in a second step to ascertain the degree that the TT courses were ESD-relevant (see Section 3.3).The list of keywords was sent to ESD spokespersons and those responsible for TT in the 16 federal states.We received positive feedback on the keyword list from all 32 stakeholders of the ministries and the TT centers.
These retrieved TTs were further analyzed in the second content analytical step, which was a major part of the analysis.If the search string results exceeded 1500 TT courses.1500 TT courses were chosen randomly from the list extracted by the key word retrieval and coded following the usual procedure.This procedure was repeated for four federal states (Bavaria, Berlin, Lower Saxony and North Rhine-Westphalia).
Step two of the data analysis is described in Section 3.3.It defines the qualitative approach that was applied to investigate the degrees of ESD relevance in the TTs.The databases of the federal states contained all in-service TTs accredited by the states.As stated above, the list includes extracurricular TTs from the non-formal sector or TT cooperation between the states and NGOs.Altogether, 111,589 TTs were listed in the databases of the federal states during the evaluation period that included the school year 2015/16 and half of the school year 2016/17.This evaluation period was chosen, because TTs were still being added to the databases for the second half of the school year 2016/17 by the time the study began.Therefore, the data collection was limited to 1.5 years during the time in which all TT events were listed in the databases.The extensive list of search terms captured 66,935 TTs with ESD relevance to be further analyzed in the second data analysis described below.

Data Analysis-Coding ESD-Relevant TT Using a Weighted Ordinal Coding System
The collected data consisting of more than half of the TT courses that were offered during the evaluation phase (66,935 out of 111,589) was analyzed using Mayring's qualitative content analysis [64].The name of this method might be misleading as it involves systematic quantitative steps of analysis.There are several advantages in using this methodological approach, because it allows a rule-bound procedure, a theory-guided character and it integrates quantitative steps of analysis (see [64], pp.[39][40][41][42]. We formulated strict content-analytical rules for the scanning process of ESD relevance, measured by defined categories.Figure 1 shows a translated extract of the coding agenda.Only the highest value of Category 1 is displayed.(original accessible under Supplementary Materials, Document S2).
Sustainability 2018, 10, x FOR PEER REVIEW 6 of 17 R"econom*"OR"energ*"OR"resources*"OR"material circ*"OR"seasonal*"OR"nutrition*"OR"futur*"OR" transformation*"OR"change*")(own translation, original accessible under www.mdpi.com/link,Table S1).These retrieved TTs were further analyzed in the second content analytical step, which was a major part of the analysis.If the search string results exceeded 1500 TT courses.1500 TT courses were chosen randomly from the list extracted by the key word retrieval and coded following the usual procedure.This procedure was repeated for four federal states (Bavaria, Berlin, Lower Saxony and North Rhine-Westphalia).
Step two of the data analysis is described in Section 3.3.It defines the qualitative approach that was applied to investigate the degrees of ESD relevance in the TTs.The databases of the federal states contained all in-service TTs accredited by the states.As stated above, the list includes extracurricular TTs from the non-formal sector or TT cooperation between the states and NGOs.Altogether, 111,589 TTs were listed in the databases of the federal states during the evaluation period that included the school year 2015/16 and half of the school year 2016/17.This evaluation period was chosen, because TTs were still being added to the databases for the second half of the school year 2016/17 by the time the study began.Therefore, the data collection was limited to 1.5 years during the time in which all TT events were listed in the databases.The extensive list of search terms captured 66,935 TTs with ESD relevance to be further analyzed in the second data analysis described below.

Data Analysis-Coding ESD-Relevant TT Using a Weighted Ordinal Coding System
The collected data consisting of more than half of the TT courses that were offered during the evaluation phase (66,935 out of 111,589) was analyzed using Mayring's qualitative content analysis [64].The name of this method might be misleading as it involves systematic quantitative steps of analysis.There are several advantages in using this methodological approach, because it allows a rule-bound procedure, a theory-guided character and it integrates quantitative steps of analysis (see [64], pp.[39][40][41][42]. We formulated strict content-analytical rules for the scanning process of ESD relevance, measured by defined categories.Figure 1 shows a translated extract of the coding agenda.Only the highest value of Category 1 is displayed.(original accessible under www.mdpi.com/link,Document S2).The coding agenda with nine categories (see Figure 2) was validated and tested by four external ESD experts.It contained the name of the variable, their values (on an ordinal grading scale), the definition, anchor samples, and the coding rules.The coding agenda with nine categories (see Figure 2) was validated and tested by four external ESD experts.It contained the name of the variable, their values (on an ordinal grading scale), the definition, anchor samples, and the coding rules.The category system is the central point in quali-/quantitative content analysis.The coding agenda consisted of nine categories and 20 coding rules.The gradations were designated using several categories of the above mentioned comprehensive coding system obtained from scanning the descriptions of teacher formation courses, retrieved by the above given list of keywords.As stated above, wide acceptance and understanding are the basis of a successful indicator development.For this reason, transparency and the consultation of other ESD experts were considered as essential criteria in the development process of the ESD category system.To assure validity of the coding agenda, it was tested, in an expert hearing, by four external ESD experts from different universities and different disciplines and backgrounds.They provided feedback and recommendations for the coding agenda.Furthermore, the stakeholders from the ministries were equally consulted.To ensure validity of the coding agenda in addition to the states and academic stakeholders, we integrated ESD topics listed by the teachers from a former study and used them as anchor samples [65].In a second validation step, three young practitioners (teacher students with practical school experience) tested and evaluated the coding agenda (in addition to the ESD academics and ministerial experts.) Afterwards, the final coding procedure was given to different coders who applied the content analytic coding rules on 235 TT courses.The intercoder reliabilities were good, ranging from 0.71 to 0.93 (Cohen's Kappa τ correlation coefficient)-"For content analysis it is inter-coder reliability which is of particular significance.Several content analysts work on the same material independently from one another and their findings are compared."(see [64], p. 42).That means that different people using our coding plan for the same TT course they came to the same or similar results when designating the categories.In the data analysis phase of the evaluation project, the data gathered by the described procedure was analyzed and interpreted using MAXQDA software.To determine ESD relevance, we did not rely on a binary decision.Instead, we used the above-mentioned gradation system ranging on an ordinal scale from highly to potentially ESD-relevant (see Figure 2).The coding procedure consisted of two main categories, Category 1: ESD-relevant content, and Category 2: ESD-relevant targets.If the texts of the TTs were not sufficient to qualify for the two categories, coders coded them with missing values.This procedure addressed coding errors caused by coders' uncertainty.
Further categories (Category 3-9, see Figure 2) were coded when the unit of analysis was at least partially ESD relevant according to Category 1 or 2 and when the information given in the TT course description was sufficient.
Figure 2 displays the coding process conversion into the rating scale.Given that the accessibilities of information on categories 3-9 varied greatly across the different states, we built our rating scale based only on the first two categories.These two categories parallel with the structure of other international ESD-indicator grids, for example, the grid of indicators and sub-indicators from the UNECE adapted by Capelo et al. [9], (p.107).We equally used "key themes of SD" as our first category.Instead of a binary 'yes/no' self-reporting format, we used strict coding rules to assign the The category system is the central point in quali-/quantitative content analysis.The coding agenda consisted of nine categories and 20 coding rules.The gradations were designated using several categories of the above mentioned comprehensive coding system obtained from scanning the descriptions of teacher formation courses, retrieved by the above given list of keywords.As stated above, wide acceptance and understanding are the basis of a successful indicator development.For this reason, transparency and the consultation of other ESD experts were considered as essential criteria in the development process of the ESD category system.To assure validity of the coding agenda, it was tested, in an expert hearing, by four external ESD experts from different universities and different disciplines and backgrounds.They provided feedback and recommendations for the coding agenda.Furthermore, the stakeholders from the ministries were equally consulted.To ensure validity of the coding agenda in addition to the states and academic stakeholders, we integrated ESD topics listed by the teachers from a former study and used them as anchor samples [65].In a second validation step, three young practitioners (teacher students with practical school experience) tested and evaluated the coding agenda (in addition to the ESD academics and ministerial experts.) Afterwards, the final coding procedure was given to different coders who applied the content analytic coding rules on 235 TT courses.The intercoder reliabilities were good, ranging from 0.71 to 0.93 (Cohen's Kappa τ correlation coefficient)-"For content analysis it is inter-coder reliability which is of particular significance.Several content analysts work on the same material independently from one another and their findings are compared."(see [64], p. 42).That means that different people using our coding plan for the same TT course they came to the same or similar results when designating the categories.In the data analysis phase of the evaluation project, the data gathered by the described procedure was analyzed and interpreted using MAXQDA software.To determine ESD relevance, we did not rely on a binary decision.Instead, we used the above-mentioned gradation system ranging on an ordinal scale from highly to potentially ESD-relevant (see Figure 2).The coding procedure consisted of two main categories, Category 1: ESD-relevant content, and Category 2: ESD-relevant targets.If the texts of the TTs were not sufficient to qualify for the two categories, coders coded them with missing values.This procedure addressed coding errors caused by coders' uncertainty.
Further categories (Category 3-9, see Figure 2) were coded when the unit of analysis was at least partially ESD relevant according to Category 1 or 2 and when the information given in the TT course description was sufficient.
Figure 2 displays the coding process conversion into the rating scale.Given that the accessibilities of information on categories 3-9 varied greatly across the different states, we built our rating scale based only on the first two categories.These two categories parallel with the structure of other international ESD-indicator grids, for example, the grid of indicators and sub-indicators from the UNECE adapted by Capelo et al. [9], (p.107).We equally used "key themes of SD" as our first category.Instead of a binary 'yes/no' self-reporting format, we used strict coding rules to assign the category values that also contained a graduation system from potentially ESD-relevant to highly relevant.Based on the coding (using an ordinal scale) of the first two categories, we classified the TT courses as highly ESD relevant if they, according to the strict coding rules, were coded as containing (1) a high ESD-relevant content (see Figure 1, extract of the coding agenda) and ( 2) they additionally mentioned an ESD-relevant goal (see Category 2).Instead of focusing on learning outcomes (see [9], p. 107) as it is expressed in the UNECE documents, we analyzed whether the TT courses stated at least any ESD-relevant goals and coded them according to the strict content analytic rules for Category 2 ESD-relevant targets.This procedure was used, because for the present evaluation study we could not yet utilize other appropriate assessment tools that capture ESD competencies or the expected learning outcomes.Additional measurement instruments would be needed to integrate further outcome indicators that capture, for example, SD-related competencies or the dimensions of knowledge, skills, attitudes, values, and behavior.

Total Numbers and Porpotions for Germany-The Indicator FESD (Basic)
Primary data on ESD-relevant TT could be gathered for 15 of 16 federal states in Germany.Only 0.6% of the 66,935 TTs (see Section 3.2) were coded highly ESD relevant.All federal states totaled 3818 TTs with different degrees of ESD relevance.For the international audience, we mainly focused on the results for Germany.However, the ministries and ESD spokespersons from the 15 federal states received the exact numbers and indicators for each of their specific states in the form of a report.TTs received the designation, highly ESD relevant, if they contained at least ESD-relevant content and ESD-relevant targets.Table 1 shows the findings for the 15 states analyzed in Germany.The numbers are not yet multiplied by the rating coefficient, as explained below.  1 Without data for TT in Baden-Wuerttemberg, numbers are not yet weighted; 2 Total numbers of teachers from the Federal Statistical Office (Destatis) [66] without Baden-Wuerttemberg.
Note that potentially ESD-relevant TT is not a placeholder for coding uncertainty.Potentially ESD-relevant means that the identified TT dealt with topics that could be conceivable starting points for teaching ESD or SD topics, but the course description had no EDS-relevant goals nor explicit text stating ESD or SD as course content.Examples for potentially ESD-relevant TT courses: (1) A TT course that included waste management but no further discussion about the environmental protection dimension; (2) A TT course about the political or economic system (without discussing any other sustainability dimensions); (3) A TT course only discussing integration (without the cross-social or global context).
If the texts of the TTs were not sufficient to decide whether a TT course contained ESD-relevant topics or goals (i.e., the coding Categories 1 and 2), the courses were coded with missing values.
Based on the gathered data, we developed several indicator possibilities, the FESD (basic), the FESD (basic, rated), and the FESD (pro) (1)- (3).FESD (basic) (1) shows the basic indicator for ESD-relevant TT FESD (basic) = TT(ESD) TT(total) × 100 TT ESD = ESD − relevant TTs accredited by the state(s) TT total = total number of TT accredited by the state(s) (1) FESD (basic) is the proportion of ESD-relevant TTs to the total number of TTs offered by the states, calculated for 15 of the 16 federal states.Afterwards, the indicator was also calculated for the entire Federal Republic of Germany.The FESD (basic) for each federal state varied between 1.08 and 9.01 and was 3.42 for Germany.Eight of the federal states obtained indicators below this ratio and seven were above.These findings clearly show a high variance in ESD-relevant TTs offers among the 15 federal states.

Rated ESD Relevance-FESD (Basic, Rated)
Table 1 also shows the applied ratings y4, y3, y2, and y1 for highly to potentially ESD-relevant TTs.The TTs coded highly to potentially ESD-relevant were rated using quartile weights based on the ESD relevance (see Table 1) as shown in the Formula (2) below.We used this ratio procedure to stress those TTs that explicitly mentioned ESD-relevant content or goals (multiplied by y4 = 1) in comparison to those that only contained a possible ESD-relevant topic, that is, without calling it an ESD topic or explicitly stressing one of the sustainability dimensions.Additionally, as explained earlier in Section 3.3, these courses did not contain any ESD-goals.The potentially ESD-relevant TTs were multiplied by y1 = 0.25.
If the texts of the TTs were not sufficient to decide whether a TT course contained ESD-relevant topics or goals (i.e., coding Categories 1 and 2) the courses were coded with missing values.
Calculations of the FESD (basic, rated) weighted values are similar to the FESD (basic) values.The results for the federal states varied between 0.63 and 4.20.Six federal states achieved indicator results that were above the ratio for Germany, FESD (basic, rated, Germany) = 1.77.Nine of the federal states were below this ratio.
The clear advantage of the FESD (basic, rated) lies in the gradation ratings instead of a simple binary ESD-relevant/non-ESD-relevant assessments.The underlying assumption of the rating system is that a highly ESD-relevant TT course will promote the development of teachers' ESD competencies to a greater extent than a TT that only marginally addresses ESD topics.However, further research is needed on the effectiveness of ESD TTs and on factors that contribute to successful developmental outcomes to substantiate this assumption.Further implications and interpretation of these results are discussed in Sections 4.4, 5 and 6.

FESD(pro)-Qualitative Rating System for ESD Relevance with a Quantitative Aspect
The Formula (3) above displays the third indicator possibility, the FESD (pro).It combines a qualitative rating system for ESD relevance with a quantitative aspect.The preceding FESD (basic, rated) indicator includes the qualitative aspect of ordinal rating ratios applied according to the results of the coding process.The quantitative aspect consists of the integration of the ratio for the numbers of teachers that theoretically could attend a highly ESD-relevant TT course during the evaluation phase.Therefore, the FESD (pro) calculates the degree and frequency of professionalization possibilities, that is, the highly ESD-relevant TT courses in relation to the total number of teachers in the country.
The indicator answers the question: What is the theoretical possibility teachers had in the observation period to effectively develop their ESD competencies (by attending a highly ESD-relevant TT)? FESD (pro) varied between 0.01 and 0.19 among the 15 states.The FESD (pro, min) of 0.01, for example, indicates that every 100th teacher could attend a highly ESD-relevant TT.In contrast, the FESD (pro) of 0.19 signifies that every fifth teacher could attend a highly ESD-relevant TT.The national mean for Germany was 0.07, that is, every 14th teacher could attend a highly ESD-relevant TT during the evaluation phase.According to calculations based on the weighted total numbers for the FESD (pro), six federal states were below the German indicator result of 0.07.The nine remaining states reached the 0.07 or above this indicator ratio.

Summary of the Results
By the end of this project, the states' coding results were applied in different formulas to calculate the indicators for ESD relevance in TTs.The results of the evaluation study showed that suitable ESD indicators, the FESD (basic), FESD (basic, rated), and FESD (pro), could be developed and assessed.The calculated indicators constitute a major evaluation that covers the entire German TT system (except one federal state), the proportion of ESD-relevant TTs for 15 of the 16 federal states, and distinct dimensions and characteristics of ESD relevance in TTs.
The results in Table 2 show the high variance among the 15 federal states.However, note that the purpose of these indicator results is not to determine strength or weakness by comparing indicators between the states (intercomparisons) but to use these indicators to make intracomparisons of longitudinal results.

Discussion
This article presented the indicator development process for the FESDs and gave several possible indicators for capturing ESD relevance in TT.We discussed the underlining guidelines for indicators and their development.Thereupon, we presented the methods and the data as well as the three indicator possibilities-FESD (basic), FESD (basic, rated), and FESD (pro)-for practical application.The main contribution and strength of the study is that the results provide the possibility to asses a transparent indicator ratio that provides condensed information on ESD-relevant TTs, based on a dynamic dataset (i.e., the state databases of the TT course listings).The presented findings of indicator development constitute the most recent research on the ESD relevance in TTs.
Against this background, the FESD can be understood as a robust proxy indicator that can quickly enable stakeholders dealing with ESD or TT to evaluate how the ESD implementation evolves over time.Recalling the research questions of the evaluation study, RQ 1: How accessible is the relevant data on TTs in the 16 German federal states for continuous monitoring?RQ 2: Is it possible to develop a replicable and applicable ESD indicator for teacher training (as part of a larger, longitudinal set of ESD indicators)?We can state the following: (1) The relevant data was accessible in 15 of 16 federal states in Germany in the time of the evaluation phase.(2) These databases could be used for continuous monitoring.(3) A replicable and applicable ESD indicator (the FESDs variants) could be developed.
Nevertheless, some shortcomings of the indicator development and application occurred.As stated above, no central database exists listing all the TT courses for the federal states in Germany.Consequently, unavoidable differences were encountered in the characteristics of databases from the federal states (e.g., differences in the length of the description and the amount of information on the training).These differences should be considered when comparing the results at the federal state level.Another consequence of the differences among the TT course descriptions was that Categories 3-9 (see Figure 2) could not be coded in many cases because of insufficient course descriptions.Hence, only Categories 1-2 were used as a basis of information for calculating the FESD indicator.Structural changes of the different federal state databases (to facilitate access to relevant information) would improve future research on TTs.For example, course attendance records of teachers would be a very important component for all the federal states.
Some shortcomings are project specific and thus mirror general limitations of single indicators.The proposed indicators-FESD (basic), FESD (basic, rated), and FESD (pro)-constitute only partial indicators that focus on the field of ESD in TT.Ideally, however, they should function as embedded in a more comprehensive set of ESD indicators.Only such a set of ESD indicators would be able to capture the interactions, multiple facets, and developmental levels to provide a more holistic view of the state of art of ESD in Germany (see e.g., [40] p. 234).Another limitation of the proposed indicators relates to a general problem associated with data collection in field studies that lack experimental control, especially in the social sciences.The use of indicators (or other research methods) to capture a multi-dimensional and highly complex system always entails a considerable reduction of this complexity of the examined system (see [39], p. 15).Additionally, further ESD-specific research is needed on the effectiveness of ESD TT and on factors that contribute to successful developmental outcomes of TTs.For a general overview, see e.g., [53,[67][68][69].
The study also assessed detailed characteristics of the TTs (e.g., methods, duration, learning material, subject specific vs. interdisciplinary).Although the evaluation study took place in Germany, the approaches for the indicator development at the meso and micro level is nevertheless transferable to transnational application.As stated above, one of the main characteristics of the indicators is that they provide an easily understood index to report on a more complex domain that is relevant for decision making.Accordingly, the FESDs offer insights on the status of ESD implementation in TTs.Hence, the main aim of these indicators is that they provide relevant and reliable measures to inform decision makers about the status of ESD development in their state (s.[65], p. 8), also for long-term perspectives.Each administrative region should therefore decide where they want to set their targets or benchmarks.However, this aim is only one aspect of how ESD indicators can be used.In fact, they provide information relevant for different levels, such as institutional and political, and various stakeholders on several levels (e.g., curriculum developers, teacher educators, teachers, school leaders, and researchers; [9], p. 95).
For the political level, this study has revealed a suggestion for applicable and replicable indicators that provide condensed, relevant, and reliable data that can be used for decision making [41] (p.510).In general, federal agencies "view indicators as essential for monitoring the status of the nation's educational system and for tracking how it changes over time.[ . . .] policy makers also seek indicator data that can inform new improvement efforts" ([70], p. 181).These two statements apply to the FESD indicators and ESD indicators in a broader context.As already stated above, subsequent to the FESD indicator results, each federal state should decide on their own targets, which are determined and influenced by many factors.
Moreover, in line with Lipsky, teachers, as policy makers on the ground can also refer to these indicator results when asking for more or better TT courses that incorporate ESD.Additionally, teachers should be given more ownership in the development of indicators in the future (see [71], p. 71).However, in developing a comprehensive set of ESD indicators for portraying the ESD implementation, this article constitutes only a starting point of suggesting a possible path for ESD monitoring in the field of TT.Further implications for research and practice and an outlook will be given in the concluding section of this article.

Conclusions & Outlook
The ESD indicator on TT, as part a of a larger set of ESD indicators, can serve to promote "comparison of performance, monitoring and data collection" [72] and foster benchmarking processes and political target setting.It can therefore be seen as an important driver in promoting the structural implementation of ESD, while addressing the following questions: "How do stakeholders know that an increased understanding of ESD and indicators develops?How can stakeholders work productively to build ownership and commitment? and Are stakeholders improving their ability to assess ESD progress?"([10], p. 32).In sum, the monitoring phases of ESD implementation are well complemented by the indicators on TT.They address the need for dynamic information by which changes can be observed in relatively short time periods and can be compared on different scales.One of the central features of the national monitoring is to grasp the level of ESD implementation for each of the educational fields and thereby contribute to a broader picture of ESD.Without the development of powerful indicators for single educational fields, this overview would remain too blurred at a macro or meso level of analysis of ESD implementation [73].The shift to the micro level at which ESD effects are measured on outcomes, such as SD-related knowledge, skills, attitudes, values, and behavior, is still needed.However, the complementary advantage of the broader scope of the national monitoring and the TTs indicator also lies in the analysis of documents with relatively long update cycles of up to 10 years, as in the case of school curricula versus the dynamic change in the database for TTs.Another complementary element is the analysis across educational fields, which focused on the conceptual level and therefore will not be able to capture the breadth on the thematic level of ESD, whereas the TT indicator is the reverse such that the whole conceptual range and educational areas of ESD might not be covered.It is therefore insightful to view the project findings in the context of the teacher-relevant outcomes examined in the document analysis of the ESD national monitoring see [34].The findings of the teacher education documents underscore the importance for further integration into TT, and they suggest that a lack of ESD-relevant knowledge is not just the concern of an older generation of teachers but also teachers who have recently finished their education.The findings also stress the relevance of TT as a valuable continuation and complement of teacher education.The concentration of references to ESD on single subjects in teacher education highlights the need for TT on ESD that addresses teachers in general and all subjects if it is to be effectively implemented as a cross-cutting educational concept.
Future research on ESD indicators should also focus on not only the input and processes of indicators related to ESD, but also on its outputs (see also the definition for monitoring education systems [28], p. 163).Although the indicators for TT proposed in this article, depending on the perspective, have elements of input, process, and output in the developed indicators (for differentiations of indicators, see [14]), a complement within a set of indicators that targets the level of teacher and student ESD competencies [74][75][76][77][78][79][80] appears worth striving for.As stated above, this shift to the micro level at which ESD effects are measured on outcomes-such as SD-related knowledge, skills, attitudes, values, and behavior-is still needed.

Figure 1 .
Figure 1.Translated extract of the coding agenda for the second step of the TT analysis.

Figure 1 .
Figure 1.Translated extract of the coding agenda for the second step of the TT analysis.

Sustainability 2018 , 17 Figure 2 .
Figure 2. Categories of the coding procedure for analyzing the TT course descriptions with nine categories validated and tested by external ESD experts.

Figure 2 .
Figure 2. Categories of the coding procedure for analyzing the TT course descriptions with nine categories validated and tested by external ESD experts.

Table 1 .
Total numbers for Germany (15 of 16 federal states)1in the evaluation period-school year 2015/16 and half of the school year 2016/17.

Table 2 .
Indicator results for the FESD (basic), FESD (basic, rated), and FESD (pro) in Germany 1 .Only the min.and max.indicator and the national mean for Germany are displayed.We decided not to display the ratios by state rankings, because each state should set their own FESD-indicator targets independent of other states.