School Development in Culturally Diverse U.S. Schools: Balancing Evidence-Based Policies and Education Values

This article problematizes evidence-based policies in the USA, using Dewey’s (1916) education theory and findings from a school development project in 71 culturally diverse Arizona schools. The study asked three questions: (1) How do formal and informal school leaders work in teams to mediate between evidence-based policy requirements at federal, state, and district levels and the needs of culturally diverse students? (2) What leadership team practices contribute to school development as measured by improved student outcomes in school letter grades? (3) What values from evidence-based policies and democratic education are evident in effective school development? Evaluation methods featured qualitative interviews with leadership team members in 71 schools as well as a descriptive analysis of school letter grades based primarily upon student outcomes. Results indicated improved student outcomes in letter grades and enhanced leadership capacity and democratic values as well as evidence-based values that contributed to school development. The article concludes with next steps to expand the project to another region of the USA and a call for a balanced use of evidence (including standardized test scores) constructed through Dewey’s notion of democratic values of education.


Introduction
This article presents a school development project that attempts to balance democratic educational values [1][2][3] with evidence-based values [4] with in culturally diverse Arizona (U.S.) schools. In recent decades, educational policymakers in many nation states, including the U.S., have legitimized and funded evidence-based innovations developed from a particular set of values, research designs and methods. The logic is also driven by a theory of utility, meaning that if educators utilize more practices grounded in strong evidence about "what works" and reduce the use of practices that do not work, schools will improve. Here, with respect to school development, innovations with strong evidence are those that value standardized test results, the knowledge tested within these tests, and a particular research methodology. Such innovations have been legitimized with federal department of education funds that established the What Works Clearinghouse as a mechanism to identify innovations with "strong or moderate evidence" of effectiveness linked to gains in student outcomes, with strong evidence often garnered from studies that use randomized controlled trials or quasi-experiments [4]. In order to attain funding from the federal Department of Education and many other funding agencies in the U.S, scholars must build their research on such prior studies that produced strong evidence. Some researchers have, at least in part, argued for this type of research and the logic behind evidence-based innovations. See frequently cited arguments for such evidence-based innovations from Hattie and Slavin and related interventions in the What Works Clearinghouse [4][5][6]. We argue that these evidence-based interventions provide resources for educational decision-making about innovations that may improve student outcomes, but they do not consider or reflect a language of education or the traditional humanistic values of U.S. education [1,7], all of which we argue are also critical for increasingly diverse students/citizens to make deliberative judgments in democracy [8,9].
Like many democratic nation states worldwide, the U.S. is becoming increasingly culturally diverse due to global population migrations, refugees, and internal demographic shifts, and schools (teachers and principals) must be able to support diverse students' needs and backgrounds in pedagogical experiences that parallel critical thinking, deliberative judgments, and experiences of a democratic way of living [1]. In other words, today's leaders (formal and informal) must be able to mediate between evidence-based policy trends that value knowledge as measured by standardized tests and documented with randomized controlled trials or quasi-experimental designs and the needs of increasingly culturally diverse students in a democracy.
Few evidence-based innovations give explicit attention to humanistic and effective education aimed at democracy with culturally diverse citizens. In this article, we present an ongoing school development project designed to build leadership team capacity for continuous development in schools that educate culturally diverse students in a democratic society. Unlike approaches to evidence-based school development that define "what works" strictly according to standardized tests, our school development approach also purposely drew on Dewey's philosophy of democracy and education [1]. Here, in the contemporary environment for democracy and education, we use evidence (including test results) as a source of reflection and deliberative educational activity and planning for pedagogical experiences [3] in schools and classrooms with an increasingly culturally diverse student population [10,11].
For purposes of our school development project, then, we expanded Dewey's [1] (pp. 24-25) perspective on diversity developed in a societal time of the industrial revolution and increasing "diversity of populations, of varying languages, religions, moral codes and tradition." Dewey noted at that time that the American child's primary association is the ethnic family and neighborhood [1]. The schools of the industrial city bring children from different groups together, and thus are truly international and intercultural. Their diverse student groups bring with them their distinct yet pervasive unconscious perspectives shaped by their primary groups. Dewey did not distinguish explicitly between voluntary immigrants and people who were involuntary immigrants (e.g., incorporated by slavery or conquest) or the role of leadership in mediation between cultural diversity and broader societal or policy trends. Therefore, we also drew on literature from culturally responsive leadership e.g., [11].
This article is organized into six main sections. To begin, we contextualize the school development model in relation to policy shifts toward evidence-based reforms and values. In a review of research, we discuss popular evidence-based education innovations. The next section considers the traditional humanistic values of democracy and education, drawing primarily on Dewey [1,7] and more recent scholarship on culturally responsive leadership [10,11]. Following a discussion of the literature, we describe the school development process as well as our methods for evaluation and results. The paper concludes with a discussion of implications and next steps for the school development project.

Recent US Policy Shifts toward Evidence-Based Innovations and Values
In recent decades, U.S. education policies have reflected related trends toward "scientific" research and evidence-based practice. The rise in the use of evidence for educational policymaking rests on two common epistemological perspectives or beliefs about knowledge: One is the belief that school knowledge (curriculum) is abstract and universal, and the other is the belief that empirical evidence in the form of student outcomes on tests is an efficient indicator of knowledge and learning. As recent examples, policy documents under both Republican and Democratic administrations dating back to the late 1990s have reflected the importance of comprehensive school reforms in federal education funding with externalized evaluations in the form of standardized tests to guide school accountability (No Child Left Behind with its emphasis on "scientifically based research to guide educational practice" whereby research relies on empiricism) later tied to funding (Race to the Top). Moreover, since 2002, 75% of education research funded by the Office of Educational Research and Improvement (OERI) addresses causal questions using random assignment designs. (Previously, funding for randomized controlled trials represented 5% of federal funding for education research) [12]. These research trends are not completely new as the U.S. has a long history of grounding educational work in psychology. We see key differences in the legitimacy of government funding for particular research in the wake of externalized evaluation trends and declining federal and state funding for higher education research of other types. In current funding applications for the U.S. Department of Education [13] grants, researchers must demonstrate that their research designs are based upon prior studies with "strong evidence" explicitly defined by large-scale quantitative studies with randomized controlled trials or quasi-experimental designs that primarily measure what works in terms of gains in student outcomes. That is, federally funded research channels future research in a particular and similar direction, and this research is considered legitimate with "strong evidence". Internationally, multinational organizations, such as the Organization for Economic Co-operation and Development (OECD) and the World Bank have also made evidence-based policymaking a priority both in their own work as influential research and policy organizations as well as for their members [14].
A number of influential scholars [5,6,[15][16][17][18] have argued persuasively for the use of such evidence to inform educational practice. Slavin, for instance, advocates for the linkage between research and practice similar to the medical field [6]. Using his Success for All project as an example, Slavin argues for the importance of studies that seek to make causal conclusions that include correlational and descriptive dimensions. Success for All is one of the innovations featured in What Works Clearing House with strong evidence of effectiveness, and, thus, one of the innovations that may be selected for funding by scholars and educators seeking grant funding for school improvements [6,18]. The What Works Clearinghouse [4], sponsored by the Institute of Educational Science to provide educators with the information they need to make evidence-based decisions, includes other innovations with similar evidence of effectiveness from randomized controlled trials of reading programs [19] and summer programs [20]. Across these innovations, researchers tested effective leadership components established in previous literature. Since the 1970s, effective leadership studies [21][22][23][24][25][26] have provided important understandings about "best practices" common to schools that improved outcomes for all students regardless of socioeconomic status. From the most recent literature, best practices included data literacy, supportive school culture, trust, relationships, shared leadership team capacity, motivation, and professional learning communities focused on curriculum, instruction, and formative student assessments. While not explicit, all of these innovations imply a primarily closed system for implementation of the innovation or program, meaning that if school leaders apply understandings from effective leadership and school development research within schools and decrease the use of other practices, schools will improve with improvements measured on state tests. In these descriptors, we also see a void in the humanistic values of education and an ontology of education emanating from the later enlightenment and romantic heritage that impacted Dewey's early work [7] (pp. [29][30][31][32][33][34][35]. In other words, we see traditional education values and pedagogical methods that support being and becoming in a democratic way of living being replaced by numerical evidence-based values and data analysis methods that support the use of externally developed programs proven to improve student outcomes on standardized tests. As examples, there are differences between evidence-based values of external measures and Deweyian educational values of internal humanistic interactions. Evidence-based programs value replication while Dewey values choice based on context and perspectives of the collective. Dewey values the process of growth (being and becoming) while evidence-based traditions value the outcomes.
Formal leadership development innovations or programs aimed at improvements in persistently underperforming schools promote an evidence-based approach to effective leadership and school development, including most prominently the Virginia School Turnaround Specialist Program (STSP), the Mass Insight and Research Institute model based in New York, and the Chicago Reconstitution Effort. For example, the Virginia STSP is an intervention for principals with a focus on effective leadership practices with effective practices identified as mediators between leadership practice and gains in student outcomes, including data literacy, professional learning, motivation, and curriculum mapping as well as use of evidence-based strategies from the business field, including the development of a 90-day plan for rapid improvement, implementation support, long-term strategic planning and on-site visits. Since 2004, the Virginia STSP program has provided 95 principals with training in business strategies as well as individual coaching to school leaders in more than 82 school districts in numerous states, including Pennsylvania, Illinois, Florida, Missouri, Louisiana, Arizona, New Mexico, Nevada, Utah, Colorado, Texas, Ohio, and the Dakotas, as well as Virginia [27]. According to the Virginia STSP report [28], 46% of participants (44) made AYP compared to only 16% (15) prior to participation in the project.
The Mass Insight and Research Institute's project for rapid school improvement [29] proposes a similar evidence-based focus on improvement but aligns schools and service providers into clusters of three to five low-performing schools. Districts and states commit to flexible operating conditions for zone schools with an emphasis on people (recruitment and retention), extended, money or budget allocation, and program implementation of a rigorous standards-based curriculum and effective leadership practices (e.g., culture building, data literacy, professional learning communities). Results from the Partnership Zones indicated that two-thirds of participants reported gains and one-third reported declines in school performance. Researchers in the School Turnaround Group, a division of Mass Insight Education [30], attributed the declines in performance to loose coupling between schools and districts.
Chicago's Academy for Urban School Leadership (AUS) drew on the Mass Insight project to improve student achievement in participating schools, including attention to positive school culture; parent engagement; setting goals; shared responsibility for achievement; standards-based, college-prep K-12 curriculum; aligned assessment systems; and engaging personalized instruction. Results indicated that some schools have attained high performance on district benchmarks; however, there were concerns about the sample of students included in the testing. Hood and Ahmed-Ullah reported that these schools have "pushed out the lowest performing children who could not attain the benchmark scores, thus artificially elevating their scores" [31] (p. 1). Surprisingly, despite the emphasis on instructional leadership in school effectiveness studies, education theory [1,3,7] with an emphasis on democratic growth and pedagogy has received little attention in educational leadership studies. Moreover, despite recent demographic shifts, cultural relevance has not been explicitly addressed in the intervention models reviewed above. In other words, none of these popular evidence-based interventions explicitly considered the humanistic values of education for continuous growth and democracy [1].

Traditional Humanistic Values of Democracy and Education
Over the course of his career, Dewey [1,3,7] encouraged continuous growth throughout life, and to avoid following a path to a point where learning could stop. In Democracy and Education, Dewey argued, in particular, that an education which only emphasizes the achievement of "external aims" (e.g., evidence from standardized test scores, grades, school letter grades, etc.) hinders students' capacity for continuous growth and leads them toward viewing learning as an overly burdensome activity which they should seek to end as quickly as possible [1]. Rather, in a Deweyian approach to education, there is a reliance on ordinary individuals increasing their wisdom through experience; democracy requires that everyone continuously grows and adapts to changing conditions. Earlier in The School and Society, Dewey wrote of unifying the student with other students so that the school "gets a chance to be a miniature community, an embryonic society" [2] (p. 320). Dewey went on to encourage the use of deliberative judgements about curriculum subjects and learning from experiences. Here, we note a gap with regards to evidence-based interventions that subsume professionals into the subject (i.e., "evidence") without explicit attention to professional educators as subjects using data as one source of reflection within a process of democratic deliberations. Biesta [31] refers to this gap in evidence-based reforms as a democratic deficit, emphasizing how a particular use of evidence threatens to replace professional judgment and the wider democratic deliberation about the aims and ends and the conduct of education. Biesta argues for a value-based education as an alternative for evidence-based education. "Calling the idea of value-based education an alternative is not meant to suggest that evidence plays no role at all in value-based education but is to highlight that its role is subordinate to the values that constitute practices as educational practices" [32] (p. 493). To date, few U.S. scholars have grounded school development and leadership practices in Biesta's notion of value-based education or Dewey's education theory. Henderson, Castner and Schneider (2018) applied Dewey's theories as well as curriculum theorizing to teacher leadership development in a leadership development framework [33]. Henderson, Castner and Schneider provide a framework for study and growth in curriculum leadership, with growth defined in Deweyian terms and related democratic values. Henderson et al., do not, however, consider democratic values in relation to or in tension with contemporary evidence-based trends and demographic shifts. In light of demographic shifts in Arizona and throughout much of the U.S. and elsewhere, we explicitly considered understandings from culturally responsive leadership.

Culturally Responsive Leadership
Culturally responsive leaders are aware of the increasingly diverse student demographics and respond to the changes through culturally responsive practices (CRP). In an international study of successful leaders, Ylimaki and Jacobson define CRP as practices "that incorporate the history, values and cultural knowledge of students' home communities in the school curriculum to develop a critical consciousness among students and faculty to challenge inequalities in the larger society and empower parents from diverse communities" [34] (p. 15). Culturally responsive leaders strive to develop teachers who legitimize what students already know and acknowledge the sociocultural realities and histories of students through what and how they teach [35]. Thus, culturally responsive leaders encourage their teachers to utilize the knowledge of their students' culture and their knowledge of the dominant culture to construct intercultural bridges that acknowledge differences "without shining the deficit light on students' cultural knowledge" [36] (p. 18).
Moreover, culturally responsive leaders recognize the impact of deficit thinking on student learning and work to remove those and other barriers. For example, Scanlan and Lopez examined 79 empirical studies that focused on culturally and linguistically diverse (CLD) students and found "the literature guides school leaders to promote school communities that normalize culturally responsive instruction, advancing the sociocultural integration of CLD students" [37] (p. 29). Additionally, Gandara et al. [38] found that culturally responsive leaders needed to create their school environment with intention and "must ensure teachers incorporate instruction that provides knowledge about how to access the majority culture to ameliorate issues with access and power that perpetuate inequalities among CLD students in order to develop a culturally responsive environment in schools" [37] (p. 27). The evidence from the literature signals the importance of culturally responsive leaders whose efforts positively impact the learning of culturally diverse students and their experiences. Here, we do not see an explicit grounding in the humanistic values of Dewey or an educational ontology which would suggest additional use of evidence as a source of reflection for pedagogical decisions in classrooms and schools. Regardless, as noted below, we do not see culturally responsive leadership or the broader humanistic values of education theory reflected in dominant leadership development programs and views of evidence. In the next two sections, we describe (1) the school development model and (2) methods for evaluation of the project.

Description of the Intervention
The AZiLDR model featured research-based and theoretically driven content as well as a research-based delivery system. We describe both aspects of the model in this section.

Content
The U.S. (Arizona) school development project was designed to provide school leadership teams with a three-year intervention model focused on curriculum /pedagogical work within and between school leadership teams, other teachers, and district leaders. The project conceptualized leadership as a shared, pedagogical, and often mediational activity grounded in trust, relationships, communication, and decision-making processes, all of which include using evidence (formative, external/summative) as sources of reflection. The content focused on two interrelated processes: (1) interpersonal, democratic (team member) interaction and (2) reflection on content/pedagogy with content including understandings from effective leadership.
To begin, we used survey results [39] designed from findings of effective leadership studies [21,[38][39][40][41][42] and interviews to inform the content. Specifically, we drew upon effective schools research and findings from the International Successful School Principalship Project (ISSPP) that expanded the effective schools literature [21][22][23] to an international sample reviewed above. Culturally responsive practices were also an integral part of our approach with content including asset thinking and funds of knowledge [43,44] applied to leadership [37] (pp. 583-625). Topics included, for example, professional learning communities, school culture, the state version of the Common Core (a national curriculum mediation), data as a source of reflection (i.e., survey results, summative/formative assessment data and other pertinent data), parent-community involvement, recognition, and culturally relevant practices.
Interpersonal interaction and reflection were integral components of the project, grounding the work accomplished by school teams. Teams were provided guidance in team development, democratic deliberative judgements about curriculum subjects and learning from experiences, and conflict resolution skills. Time was provided throughout the project for team members to reflect at both the individual and team levels. Teams were asked to reflect on the content they were receiving, the issues at their own sites and how they might address those needs, as well as ways to diffuse the content and pedagogy throughout the school.
It is also important to note that the Arizona Initiative for Leadership Development and Research (AZiLDR) was developed in response to an Arizona state education department policy requirement to improve persistently underperforming schools with improvement defined by student outcomes and school letter grades. Beyond analysis of student outcomes on state tests, with AZiLDR, then, we drew on Dewey [1] and aimed to build leadership capacity to use leadership survey data, readings, and other student information as a source of reflection, to work in teams (principals, assistant principals, teacher leaders, coaches, and district leaders) to identify problems of practice and set goals for individual and collective improvement.

Delivery System
The second interrelated process revolved around the delivery system. The delivery system featured direct instruction during institutes (10 days annually attended by all school teams), monthly regional network meetings for the purposes of both reflection and content follow-up, and in-school coaching and walk-through observations. Institutes and regional meetings were experiential, modeling processes to intervene and mediate among common core standards, individual learner (student, teacher, leader) needs, and local school-community traditions. Further, institutes and other meetings also provided school team participants and district leaders with structured (discursive) spaces for dialogue and reflection within and between levels; time for school planning for diffusion was embedded within all meetings.

Methodology
For purposes of evaluation, we asked three questions:

1.
How do formal and informal school leaders work in teams to mediate between evidence-based policy requirements at federal, state, and district levels and the needs of culturally diverse students? 2.
What leadership team practices contribute to school development as measured by improved student outcomes in school letter grades? 3.
What values from evidence-based policies and democratic education are evident in effective school development?
Evaluation methods included a qualitative analysis and interviews with leadership teams in 71 schools, which is accompanied with some descriptive quantitative data analysis regarding student outcomes and school letter grades. In the next sub-sections, we describe sampling, data collection, and data analysis strategies.

Sampling
Our sample of participating schools featured 71 Arizona schools that serve increasingly culturally diverse students and that have been identified as underperforming according to state outcomes summarized in school letter grades. We identified participating schools according to state lists and recommendations from superintendents. The school development project was piloted and refined over several iterations. Participants and demographics are illustrated in Table 1.

Data Collection and Data Analysis
Data collection and analysis of results from participating schools has been ongoing; however, we did not include a control group as recommended by What Works Clearinghouse methods. We used the Arizona Department of Education website to determine letter grades for schools with differing levels of participation (full participation, partial participation, and no participation). Letter grades were only available for Phases 1 and 2 because the state of Arizona suspended their use during Phase 3 due to a change in state testing.
Semi-structured qualitative interviews (35-40 min) were conducted by interviewers (outside of the internal researchers), paid by the grant and trained in qualitative interviewing techniques. Interview questions featured leadership practices in relation to the three stages of turnaround leadership [45], including levels of capacity building, collaboration, community involvement, assessment literacy, curriculum, and overall priorities. Interviews examined participants' (principals and teachers) understandings of turnaround stages, conceptions of leadership, and capacities. Specifically, semi-structured interviews were conducted in the last two institutes in order to determine the perceptions of changes in capacity building that occurred throughout the intervention period. Observational data was noted during walk-through observations, site visitations, and through observations of the team interactions. The research team provided team members with feedback on their pedagogical interactions with other teachers and students, progress toward goals, team interactions and, on occasion, ethical issues around, for example, disparities in student discipline or parent involvement.

Results
Results were analyzed using quantitative (school letter grades based on student outcomes) and qualitative (interviews and observations) methods.

Improved School Letter Grades
State assessments and data were used to analyze the movement of lowest quartile students, within-school gaps, and graduation rate changes, all of which impacted the state letter grade designation. In Arizona, as in most other U.S. states, student outcome data is summarized into a school grade designation (A-F). Letter grades are reported to parents and communities as a summary of school effectiveness or quality. Figure 1 shows that, in the first test group, full participation in AZiLDR training increased the likelihood of an improved accountability rating by one to two grade levels. Specifically, over 50% of those schools that participated in all AZiLDR sessions and activities improved their letter grades by one or two letters. A few schools with lower levels of participation (i.e., some, none) were still able to make improvements. Those schools that had participated, either fully or partially, in the intervention showed greater improvement overall than those schools who had not participated in the training, with greater improvement defined by increased letter grades.
Educ. Sci. 2019, 9, x FOR PEER REVIEW 8 of 15 assessment literacy, curriculum, and overall priorities. Interviews examined participants' (principals and teachers) understandings of turnaround stages, conceptions of leadership, and capacities. Specifically, semi-structured interviews were conducted in the last two institutes in order to determine the perceptions of changes in capacity building that occurred throughout the intervention period. Observational data was noted during walk-through observations, site visitations, and through observations of the team interactions. The research team provided team members with feedback on their pedagogical interactions with other teachers and students, progress toward goals, team interactions and, on occasion, ethical issues around, for example, disparities in student discipline or parent involvement.

Results
Results were analyzed using quantitative (school letter grades based on student outcomes) and qualitative (interviews and observations) methods.

Improved School Letter Grades
State assessments and data were used to analyze the movement of lowest quartile students, within-school gaps, and graduation rate changes, all of which impacted the state letter grade designation. In Arizona, as in most other U.S. states, student outcome data is summarized into a school grade designation (A-F). Letter grades are reported to parents and communities as a summary of school effectiveness or quality. Figure 1 shows that, in the first test group, full participation in AZiLDR training increased the likelihood of an improved accountability rating by one to two grade levels. Specifically, over 50% of those schools that participated in all AZiLDR sessions and activities improved their letter grades by one or two letters. A few schools with lower levels of participation (i.e., some, none) were still able to make improvements. Those schools that had participated, either fully or partially, in the intervention showed greater improvement overall than those schools who had not participated in the training, with greater improvement defined by increased letter grades.
In Phase 2, about 87% of schools that fully participated in the project increased by one letter grade. The participating schools that had only some participation in the project had no change in their state-designated letter grade. During the third phase of the project, Arizona changed the state assessment and suspended reporting letter grade determinations for three years in order to gather three years of data from the new assessment which was part of the formula for determining the letter grade; therefore, this information is not available for the third iteration of the project. In Phase 2, about 87% of schools that fully participated in the project increased by one letter grade. The participating schools that had only some participation in the project had no change in their state-designated letter grade.
During the third phase of the project, Arizona changed the state assessment and suspended reporting letter grade determinations for three years in order to gather three years of data from the new assessment which was part of the formula for determining the letter grade; therefore, this information is not available for the third iteration of the project.

Qualitative Interview Results
Results from interviews of principals and teacher leaders during Phase 1 indicated that their schools were making positive changes regarding formative assessments, data use, and growth in some community interactions, but noted that they were largely lacking in more authentic forms of democratic engagement. Here, we encouraged participants to retain clarity about the importance of authentic collaboration to education aimed at democracy. In much the same way we encouraged teams to use student outcome data and other student data as a source of reflection, we worked to facilitate schools toward higher capacity for collaboration, using evidence from the survey as well as interview data. Participants also reported the need to move beyond their low capacity status and develop into high capacity learning communities but described little consciousness of the broader socio-cultural dimension and cultural-political shifts in developing the potential for sustainable improvement in the Arizona context [46] (pp. . The varied implementation of professional learning communities (PLCs) was evident in the multiple and sometimes conflicting district priorities, range of instructional leadership perspectives, and levels of resistance. As one principal stated, PLCs "forced us to look a little bit deeper at our data and think about who was involved genuinely and who might be excluded or ignored. That was kind of alarming . . . We maybe had that before and really didn't focus on it". In many cases, priorities shifted away from building authentic and culturally respectful relationships with families and communities, which served only to reinforce deficit thinking and lack of coherent direction within schools for collaboration and cultural responsiveness. This deficit thinking is illustrated by a principal in high Native population school who talks about "those students" who are not coming from homes with college education. She does not recognize students' funds of knowledge when she states: "It's very important for us to try and help those students coming from those homes so that they have a better chance at the future. I live where the educated people live. And I said that if you get a good education, you can live down there too." In these instances, team members were asked to think about students' funds of knowledge and their agency to leverage opportunities for all students.
Prior to the beginning of Phase 2, we solicited input from teachers within the project schools. Strong trust in the principals was evident. However, several issues were identified immediately by teachers. These included a lack of focused vision for the schools, limited capacity for collaborative leadership, and deficit thinking. These issues were considered in relation to survey evidence as well as observations and interview data that illuminated values. Teams were asked to set goals, make plans, and reflect on their actions over time.
In all schools at the beginning of the project, the focus was on the state letter grade rather than on enhancing pedagogical interactions and learning for all students. One teacher said, You know, I really don't think we have a real, definite vision. I think right now we were a C-minus school . . . So I know that is definitely one of his visions, is to get us to improve our C-minus standing. But other than that, I don't think we all know, okay, definitely what is the vision for the school as a unit, other than moving forward from a C-minus; and that is a big goal.
As indicated earlier, state letter grades are based on performance on state-mandated norm-referenced tests, factoring in growth over time, actual achievement levels, and examining sub-group performance (gender, ethnicity, special needs, etc.). Additionally, understanding how decisions are made within school sites, and feeling a part of those decisions were lacking. Teachers described the process as problematic.
I know he meets with department chairs, I think it's once a month, and they meet I think for about an hour. And then their department chairs go back to their department and pass on information that they have gathered from that. So there is-he's trying to get that communication down. But from there there's not an avenue, really, to put input back up, so it comes down. And I know I've given suggestions to my chair, but I don't know if it has gotten back up. So there's no real follow-through on when you have ideas.
Finally, deficit thinking was revealed across the board, yet teachers believed they were doing what was in the best interest of students. One teacher shared, "Well, right now, because the Hispanic culture tends to be very kind of laid-back, you kind of really have to push. They're very much, what's the word, "mañana," "tomorrow," that's the word; it's their culture. So they're not in any big rush; that's their culture." By the end of Phase 2, participants were applying the leadership skills that they had learned over the course of the project. For example, interview data indicated that the principals and teacher leaders developed a shared mission for school improvement. As one principal commented, Since we started, I have seen changes in the school vision and mission, the directions that we are going in the capacity-building groups that we have, our curriculum action team, as well as the revamped and rejuvenated leadership council with better direction . . . We have better communication across the board and better professional development for our staff focused on student learning.
As another example, during the course of the project, participants increased their data literacy skills and use of data and reflection in their daily practices. As another principal noted, "we are using data and the strategies we learned in the institutes in our PLCs . . . Primarily we've been modeling leadership processes and making data-based decision-making but really pushing people to think deeply behind the numbers as well. Everyone has a voice at the table, but the voice needs to be informed by research and data. It is helping slowly." Capacity building was key. One principal noted, "We executed a great turnaround so that when I left the school a couple years later, we had been recognized as an A+ school of excellence for the state of Arizona . . . " One assistant superintendent, who served as his district representative and attended all of the institutes, believed that the project made a difference to the capacity of the school to move forward. He stated, [The School Improvement Project] has provided the research, the systems, the applications to start small, look at the low-hanging fruit, start to build momentum, have clarity in purpose and direction, and get the buy-in to start moving forward . . . it's showing the principal how to build teams to have, for example, to help with issues on curriculum and culture. It is no longer just the principal trying to lead the way. It's all encompassing of staff trying to get on board.
Interestingly, during final interviews, no mention was made of culturally relevant pedagogy, yet at the same time, there was no evidence of deficit language.
Finally, Phase 3 began with school teams analyzing their own school data, including achievement data as well as the survey data. When talking about that analysis, one principal talked about the school survey results, stating, "I think looking at the trusting culture among the staff-that was a huge area, that they don't really trust; and collective efficacy was bad-they jump out at us . . . " In contrast, at the end of Phase 3, a principal shared, "I think just reflecting . . . that it's not me. It's this team of people communicating and determining these are the needs. This is what we need to do. This is where we get feedback from teachers what they need, and now let's put it together. That's what I think has been really wonderful this year." In other words, teams recognized the value of diverse perspectives engaged in collaborative deliberations and reflection using multiple sources of data.
Early in the project, participants had limited understanding about effective leadership practices, including the importance of trust and school culture. Survey results were corroborated by interview data, indicating that school culture and a focused vision were areas of concern for many schools; however, progress was evident. One principal shared, "But, the biggest thing is we have been able to build our leadership team in democratic ways....and really look and see, what is our school culture? What defines [us]?" A teacher on her leadership team was enthusiastic about the changes occurring; she stated, " . . . definitely shared collaboration time, shared vision. I don't feel like Katie's [principal] telling us what to do. I feel like Katie's involving us in the process, and that has never happened before ever." Building the capacity of the site to continue was a focus of the project. One principal, who was retiring at the end of the school year, was excited about what could continue to develop: I'm still sitting here with my team going, "Okay, we need to do this next year, dah dah dah," and I have no idea if it's going to happen or not . . . I know they'll carry forward, or hopefully whoever takes over will be open to where we've been, and where we were thinking we would be going, and I'm sure they'll add their own expertise. We want it to be better, and it will be...
Another principal focused on shared accountability. He stated, We have increased accountability at [our school] . . . This means we have made our goals and outcomes clearer. We have also further defined individual roles and what they look like so that people can truly be more included and feel their own importance to our shared goals. The further we move along with every individual having clearly defined roles/value/and importance to our team, the more people embrace that and make us more effective as a whole school.
Although there was still evidence of deficit thinking or focusing on what participants consider 'wrong', the idea of asset-based instruction or focusing on strengths was at the beginning stages. One teacher reported, "I try to integrate things that are related, like topics that are related to the students . . . So, I choose a topic that they know in order to teach them a new strategy. So, that way, I'm now teaching a new topic and a new strategy. So, I try to integrate things that have to do with agriculture and things related that students can relate to . . . " And while the evidence-based reforms described earlier all identify culture, they do not go deeply into the cultural bias and deficit thinking that restricts or inhibits goals of equality and freedom.
Many participants spoke at length about the school development process itself. The interactions with other teams were highlighted, with one teacher sharing, "Just by talking to the other teams, some of them are also going through the same problems, seeing the same things. Some of the things that we're doing, a lot of times, sparks ideas and reflection for them. Some of the things they're doing sparks ideas for us." The walk-through process resonated with the participants, with one teacher highlighting this aspect of the process. "Walk-throughs . . . wow, that was an amazing... because I had an idea what my team does, but I don't have a chance to go into my 7th grade team, just as he doesn't have time to go into his 8th grade teams. What I expected them to see wasn't necessarily there. We actually collected data and then we shared it with our teams." Finally, the structure of the institutes, taking teams away from their schools, while difficult, was appreciated. On principal shared, " . . . it gave us a time to think and really process, and maybe still not process as far as we need to, but I so appreciate that because it made the noise, all that outside noise, go away for a little while so we're going to do this, and now here we go with it, and then we'll come back to it again, and push forward with it. I think that was the helpful piece." A team member summarized the process, stating, It's kind of like we're progressing hand in hand, or what I see this training has enabled us to do is become a team. Before we were every two weeks for 30 min before school. Where we weren't given time to gel, to use your word, to become a unit. Then what we do, then we take it back to our teams . . . We went back and we have our thing ready to go.
Across the three phases of school development implementation, we consistently observed the importance of providing teams with an immersion experience away from schools during the institutes. The other two delivery modes, however, were conducted within schools, including regional meetings and school observations.

Discussion
Many school teams were focused on outcome data as evidenced by letter grades, standardized tests, survey results, and other numerical data, such as discipline and attendance statistics. While some participants indicated concerns about the strong district and state value on numerical data, participants overwhelmingly identified the district and state trend toward the use of numerical evidence to make decisions [5,7]. Participants, perhaps unconsciously, reflected this dominant perspective about numerical tests as evidence of knowledge acquisition and other behaviors. In some cases, schools were also talking about curriculum in terms of state standards or textbooks. Across participating schools, many principals and teachers talked about leadership in terms of the individual principal with the knowledge and skills to leverage instructional improvements and gains in student outcomes. Such conceptions of individual, and even directive leadership, are frequently featured in traditional, evidence-based interventions, such as UVA, as well as mainstream effective schools literature [21]. Full participation in the process of school development generally yielded improved results according to the evidence-based conceptions held by Slavin and Hattie. State and district discursive pressures for accountability as measured by numerical evidence reinforced such values and perspectives on knowledge and leadership.
At the same time, this attention to letter grade outcomes and the need to produce evidence of growth and improvement in those terms resulted in a disconnect with the humanistic conceptions and values of Dewey. Evidence of growth in a Deweyian sense was demonstrated through democratic work as a collective aimed at continuous growth in learning. As noted in the literature, in Democracy and Education Dewey argued, in particular, that an education which only emphasizes the achievement of "external aims" (e.g., evidence from standardized test scores, grades, school letter grades, etc.) hinders students' (in this case, team leadership members') capacities for continuous growth and leads them toward viewing learning as an overly burdensome activity which they should seek to end as quickly as possible [1]. In our application of a Deweyian approach to education, we also emphasized participation and the use of achievement and survey evidence as a source of reflection. Here, we considered Biesta's [32] caution about evidence-based reforms as a democratic deficit, emphasizing how a particular use of evidence threatens to replace professional judgment and the wider democratic deliberation about the aims and ends and the conduct of education. "Calling the idea of value-based education an alternative is not meant to suggest that evidence plays no role at all in value-based education but is to highlight that its role is subordinate to the values that constitute practices as educational practices" [32] (p. 493).
It was clear that teams struggled with balancing Deweyian notions of education as noted by Biesta, finding it unnatural to hold evidence-based values and democratic education values at tension. They felt pressured to work on one area or the other rather than to balance the needs of both outcome data evidence and democratic values. Additionally, we surmise, the deficit thinking that persisted in many ways was a result of this inability to balance the tensions which prevented schools from fully addressing their internal biases. For example, the focus on test scores and the mandated breakdown and analysis by subgroups subtracts from a humanistic focus. We see the need in future work to examine how leadership teams work with and through these tensions.

Limitations
Limitations of the study, from the perspective of evidence-based research, was the lack of a control group. Additionally, the only outcome data available to the researchers was limited to state assigned letter grades, and those were limited to Phases 1 and 2 only due to a change in tests by the state. Participants in this study were all drawn from a single state with its particular cultural and historical context, limiting generalizability to other contexts. Finally, researchers had limited access to state level policy maker perspectives on evidence and what constitutes evidence in evidence-based education and what constitutes education and its values.

Future Directions
Yet at the end of the last phase of implementation in Arizona, we wondered about implementation of school development in other contexts in the U.S. For example, we see the relevance of data evidence as a source of reflection across contexts; however, education is a culturally and historically situated phenomenon. How might leadership teams work with data and plan for change in settings shaped by historical developments and other types of cultural diversity in different U.S. states? Thus, next steps in this project feature adding school development teams in another U.S. state with a different historical and cultural backdrop for education in school development.
Specifically, we intend to extend the school development process to South Carolina with its history of inequality in terms of Black-White disparities, now complicated with increasing diversity in what is becoming the global South. We seek to support educational leadership teams to reflect on the way evidence (including standardized test scores) can be thought about and constructed through Dewey's notion of democratic values of education in contexts with increasingly diverse students and differing cultural histories (e.g., the border region with its history of border politics and increasing diversity and the global South with its history of Black-White disparities and increasing diversity). As Dewey [1] argued so well, education/life is always a simultaneity of past-present and a source of renewal. In closing, we hope this discussion of evidence-based reform in school development will inspire balanced and theoretically informed approaches to meet the needs of culturally diverse students.

Conclusions
In this paper, we provided a brief overview of democratic education aims and values from Dewey and others as these aims and values are essential for education in an increasingly pluralistic world. This literature was supplemented with a brief discussion of recent empirical work on culturally responsive leadership and pedagogy [11,47]. We presented a school development process for building capacity through evidence-based reforms and the democratic, humanistic values of education in culturally diverse schools along the Arizona-Mexico border. Lessons from three phases of the school development process indicate the importance of leadership teams and the use of evidence as a source of reflection and democratic deliberations about curriculum, culture, and other dimensions of school development. We considered these evidence-based practices as important to creating deliberative spaces for our school development approach. From this perspective, evidence, including state test results and survey results were sources of reflection and planning. In so doing, we considered leadership teams as a microcosm of democratic deliberations that leaders could model, teach, and implement throughout the school. During institutes, regional coaching meetings and individual school visits, teams were provided with the opportunity to develop plans and get feedback from other teams as well as from the research team. Results were promising in terms of improved school letter grades and team member perceptions of school development in the interviews. Funding: The school development work was funded through Improving Teacher Quality grants awarded through the Arizona Board of Regents, as well as contracted work through the Arizona Department of Education. The research itself received no specific funding.