1. Introduction
School discipline refers to the policies, practices, and actions used by educational institutions to manage student behavior and maintain a safe and productive learning environment. While intended to promote order and accountability, school discipline, particularly exclusionary practices such as suspension and expulsion, has been shown to disproportionately affect low-income and minority students in the United States. Evidence indicates that Black, Latino, and Native American students are significantly more likely to receive punitive responses for similar behaviors compared to their White peers, often for more subjective infractions. These disciplinary measures are not only unequally applied, but are also strongly associated with negative long-term outcomes, including higher dropout rates, increased likelihood of arrest, and greater risks of adult incarceration [
1,
2,
3,
4]. During the 2015–2016 school year, Black students in the United States were suspended at double the rate (8%) of White students (3.8%) and Latinx students (3.5%). Black children, regardless of age or grade level, have been found to receive disproportionate punishment for the same misbehaviors as their White peers [
5,
6,
7,
8]. Further, Native American students have been found to be at similar risk for receiving this disparate treatment. In 2015–2016, American Indian and Alaska Native students were 10 times more likely than White students to receive suspension [
9,
10]. Collectively, this equates to approximately 5% to 6% of all students in the United States receiving one or more suspensions.
There is also strong evidence that school suspensions are a strong predictor of school dropout [
11]. Other researchers [
12] have found that approximately one-third of sophomores that had dropped out of school had been suspended over three times as often as their peers who remained in school. Further, longitudinal research has demonstrated that when a child was expelled or suspended, that child was more than twice as likely to be arrested within the same month when compared with a non-expelled or suspended peer [
13]. Exclusionary school discipline practices have also been found to be related with a higher risk of adult incarceration [
14,
15].
On a larger scale, one investigation encompassing over 70,000 U.S. educational districts found a significant disproportion; Black students faced over three times the likelihood of suspension or expulsion compared to their White counterparts [
16]. Additionally, a separate nationwide dataset from 2007 revealed a stark contrast—while only 18% of White high school students had ever experienced suspension, nearly half (49%) of Black students had [
17]. Moreover, the suspension rates for Black and Latino children showed an increase from 1999 to 2007, contrasting with a decrease observed among White and Asian students. Other researchers have shown [
8] that White students are predominantly referred for concrete infractions such as smoking or vandalism, whereas Black students are disproportionately referred for subjective infractions like perceived disrespect or threats.
Ultimately, the goal of equitable discipline must be to affirm student agency, restore relationships, and disrupt cycles of marginalization. Doing so requires that AI tools like ChatGPT not only be subjected to bias audits and human oversight, but also be situated within a broader commitment to educational justice—one informed by the voices and scholarship of those who have long exposed the structural racism embedded within school practices.
Artificial Intelligence (AI) has emerged as a promising tool in revolutionizing various facets of education, including the K-12 school discipline. Its applications span from behavior monitoring to predictive analysis, offering educators novel means to address disciplinary issues effectively. Within the K-12 context, AI-driven systems can be designed to analyze patterns in student behavior, predict potential disruptions, and recommend personalized interventions. What remains unclear, however, is the extent to which these tools make diagnostic predictions consistently and without bias. The present study, therefore, examined two research questions: (a) To what extent do the recommendations generated by ChatGPT in response to school discipline referral vignettes reflect alignment with evidence-based, equitable, and developmentally appropriate disciplinary practices? And (b) to what extent does ChatGPT’s output remain consistent when the ethnicity of the student is varied?
While much of the existing research on artificial intelligence in fields such as medicine, psychology, and education has focused on evaluating the accuracy and reliability of AI-generated outputs by comparing them to those produced by panels of human experts, such studies have largely emphasized diagnostic validity or performance metrics. These efforts, while valuable, often overlook the internal consistency and potential biases embedded within the AI systems themselves. In contrast, the current study represents a novel methodological contribution by comparing ChatGPT’s output against itself, by using identical prompts differing only in racial or ethnic identifiers to investigate whether racial disparities are inherently encoded in the model’s responses.
This self-comparison approach moves beyond traditional validation methods by examining the model’s internal logic and the influence of sociocultural bias in algorithmic language generation. By isolating race as the only variable and assessing possible differential responses, this study provides direct evidence of potential algorithmic bias without relying solely on external expert judgments. As such, the research not only sheds light on how large language models may perpetuate or amplify racial disparities in professional contexts (e.g., disciplinary decisions, psychological diagnoses, or educational recommendations), but also offers a new paradigm for bias detection that can be applied to future studies involving generative AI systems.
1.1. The Possible Role of Artificial Intelligence in School Discipline
One prominent application of AI in school discipline is behavior monitoring through advanced algorithms capable of processing vast amounts of data. These systems can track behavioral patterns in students, identify deviations, and flag potential concerns for further assessment. For instance, AI-powered software can analyze attendance records, academic performance, and even sentiment analysis from written assignments or discussions to detect signs of distress or behavioral issues that might otherwise go unnoticed.
Moreover, predictive analysis is a pivotal feature offered by AI systems. By leveraging machine learning algorithms, these systems forecast potential behavioral incidents based on historical data and patterns. This proactive approach enables educators to intervene preemptively, implementing targeted interventions and support strategies to mitigate or prevent disciplinary problems. Studies [
18,
19] underscore the effectiveness of predictive analysis in reducing disruptive incidents and fostering a positive school climate. Additionally, systems can analyze student data to anticipate potential conflicts or behavioral concerns, allowing for early intervention [
20]. AI-driven chatbots and virtual counselors have also been implemented to provide students with support and guidance, offering resources for conflict resolution and emotional regulation [
21].
However, the integration of AI in K-12 school discipline is not devoid of challenges and ethical considerations. One primary concern is the issue of algorithmic bias. AI models trained on historical data might perpetuate biases present in the data, leading to unfair or discriminatory outcomes, especially concerning marginalized student groups [
22]. Ensuring fairness and equity in AI algorithms demands meticulous scrutiny and continuous refinement to minimize biases. Moreover, privacy concerns loom large in the implementation of AI in schools. The collection and analysis of sensitive student data raise ethical dilemmas regarding consent, data security, and the responsible use of personal information. Safeguarding student privacy and establishing transparent protocols for data handling are imperative in the ethical deployment of AI systems in educational settings.
1.2. ChatGPT as a Particular Tool
ChatGPT, a language generation model developed by OpenAI, presents a unique opportunity for monitoring K-12 school discipline due to its natural language processing capabilities and potential for personalized interaction. One of the key applications of ChatGPT in school discipline involves providing personalized counseling and support to students. The model’s ability to engage in conversation and offer guidance can be harnessed in scenarios where students require immediate emotional support or assistance. ChatGPT can provide a non-judgmental and confidential space for students to express their concerns, seek advice, or discuss issues they might hesitate to share with adults. Studies [
23] have demonstrated the positive impact of AI-based counseling in alleviating stress and improving emotional well-being among students.
Moreover, ChatGPT can assist educators in delivering personalized interventions and behavior management strategies. By analyzing student responses and behavior patterns, the model can generate tailored recommendations for addressing specific disciplinary issues. For instance, based on inputs regarding a student’s behavior, ChatGPT could suggest appropriate intervention strategies, resources, or referrals to support services. Some research suggests that AI interventions can positively influence student behavior by fostering self-awareness and providing personalized support [
24]. However, there are concerns that overreliance on AI systems might diminish the role of educators in understanding and addressing students’ individual needs [
23].
1.3. Challenges and Ethical Considerations
The integration of ChatGPT into K-12 school discipline, however, also raises ethical considerations that necessitate careful attention. Privacy and data security are paramount concerns when implementing AI models in educational settings. Safeguarding student data and ensuring confidentiality in conversations between ChatGPT and students are critical ethical imperatives [
25]. Moreover, transparency in how the AI model operates and the limitations of its advice must be communicated clearly to students, parents, and educators to build trust and mitigate potential misunderstandings.
Another ethical consideration involves the responsibility of using AI as a supplement, not a replacement, for human interaction and decision-making. While ChatGPT can provide valuable support, it should not replace the essential role of human educators and counselors in understanding the complexities of student behavior and emotions. Studies [
19] emphasized the importance of maintaining a balance between AI-driven support and human intervention to ensure holistic student well-being.
The final, and perhaps most critical ethical consideration is the issue of bias inherent in AI algorithms. Studies have highlighted the risk of perpetuating biases present in historical disciplinary data, potentially leading to discriminatory practices [
26]. AI algorithms are already being used to monitor the online activity of students in the United States, and are like predictive models used in the criminal justice system to examine sentencing [
27]. Unfortunately, in several instances, these algorithms have been found to contain biases against minority individuals [
28,
29,
30,
31].
1.4. Vignettes in Psychological and Educational Research
Vignettes have become a valuable tool in psychological and educational research, offering controlled scenarios that simulate real-life situations. They allow researchers to manipulate various elements within the vignette—such as demographic characteristics or specific behaviors—to analyze how these factors influence perceptions, decision-making, and behavior [
32]. In clinical settings, vignettes are often used to present detailed case histories, enabling practitioners and trainees to develop their diagnostic and treatment skills for psychological disorders [
33,
34]. In education, vignettes help explore the decision-making processes of educators and students, as well as the effectiveness of different teaching approaches [
35].
This methodology offers several advantages. Vignettes allow researchers to isolate and manipulate specific variables, making it easier to study sensitive topics while preserving participant anonymity. They are also more cost-effective than large-scale studies and provide a standardized approach that enhances the replicability of research findings. However, vignettes may oversimplify complex real-world dynamics, potentially limiting the generalizability of results. Additionally, participants may respond in socially desirable ways, rather than providing authentic reactions. Ethical considerations are critical when using vignettes, as researchers must ensure that participants are not misled or emotionally distressed. Providing informed consent and a thorough debriefing helps mitigate these concerns [
36].
2. Materials and Methods
2.1. Overview of the Vignettes
A series of ten sets of vignettes (
Appendix A) were created that modeled common discipline referral problems commonly encountered by school principals (e.g., fighting, truancy, disrespect of school personnel or property). Each vignette contained background information on a hypothetical student including age, gender, and grade level, as well as information describing the reason for referral. The sets of vignettes were identical except for the ethnic background of the student, which varied across sets (e.g., White, Black, Hispanic, Asian–American, Native American). These ethnic categories were selected because they represent the major racial and ethnic groups identified in U.S. K-12 public education demographic data are commonly referenced in education equity research. Additionally, prior literature on school discipline disparities consistently highlights the overrepresentation of Black, Hispanic, and Native American students in disciplinary actions, making their inclusion essential. White students served as a baseline comparison group, and Asian–American students were included to allow for a fuller spectrum of racial analysis, especially given that they are often underrepresented in disciplinary disparity research.
The vignettes intentionally varied in terms of the type of disciplinary issue (e.g., truancy, fighting, defiance) and severity/risk level to mirror the diversity and complexity of real-world disciplinary situations faced by school personnel. This variation allowed for the examination of whether bias in AI-generated recommendations is sensitive to context, that is, whether disparities in suggested responses are more pronounced in high-stakes versus low-stakes scenarios. Including a range of behaviors also helped to assess whether the AI treats students of different ethnicities differently when the infraction is subjective (e.g., “disrespect”) versus objective (e.g., “vandalism”), a distinction found to be important in prior research on school discipline disparities.
Vignettes were kept intentionally brief yet content-rich to reflect the real-world format of many school discipline referrals, which often contain limited information. This brevity also serves to standardize the inputs across trials, ensuring consistency in what ChatGPT is evaluating, and to limit irrelevant variation. The goal was to provide enough contextual and behavioral detail to allow for meaningful analyses and interpretations of the AI’s recommendations, while minimizing the possibility that excessive information would introduce confounding variables or distract from the key factor being studied—student ethnicity. Brief vignettes also make the design more feasible for repeated trials, and easier to replicate in future studies. The vignettes attempted to capture the complexity of real-life cases, where students often present with multiple overlapping issues. These complexities make the vignettes suitable for evaluating nuanced psychological or educational recommendations, as supported by APA guidelines.
2.2. Data Collection
Vignettes were entered into ChatGPT 3.5, and output was collected and recorded for analysis. Vignettes were entered one at a time, across multiple computers and using a VPN over a three-month period. ChatGPT 3.5 was selected because at the time of the data collection, it represented the most widely accessible and publicly used version of OpenAI’s large language models. Entering vignettes individually ensured that each response was isolated and independent, reducing the likelihood that the model’s short-term memory of previous prompts influenced subsequent outputs. This was particularly important as the study examined potential differences in how the model treated variations (e.g., ethnicity) in similar scenarios. The use of multiple devices reduced the chance that model behavior was influenced by session-specific memory or device-based identifiers. Large language models like ChatGPT can sometimes retain brief session memory, which could influence the tone, content, or assumptions made in successive responses. The use of a VPN was intended to mask the IP address and location of the user to help reduce the potential for geolocation-based variation in ChatGPT responses. There is some evidence that outputs may subtly vary based on perceived user region or context, and the use of a VPN helped ensure that all prompts were treated uniformly, regardless of geographic location, improving the study’s reliability and generalizability.
The order of ethnicity was counterbalanced across trials, and all ethnicities were examined for each of the vignettes. Counterbalancing the order of ethnicities across trials was done to control for order effects and reduce potential bias in how ChatGPT processed sequential inputs. Large language models can retain the short-term memory of previous prompts in a session, so randomizing the order in which ethnicities appear helps ensure that no particular group consistently benefits from or is disadvantaged by prompt position. This methodological step enhances the internal validity of the study by helping isolate ethnicity as the only variable systematically varied.
2.3. Data Analysis
A qualitative content analysis design was used, with quantitative elements introduced via expert ratings. ChatGPT-generated responses to the discipline referral vignettes were evaluated by a panel of experienced educators for their alignment with evidence-based discipline practices. A panel of four experts served as evaluators; two were university professors in educational administration and two were public high school teachers. Each of the evaluators had over 15 years of experience in their current professional roles. Each panelist independently reviewed the ChatGPT recommendations and provided ratings on a Likert scale (e.g., 1 = not aligned at all, 5 = strongly aligned) based on clarity, development appropriateness, and alignment with the U.S. Department of Education’s Guiding Principles: A Resource Guide for Improving School Climate and Discipline [
37].
3. Results
Research Question 1 (Alignment of ChatGPT recommendations with disciplinary practices).
The first research question examined the utilization of ChatGPT in simulated disciplinary scenarios to determine if the AI-generated output was reasonable for use in a K-12 school setting. The panel of expert evaluators demonstrated a high level of agreement in their ratings, with the majority of ChatGPT-generated recommendations receiving scores of 4 or 5 on the Likert scale across all criteria. Overall, the responses were judged to be clear, developmentally appropriate, and well-aligned with the U.S. Department of Education’s Guiding Principles for Creating Safe, Inclusive, and Fair School Climates. Mean ratings across the ten vignettes ranged from 4.2 to 4.8, with particularly strong alignment noted in recommendations that emphasized restorative approaches, non-punitive interventions, and culturally responsive strategies. Overall, the results indicate that the responses generated by ChatGPT appeared to align with what would be expected from a school principal who was making decisions based upon established school district policies regarding disciplinary actions. The model demonstrated an understanding of the principles and guidelines outlined in these policies, and its output reflected a commitment to upholding these standards. For example, recommendations to contact authorities when appropriate, such as in cases involving serious misconduct or safety concerns, were consistently provided by ChatGPT. This indicates that the model is capable of incorporating legal and regulatory considerations into its decision-making process (See
Table 1).
Furthermore, ChatGPT frequently recommended engaging in dialogue with students to obtain their side of the story before determining appropriate disciplinary measures. This aligns with best practices in school discipline, which emphasize the importance of gathering all relevant information and perspectives before acting. By encouraging open communication and considering the perspectives of students, ChatGPT demonstrated a commitment to fairness and procedural justice in disciplinary proceedings.
In addition, the model consistently recommended engaging with parents or guardians as part of the disciplinary process. Recognizing the importance of parental involvement in addressing disciplinary issues, ChatGPT emphasized the need to communicate effectively with parents and collaborate with them in finding solutions. This recommendation reflects the holistic approach to discipline advocated by many educational professionals, which emphasizes the importance of partnerships between schools and families in supporting student behavior and development. Beyond the holistic recommendations made with off-campus stakeholders, many of the recommendations contained interdisciplinary supports within the school setting, including counseling, the development of behavioral intervention plans, and academic supports such as tutoring or after-school support sessions.
Importantly, the output generated by ChatGPT was characterized by clarity and straightforwardness, mirroring what would be expected in real-life disciplinary situations. The model’s responses were comprehensible and actionable, providing clear guidance on how to address disciplinary incidents effectively. This suggests that ChatGPT has the potential to serve as a valuable tool for educators and administrators in navigating complex disciplinary challenges. Restorative measures were also frequently mentioned as a way of addressing the harm caused by the student’s actions and promoting accountability and understanding within the school community.
Research Question 2. (Examination of potential racial bias in ChatGPT’s recommendations).
For the second research question, however, the results were not as reliable. For the most part, disciplinary recommendations were agnostic to race, but when variation occurred, it tended to be subtle yet problematic. When disciplinary scenarios were dangerous, violent, or possibly criminal in nature, ChatGPT’s recommendations were clear, reasonable, and consistent across student racial identities. For example, in a scenario involving the possibility of a student having drugs in his locker, the recommendations were invariant across race and included the following: secure the area, notify school authorities, follow school policies, involve law enforcement, if necessary, document the search, contact parents, and enforce consequences. However, in the scenario describing a fight in the cafeteria, the first consideration for the White student was “suspension” while for the Black, Hispanic, Asian–American, and Native American students the first consideration was “immediate suspension”. The recommendation then continued with identical language for all students, “This suspension could range from a few days to several weeks, providing time for the principal to investigate the incident and determine appropriate disciplinary measures.” (See
Table 2).
A similar finding arose with the AI-generated output associated with the scenario describing a student being disrespectful and cursing in class. For the White, Black, Hispanic, and Asian–American student, a series of logical and reasonable recommendations were put forth: investigate the incident, meet with the student, engage with the parents, review school policies, offer counseling support, develop a behavioral intervention plan with positive reinforcement, provide professional development for teachers, and explore restorative practices. Unexpectedly, for only the Native American student, the very first consideration was for the principal to consider cultural sensitivity—“Recognize and respect the student’s cultural background and heritage. Be aware of any cultural factors that may be influencing his behavior and approach the situation with cultural sensitivity.” This suggestion preceded even the number one suggestion for all other racial groups: “Conduct a thorough investigation to gather information about the incident, including speaking with the teacher, any witnesses, and the student himself, to understand the context and factors contributing to the behavior. Consider any potential cultural misunderstandings or misinterpretations,” which was moved to number two for the Native American. Further, while cultural sensitivity was added for only the Native American student, it replaced the recommendation to “Provide training or resources for teachers on effective classroom management strategies and techniques for addressing challenging behaviors”, which was missing for the Native American student but included in the recommendations for all other racial groups. A similar divergence occurred in the scenario involving a tenth-grade Hispanic student wearing a t-shirt with offensive content, such as promoting drug usage; the principal’s first AI-generated recommendation was to “Recognize and understand the cultural context of the student’s background. Some popular songs or lyrics within certain cultural contexts may not carry the same connotations as they do in others” for the Black and Hispanic students, but not for the Asian-American or Native American students.
As mentioned previously, ChatGPT output was most consistent across student racial identities when disciplinary scenarios involved dangerous, violent, or potentially criminal behavior. However, in cases of less concretely defined infractions, such as defiance or disrespect, the model’s responses demonstrated far greater variability, raising significant concerns about embedded bias. For instance, in a scenario involving a student referred for frequent defiance, the AI considered cultural sensitivity for all minority groups, but uniquely recommended “Consider Further Support or Referrals...to external resources,” only for the White student. Similarly, in a case describing frequent disrespect, the directive to “Enforce Consequences...in accordance with the school’s disciplinary policies” was made only for the White and Black students—not for their Hispanic, Native American, or Asian–American peers. Yet, in the same scenario, a recommendation to explore restorative interventions prior to suspension was offered to all students of color, but these excluded the White student entirely.
4. Discussion
4.1. Alignment of ChatGPT Recommendations with Disciplinary Practices
The findings of this study provide valuable insights into the potential applications and ethical considerations surrounding the use of AI [
28,
38], particularly ChatGPT, in K-12 school discipline scenarios. Overall, the results suggest that ChatGPT can serve as a valuable tool for educators and administrators, as its recommendations typically align closely with established school district policies and best practices in school discipline. The model consistently offered clear and actionable guidance on addressing disciplinary incidents, emphasizing important principles such as procedural fairness, parental involvement, and holistic approaches to discipline [
39,
40].
4.2. Examination of Potential Racial Bias in ChatGPT Recommendations
However, amidst these promising results, the study also reveals subtle yet concerning disparities in ChatGPT’s output across different racial identities. While the model appears generally agnostic to race in scenarios involving dangerous or criminal behavior, variations emerge in scenarios where infractions are less concretely defined, such as defiance or disrespect. In these instances, ChatGPT’s recommendations exhibited subtle biases, with certain racial groups receiving differential treatment or considerations—a pattern that mirrors real-life disciplinary inequities observed in K-12 education settings [
41,
42]. For example, in scenarios involving disciplinary incidents related to disrespect or defiance, the model occasionally suggested more punitive measures or fewer alternative interventions depending on the racial identity of the student involved. Similarly, when cultural sensitivity was at issue, the model demonstrated inconsistency, at times offering nuanced contextual understanding for students from certain cultural backgrounds while failing to do so for others [
43,
44]. These findings carry significant implications for the real-world adoption of AI tools in educational settings, particularly in areas involving student discipline and behavioral intervention. Disparities in AI-generated recommendations, even when subtle, risk undermining due process for students by introducing an opaque layer of algorithmic influence that may perpetuate or even reinforce existing biases. Without proper oversight, educators might unintentionally rely on biased outputs when making decisions that carry long-term consequences for students’ educational trajectories, disciplinary records, or access to support services.
These biases pose a direct threat to student success, as students from historically marginalized backgrounds may find themselves subject to AI-driven judgments that are neither transparent nor fairly contextualized. If students or parents perceive AI-influenced processes as biased or unfair, it may erode trust in the school system and contribute to feelings of disempowerment and disengagement. Systemic equity is also at stake. Integrating AI tools into school disciplinary frameworks without addressing and correcting these biases could codify inequitable treatment into automated systems, making it more difficult to detect and challenge discriminatory patterns. As such, any implementation of AI in school discipline must be accompanied by rigorous equity audits, stakeholder engagement (including students and families), and clear guidelines that preserve human discretion and prioritize culturally responsive practices. Ensuring that AI augments rather than undermines equitable educational experiences is not only a technical challenge, but an ethical imperative.
These findings raise important questions about the potential perpetuation of biases within AI algorithms, underscoring the need for careful consideration and scrutiny in the development and implementation of AI-driven systems in educational settings [
45]. Algorithmic bias, whether inadvertent or systemic, can significantly impact students’ experiences and outcomes, highlighting the critical importance of addressing fairness and equity in AI algorithms used in school discipline. Moreover, the study underscores the indispensable role of human oversight and intervention in the use of AI in school discipline [
46]. While ChatGPT can offer valuable support and guidance, it should not replace the essential role of human educators and administrators in understanding the nuances of student behavior and addressing disciplinary issues in a fair and equitable manner. Human judgment and discretion remain crucial in interpreting and contextualizing AI-generated recommendations, especially in sensitive areas such as school discipline where individual circumstances and cultural factors play significant roles. It is important to recognize that ChatGPT’s responses are shaped by its training data, which may not fully capture the complexities or diversity of specific school communities. Therefore, generalizing its use to broader AI applications in schools should be approached with caution and supported by ongoing research and validation.
4.3. Practical Implications
The role of K-12 school principals encompasses multifaceted responsibilities, ranging from instructional leadership to administrative management. In this digital age, the integration of Artificial Intelligence (AI) tools such as ChatGPT offers transformative opportunities for school leaders to streamline administrative tasks and workflows, enhance communication, and make data-informed decisions [
47]. Furthermore, ChatGPT can support principals in data analysis and decision-making processes. By analyzing school performance data, student achievement metrics, and staff feedback, ChatGPT can provide synthesized insights and trend analyses. This enables principals to make informed decisions regarding curriculum modifications, resource allocation, and intervention strategies. The model’s capacity to generate reports and summaries based on complex data sets facilitates a more comprehensive understanding of the school’s strengths and areas needing improvement. To ensure the responsible and effective deployment of AI tools like ChatGPT, school administrators should emphasize ethical integration throughout their implementation practices. Transparency is critical; educators, students, and families should be informed when AI is used, including explanations of its purpose, the data it draws from, and the logic behind its recommendations. Employing explainable AI approaches can help demystify complex processes and make the reasoning behind AI-generated insights more accessible to all stakeholders.
Human oversight remains essential. AI-generated outputs should not replace professional judgment, but rather serve as one of many tools to support it. Principals and educators must contextualize AI recommendations, especially when interpreting student data or making decisions that impact instructional practices or disciplinary outcomes [
48,
49]. A hybrid approach, combining algorithmic analysis with human expertise, can preserve nuance and promote more equitable educational decisions. Ethical integration also requires ongoing attention to bias mitigation and equity. Regular audits of AI tools can help detect patterns of bias in data processing, particularly when the tools influence interventions, resource distribution, or performance evaluations [
50,
51]. School leaders must ensure that the data used to train AI systems reflect student diversity and avoid reinforcing systemic disparities. Periodic audits should be documented, with outcomes and corrective actions communicated transparently to staff and the broader school community. Privacy and data security are foundational to ethical AI use in education. Administrators must ensure that any data processed by AI tools comply with existing privacy regulations such as FERPA, COPPA, and GDPR. Effective safeguards such as data minimization, controlled access, and anonymization are necessary to protect sensitive student and staff information. When administrators implement AI responsibly, with transparency, oversight, and strong governance structures, these tools can enhance leadership effectiveness while upholding the trust and rights of the school community.
ChatGPT can serve as a supplemental tool for school principals in managing discipline referrals by providing a structured and standardized platform for incident reporting and analysis. Principals can utilize ChatGPT to streamline the referral submission process, allowing teachers and staff to input incident details, student information, and contextual factors. The model’s natural language processing capabilities enable the extraction of key information from referral submissions, helping in categorizing incidents and identifying patterns or trends in student behavior. To maximize the effectiveness and ethical use of AI tools like ChatGPT in school discipline practices, it is essential to provide targeted professional development for educators, school leaders, and policymakers [
52,
53]. Training should focus on building foundational knowledge about artificial intelligence, including how language models work, their limitations, and potential biases. Educators must understand how to critically assess AI-generated outputs, ensuring that such tools complement
but not replace human judgment and culturally responsive practices. Professional development should also address issues of data privacy, equitable implementation, and the importance of transparency when using AI in student-related decisions [
54,
55,
56].
For policymakers, learning opportunities should emphasize the regulatory, ethical, and policy implications of AI integration in education, including establishing clear guidelines for data governance and accountability. Encouraging collaboration between educators, technologists, and researchers can also foster responsible innovation and help ensure that AI tools are used in ways that support equity, student well-being, and instructional improvement. By equipping all stakeholders with the knowledge and skills to apply AI responsibly, schools can better harness its potential while safeguarding against misuse.
Moreover, ChatGPT’s ability to generate personalized responses offers a unique opportunity for principals to engage with students involved in disciplinary incidents. When a referral is submitted, ChatGPT can initiate a conversation with the student, offering immediate guidance, empathy, and support. When ChatGPT was first introduced to the public, the Terms of Use required users to be at least 18 years old. That number changed in the spring of 2023 to 13, with an additional requirement for parental consent for those 18 and under. This approach allows for timely intervention and the provision of resources or interventions tailored to the specific needs of the student. Studies [
57] demonstrated the effectiveness of AI-driven personalized interventions in reducing recurrent disciplinary incidents among students.
However, the implementation of ChatGPT by school principals also necessitates considerations of ethical and practical implications. Data privacy and confidentiality remain paramount concerns when utilizing AI tools in educational settings. Principals must ensure that sensitive information shared with ChatGPT adheres to data protection regulations and that conversations are secure and confidential. Moreover, proper training and guidelines for principals on the ethical use of AI are crucial to maintain accountability and transparency in decision-making processes. Additionally, ChatGPT can assist principals in analyzing disciplinary data over time, identifying recurring issues, and assessing the effectiveness of implemented interventions. By aggregating and analyzing referral data, the model can generate reports highlighting trends, behavioral patterns, and the impacts of interventions. This data-driven approach empowers principals to make informed decisions regarding discipline policies, targeted interventions, and resource allocation to address prevalent issues.
5. Conclusions
In conclusion, while AI holds promise in revolutionizing K-12 school discipline, its integration must be accompanied by rigorous ethical considerations and ongoing efforts to address algorithmic bias. The findings of this study underscore the need for further research and development to ensure that AI-driven systems in education are fair, equitable, and aligned with the principles of procedural justice and student well-being. Additionally, continued human oversight and intervention are essential to complement the capabilities of AI and ensure that disciplinary practices remain grounded in fairness, equity, and respect for all students, regardless of their racial identity. As posited [
58,
59], school administrators will benefit from coaching and mentoring practices that encourage effective instructional management strategies, as part of system-wide school reform. In this regard, principals and educators involved in school discipline decisions must recognize the complementary nature of AI tools in combination with their expertise and experience, using ChatGPT as a support mechanism rather than a sole decision-making authority.
The findings affirm a central tenet of current scholarship: that AI systems can reproduce or amplify existing racial and systemic biases if not carefully designed and monitored. This is consistent with foundational studies, such as Benjamin’s
Race After Technology [
60], which underscores how data-driven systems often reflect the inequities of the environments in which they are developed. Similarly, works [
61,
62] argue that algorithmic tools can encode forms of structural discrimination, particularly against historically marginalized communities. The recommendation for complementing AI with educator expertise reflects best practices recommended in the literature for AI integration in learning and administration [
63].
These patterns reflect broader, systemic concerns about bias in school discipline that have long been documented in education research [
64,
65]. Foundational scholarship, such as Gloria Ladson-Billings’ work on culturally relevant pedagogy, emphasizes how educational systems often fail to affirm and respond to the cultural knowledge and lived experiences of racially minoritized students [
66,
67]. The selective application of cultural considerations by an AI model, as seen in the examples above, may unintentionally reinforce a deficit-oriented view whereby only non-White students are framed as requiring cultural or behavioral “adjustment,” while White students are framed as warranting individualized support or referrals. This dynamic mirrors a persistent inequity in schools, where students of color are often pathologized rather than supported holistically.
Furthermore, Derrick Bell’s theory of interest convergence within critical race theory suggests that systemic changes, such as efforts toward racial equity, are often tolerated only when they align with the interests of those in power [
68,
69]. Algorithmic decision-making systems like ChatGPT risk reproducing such structures if they are deployed without critical oversight, reinforcing existing hierarchies under the guise of neutrality. While AI tools are often perceived as objective, they are trained on data shaped by historical and institutional inequities. As a result, AI outputs may encode and perpetuate the very disciplinary disparities they are intended to help mitigate.
The inconsistent application of support, consequences, and cultural framing in these scenarios reveals the danger of relying on AI tools without embedding a deep understanding of systemic racism and its manifestations in schools. If AI recommendations guide principal decision-making in ways that unevenly distribute punishment or support across racial lines, they may entrench rather than disrupt the school-to-prison pipeline. This makes the integration of critical pedagogy and anti-racist frameworks—not just technical refinement—essential in the design and application of AI in education. The work of Ladson-Billings and Bell calls on educators and policymakers to look beyond surface-level fairness and ask deeper questions about whose interests are being served, whose behaviors are being scrutinized, and whose dignity is being preserved or denied in systems that adopt AI.
6. Limitations and Future Research
Despite its contributions, this study has several limitations that warrant consideration. First, the analysis was limited to a single AI model, ChatGPT 3.5, at a fixed point in time. As AI models are frequently updated, the findings may not reflect changes made in subsequent iterations, potentially limiting the generalizability of the results over time. Second, the study relied on a relatively small set of vignettes and a limited range of disciplinary scenarios. While efforts were made to capture a variety of referral types and student backgrounds, the sample may not encompass the full range of behaviors or contextual nuances encountered in real-world school settings. Additionally, the study examined only five racial/ethnic categories, which, while representative of major demographic groups in U.S. schools, may oversimplify the complexity of student identity and intersectionality (e.g., race, gender, disability status). Another limitation involves the inherent subjectivity in interpreting what constitutes a “biased” or “equitable” recommendation. Although expert evaluators were used to assess alignment with best practices, their judgments are shaped by personal, professional, and cultural perspectives.
Future research should expand the scope of analysis to include a broader range of AI models and disciplinary contexts, including those involving intersectional student identities. Longitudinal designs that examine how AI-generated recommendations evolve over time as the technology changes would also be valuable. Moreover, engaging a more diverse panel of evaluators—including school counselors, students, parents, and equity specialists—could enrich the interpretation of what constitutes fairness and appropriateness.