Exploring the Use of AI to Optimize the Evaluation of a Faculty Training Program

Míguez-Souto, Alexandra; Gutiérrez García, María Ángeles; Martín-Núñez, José Luis

doi:10.3390/educsci15101394

Open AccessArticle

Exploring the Use of AI to Optimize the Evaluation of a Faculty Training Program

by

Alexandra Míguez-Souto

^1,*

,

María Ángeles Gutiérrez García

²

and

José Luis Martín-Núñez

³

¹

UAM Doctoral School, Universidad Autónoma de Madrid (UAM), 28049 Madrid, Spain

²

Pedagogy University Department, Universidad Autónoma de Madrid (UAM), 28049 Madrid, Spain

³

Institute for Educational Sciencies (ICE), Universidad Politécnica de Madrid (UPM), 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2025, 15(10), 1394; https://doi.org/10.3390/educsci15101394

Submission received: 9 August 2025 / Revised: 3 October 2025 / Accepted: 10 October 2025 / Published: 17 October 2025

(This article belongs to the Topic AI Trends in Teacher and Student Training)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study examines the potential of the AI chatbot ChatGPT-4o to support human-centered tasks such as qualitative research analysis. It focuses on a case study involving an initial university teaching training program at the Universidad Politécnica de Madrid (UPM), evaluated through student feedback. The findings indicate that ChatGPT can assist in the qualitative analysis of student assessments by identifying specific issues and suggesting possible solutions. However, expert oversight remains necessary as the tool lacks a full contextual understanding of the actions evaluated. The study concludes that AI systems like ChatGPT offer powerful means to complement complex human-centered tasks and anticipates their growing role in the evaluation of formative programs. By examining ChatGPT’s performance in this context, the study lays the groundwork for prototyping a customized automated system built on the insights gained here, capable of assessing program outcomes and supporting iterative improvements throughout each module, with the ultimate goal of enhancing the quality of the training program

Keywords:

program evaluation; faculty development; initial training; university; artificial intelligence

1. Introduction

The continuous enhancement of university teaching quality is crucial for fostering effective student learning (Gibbs & Coffey, 2004), particularly within demanding fields such as sciences and technologies. Initial teacher training programs play a critical role in equipping novice faculty members with the pedagogical skills and contextual understanding necessary to excel in higher education (De la Cruz, 2000). These programs aim to transform instructors into impactful educators, ultimately benefiting the learning outcomes of future faculty.

However, evaluating the effectiveness of such training programs, especially through rich qualitative data like student feedback, is a complex and human-intensive task. Traditional qualitative analysis, while insightful, can be time-consuming and resource-demanding, posing challenges for timely and iterative program improvements. This challenge underscores the need for innovative approaches to streamline and enhance the evaluation process for formative programs.

In recent years, Artificial Intelligence (AI) and large language models, such as ChatGPT, have demonstrated considerable potential to support complex, human-centered tasks, including qualitative data analysis (Hamilton et al., 2023). These AI tools can assist in identifying patterns, synthesizing information, and even proposing solutions from unstructured textual data. While their capabilities are rapidly expanding, their specific application and effectiveness in the nuanced domain of evaluating initial university teaching training programs remain an area ripe for focused investigation.

This study shows the potential of the AI chatbot ChatGPT to support the qualitative analysis of student feedback collected from an initial university teaching training program at the Universidad Politécnica de Madrid (UPM). By focusing on a real-world case study, our research seeks to understand how AI can assist the UPM faculty development team in identifying critical issues within a training program, analyzing their root causes, and propose strategic solutions.

This study addresses a specific gap in current research: the use of AI to assist in the evaluation of training programs. While previous literature on AI in education has mainly focused on learning outcomes or student assessment, our work explores how AI, specifically ChatGPT-4o model, can support the systematic evaluation of training programs based on student feedback. We propose a reproducible human–AI hybrid workflow and demonstrate its value through a longitudinal case study at the Universidad Politécnica de Madrid (2012–2024). The research contributes in three key areas: (i) it introduces a method for qualitative analysis using AI to support program evaluation; (ii) it applies this method to a large-scale, real-world dataset to justify the role of AI; (iii) it explores how effectively ChatGPT can support the work of faculty development teams, laying the groundwork for a future custom GPT system that assists in real-time analysis and strategic feedback.

The insights gained aim to provide valuable recommendations for optimizing faculty development. We argue that the careful integration of AI systems can enhance complex, human-centered processes and play a growing role in the ongoing improvement of formative programs.

This article proceeds by first detailing the context of faculty training in Spain and program evaluation models, followed by a discussion on the role of AI in educational contexts. Subsequently, it outlines the methodological approach, presents the findings, and concludes with a discussion of the implications, limitations, and future research directions.

1.1. Faculty Training in Spain

The 21st century has brought profound changes that directly impact higher education. Digitalization, globalization, and the growing need for lifelong learning have created new demands for university faculty. Universities are now expected not only to train highly qualified professionals but also to ensure that their faculty can deliver inclusive, high-quality, and future-ready teaching. This evolving context requires a profound rethinking of faculty development, aligning it with the demands of the knowledge society and the principles of professional teaching practice.

Despite policy advances, faculty development programs in Spanish universities remain fragmented and uneven. According to Pérez-Rodríguez (2019), most institutions continue to offer short, targeted courses, while comprehensive and sustained training pathways are still scarce. Montes and Suárez (2016) point out that such initiatives are often voluntary and poorly connected to actual teaching practice, limiting their transformative potential. Haras (2018) argues for an authentic approach to professional development based on situated work, peer mentoring, and communities of practice that integrate learning into real educational contexts.

A persistent gap remains between faculty training and its practical application in the classroom. Feixas et al. (2015) and Haras (2018) emphasize that many programs fall short in generating meaningful pedagogical change. In particular, Feixas et al. (2015) review the limited empirical evidence available in Spain regarding the effectiveness and actual transfer of faculty development programs into daily teaching. As a result, even well-intentioned initiatives often fail to produce sustainable improvements. Benavides and López (2020) underscore the need to strengthen impact evaluation as a core component of faculty development.

Recent literature has advanced the conceptualization of teaching competencies. Cristi-González et al. (2023) propose a comprehensive framework organized into seven cross-cutting axes and four functional areas, enabling universities to tailor training interventions to specific priorities. In the field of science education, Porlán et al. (2024b) suggest an integrated model that links theory, practice, and pedagogical research. Buils et al. (2024) highlight the increasing relevance of digital competencies, such as networked collaboration, digital communication, and professional engagement. These contributions converge on a shared premise: faculty development should not merely inform but empower teaching transformation.

Initial teacher training has assumed a central role in higher education reforms in Spain and across Europe. Traditionally, teaching in universities was subordinated to disciplinary expertise, with little or no formal pedagogical preparation (Tiana, 2013). However, recent studies (Malagón et al., 2025) emphasize the need to equip university educators with explicit teaching competencies from the start of their careers. This shift reflects the growing recognition of university teaching as a profession, requiring structured, research-based preparation.

At the European level, the European Higher Education Area (EHEA), driven by the Bologna Process, has contributed to this paradigm shift. The Bologna Process Implementation Report recognizes the pedagogical training of novice faculty as a strategic priority and promotes student-centered learning and the recognition of teaching as an essential function (European Commission/EACEA/Eurydice, 2020). A key milestone in Spain has been the approval of Organic Law 2/2023 of the University System (LOSU), which mandates the completion of initial pedagogical training during the first year of employment for Assistant Professors (Gobierno de España, 2023). Although this legal measure establishes a formal requirement, research indicates that implementation remains insufficient and lacks coherence across institutions (Malagón et al., 2025; Hinojo et al., 2020).

Teacher training centers and Institutes for Educational Sciences (ICE) play a strategic role in articulating institutional needs with professional development initiatives. As intermediaries, they ensure that training is aligned with evolving academic demands and quality standards. Strengthening these structures is essential to the sustainability of faculty development and to ensure its institutionalization (Porlán et al., 2024a).

In this context, the work of Paricio et al. (2019) and Hinojo et al. (2020) highlights the increasing complexity of faculty roles, which must integrate teaching, research, and academic management. Addressing this complexity requires overcoming reductive views of university teaching and adopting academic approaches rooted in educational research and continuous improvement.

Initial teacher training is a foundational element for improving the quality of higher education. While legislative and institutional progress has been made, much remains to be done to develop structured, coherent, and sustainable training policies. These policies must bridge theory and practice, foster professional learning communities, and recognize university teaching as a specialized field of knowledge and practice. Only under these conditions can higher education institutions ensure meaningful student learning and sustained faculty development from the earliest stages of academic careers.

1.2. Evaluating Training Programs in Higher Education

The systematic evaluation of faculty development programs is essential for ensuring their relevance, quality, and contribution to institutional improvement. These evaluations help identify strengths and areas for improvement, ensure alignment with the evolving needs of academic staff, and support timely updates that keep teaching practices effective and current. In Spain, as Rué et al. (2013) note, higher education has traditionally prioritized research output and student performance indicators, often overlooking how teaching actually happens and how it can be improved. Moving beyond this limited view requires institutional frameworks that support skill-based training and incorporate evaluation from the start.

Recent models propose more structured and participatory approaches. Losada and Moreno (2024) developed a validated tool to assess program design using five key dimensions: contextual fit, content, technical quality, evaluability and feasibility. Similarly, García et al. (2020) outline eight broader criteria used in institutional evaluation, ranging from teaching and student engagement to infrastructure and social responsibility, emphasizing the need to rethink rigid evaluation models and adopt more dynamic and student-centered approaches. These developments also open the door to integrating technologies such as AI to support data analysis on a large scale. Ochoa Oliva (2022) reinforces this perspective, calling for flexible, tech-enabled systems that respond to real institutional needs.

However, most training programs still rely heavily on satisfaction surveys and do not track whether participants apply what they have learned. Fernández (2008) addresses this gap with a comprehensive evaluation model structured around four key dimensions—planning, implementation, satisfaction, and impact. Her proposal emphasizes the importance of evaluating programs across the full lifecycle of training, from initial design to long-term effects on teaching performance. In a similar direction, Tejada and Ferrández (2012) emphasize two key dimensions that should guide evaluation: the effectiveness of the learning experience and the applicability of acquired skills to teaching practice. Effective programs align with institutional goals, promote active participation, offer timely feedback, and create conditions that support the practical transfer of learning.

Involving faculty and students in the evaluation process is crucial. Tello Díaz-Maroto (2010) highlight that participant feedback, particularly from students, should be central in assessing satisfaction, learning and transfer. She proposes a systemic and continuous cycle that integrates design, implementation, evaluation, and modification phases. This model encourages identifying areas for improvement not only after a course concludes, but also before it starts (through careful planning and relevance checks) and during its delivery (by monitoring participation, adjusting content, and collecting real-time feedback). Supporting this innovative view, Parra and Ruiz (2020) highlight that feedback from faculty offers key insights into how programs are received and applied. Rather than being limited to accountability, evaluation should serve as a tool for internal learning and redesign.

Institutional self-evaluation has also gained traction as a valuable strategy. Freire and Intriago (2025) present self-assessment plans as tools to identify weaknesses and drive professional development. Freire and Intriago (2025) take this further with a participatory model based on surveys, focus groups, SWOT matrices and quality indicators. Their holistic and participatory approach promotes collaborative reflection and emphasizes quality as an ongoing process involving pedagogical, organizational and social dimensions.

Finally, several institutions have begun embedding evaluation into faculty development programs with measurable impact. Ruiz-Cabezas et al. (2022) report improvements across core teaching competencies when training is well-structured and strategically supported. These findings confirm the importance of evaluation not only for judging outcomes but for sustaining a culture of continuous improvement.

1.3. The Use of ChatGPT to Support Qualitative Analysis: Potential, Limitations and Hybrid Approaches

ChatGPT, developed by OpenAI and released in November 2022 (Hamilton et al., 2023), is a generative AI model built on the GPT Transformer architecture. It was trained on large volumes of internet text and fine-tuned using reinforcement learning from human feedback (OpenAI, n.d.). As a natural language processing (NLP) system, it generates human-like text word by word while maintaining coherence across conversational turns. In addition to its general-purpose capabilities, OpenAI offers customizable versions, known as custom GPTs, that can be tailored for specific tasks using prompt engineering, integrated knowledge sources, and domain-specific constraints (Ogundoyin et al., 2025).

This capacity for customization has opened the door to academic applications. Kabir et al. (2025), for example, developed domain-specific GPTs designed for scientific writing. Their models, Neurosurgical Research Paper Writer and Medi Research Assistant, were configured to improve the structure, accuracy, and reliability of research writing. These findings demonstrate that careful calibration reduces hallucinations and enhances the quality of academic outputs. In the context of qualitative analysis, this suggests that GPT models can be configured not only to support writing but also to extract, summarize, and structure complex textual data, provided that the process is monitored critically.

Building on this potential, researchers have started evaluating ChatGPT’s performance in actual qualitative research contexts. Hamilton et al. (2023) compared AI-assisted analysis with human-led coding of interview data from a guaranteed-admission program. ChatGPT was able to identify themes that broadly aligned with those detected by human analysts, while also proposing novel groupings and patterns. Its ability to handle large datasets and produce coherent drafts highlights its usefulness in the early stages of analysis.

However, the benefits come with important caveats. Multiple studies caution that ChatGPT lacks interpretive depth and contextual awareness. For instance, Morgan (2023) emphasizes that while the model can detect patterns, it struggles with the nuance required in reflexive thematic analysis. Moreover, issues such as factual inaccuracies, citation errors, and inconsistency across runs have been well-documented (Klyshbekova & Abbott, 2024; Dengel et al., 2023). Its effectiveness also varies by domain, with better results reported in scientific contexts than in more abstract or human-centered fields like education (Hunkoog & Minsu, 2024).

These limitations are not just technical, they are epistemological. As D’Oria (2023) points out, language models do not generate new insights but reorganize existing knowledge based on statistical regularities. This makes them unsuitable as stand-alone tools in contexts that require interpretation, synthesis, and critical engagement. Instead, scholars recommend integrating AI within hybrid frameworks where it complements, rather than replaces, human reasoning (Parker et al., 2023).

Indeed, several authors stress the importance of maintaining human oversight when deploying ChatGPT in research workflows. Goyanes and Lopezosa (2024), for example, argue that while large language models can assist with systematic reviews, interviews, and content analysis, their output must be critically validated. Similarly, Morgan (2023) limits their role to exploration tasks within ethical research frameworks. Together, these studies converge on a key point: AI can extend human capabilities but must remain subordinate to expert judgment.

This human-in-the-loop approach is particularly relevant in education, where the stakes of misinterpretation are high. Recent research shows that AI tools like ChatGPT are reshaping how educators approach feedback, assessment, and learning design. Studies by Saltos-García et al. (2024) and Guijarro (2024) highlight the efficiency gains in handling large volumes of student feedback, while others, such as Chaves and Saborío-Taylor (2025) and Reyes-Zúñiga et al. (2024), emphasize improvements in student engagement, reflection, and satisfaction.

Expanding on this, Oshanova et al. (2025) explored ChatGPT’s ability to provide formative feedback on student writing. Their study revealed that the model not only identified structural and argumentative weaknesses but also generated constructive, personalized recommendations. Students rated the feedback as clear, helpful, and timely, especially in large courses where instructors faced time constraints.

The growing use of NLP techniques in education reinforces these findings. According to Shaik et al. (2022), automated analysis of student comments allows institutions to uncover patterns, emotions, and suggestions that are often missed in manual reviews. Techniques such as sentiment classification, topic modeling, and semantic annotation help educators make data-informed decisions and personalize their learning experience. Notably, these tools are evolving to detect subtleties like sarcasm or ambiguity, increasing their interpretive power.

Liu et al. (2016) offer a complementary perspective with their DEI-Text Mining model, which combined emotion recognition and topic mining to analyze feedback in MOOCs. By identifying emotion–theme pairs, the system revealed strengths like instructor expressiveness, as well as weaknesses such as limited interaction or fast pacing. These insights directly informed course improvements, demonstrating how AI can support pedagogical refinement.

At the institutional level, DiSabito et al. (2025) propose integrating ChatGPT into evaluation systems to reduce administrative workload and support decision-making. Their model allowed instructors to focus more on interpreting results and planning pedagogical improvements. However, they also flagged risks such as implementation bias and detachment from students’ actual performance, reinforcing the need for human oversight.

In a closely related effort, Borakati (2021) applied machine learning and NLP to evaluate a medical e-learning course. His analysis of large-scale qualitative and quantitative feedback produced actionable recommendations for course design. While the domain differs from the present study, methodological alignment makes this work an important precedent.

Additional research from fields such as linguistics, health, and social sciences shows similar trends. Studies by Sbalchiero and Eder (2020), Koltai et al. (2021), and Natukunda and Muchene (2023) used text mining and topic modeling to analyze massive datasets and identify meaningful patterns. Others, like Rosalind and Suguna (2022), have taken a predictive turn, using AI to anticipate student satisfaction through behavioral data. These developments reflect a growing consensus: AI can augment analysis at scale, but its output must be interpreted through human expertise.

Following this hybrid model, our own study involved a team of teacher educators working alongside ChatGPT to analyze qualitative feedback from participants in professional development courses. The AI helped organize and explore the data, identifying recurring patterns and structuring initial insights. However, all interpretations and conclusions were ultimately refined and validated by human analysts.

Finally, ChatGPT can be situated within a broader trajectory of automated qualitative research tools. Earlier techniques, such as topic modeling, grouped words into themes using unsupervised algorithms (Borakati, 2021). While useful for extracting sentiment and dominant topics, these methods offered limited contextual understanding. ChatGPT represents a leap forward in that it combines textual generation with contextual coherence—but it does not eliminate the need for careful, critical interpretation. Especially in education and other socially sensitive domains, responsible use requires ethical vigilance, transparency, and human judgment at every stage.

Although artificial intelligence has been increasingly applied to educational contexts, most existing studies focus on predictive analytics, for example, forecasting student performance or satisfaction (Rosalind & Suguna, 2022). Far less our research has explored the retrospective use of AI, particularly large language models such as ChatGPT, to evaluate and improve a faculty training program. This gap is especially evident in initial teacher training within higher education, where program evaluation still relies heavily on manual coding and interpretation of open-ended student feedback. By targeting this underexplored area, our study responds to the need for evidence-based, scalable methods that combine the strengths of human expertise with AI-driven analysis.

This study is relevant because it extends existing methodological approaches in two important ways. First, while previous research has shown that AI can handle large datasets, classify sentiment, or identify patterns in student comments (Shaik et al., 2022; Liu et al., 2016), it has not yet been systematically applied to the evaluation of teacher training programs. Second, our approach goes beyond descriptive analytics by integrating ChatGPT into a hybrid human–AI process designed to generate actionable recommendations for program improvement. This novelty positions our work at the intersection of educational evaluation, AI application, and faculty development.

To address this gap, we pose the following research questions: (i) How can ChatGPT-assisted analysis of student feedback enhance the identification of critical issues in a university’s initial teacher training program? (ii) Can a hybrid human–AI approach provides deeper insights into the causes of these issues and generate more targeted recommendations than traditional manual analysis alone?

Our working hypothesis is that a hybrid model combining ChatGPT and human expertise will offer added value over traditional methods by enabling faster detection of patterns, more systematic organization of data, and more consistent formulation of actionable recommendations.

This research focuses on the Initial Teacher Training program delivered by the Institute for Educational Sciences (ICE) at the Universidad Politécnica de Madrid (UPM), one of Spain’s top technical universities. UPM has a long-standing reputation in engineering and applied sciences, with a faculty primarily composed of specialists from technical fields. Ranked among Spain’s top polytechnic universities, UPM remains at the forefront of technological teaching and research (QS Top Universities, 2025). In this context, initial teacher training must be tailored not only to general pedagogical principles, but also to the specific demands of an innovation-driven, engineering-oriented academic environment.

The ICE program is both consolidated and nationally recognized. Sánchez Núñez (2007) conceptualized it as a comprehensive, reflective process that combines a pedagogical course, a supervised practicum, and a final innovation project, all supported by institutional structures and grounded in critical reflection. More than a decade later, Martín et al. (2018) conducted a large-scale empirical study with 198 participants who completed the program. Their findings confirmed the program’s strong value for faculty: 77% enrolled with the aim of improving their teaching competencies, particularly appreciating tools for planning, assessment, and active learning. Furthermore, over 96% of participants stated they would recommend the training, citing increased confidence in innovating in the classroom and the opportunity to share experiences with colleagues from different disciplines. The authors also identified areas for future improvement, such as enhancing post-training follow-up and providing stronger institutional recognition for pedagogical development efforts.

Building on this strong foundation, the ICE faculty development team launched a new phase of program review, driven by the need to keep pace with the shifting demands and expectations of a new generation of novice instructors at UPM. To support this renewal process, an in-depth and systematic analysis of student feedback was deemed essential. This study was designed to meet that need, using a hybrid human–AI approach to analyze qualitative perceptions and generate evidence-based insights that could inform strategic improvements and ensure the program remains aligned with UPM’s educational mission.

The general aim of this study is to evaluate and optimize the ICE initial teacher training program through a hybrid human–AI process using ChatGPT, enabling the identification of critical issues, understanding their root causes, and formulating strategic, evidence-based recommendations for continuous improvement.

Building on this overarching aim, the study pursues the following specific objectives:

To analyze survey responses evaluating the initial teacher training program offered by ICE, with the aim of identifying critical areas and recurring issues and assisted by ChatGPT.
To explore the root causes of the identified critical aspects through thematic grouping and visual organization into a cause-and-effect diagram, assisted by ChatGPT.
To propose strategic solutions and concrete actions by compiling a table that links each identified issue with potential interventions, assisted by ChatGPT.
To formulate student-based recommendations to address the program’s deficiencies, serving as a basis for discussion by the faculty development team, assisted by ChatGPT.

By filling the identified gap in the literature, this study contributes to both research and practice. It offers a replicable model for using ChatGPT in faculty training evaluation, demonstrates the feasibility of hybrid human–AI approaches in higher education program assessment, and provides empirical evidence on how AI can support continuous improvement in teacher training programs. These contributions address a critical need in educational research for scalable, transparent, and ethically supervised methods that go beyond predictive analytics to actively inform decision-making and program design. By examining ChatGPT’s performance in this context, the study also lays the groundwork for prototyping a customized automated system, built on the insights gained here, capable of assessing program outcomes and supporting iterative improvements throughout each module, with the ultimate goal of enhancing the quality of the training program.

2. Materials and Methods

2.1. Study Context and Methodological Framework

This study examines the Initial Teacher Training program offered by the Institute for Educational Sciences (ICE) at the Universidad Politécnica de Madrid (UPM), a 15-ECTS course primarily designed for novice faculty beginning their university teaching careers, although experienced instructors seeking to update their teaching skills are also encouraged to participate (Instituto de Ciencias de la Educación–UPM, 2025). The program follows a blended-learning model, combining weekly four-hour in-person sessions with complementary online assignments. Its curriculum is structured around eight autonomous 50 h modules delivered from November to June, covering topics such as teaching planning (PLAN), instructional methods (METs), learning technologies (TECs), student assessment (EVA), academic guidance (TUT), university organization and psychology (PSICO), educational innovation (INNOVA), and classroom practice and communication techniques (PRACDOs).

An optional 5 ECTS practicum allows participants to apply their learning in real academic settings, supported by a senior subject-matter tutor and an ICE mentor. Throughout the program, participants develop key competencies such as competency-based planning, instructional design, digital tool integration, learning assessment, tutorial support, application of psychological theories to teaching, and the implementation of pedagogical innovations. These are complemented by transversal skills such as teamwork, institutional awareness, and a student-centered mindset, all oriented towards the professionalization of university teaching and fostering a reflective teaching culture.

The present study emerged from the ICE faculty development team’s strategic interest in reviewing and improving the program’s design. While previous studies (e.g., Martín et al., 2018) have confirmed the program’s strengths and positive impact, the team sought a fresh and more systematic perspective that would support the program’s continuous improvement, considering emerging challenges and changing faculty needs. Two of the three authors of this article are directly involved in the design and delivery of the program, which poses a challenge in terms of evaluative objectivity. To mitigate this insider bias and introduce a degree of analytical distance, we adopted a qualitative research approach assisted by artificial intelligence, specifically, ChatGPT.

Qualitative analysis was chosen because it enables an in-depth exploration of how participants perceive, interpret, and experience the program (Moustakas, 1994). Our data consists of open-ended responses collected through the ICE’s final course survey, which gathers feedback from participants upon completion of the training. Given the volume of qualitative data and the need for efficient yet nuanced analysis, we opted to integrate ChatGPT as a virtual assistant in a hybrid process combining machine-assisted processing with expert human oversight.

This decision is grounded in two rationales. First, leveraging a customized GPT system allows for faster, more scalable analysis of open-ended responses, with the potential to detect patterns, group themes, and propose recommendations in a more structured manner. Second, and more importantly, ChatGPT introduces a form of analytical neutrality: by generating summaries and interpretations based solely on textual input, and without prior exposure to the institutional context (OpenAI, 2025a), it offers a perspective less affected by the implicit subjectivity of trainers evaluating their own program.

Furthermore, we chose to prioritize student feedback because the participants, novice university lecturers already engaged in teaching, are in a strong position to critically reflect on the relevance and applicability of the training. Their insights, grounded in early teaching practice, are not only fresh but also context-aware, making them a valuable source of information for evidence-based program improvement.

In this process, ChatGPT is not used to replace human judgment, but to assist in data synthesis and interpretation. The ICE faculty development team, acting as domain experts, oversees each step of the analysis, reviews AI-generated outputs, and validates or adjusts findings based on pedagogical expertise. The central research question guiding the study is: What specific actions can be taken to enhance the ICE initial training program, based on student feedback and AI-assisted analysis? The ultimate goal is to inform a redesign of the program’s core components that better meet the evolving needs of early-career university faculty, while aligning with broader institutional and regulatory expectations.

2.2. Participants and Sampling

The participants in this study were university faculty enrolled in the Initial Teacher Training program at the Universidad Politécnica de Madrid (UPM) between the academic years 2011 and 2012, and 2023 and 2024. Over this 13 year period, the program was delivered annually, with each cohort averaging approximately 25 participants. Enrolled instructors included early-career teaching staff such as PhD candidates, teaching assistants, and assistant professors. Cohort sizes varied slightly from year to year, largely due to participant attrition caused by scheduling conflicts, particularly among those engaged in international research activities during their doctoral training.

All participants were invited to complete a final online survey (see Appendix A) at the end of the course. Participation in the survey was voluntary, and responses were anonymized. In this study, only open-ended responses were analyzed. After screening for completeness and internal coherence, the final dataset comprised three categories of qualitative feedback: 211 comments describing the most valued aspects of the program, 195 comments identifying the least appreciated features and 190 suggestions for program improvement.

These responses, expressed in participants’ own words, reflect personal teaching experiences and professional expectations at early stages of academic careers. They form the empirical basis for the qualitative analysis carried out using a hybrid human–AI approach.

2.3. Data Collection

Data were collected from 13 editions of the Initial Training Program for University Teaching at UPM, spanning the academic years 2011–2012 to 2023–2024. At the conclusion of each course edition, participants were invited to complete an anonymous and voluntary online evaluation survey (see Appendix A), accessible only to those enrolled in the program. The survey was conducted during the final face-to-face session and required approximately 15 min to complete.

The questionnaire gathered both quantitative and qualitative data regarding participants’ perceptions of the course content, structure, materials, and perceived gains in teaching competence. It included a combination of Likert-scale items and open-ended questions. Participation was incentivized and each participant was allowed to submit only one response.

For the purposes of this study, we focused exclusively on the qualitative data derived from open-ended questions 8, 9, and 10 (see Table 1). These questions asked participants to describe: (1) the most valued aspects of the course, (2) the least appreciated elements, and (3) their specific suggestions for improvement. These responses were expressed in the participants’ own words and provided rich insights into the course’s perceived strengths, limitations, and potential areas for enhancement.

All responses were submitted in Spanish, the language of instruction of the program. No revisions were allowed after submission, and only complete, coherent responses were included in the final dataset for analysis. The questionnaire remained unchanged throughout the entire 13 year period of data collection, ensuring consistency across cohorts and facilitating longitudinal comparability of participant feedback.

2.4. Data Analysis

We conducted the qualitative data analysis using a hybrid human–AI workflow in eight structured phases (see Figure 1). The process focused on the first research objective: identifying critical aspects of the ICE training program and generating student-informed suggestions for its improvement. To carry out this analysis, we used ChatGPT-4o (GPT-4 Omni), OpenAI’s multimodal language model launched in May 2024. Although capable of processing text, voice, and image inputs, our use was limited to textual processing (OpenAI, 2025b).

The role of ChatGPT was assistive rather than decisional, supporting the coding, clustering, and interpretation of free-text responses. All outputs were reviewed, validated, or corrected by two subject-matter experts from the ICE faculty development team. This approach allowed us to combine the pattern recognition speed of AI with the contextual judgment of experienced educators, while mitigating potential model biases or hallucinations.

We used standardized prompts to guide the model in performing content clustering, thematic synthesis, causal mapping, and action-oriented reformulation. These prompts were iteratively refined through expert testing and are documented in Figure A1 in Appendix B. The analysis was applied to the answers to three open-ended questions (see Table 1), which collected participants’ perceptions about the most valued elements of the course, its shortcomings, and ideas for improvement.

The step-by-step process (Figure 1) described serves as the framework for the development and programming of our own pilot chatbot, designed to systematize the evaluation of our training program. The seven phases of the analysis were as follows:

2.4.1. Step 1. Data Preprocessing

Faculty development team: data review and cleaning. As a preliminary step, the faculty development team manually reviewed all responses to survey questions 8, 9, and 10, which focused on the course’s most valued aspects, least valued aspects, and suggestions for improvement. Comments from all editions were examined, and only relevant and coherent responses were retained. Personal references were removed to ensure anonymity, and responses were tagged by course module. Experts also removed “stopwords” and punctuation to clean the text before providing it to ChatGPT. For this first stage, ChatGPT-4o received only the cleaned negative-comment dataset for initial processing.

To capture thematic variation over time, the team analyzed responses separately for each of the 13 cohorts (from 2011 to 2012, to 2023 to 2024). The resulting categorizations were logged in a master file, noting the frequency of each theme across editions. This allowed us to track emerging patterns as well as inconsistencies or rare mentions. The review of AI-generated outputs helped surface three broad recurring themes: program organization and logistics, content and structure and workload and assignments.

Additionally, isolated or inconsistently labeled categories were examined in context. For example, early mentions of collaboration were initially classified under “organization”, but after further review, they were grouped into a new category called ‘peer collaboration’, improving thematic accuracy.

ChatGPT-4o: Preliminary thematic categorization. ChatGPT-4o was prompted (see Figure A1 in Appendix B) to act as a qualitative researcher and identify recurring negative themes from the cleaned dataset. The prompt instructed the model to analyze student feedback and suggest an initial set of thematic categories. For each cohort, the model generated a list of categories with brief descriptions. These initial codes were used to construct a first-layer map of student concerns.

2.4.2. Step 2. Theme Validation

Faculty development team: thematic validation and refinement. The team examined the preliminary categories generated by ChatGPT for all 13 cohorts, checking for semantic coherence, conceptual accuracy, and relevance to the training program context. They compared outputs across editions to assess consistency and detect overlap or ambiguity. In several cases, category labels were modified for clarity, grouped under broader categories, or split into subthemes. Discrepant outputs were discussed and resolved through expert consensus. A master list of refined, cross-cohort thematic categories was created, serving as the validated framework for subsequent analysis phases.

ChatGPT-4o: Classification. ChatGPT was prompted to act as a qualitative researcher and assign each comment to one of the four validated categories by experts. The same focus prompt (see Figure A1 in Appendix B) was used for each cohort to ensure consistent classification across all 13 editions.

2.4.3. Step 3. Problem Definition

Faculty development team: data review and synthesis per theme. Before engaging ChatGPT, the team reviewed and condensed all student comments in each of the four validated categories. This involved removing duplicates, clarifying phrasing when needed, and selecting the most representative statements for each theme. The goal was to ensure ChatGPT received a clean, focused input for identifying the core issue in each area.

The output from this step laid the groundwork for defining the four critical problems underpinning the training program’s shortcomings.

ChatGPT-4o: Problem summarization by theme. ChatGPT-4o was instructed to formulate a core problem statement for each thematic area, based on the curated comments. Each input consisted of all relevant critical feedback grouped by theme. This process (prompt in Figure A1 in Appendix B) was repeated for all four themes and resulted in the formulation of a general problem statement for each of the four thematic areas: (1) program organization and logistics, (2) content and structure, (3) workload and assignments, and (4) peer collaboration.

The output consisted of one concise diagnostic statement per theme, to be used in the root cause analysis phase.

2.4.4. Step 4. Subcategorization of Main Problems

Faculty development team: prompt design for causal exploration. For each of the four core problems identified in the previous step, the team crafted a focused prompt asking ChatGPT-4o to organize relevant issues into subcategories (see Figure A1 in Appendix B). These thematic subdivisions would help uncover the underlying causes behind each critical problem.

ChatGPT-4o: Thematic subcategorization for each problem. ChatGPT received grouped concerns related to each critical theme and was instructed to classify them into subcategories to support a cause-and-effect analysis. The model generated clear thematic groupings for each problem area. These subcategories represented intermediate-level explanations for each issue and served as a conceptual scaffold for the causal diagrams built in step 5.

2.4.5. Step 5. Visual Mapping

Faculty development team: prompt design for diagram generation. The team selected the Ishikawa (fishbone) diagram as a tool for visualizing the root causes behind each major problem. To support this, they provided ChatGPT with the subcategories generated in Step 4 and a curated set of student suggestions for improvement, which served as a source for identifying likely causes. Although the fishbone format can become detailed in complex scenarios (Burgasí Delgado et al., 2021), it remains invaluable for guiding targeted improvements. The goal was to break down each complex problem into main contributing factors and their underlying causes, to guide targeted and data-driven interventions. To support this task, we supplied the subcategories from Step 4 along with the full set of student suggestions, giving the model rich contextual data on potential causes.

ChatGPT-4o: Ishikawa diagram generation. ChatGPT was prompted (see Figure A1 in Appendix B) to act as an analyst and build a cause-and-effect (Ishikawa) diagram for each of the four critical themes. It used the subcategories as main branches, and for each, listed three specific causes extracted from student feedback and improvement suggestions.

The result was a set of detailed and structured diagrams that clearly linked student perceptions to root causes across all four problem areas.

2.4.6. Step 6. Solution Design

Faculty development team: prompt design and data contextualization for action planning. The team compiled all outputs from the Ishikawa diagrams and additionally supplied ChatGPT with a list of positive aspects of the program, as reported by students. These elements were introduced to ground the solutions in existing strengths and ensure that proposals aligned with the program’s pedagogical identity.

ChatGPT-4o: Generation of structured improvement plans. Using the root causes identified in Step 5 and the program’s positive feedback, ChatGPT was asked to generate a three-column table matching each identified problem with: potential solutions and concrete actions for implementation.

The model produced a clear and structured improvement table, designed to support the faculty development team’s decision-making process.

The team also exported the AI-generated table into Excel to enable better visual organization, cross-comparison, and expert review. This process was repeated independently for each of the four main problem areas.

2.4.7. Step 7. Recommendations Extraction

Faculty development team: strategic synthesis and decision-making support. Based on the structured tables of problems, solutions, and actions (from Step 6), the team tasked ChatGPT with extracting the most impactful suggestions to guide program redesign. This helped reduce complexity and prioritize actionable improvements across the four thematic areas. The team repeated this task for each category. These final recommendations will be considered by the faculty development team when making decisions to redesign the training program.

ChatGPT-4o: Generation of structured improvement plans. ChatGPT was asked to synthesize all prior insights and produce five clear and strategic recommendations per category. This prompt (see Figure A1 in Appendix B) was executed separately for the declared main issues, and the result was a concise and targeted set of prioritized recommendations ready for expert evaluation and implementation.

3. Results

3.1. Critical Areas and Recurring Issues of the Initial Teacher Training Program

Considering our first research objective, we reviewed student feedback through survey responses, assisted by ChatGPT, with the aim of identifying critical areas and recurring issues.

As illustrated in Figure 1, the identification of the program’s core issues emerged from a multi-step, human–AI collaborative process. In the first two steps, the faculty development team conducted a manual review and cleaning of open-ended survey responses, focusing specifically on students’ negative comments. These were then processed by ChatGPT-4o to propose an initial set of thematic categories.

The outputs from the Step 1 prompt (see Figure A1, Appendix B) included a broad range of raw categories, such as: workload and course demands, teaching modality and flexibility, relevance and applicability of content, bureaucratic aspects and teaching usefulness, evaluation of specific modules, balance between theory and practice, and session methodology and dynamics, among others. These categories reflect recurring concerns, though with some overlaps and inconsistencies.

In Step 2, the faculty development team refined and consolidated these raw themes to produce a validated and more cohesive set of categories. This process led to the identification of four overarching thematic areas that encompassed the diversity of student feedback and served as the basis for further analysis:

Organizational aspects of in-person attendance.
Content and structure.
Workload and assignments.
Peer collaboration.

Following the expert review in Step 3 (see Figure 1), we were able to reintroduce the consolidated student comments, now cleaned and grouped by theme, into ChatGPT-4o. At this stage, the goal was to formulate a clear, student-centered problem statement for each thematic category, based on the critical feedback provided.

We prompted the model to “summarize the core issue” for each theme, drawing on the full set of negative responses. As expected, the output was phrased negatively, in line with the critical nature of the input. For instance, under the theme Organizational aspects of in-person attendance, ChatGPT generated several alternatives, such as “Inefficient Organizational Aspects of Face-to-Face Attendance” or “Inefficiencies in the Organization of Face-to-Face Attendance”.

While these initial problem statements effectively reflected students’ concerns, the faculty development team refined them to remove bias, enhance clarity, and frame each issue in a more neutral, diagnostic tone. Step 3 ensured that the final problem statements remained faithful to students’ perspectives while presenting each issue with precision and academic rigor.

Based on the AI-supported thematic analysis and expert review, four core problem areas were identified as central concerns in participants’ feedback on the Initial Teacher Training program at UPM.

The first issue relates to the organizational aspects of in-person attendance. Students frequently reported that the face-to-face sessions did not justify the time and effort invested. These sessions were often perceived as inefficient or repetitive, particularly when compared to the flexibility and perceived usefulness of the online components. The rigidity of scheduling and limited adaptability were seen as obstacles for working professionals balancing multiple responsibilities.

A second problem concerns the content and structure of the program. Many participants questioned the usefulness of the course material, expressing that it lacked practical relevance to their current teaching practice or long-term professional goals. The structure of the modules was also perceived as uneven, with certain areas seen as overly theoretical and disconnected from the realities of university-level teaching.

The third critical issue identified was the workload and design of assignments. Students described the volume of tasks as excessive and poorly aligned with the realities of academic life. Instructions were sometimes unclear, deadlines inflexible, and the overall effort required was difficult to reconcile with their teaching and research responsibilities. These challenges contributed to a general sense of overload and frustration.

Finally, the theme of peer collaboration emerged as an important concern. Participants felt that the program lacked structured opportunities for meaningful interaction with colleagues. Limited space for discussion, collaboration, and exchange of experiences was perceived as a missed opportunity to enhance the learning process and build professional networks across disciplines.

These four issues represent the foundation for the following stages of analysis, which seek to identify root causes and develop targeted strategies for improvement.

3.2. Root Causes of the Critical Aspects and Cause-and-Effect Diagram

Before proceeding with the root cause identification through cause-and-effect diagrams, step 4 focused on the subcategorization of each critical issue into more specific dimensions. This phase was essential to structure a more granular and exhaustive cause analysis.

To do this, we asked ChatGPT to break down each of the four major problems into relevant subcategories. The AI processed the full dataset of student comments and returned multiple subcategory suggestions per theme. These raw outputs were rich in detail. For instance, for Problem 1: Organizational aspects of in-person attendance, one subcategory proposed by ChatGPT was “Class schedule and time load”. This included recurring student concerns such as: “Too many hours in a row”, “Sessions are too long”, “Too many face-to-face sessions just to justify ECTS”, or “The four-hour blocks feel exhausting”.

Given the density and variability of the data, the faculty development team played a key role at this stage. Experts carefully reviewed and refined the subcategories, ensuring that each group of issues was coherent and that no relevant concern was overlooked. These validated subcategories then served as the structural branches, or “bones”, for the Ishikawa diagrams generated in the next phase.

In step 5, ChatGPT was fed these subcategories along with students’ improvement suggestions to construct a cause-and-effect (fishbone) diagram for each of the four problem areas. These AI-generated suggestions provided a rich overview of perceived inefficiencies. However, as seen in Figure A2 (Appendix C), which already reflects the expert team’s refinements, not all elements initially included in the diagram were carried forward to the final action plan. One such case is the point “No hybrid/online option.” Although it appears in the diagram as a recurring concern raised by students, it was later excluded from the list of actionable items because decisions regarding the in-person nature of the program had already been made in previous evaluations. Given that the ICE program maintains an institutional commitment to face-to-face training for novice teachers, this item was deemed outside the scope of feasible interventions. The final, expert-refined versions of these diagrams can be consulted in Appendix C. We repeated the same process separately for each of the four critical aspects.

ChatGPT identifies six key factors for this first problem: evaluation and attendance; course structure and scheduling; course recognition and motivation; teaching methodology and engagement; flexibility and accessibility; and digitalization and resource use. This is the revised Ishikawa diagram, presented in Figure A2 (see Appendix C).

ChatGPT organizes the root causes of the second problem into six main categories: difficult practical application, profile and experience mismatch, poor module structure, theory–practice imbalance, outdated and irrelevant content and criticism of key modules. This is the revised Ishikawa diagram, presented in Figure A3 (see Appendix C).

ChatGPT organizes the root causes of the third problem into the following subthemes: workload overload, time management and compatibility, task design and relevance, instructions and clarity, feedback and evaluation, and coordination and structure. The final revised diagram, shown in Figure A4 (see Appendix C).

The analysis conducted using ChatGPT for this fourth key issue gives us the following subthemes that emerged from their feedback: few group activities, lack of networking tools, time constraints, missed pedagogical opportunities and limited interaction space. The final revised diagram, shown in Figure A5 (see Appendix C). These improved diagrams served as the basis for identifying actionable solutions in the following stages of analysis.

Note that all the Ishikawa diagrams generated in Step 5, which visually map out the core challenges raised by students, also required expert revision. As illustrated in Figure A6 and Figure A7 in Appendix E, the diagrams provided by ChatGPT offered an initial structure that needed further refinement to ensure consistency, relevance, and applicability.

3.3. Strategic Solutions and Concrete Actions for Each Problem

In the sixth step of the process (see Figure 1), the aim was to translate the previously identified root causes into a coherent improvement plan. To do so, we asked ChatGPT to generate a table outlining potential solutions and concrete actions for each of the four critical problems identified in the program.

Before prompting the model, the expert team reviewed, adjusted, and refined all cause-and-effect diagrams generated in step 5, discarding issues that did not align with the context or constraints of the ICE training program. Additionally, to enhance the relevance and accuracy of the proposed solutions, we supplied ChatGPT with a curated selection of positive aspects mentioned by students in the final survey. These positive insights served as a source of pedagogical and organizational strengths that could be reinforced or expanded in the redesign.

Drawing on this combined input, ChatGPT proposed a series of structured tables aligning each identified problem with targeted interventions and actionable steps. These outputs were then reviewed by the expert team, who focused on refining the language for clarity, tone, and pedagogical appropriateness. In most cases, no changes were made to the substance of the proposals. However, in the case of Table A4, corresponding to the issue of peer collaboration, the AI did not generate specific actionable steps, which were therefore completed by the expert team. The refinement processes carried out by the experts are indicated with asterisks (*) in the tables, as they reflect minor adjustments only. The final validated versions of the improvement plans are presented in Appendix D (Table A1, Table A2, Table A3 and Table A4).

3.4. Student-Based Prioritized Recommendations to Address the Program’s Deficiencies

As a final step in study objective 4, the strategic solutions identified in the previous section are translated into a set of student-based recommendations to improve the overall quality, structure, and effectiveness of the training program (step 7 in Figure 1). These recommendations are grounded in participants’ perspectives and experiences, as well as the diagnostic process supported by ChatGPT. They are intended to directly address the deficiencies identified across the four main problem areas, with an emphasis on feasibility, relevance, and alignment with student needs.

Below is an example of the raw output generated for Problem 3: Workload and Assignments, which we present verbatim to illustrate the format and tone of the AI’s proposals:

Five Key Recommendations for Improving Workload and Assignments:

Reduce and Integrate Tasks Strategically:
- Limit the number of assignments per module.
- Merge related tasks into one comprehensive assignment.
- Focus on tasks with high practical relevance, such as designing rubrics or lesson plans.
Balance Workload with Professional Duties:
- Allow flexible deadlines and extended submission windows.
- Integrate task completion time into in-person sessions when possible.
- Provide alternatives for group work to accommodate varying schedules.
Improve Task Clarity and Guidance:
- Provide clear instructions, time estimates, and example submissions.
- Link each task explicitly to learning outcomes or module objectives.
- Use standardized templates and rubrics across modules.
Ensure Timely and Formative Feedback:
- Set maximum time limits for returning feedback (e.g., 2 weeks).
- Offer feedback opportunities during in-class sessions or via peer review.
- Incorporate guided self-assessment as part of the learning process.
Coordinate Assignment Planning Across Modules:
- Create and share a unified calendar of all course deadlines.
- Avoid overlapping submission dates between modules.
- Organize Moodle task areas consistently to reduce confusion.

These initial suggestions were subsequently reviewed by the expert team to ensure alignment with the institutional context and the specific structure of the training program. In addition to refining the recommendations, the team designed a simplified visual summary of the proposed improvements (see Figure 2) to facilitate internal discussion. This figure was conceived as a practical tool to support decision-making in upcoming faculty development meetings, where the actions to be implemented will be collectively agreed upon.

Drawing on these results, we defined the strategic actions ultimately adopted to update the program. Rather than treating each minor suggestion in isolation, our faculty development experts, guided by an in-depth understanding of the program’s context, evaluated and synthesized the many improvement ideas into coherent, realistic measures. For example, while the AI recommended broad solutions like limiting or merging related tasks, our experts determined exactly which module assignments could be combined or removed, and which must remain. Focusing on the four critical areas identified, this collaborative process culminated in the development of an action plan that organizes the proposed initiatives into a practical and structured roadmap to guide the redesign of the Initial Teacher Training Program at the Institute for Educational Sciences (UPM):

To address the organizational challenges posed by mandatory in-person attendance, we recommend fostering the hybrid format and implementing a fully modular postgraduate program that awards official credits and certification and is seamlessly integrated into UPM’s existing accreditation frameworks. By reinforcing the assessment system, shifting the emphasis from physical presence to demonstrable individual academic performance through coursework and submissions on a virtual learning platform. In this way, we can ensure that participants’ progression reflects their mastery of content rather than mere attendance. This approach will enhance flexibility, uphold rigorous standards, and align the Initial Teacher Training program more closely with the needs of a modern, competency-based educational environment.

The module content must be updated, and, above all, the pedagogical approach needs to be firmly rooted in engineering contexts. A field study should be conducted to gather real-world examples and best practices from UPM faculty for each module, enabling participants to see how pedagogical theories translate into science and engineering teaching. Moreover, all coursework should be designed around hands-on teaching activities, ensuring that the modules’ contents are meaningful and directly applicable to the practice of university teaching. The workload across modules will be standardized, and the practicum module will be reinforced with peer observation and feedback, aligning its assessment with the rest of the program.

Streamline the workload by focusing on practical, in-class group tasks that leverage face-to-face time for sharing and enriching perspectives among peers, mirroring collaborative activities common in professional settings. In parallel, each participant should develop an individual, evidence-based teaching portfolio. Building on a model proven effective in other programs, this portfolio would be the program’s sole transversal assignment. The proposal involves each novice teacher designing a complete teaching project, developing a course from initial planning through classroom delivery, addressing every competency covered in the content modules. This approach ensures that each module feeds directly into the teaching portfolio and reinforces authentic, real-world instructional skills. All tasks should follow a unified format, structure, and design framework, ensuring consistency across the program, even as each one approaches the material from a different perspective. This standardized template will streamline development and assessment while still allowing each assignment’s unique focus to shine through.

We will strengthen peer collaboration by increasing group tasks during in-person sessions and integrating peer observation and feedback within the practicum module. Additionally, we will foster teaching communities and establish a dedicated ICE forum where program alumni can share experiences and best practices.

4. Discussion

This study examined how generative AI, specifically ChatGPT-4o, can assist in the qualitative evaluation of an Initial Teacher Training program offered at a leading Spanish polytechnic university. By analyzing student feedback collected over 13 academic years, the study demonstrated that integrating AI into the review process is both feasible and effective for identifying critical areas of improvement, formulating targeted solutions, and proposing context-specific actions.

The findings show that students consistently expressed concerns across four thematic areas: the efficiency of in-person sessions, the perceived relevance of the program content, the workload and task design, and the lack of opportunities for peer collaboration. These results were obtained through an iterative process in which ChatGPT was used as a support tool for generating categories, synthesizing issues, and drafting initial recommendations, while the faculty development team provided expert oversight, contextual refinement, and validation.

This study not only confirms the relevance of student feedback for program improvement but also extends existing knowledge by showing how a hybrid human, AI process can structure and accelerate qualitative data analysis in meaningful ways. While some concerns identified reflect common tensions in teacher education, such as the balance between theory and practice or the manageability of workload, our approach adds value by demonstrating how AI can help transform unstructured feedback into actionable insights, especially when scaled across multiple program editions.

Furthermore, the study contributes to the emerging body of work on the use of large language models (LLMs) in higher education evaluation (Kabir et al., 2025). Unlike previous uses of AI focused on content generation or tutoring, this research positions AI as a decision-support tool in the context of faculty development, reinforcing the potential of human, AI collaboration to improve quality assurance processes in university teaching programs.

But this process has also presented difficulties that we now address in detail. Our first research question asked which critical areas and recurring issues could be identified in the Initial Teacher Training Program with the support of ChatGPT. The hybrid human–AI analysis allowed us to detect problems more broadly and systematically than a purely human approach might, yet it also exposed the limits of using a large language model for this purpose.

Although all four dimensions—organizational, pedagogical, structural, and interpersonal—proved equally critical, the administrative and logistical aspects were more prominently voiced in student comments, as they directly affect the day-to-day experience. Pedagogical concerns, in contrast, often remain more implicit, reflecting the fact that novice teachers may be less aware of deeper instructional issues. This pattern underscores the need for a layered analysis that captures both explicit complaints and latent concerns.

One of the main difficulties encountered in this first stage was that ChatGPT occasionally mixed topics and testimonials, producing categories or issues that were not truly recurrent. Several factors contributed to this problem. Personal references within some student responses introduced biases that distorted the model’s focus and, in some cases, led it to create new, non-representative categories (Hamilton et al., 2023). Moreover, the model’s reliance on pattern recognition, rather than genuine semantic understanding, limited its ability to grasp contextual nuances in the feedback (Hamilton et al., 2023). Finally, the high variability and breadth of student comments collected over more than a decade increased the risk of misclassification (Hunkoog & Minsu, 2024).

To address these limitations, the expert team refined the thematic categories by reviewing and adjusting the AI-generated classifications to ensure each theme accurately reflected recurring issues. Additionally, testimonials containing personal or irrelevant references were removed to avoid skewing the analysis. In this initial categorization step, the contribution of human experts was fundamental; while traditional qualitative analysis alone might have produced similar results, the hybrid approach demonstrated how AI can accelerate and systematize early stages of large-scale text analysis.

In this second phase of the study, we aimed to go beyond surface-level issues and uncover the underlying causes of the problems detected in the ICE training program. The working hypothesis proposed that the use of ChatGPT would facilitate the identification and structuring of these root causes through thematic subcategorization and cause–effect (Ishikawa) diagrams, thereby increasing the clarity and interpretability of findings for the faculty development team.

Indeed, one of the main advantages of integrating ChatGPT in this step was its ability to quickly organize large volumes of student comments into subcategories that could later be used as “branches” for the cause–effect diagrams. However, this process was not without challenges. For instance, while the AI could group recurring concerns under labels such as “teaching methodology and engagement” or “course scheduling and workload”, these outputs often required refinement. Some of the phrases proposed, like “inefficient teaching techniques”, needed clarification or contextual adaptation to accurately reflect the pedagogical practices and policies of the ICE program. In other cases, feedback on grading was summarized by ChatGPT as “desmotivating grading system”, which the expert team reformulated in more neutral and actionable terms. These adaptations ensured the subcategories retained fidelity to student feedback while aligning with institutional language and feasibility.

This process demonstrates the power of a hybrid human–AI workflow: while ChatGPT accelerated the data structuring process, human experts ensured that final outputs were meaningful, coherent, and useful for programmatic decision-making. Without this refinement, many AI-generated formulations would have remained too vague, overly technical, or contextually misaligned.

A greater technical limitation emerged in step 5, when we prompted ChatGPT to generate Ishikawa diagrams to visualize the root causes identified. As illustrated in Figure A6 (Appendix E), the model’s initial outputs were inconsistent and often incompatible with a standard fishbone format. While some diagrams resembled hierarchical trees, others took the form of unstructured lists or spreadsheet-style layouts. Even after providing highly specific prompts, such as “generate a fishbone diagram with six branches based on these subcategories”, the model returned outputs with overlapping categories, misaligned branches, and inconsistent terminology.

These formatting issues can be attributed to known limitations in ChatGPT’s visual generation capabilities. As noted by Dengel et al. (2023) and corroborated by internal documentation (OpenAI, 2025b), the model does not follow a fixed template when generating visual content. Even small variations in prompt wording, or repeated use of the same prompt, can yield significantly different outputs. Additionally, in our iterative process, requests such as “make the diagram more visual” or “clean up the layout” often led to unintended structural changes, introducing further inconsistencies. The absence of a standardized visual template from the outset meant that each new output defaulted to the model’s own internal formatting, which lacked consistency and analytical clarity.

Despite these challenges, the content provided by ChatGPT was generally rich and usable after expert review. The final cause–effect diagrams were refined manually by the faculty development team to ensure conceptual coherence and graphical clarity (see Appendix C, Figure A2, Figure A3, Figure A4 and Figure A5). Based on this experience, we recommend that future uses of generative AI in evaluation design establish a fixed visual template early in the process, ensuring uniformity in structure, labeling, and visual style across all diagrams.

Our third research question explored whether the combination of ChatGPT with expert human judgment could yield actionable and contextually appropriate strategies for improving the ICE Initial Training Program. The underlying hypothesis posited that this hybrid collaboration would generate a broader range of concrete proposals which, once validated by experts, would prove viable and useful for program improvement.

This phase of the study demonstrated both the potential and the limits of using ChatGPT to inform educational changes. In Step 6, the model was prompted to build tables aligning each previously identified problem with possible solutions and corresponding concrete actions. While the output was well-structured and relatively comprehensive, several challenges quickly became apparent. First, the pedagogical language used by ChatGPT was often imprecise or overly abstract, making it difficult to interpret and apply directly within the ICE program’s specific instructional and institutional context. The model occasionally proposed interventions, such as radical changes to session formats or universal grading adjustments—that, although theoretically interesting, were incompatible with existing academic policies or practices.

Second, and more critically, the model showed difficulty understanding the specific characteristics of a highly technical university like UPM, where novice instructors often possess strong subject-matter expertise but limited formal training in pedagogy. Many of the AI-generated suggestions, especially those relating to instructional methods, assumed a baseline of teaching competence that does not align with the actual profile of the participants. As such, several proposals risked being overly ambitious, misaligned, or simply unfeasible without substantial institutional support.

These issues reflect broader limitations in generative AI systems, particularly their lack of common-sense reasoning and inability to produce genuinely original insights beyond pre-trained patterns (Hamilton et al., 2023). Additionally, ChatGPT’s output was inconsistent: in some cases, it generated a detailed list of actions; in others, it failed to return any concrete proposals for a given problem. To address these shortcomings, we embedded student suggestions from the same survey into the AI prompts, ensuring the model had access to grounded, program-specific experiences. This strategy anchored the proposed solutions more firmly in real challenges faced by participants.

Nevertheless, this anchoring came at a cost. While it improved the contextual accuracy of the suggestions, it also narrowed the creative range of the outputs. The model became more conservative in its proposals, tending to iterate on known concerns rather than offer fresh perspectives. Even so, the variety and specificity of ideas remained valuable, especially when reframed or grouped by the expert team into more pedagogically coherent categories.

This phase reaffirmed the core value of the hybrid model: ChatGPT accelerated the ideation process, offering structured proposals, while expert reviewers ensured those proposals were viable, appropriately formulated, and realistically actionable. All solution tables were carefully reviewed, refined, and checked by the faculty development team to contextualize ideas and adapt them to the program’s institutional and pedagogical framework. In many cases, this involved rephrasing suggestions using pedagogical language, merging overlapping items, and discarding proposals that conflicted with UPM’s operational constraints.

As such, the refined judgment of domain experts proved essential for producing valid, implementable enhancements. Neither form of intelligence, human or artificial, was sufficient on its own; but together they enabled faster, fairer, and more context-aware decisions at scale, combining the efficiency of computational analysis with the nuanced understanding required for educational planning.

The findings from the final phase of the process led to the generation of student-centered recommendations that aligned well with the overarching goals of the ICE Initial Teacher Training Program. However, many of these proposals remained highly idealized and proved difficult to implement in practice—particularly within an environment where human judgment, interpersonal dynamics, and institutional constraints play a critical role. This limitation stems, in part, from ChatGPT’s lack of emotional intelligence and its inability to assess the real-world feasibility of its suggestions (Hamilton et al., 2023).

Nevertheless, the expert team was able to leverage the structuring potential of these recommendations as a springboard for internal discussion and strategic decision-making. Rather than being treated as prescriptive solutions, they served as a structured input to support team deliberations and to inform the redesign of the training program. As a result of this work, an initial action plan has been developed and is currently under review by the broader ICE faculty team. While some measures will require longer-term planning, others are expected to be implemented as early as the 2025–26 academic year. In this regard, the findings support the original hypothesis: the human–AI collaborative process provided valuable, student-relevant insights that enhanced the faculty development team’s capacity to make informed, context-sensitive decisions.

This approach offers several practical implications for faculty development. First, its scalability allows teams to process thousands of open-ended responses across multiple cohorts in a fraction of the time required for traditional manual coding. Second, the seven-step hybrid protocol we implemented is replicable and adaptable to other training contexts, if participant anonymity is maintained and prompt libraries are fine-tuned to reflect the specific educational domain. Third, by democratizing the analysis process, this workflow empowers not only academic researchers but also administrative and support staff, often lacking formal training in qualitative methods, to extract meaningful insights, assuming they receive basic guidance in prompt design and output validation.

This study does not merely confirm that AI can assist in qualitative analysis; it proposes a reproducible hybrid method that improves the efficiency, coverage, and pedagogical grounding of recommendations in faculty development settings. While the current focus was on ChatGPT’s capacity to assist in end-of-program analysis, the study opens the door to a prospective application: the development of a customized GPT model capable of accompanying the program’s implementation in real time. Such a system, still conceptual, could serve as a continuous monitoring and feedback tool—provided it remains embedded within a human–AI framework that ensures the pedagogical relevance and validity of its outputs.

Recent work by Kabir et al. (2025) supports this direction, emphasizing both the potential of customized GPTs and the critical need for human oversight in prompt design and output supervision. Their findings reinforce the importance of carefully calibrating these systems to prevent distortion and maintain the educational value of AI-assisted recommendations.

To our knowledge, this is the first study in Spain to evaluate an Initial Teacher Training Program using a structured AI-assisted methodology.

However, the study also presents several limitations. First, there is a risk of AI-induced bias: poorly formulated prompts or noisy input data can lead to misleading conclusions. To mitigate this, we recommend implementing rigorous internal quality controls and peer-reviewed coding validation. Second, the quality of participant feedback is critical; responses containing severe language issues or personal references had to be discarded, as they frequently led to topic drift and inconsistent outputs. Third, the evolving nature of ChatGPT means that future model versions may alter prompt behavior and affect reproducibility. Fourth, we did not conduct a formal quantitative comparison between AI-assisted and manual analysis in terms of time savings or analytical accuracy, so efficiency claims remain tentative. Fifth, while AI-generated Ishikawa diagrams are useful for structuring complex issues, they introduce a level of subjectivity and may be difficult to interpret for those unfamiliar with the format, especially if the suggestions are not directly grounded in student data. Lastly, the external validation of results is still pending: the recommended actions have not yet been implemented nor assessed longitudinally by the ICE team.

5. Conclusions

In conclusion, this research provides new insights into the complex process of evaluating and redesigning initial teacher training programs in Spanish higher education through AI-assisted methodologies. The findings confirm that generative AI can accelerate the processing of large qualitative datasets by identifying recurring themes, proposing coding schemes for deeper analysis, and producing broad recommendations that mitigate individual bias. At the same time, AI-only outputs often lack contextual granularity and may fail to generate actionable proposals. Without expert oversight, such suggestions risk being vague, overly abstract, or misaligned with institutional practices. The study demonstrates that a hybrid human–AI workflow, combining computational capacity with expert pedagogical judgment, is essential for producing valid, feasible, and context-sensitive recommendations. Neither form of intelligence suffices alone, but together they enable faster, fairer, and more scalable decision-making.

Practical implications for faculty development emerge directly from these findings. Institutions can adopt hybrid evaluation protocols to systematically transform unstructured student feedback into evidence for program redesign, strengthening continuous quality assurance, especially when scaled across multiple program editions. Embedding AI-assisted tools in formative evaluation also opens possibilities for iterative monitoring after each module, thereby aligning training provision with the real challenges of novice instructors. Strategic investment in teacher training centers such as ICEs remains essential to sustain innovation, integrate AI responsibly, and ensure that professional development programs translate into meaningful teaching improvements.

Several limitations must be acknowledged. The study focused on a single case, which constrains generalizability. Efficiency gains were observed qualitatively but not formally compared with traditional manual coding. AI limitations included the generation of non-representative categories, inconsistent visual outputs (e.g., Ishikawa diagrams), and difficulties in contextual adaptation. Moreover, student feedback itself varied widely in quality, requiring manual filtering to avoid distortions. Finally, external validation of the proposed recommendations has not yet been completed, as their implementation is still pending.

Future research should explore how faculty teams interpret and apply these AI-supported outputs, and evaluate their actual impact on program redesign, sustainability, and long-term teaching improvement. It should also examine the development of customized GPT systems for continuous monitoring, provided they incorporate strong human oversight. Governance and ethical guidelines must be strengthened to address risks such as algorithmic bias, depersonalization, or security weaknesses in custom GPTs (Ogundoyin et al., 2025). Further studies are needed to assess the impact of hybrid human–AI evaluation on student learning outcomes, faculty workload, and institutional sustainability, as well as to compare AI-assisted and traditional qualitative methods in terms of efficiency and reliability.

Overall, this research contributes both a validated seven-step protocol for AI-assisted qualitative analysis and an operational demonstration of a hybrid human–AI model. Together, these advances offer a transferable blueprint for evidence-based program evaluation and redesign in higher education. AI brings speed and consistency, while expert judgment safeguards nuance, ethics, and contextual fit. This productive tension constitutes the foundation for the future of faculty development in Spain and beyond.

6. Patents

There are no patents resulting from the work reported in this manuscript.

Author Contributions

Conceptualization, A.M.-S. and J.L.M.-N.; Methodology, A.M.-S., M.Á.G.G. and J.L.M.-N.; Software, A.M.-S. and J.L.M.-N.; Validation, A.M.-S., M.Á.G.G. and J.L.M.-N.; Formal analysis, A.M.-S., M.Á.G.G. and J.L.M.-N.; Investigation, A.M.-S. and J.L.M.-N.; Resources, A.M.-S., M.Á.G.G. and J.L.M.-N.; Data curation, A.M.-S., M.Á.G.G. and J.L.M.-N.; Writing—original draft, A.M.-S., M.Á.G.G. and J.L.M.-N.; Writing—review and editing, A.M.-S., M.Á.G.G. and J.L.M.-N.; Visualization, A.M.-S., M.Á.G.G. and J.L.M.-N.; Supervision, M.Á.G.G. and J.L.M.-N.; Project administration, J.L.M.-N.; Funding acquisition, J.L.M.-N. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Faculty Development Project of the Institute for Educational Sciences (ICE) at the Universidad Politécnica de Madrid (UPM).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Comité de ética de actividades de I+D+i de la Universidad Politécnica de Madrid (protocol code CE220204 and date of approval 2022/02/04) for studies involving humans.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UPM	Polytechnic University of Madrid
ICE	Institute of Educational Sciences
PLAN	Teaching planning module
MET	University teaching methods
TEC	Learning technologies
EVA	Assessment
TUT	Student guidance
PSICO	University organization and applied psychology
INNOVA	Educational innovation
PRACDO	Classroom practice and communication techniques

Appendix A

COURSE EVALUATION SURVEY Initial Training for University Teaching
Polytechnic University of Madrid (UPM) · Institute of Educational Sciences (ICE)
Academic Year 20XX– 20XX
Below are some questions about the Initial Training course you have completed. Please answer all of them honestly. There are no right or wrong answers; we want to know your opinion. Thank you very much for your participation!
Write in or mark with an “X” the appropriate response.
Gender:	□ Male	□ Female
Age: _____	Years of University Teaching Experience: _____
Field of Knowledge:
□ Health Sciences	□ Sciences
□ Arts and Humanities	□ Social and Legal Sciences
□ Physical Education	□ Engineering/Architecture/Computer Science
Professional Category:
□ Predoctoral Researcher	□ Postdoctoral Researcher
□ Assistant Lecturer	□ Doctoral Assistant Lecturer
□ Contracted Doctor	□ Interim Associate Professor
□ Associate Professor	□ Other: _______________

Circle the response that best applies on a scale from 1 (Strongly Disagree) to 6 (Strongly Agree).		Strongly disagree					Strongly agree
1. In the face-to-face sessions of the Initial Training course:
1.1. The instructors’ presentations helped me assimilate the scheduled content.		1	2	3	4	5	6
1.2. The activities carried out helped me acquire teaching competencies.		1	2	3	4	5	6
2. In the Virtual Classroom of the Initial Training course:
2.1. The topic documents helped me assimilate the scheduled content.		1	2	3	4	5	6
2.2. The proposed tasks helped me achieve the intended learning objectives.		1	2	3	4	5	6
2.3. The use of the learning platform was appropriate.		1	2	3	4	5	6
3. I believe I have acquired competencies useful for my teaching activity.		1	2	3	4	5	6
4. Regarding my level of satisfaction:
4.1. I am satisfied with the delivery of the course’s face-to-face sessions.		1	2	3	4	5	6
4.2. I am satisfied with the online delivery of the course.		1	2	3	4	5	6
4.3. Overall, I am satisfied with the delivery of the course.		1	2	3	4	5	6
5. I would recommend this course to someone with needs similar to mine.		1	2	3	4	5	6
6. Currently, the 15 ECTS of the Initial Training course are distributed as 100 contact hours and 275 distance-learning hours.
6.1. In your opinion, it should be organized as:
□ Fully face-to-face	□ More face-to-face
□ As it is	□ More online		□ Fully online
7. In my teaching practice, I plan to apply or I am applying the learning achieved in:		Very little					Very much
7.1. Planning University Teaching		1	2	3	4	5	6
7.2. Methodology for University Teaching		1	2	3	4	5	6
7.3. Technologies for Networked Learning and Knowledge		1	2	3	4	5	6
7.4. Assessment of Learning		1	2	3	4	5	6
7.5. Tutorial Action in the University.		1	2	3	4	5	6
7.6. University Organization and Applied Psychology to Teaching		1	2	3	4	5	6
7.7. Innovation and Educational Research in the Classroom		1	2	3	4	5	6
7.8. Communication Techniques in the Classroom and Analysis of Teaching Practice		1	2	3	4	5	6
8. Which aspects of the course did you like the most, and why? (Please elaborate)
9. Which aspects of the course did you like the least, and why? (Please elaborate)
10. General opinion and suggestions for improvement: (Please elaborate)

Appendix B

Figure A1. Workflow of the AI-assisted qualitative analysis and sample prompts used by the expert team to guide ChatGPT responses.

Appendix C

Ishikawa Diagrams Breaking Down Each Main Problem into Specific Causes

Figure A2. Ishikawa diagram for problem 1.

Figure A3. Ishikawa diagram for problem 2.

Figure A4. Ishikawa diagram for problem 3.

Figure A5. Ishikawa diagram for problem 4.

Appendix D

General Solutions and Specific Actions Proposed by ChatGPT to Tackle the Four Problems

Table A1. General solutions and specific actions proposed by ChatGPT to tackle problem 1 and refined by experts in faculty development.

Main Problem 1 *: “Students Consider the In-Person Sessions Inefficient”
Subcategory	Problem	Strategic Solutions	Specific Actions
Evaluation and attendance	Attendance requirement is too high (75%)	Offer flexible attendance through hybrid and asynchronous alternatives	Allow students to complete part of the course asynchronously through Moodle forums and assignments. Reduce the 75% attendance requirement. Allow alternatives to mandatory attendance such as evaluative activities or forum participation assignments.
Evaluation and attendance	Demotivating grading system linked to attendance	Implement competency-based assessment with formative feedback	Introduce competency-based grading with rubrics, self-assessment, and peer feedback. Reduce dependency on attendance as a grading factor.
Course structure and scheduling	Sessions are too long (exceed four hours)	Reduce session duration and space sessions out	Limit sessions to a maximum of three hours. Space sessions out across the semester. Offer modular or two-semester formats to balance workload and certification needs. Alternate weeks classes with self-study to prevent overload.
Course structure and scheduling	Sessions are scheduled too frequently to meet ECTS requirements	Reorganize the course schedule to balance in-person and asynchronous activities	Space sessions strategically across the semester. Restructure the course into two semesters, offering a mid-year certification to support agencies accreditation.
Teaching methodology and student engagement	Too much theory, not enough practice	Prioritize practical learning	Avoid passive theory-heavy blocks. Adopt problem-based learning and expert-led workshops. Incorporate practical activities: microteaching, case studies with real examples, peer collaboration and feedback, role-playing, etc. Invite guest experts to share real-world teaching experiences.
Teaching methodology and student engagement	Inefficient teaching techniques	Implementing active learning strategies
Digitalization and use of resources	Excessive paper-based documentation	Optimize the use of digital resources and promote paperless policies	Minimize the use of printed materials by transitioning to a fully digital format. Printed handouts should be provided only when strictly necessary.
Digitalization and use of resources	Poor Moodle content organization	Standardize and organize Moodle course content into a clear and consistent structure	Standardize the structure of all Moodle course modules. Ensure consistent use of themes, materials and sections across modules. Organize the grading section for transparency and usability. Digitize all course documentation. Promote a fully paperless learning environment wherever possible.
Flexibility and accessibility	Hard to balance with professional work	Offer hybrid formats and modular course structures	Provide modular course options so teachers can progress at their own pace. Offer alternative schedules to fit their professional duties.
Flexibility and accessibility	No hybrid/online option for those who cannot attend in person	Offer hybrid formats with asynchronous participation options	Introduce a hybrid format with recorded key sessions available online. Record sessions and allow flexible completion through Moodle-based activities. Support those with heavy teaching loads with flexible, blended participation.
Course recognition and motivation	Lack of professional incentives or certification value	Integrate the course into a modular master’s program	Integrate the course into a modular postgraduate program with official credit and recognition: Certify the training as part of a structured Master’s track, offering interim credentials and aligning it with UPM’s accreditation systems. Establish the course as a formal requirement for new faculty, reinforcing its role in academic career progression.
Course recognition and motivation	Demotivating grading system (Pass/fail)	Implement a traditional grading system	Adopt a standard letter-grade system (F to A) to reflect a student’s performance.

* Refined by the expert team: concrete actions were grouped under broader pedagogical solutions for clarity and alignment with the program’s context.

Table A2. General solutions and specific actions proposed by ChatGPT to tackle problem 2 and refined by experts in faculty development.

Main Problem 2: “Low Perceived Usefulness and Applicability of Program Content”
Category	Problems	General Solutions *	Specific Actions
Difficult practical application	Not transferable to real teaching practice	Increase connection with teaching practice including more classroom-tested strategies	Include activities directly applicable to real subjects, with contextualized and real cases. Redesign modules to include more hands-on activities, simulations, real case studies, and examples directly applicable to university teaching contexts.
	Activities not adapted to context	Design specific tasks for university environments, connecting tasks to real teaching experiences	Differentiate tasks according to teaching profile and real teaching subjects.
	Unrealistic for current constraints	Make realistic and scalable proposals by adapting activities to various disciplines and teaching scenarios	Provide examples adapted to large classes or limited resources. Design tasks grounded in participants’ real teaching contexts.
Profile and experience mismatch	Varying relevance by profile	Make the program design more flexible, design leveled content paths	Offer differentiated pathways based on faculty level teaching experience. Adapt content and tasks to different levels of teaching experience and disciplinary backgrounds by providing alternative routes or task options based on participants’ roles and expertise.
	Prior knowledge required for some content	Ensure leveling or base support for participants	Create optional or elective pathways according to previous experience and individual needs. Provide introductory materials or bridge modules for teachers without prior teaching experience.
	Contents are too complex for novices	Adapt the level of task difficulty	Offer graded or progressive versions of tasks based on experience level. Include step-by-step guides to support completion.
Poor module structure	Unbalanced durations	Reorganize the duration of content modules to better match their workload	Reduce or adjust the length of less valued modules: student guidance (TUT), university organization and applied psychology (PSICO). Expand classroom practice and communication techniques (PRACDOs).
	Lack of cohesion across modules	Improve coordination among teachers and promote more instructional design consistency	Stable teaching teams for each thematic module. Improve the logical sequence of contents. Strengthen connections among modules to avoid repetition, overload, or inconsistencies in terminology and focus.
	Content repetition and overload	Review the sequence of modules and eliminate overlaps	Streamline overlapping content to remove redundancies and better balance the theoretical and practical workload.
Theory- practice imbalance	Too much theory, little application *	Reversing the teaching approach to integrate theory with hands-on tasks	Apply the flipped classroom model: theory as reading, practice in class. Dedicate face-to-face time to project design, peer learning, or classroom simulations. Use classroom time for active learning, not lecturing.
	Missing theoretical foundation	Balancing theory and practice through offering real-life examples from university teaching	Introduce brief, clear and practical fundamentals. Combine theory with micro-workshops or hands-on activities.
	Incomprehensible pedagogical jargon	Clarify pedagogical concepts with concrete examples	Use a teaching glossary and link terms to their use in the university classroom. Align theoretical concepts across modules with consistent vocabulary and complementary timing and reinforce key ideas across sessions.
Outdated and irrelevant content	Antiquated or repetitive topics	Update content and focus regularly	Update content with current references. Review content to avoid redundancy and outdated materials, ensure clarity, and provide context-specific examples, especially in modules like TEC, PSICO, and INNOVA.
	Focus on non-university education	Prioritize higher education contexts	Use examples and activities specific to university teaching.
	Missing concrete tools or examples	Incorporate current ready-to-use tools	Add practical sessions on Moodle use, authentic assessment, mentoring for master’s Theses, etc.
Criticism of key modules	PSICO: obvious or poorly applicable content	Prioritize the redesign of these modules to make them more practical, relevant and better aligned with the needs of faculty	Redesign PSICO to include more applied content and sessions focused on real classroom situations and student management.
	TUT: impractical approach and low utility		Refocus TUT on realistic tutorial scenarios.
	TEC: outdated basic content; difficult to apply in a face-to-face setting		Update TEC contents and replace generic digital literacy topics with practical training on Moodle and UPM- specific tools.
	EVA and INNOVA: missing concreteness and currently successful practices		Revise EVA and INNOVA to include current practices and successful case examples.
	PLAN and MET: excessive load, dense and repetitive content		Make PLAN and MET more interactive and modular in structure.
	PRACDO: need for more practical work and time dedication		Restructure the program to allocate more time to observing teaching practice. Implement guided microteaching sessions and include several feedback loops.

* Refined by experts to group actions under broader pedagogical solutions. For example, in response to the problem “too much abstract content”, ChatGPT suggested reducing lecture time and increasing demonstrations and classroom simulations, concrete ideas that were reclassified under the ‘specific actions’ column.

Table A3. General solutions and specific actions proposed by ChatGPT to tackle problem 3 and refined by experts in faculty development.

Main Problem 3: “Students Perceive the Workload and Task Design as Excessive, Unclear, and Difficult to Balance with Their Academic Responsibilities”
Category	Problems	General Solutions *	Specific Actions
Workload overload	Too many tasks	Limit number of tasks	Provide one integrated task per module and avoid duplication. Limit the number of assignments per module. Merge related tasks into a single comprehensive assignment whenever possible.
	Excessive homework hours	Adjust estimated task time to align with ECTS credit guidelines	Estimate actual workload hours and adjust accordingly to match ECTS credits.
	Tasks accumulate across modules	Distribute tasks more evenly across the course timeline	Set assignment deadlines using a shared planning tool.
Time management and compatibility	Hard to combine with work	Include in-class time for working on tasks	Reserve part of the in-person session to start or complete tasks. Integrate task completion time into in-person sessions when possible.
	Rushed deadlines	Allow flexible deadlines whenever possible	Allow submission windows longer than one week. Offer flexible deadlines with extended submission periods.
	Group coordination issues	Offer individual alternatives to group assignments	Offer individual alternatives to group work for participants with scheduling conflicts.
Task design and relevance	Too long tasks or redundant	Align tasks with essential teaching activities	Design tasks focused on practical activities such as lesson planning, rubric creation, and real case studies.
	Misaligned with real teaching	Adapt tasks to different academic profiles	Include flexible options based on role.
	Too many short tasks	Prioritize integrated tasks that offer clear practical value	Replace small, unfocused tasks with a single comprehensive assignment that applies the course content in a practical context.
Instructions and clarity	Unclear guidelines	Clarify and improve task instructions	Provide concise instructions for each task. Include grading rubrics and templates. Upload sample tasks to Moodle with estimated completion times. Offer clear examples to guide participants.
	No time estimation	Add time estimate	Include estimated durations for each task. Require students to note the time they spend on each task. Balance and regulate workload based on students’ records.
	Misaligned objectives	Align each task with specific learning goals	Map each task explicitly to its corresponding module objective. Link every task directly to the module’s defined learning outcomes.
Feedback and evaluation *	Late feedback	Establish deadline for returning feedback	Provide feedback within two weeks of submission. Set a maximum time of two weeks for all feedback.
	Overlapping corrections	Feedback should be part of each module session	Allocate time in each module session to discuss tasks. Return oral and group feedback on assignments. Offer feedback opportunities during in-class sessions using instructor-led discussions or via peer review.
	Lack of guidance	Use guided rubrics	Use standardized rubrics across all modules to clarify the criteria for task assessment and provide formative feedback. Incorporate guided self-assessment and peer-review activities as part of the learning process.
Coordination and course structure	Overlapping deadlines	Create shared calendars for all tasks	Publish a course-wide calendar with all deadlines Create and share a unified calendar of all course deadlines. Avoid overlapping submission dates between modules.
	One task per session	Reduce frequency of required submissions	Limit tasks to one every 2–3 sessions if possible
	Moodle disorganized	Standardized task posting and deadlines on Moodle	Unify format and deadlines for all tasks in Moodle. Organize Moodle task areas consistently to reduce confusion.

* Refined by the expert team: concrete actions were grouped under broader pedagogical solutions for clarity and alignment with the program’s context.

Table A4. General solutions and specific actions proposed by ChatGPT to tackle problem 4 and refined by experts in faculty development.

Main Problem 4: “Limited Opportunities for Meaningful Peer Collaboration”
Category	Problems	General Solutions	Specific Actions
Few group activities	Most tasks are individual	Increase the number of group-based tasks	Design specific group assignments within core modules to foster peer collaboration.
	Group work not encouraged across modules	Standardize the inclusion of group tasks	Include at least one collaborative activity across all modules.
	Final group work discouraged	Emphasize the value of teamwork in assessment criteria	Incorporate mixed assessment formats (individual and group) taking advantage of the added value of teamwork for developing teaching competences related to collaborative skills.
Lack of networking tools	No platform for ongoing communication	Enable digital tools to support networking and peer interaction	Activate Moodle forums to encourage discussion among peers. Suggest professional networks such as LinkedIn for ongoing peer connection.
	No group continuity after course	Facilitate alumni connections	Create an ICE alumni mailing list to keep contact and share information about university teaching events. Invite to webinars and meet-ups to support continued learning and networking.
	No digital community building	Use online platforms for community development	Launch Slack or Teams group to share teaching resources and program updates and events.
Time constraints	Tight course schedule	Rebalance program schedule	Distribute collaborative tasks in a balanced and systematic way throughout the weeks to avoid clustering.
	Heavy workload in class moments	Reduce individual workload	Replace one long individual task with a shorter group task.
	Difficult coordination	Use asynchronous collaboration	Enable collaborative documents and forum debates to allow flexibility between participants.
Missed pedagogical opportunities	No peer teaching activities	Include peer instruction	Assign microteaching activities where participants teach their peers.
	Little role-playing or simulation	Use experiential learning techniques	Add role-play activities across module sessions.
	Peer feedback underused	Include structured peer review	Use peer-evaluation rubrics in PRACDO and MET module tasks
Limited interaction spaces	Lack of collaboration in-class moments	Incorporate group tasks *	Introduce debate techniques, case study or any methodology which means exchanging views *
	No structured moments for peer exchange	Incorporate group tasks *
	Lack of materials for collaborative tasks	Provide resources for teamwork	Offer templates, post-its, shared documents, or digital whiteboards

* Gaps in ChatGPT response filled for faculty development team.

Appendix E

Original Cause–Effect Diagrams Generated by ChatGPT

Figure A6. Format inconsistencies in first ChatGPT attempt.

Figure A7. Second attempt after requiring further refinement to ensure clarity.

References

Benavides, C., & López, N. (2020). Retos contemporáneos para la formación permanente del profesorado universitario. Educación y Educadores, 23(1), 71–88. [Google Scholar] [CrossRef]
Borakati, A. (2021). Evaluation of an international medical E-learning course with natural language processing and machine learning. BMC Medical Education, 21(1), 181. [Google Scholar] [CrossRef]
Buils, S., Viñoles-Cosentino, V., Esteve-Mon, F. M., & Sánchez-Tarazaga, L. (2024). La formación digital en los programas de iniciación a la docencia universitaria en España: Un análisis comparativo a partir del DigComp y DigCompEdu. Educación XX1, 27(2), 37–64. [Google Scholar] [CrossRef]
Burgasí Delgado, D. D., Cobo Panchi, D. V., Pérez Salazar, K. T., Pilacuan Pinos, R. L., & Rocha Guano, M. B. (2021). El diagrama de Ishikawa como herramienta de calidad en la educación: Una revisión de los últimos 7 años. Revista Electrónica Tambara, 14(84), 1212–1230. [Google Scholar]
Chaves, A. A., & Saborío-Taylor, S. (2025). Integración de la inteligencia artificial en los procesos de investigación educativa y evaluación de aprendizajes: Una experiencia con estudiantes de la carrera de Estudios Sociales y Educación Cívica en la Universidad Nacional de Costa Rica. Revista de Investigación e Innovación Educativa, 3(1), 22–37. [Google Scholar] [CrossRef]
Cristi-González, R., Mella-Huenul, Y., Fuentealba-Ortiz, C., Soto-Salcedo, A., & García-Hormazábal, R. (2023). Competencias docentes para el aprendizaje profundo en estudiantes universitarios: Una revisión sistemática. Revista de Estudios y Experiencias en Educación, 22(50), 28–46. [Google Scholar] [CrossRef]
De la Cruz, M. Á. (2000). Formación pedagógica inicial y permanente del profesor universitario en España: Reflexiones y propuestas. Revista Interuniversitaria de Formación del Profesorado, 37(1), 95–114. [Google Scholar]
Dengel, A., Gehrlein, R., Fernes, D., Görlich, S., Maurer, J., Pham, H. H., Großmann, G., & Dietrich genannt Eisermann, N. (2023). Qualitative research methods for large language models: Conducting semi-structured interviews with ChatGPT and BARD on computer science education. Informatics, 10(4), 78. [Google Scholar] [CrossRef]
DiSabito, D., Hansen, L., Mennella, T., & Rodriguez, J. (2025). Exploring the frontiers of generative AI in assessment: Is there potential for a human–AI partnership? New Directions for Teaching and Learning, 182, 81–96. [Google Scholar] [CrossRef]
D’Oria, M. (2023). Can AI language models improve human sciences research? A phenomenological analysis and future directions. Encyclopaideia, 27(66), 77–92. [Google Scholar] [CrossRef]
European Commission/EACEA/Eurydice. (2020). The European higher education area in 2020: Bologna process implementation report. Publications Office of the European Union. [Google Scholar] [CrossRef]
Feixas, M., Lagos, P., Fernández, I., & Sabaté, S. (2015). Modelos y tendencias en la investigación sobre efectividad, impacto y transferencia de la formación docente en educación superior. Educar, 51(1), 81–107. [Google Scholar] [CrossRef]
Fernández, A. (2008). La formación inicial del profesorado universitario: El título de Especialista Universitario en Pedagogía Universitaria de la Universidad Politécnica de Valencia. Revista Interuniversitaria de Formación del Profesorado, 22(3), 161–187. [Google Scholar]
Freire, M., & Intriago, L. (2025). Diseño de un plan de autoevaluación para mejorar la calidad educativa en la Educación Superior. Revista Científica Arbitrada Multidisciplinaria Pentaciencias, 7(1), 223–243. [Google Scholar] [CrossRef]
García, G. Y., García, R. I., & Lozano, A. (2020). Calidad en la educación superior en línea: Un análisis teórico. Educación, 44(2). [Google Scholar] [CrossRef]
Gibbs, G., & Coffey, M. (2004). The impact of training of university teachers on their teaching skills, their approach to teaching and the approach to learning of their students. Active Learning in Higher Education, 5(1), 87–100. [Google Scholar] [CrossRef]
Gobierno de España. (2023, May 23). Ley orgánica 2/2023, de 22 de marzo, del Sistema Universitario. Boletín Oficial del Estado, núm. 70. Available online: https://www.boe.es/eli/es/lo/2023/03/22/2/con (accessed on 8 August 2025).
Goyanes, M., & Lopezosa, C. (2024). ChatGPT en Ciencias Sociales: Revisión de la literatura sobre el uso de inteligencia artificial (IA) de OpenAI en investigación cualitativa y cuantitativa. Anuario ThinkEPI, 18, e18e04. [Google Scholar] [CrossRef]
Guijarro, A. D. L. Á. (2024). Impacto de la inteligencia artificial en la evaluación y retroalimentación educativa. Revista Retos para la Investigación, 3(1), 19–32. [Google Scholar] [CrossRef]
Hamilton, L., Elliott, D., Quick, A., Smith, S., & Choplin, V. (2023). Exploring the use of AI in qualitative analysis: A comparative study of guaranteed income data. International Journal of Qualitative Methods, 22, 16094069231201504. [Google Scholar] [CrossRef]
Haras, C. (2018, January 17). Faculty development as authentic professional practice. Higher Ed Today. Available online: https://www.higheredtoday.org/2018/01/17/faculty-development-authentic-professional-practice/ (accessed on 1 March 2025).
Hinojo, F. J., Aznar, I., Rodríguez, A. M., & Romero, J. M. (2020). La carrera docente universitaria en España: Perspectiva profesional de los contratados predoctorales FPU y FPI. Revista Electrónica Interuniversitaria de Formación del Profesorado, 23(3), 1–16. [Google Scholar]
Hunkoog, J., & Minsu, H. (2024). Towards effective argumentation: Design and implementation of a generative AI-based evaluation and feedback system. Journal of Baltic Science Education, 23(2), 280–291. [Google Scholar] [CrossRef]
Instituto de Ciencias de la Educación–UPM. (2025). Programa superior de formación para la docencia universitaria (15 ECTS). Available online: https://ice.upm.es/formacion/docencia-universitaria (accessed on 7 August 2025).
Kabir, A., Shah, S., Haddad, A., & Raper, D. M. S. (2025). Introducing our custom GPT: An example of the potential impact of personalized GPT builders on scientific writing. World Neurosurgery, 193, 461–468. [Google Scholar] [CrossRef]
Klyshbekova, M., & Abbott, P. (2024). ChatGPT and assessment in higher education: A magic wand or a disruptor? Electronic Journal of e-Learning, 22(2), 30–45. [Google Scholar] [CrossRef]
Koltai, J., Kmetty, Z., & Bozsonyi, K. (2021). From Durkheim to machine learning: Finding the relevant sociological content in depression and suicide-related social media discourses. In B. D. Loader, M. Stephenson, & J. Busher (Eds.), Pathways between social science and computational social science: Theories, methods, and interpretations (pp. 237–258). Springer International Publishing. [Google Scholar] [CrossRef]
Liu, Z., Zhang, W., Sun, J., Cheng, H. N. H., Peng, X., & Liu, S. (2016, September 22–24). Emotion and associated topic detection for course comments in a MOOC platform. 2016 International Conference on Educational Innovation through Technology (EITT) (pp. 15–20), Tainan, Taiwan. [Google Scholar] [CrossRef]
Losada, L., & Moreno, Ó. (2024). Evaluación de programas diseñados bajo el enfoque de competencias. Validación de un instrumento. In Congreso internacional ideice, volume (Vol. 14, pp. 389–395). Congreso Internacional Ideice. [Google Scholar]
Malagón, F. J., Cadilla, M., Sánchez-Sánchez, A. M., & Graell, M. (2025). Literatura científica sobre la formación del profesorado universitario en España: Análisis temático. European Public & Social Innovation Review, 10, 1–21. [Google Scholar] [CrossRef]
Martín, J. L., Pablo-Lerchundi, I., Núñez-del-Río, M. C., Del-Mazo-Fernández, J. C., & Bravo-Ramos, J. L. (2018). Impact of the initial training of engineering schools’ lecturers. International Journal of Engineering Education, 34(5), 1440–1450. [Google Scholar]
Montes, D. A., & Suárez, C. I. (2016). La formación docente universitaria: Claves formativas de universidades españolas. Revista Electrónica de Investigación Educativa, 18(3), 51–64. [Google Scholar]
Morgan, D. L. (2023). Exploring the use of artificial intelligence for qualitative data analysis: The case of ChatGPT. International Journal of Qualitative Methods, 22, 1–10. [Google Scholar] [CrossRef]
Moustakas, C. (1994). Phenomenological research methods. Sage. [Google Scholar]
Natukunda, A., & Muchene, L. K. (2023). Unsupervised title and abstract screening for systematic review: A retrospective case-study using topic modelling methodology. Systematic Reviews, 12(1), 1. [Google Scholar] [CrossRef] [PubMed]
Ochoa Oliva, M. (2022). Aseguramiento y reconocimiento de la calidad en la Educación Superior a través de las nuevas formas de medición: Desafíos, oportunidades y mejores prácticas. Tecnología Educativa. Revista CONAIC, 8(3), 14–21. [Google Scholar] [CrossRef]
Ogundoyin, S. O., Ikram, M., Asghar, H. J., Zhao, B. Z. H., & Kaafar, D. (2025). A large-scale empirical analysis of custom GPTs’ vulnerabilities in the OpenAI ecosystem. arXiv, arXiv:2505.08148. [Google Scholar]
OpenAI. (2025a). GPT-4o and more tools to ChatGPT free. OpenAI. Available online: https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/ (accessed on 7 May 2025).
OpenAI. (2025b). ChatGPT (o4-mini) [Large language model]. Available online: https://chat.openai.com/ (accessed on 5 August 2025).
OpenAI. (n.d.) CHATGPT general FAQ. OpenAI help center. Available online: https://help.openai.com/en/articles/6783457-chatgpt-general-faq (accessed on 16 February 2023).
Oshanova, A., Sargeant, J., Karim, M., & Yates, M. (2025). Assessing the efficacy of an artificial intelligence–driven essay feedback tool in undergraduate education: A randomized controlled trial. Computers & Education, 216, 104870. [Google Scholar] [CrossRef]
Paricio, J., Fernández, A., & Fernández, I. (2019). Cartografía de la buena docencia universitaria: Un marco para el desarrollo del profesorado basado en la investigación (Vol. 52). Narcea Ediciones. [Google Scholar]
Parker, R. D., Mancini, K., & Abram, M. D. (2023). Natural language processing enhanced qualitative methods: An opportunity to improve health outcomes. International Journal of Qualitative Methods, 22, 280. [Google Scholar] [CrossRef]
Parra, R., & Ruiz, C. (2020). Evaluación de impacto de los programas formativos: Aspectos fundamentales, modelos y perspectivas actuales. Revista Educación, 44(2), 541–554. [Google Scholar] [CrossRef]
Pérez-Rodríguez, N. (2019). Programas de formación docente en educación superior en el contexto español. Investigación en la Escuela, 97, 1–17. [Google Scholar] [CrossRef]
Porlán, R., Martín-Lope, M., Villarejo, Á. F., Moncada, B., Obispo, B., Morón, C., Aguilar, D., Campos, M., & Rubio, M. R. (2024a). Recomendaciones estratégicas para la formación docente de ayudantes doctores: Hacia un modelo docente centrado en el aprendizaje activo del estudiante. Red Colaborativa Interuniversitaria de Formación, Innovación e Investigación Docente. Available online: https://institucional.us.es/fidopus/ (accessed on 1 August 2025).
Porlán, R., Pérez-Robles, A., & Delord, G. (2024b). La didáctica de las ciencias y la formación docente del profesorado universitario. Enseñanza de las Ciencias, 42(1), 5–22. [Google Scholar] [CrossRef]
QS Top Universities. (2025). University subject rankings: Engineering and technology. Available online: https://www.topuniversities.com/university-subject-rankings/engineering-technology (accessed on 8 August 2025).
Reyes-Zúñiga, C. G., Sandoval-Acosta, J. A., & Osuna-Armenta, M. O. (2024). Integración de IA en la retroalimentación académica: Un análisis exploratorio en ingeniería en sistemas computacionales. Revista Interdisciplinaria de Ingeniería Sustentable y Desarrollo Social, 10(1), 330–343. [Google Scholar] [CrossRef]
Rosalind, J., & Suguna, S. (2022). Predicting students’ satisfaction towards online courses using aspect-based sentiment analysis. In E. J. Neuhold, X. Fernando, J. Lu, S. Piramuthu, & A. Chandrabose (Eds.), Computer, communication, and signal processing: ICCCSP 2022 (Vol. 651, pp. 27–39). IFIP Advances in Information and Communication Technology. Springer. [Google Scholar] [CrossRef]
Rué, J., Arana, A., González de Audícana, M., Abadía Valle, A. R., Blanco Lorente, F., Bueno García, C., & Fernández March, A. (2013). Teaching development in higher education in Spain: The optimism of the will within a black box model. Revista de Docencia Universitaria, 11(3), 125–158. [Google Scholar]
Ruiz-Cabezas, A., Medina-Domínguez, M. C., Subía-Álava, A. B., & Delgado-Salazar, J. L. (2022). Evaluación de un programa de formación de profesores universitarios en competencias: Un estudio de caso. Formación Universitaria, 15(2), 41–52. [Google Scholar] [CrossRef]
Saltos-García, P. A., Zambrano-Loja, C. M., Rodríguez-Carló, D. F., & Cobeña-Talledo, R. A. (2024). Análisis del impacto de las estrategias de seguimiento académico basados en la inteligencia artificial en el rendimiento de estudiantes universitarios en programas de administración. MQRInvestigar, 8(2), 1930–1949. [Google Scholar] [CrossRef]
Sánchez Núñez, J. A. (2007). Formación inicial para la docencia universitaria. Revista Iberoamericana de Educación, 42(5), 1–17. Available online: https://rieoei.org/historico/deloslectores/sanchez.PDF (accessed on 1 February 2025).
Sbalchiero, S., & Eder, M. (2020). Topic modeling, long texts and the best number of topics: Some problems and solutions. Quality & Quantity, 54(4), 1095–1108. [Google Scholar] [CrossRef]
Shaik, T., Tao, X., Li, Y., Dann, C., McDonald, J., Redmond, P., & Galligan, L. (2022). A review of the trends and challenges in adopting natural language processing methods for education feedback analysis. IEEE Access, 10, 56720–56737. [Google Scholar] [CrossRef]
Tejada, J., & Ferrández, E. (2012). El impacto de la formación continua: Claves y problemáticas. Revista Iberoamericana de Educación, 58(3), 1–14. [Google Scholar] [CrossRef]
Tello Díaz-Maroto, I. (2010). Modelo de evaluación de la calidad de cursos formativos impartidos a través de Internet. RIED. Revista Iberoamericana de Educación a Distancia, 13(1), 209–240. [Google Scholar] [CrossRef]
Tiana, A. (2013). Tiempos de cambio en la universidad: La formación pedagógica del profesorado universitario en cuestión. Revista de Educación, 362, 12–38. [Google Scholar]

Figure 1. Step by step process of AI-supported qualitative data analysis.

Figure 2. Student-based prioritized recommendations.

Table 1. Open-ended survey questions used for qualitative analysis.

Question	Response Options
(8) What aspects of the course did you like the most, and why? *	Free text
(9) What aspects of the course did you like the least, and why? *	Free text
(10) What suggestions do you have to improve this program? *	Free text

* Indicates mandatory question.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Míguez-Souto, A.; Gutiérrez García, M.Á.; Martín-Núñez, J.L. Exploring the Use of AI to Optimize the Evaluation of a Faculty Training Program. Educ. Sci. 2025, 15, 1394. https://doi.org/10.3390/educsci15101394

AMA Style

Míguez-Souto A, Gutiérrez García MÁ, Martín-Núñez JL. Exploring the Use of AI to Optimize the Evaluation of a Faculty Training Program. Education Sciences. 2025; 15(10):1394. https://doi.org/10.3390/educsci15101394

Chicago/Turabian Style

Míguez-Souto, Alexandra, María Ángeles Gutiérrez García, and José Luis Martín-Núñez. 2025. "Exploring the Use of AI to Optimize the Evaluation of a Faculty Training Program" Education Sciences 15, no. 10: 1394. https://doi.org/10.3390/educsci15101394

APA Style

Míguez-Souto, A., Gutiérrez García, M. Á., & Martín-Núñez, J. L. (2025). Exploring the Use of AI to Optimize the Evaluation of a Faculty Training Program. Education Sciences, 15(10), 1394. https://doi.org/10.3390/educsci15101394

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring the Use of AI to Optimize the Evaluation of a Faculty Training Program

Abstract

1. Introduction

1.1. Faculty Training in Spain

1.2. Evaluating Training Programs in Higher Education

1.3. The Use of ChatGPT to Support Qualitative Analysis: Potential, Limitations and Hybrid Approaches

2. Materials and Methods

2.1. Study Context and Methodological Framework

2.2. Participants and Sampling

2.3. Data Collection

2.4. Data Analysis

2.4.1. Step 1. Data Preprocessing

2.4.2. Step 2. Theme Validation

2.4.3. Step 3. Problem Definition

2.4.4. Step 4. Subcategorization of Main Problems

2.4.5. Step 5. Visual Mapping

2.4.6. Step 6. Solution Design

2.4.7. Step 7. Recommendations Extraction

3. Results

3.1. Critical Areas and Recurring Issues of the Initial Teacher Training Program

3.2. Root Causes of the Critical Aspects and Cause-and-Effect Diagram

3.3. Strategic Solutions and Concrete Actions for Each Problem

3.4. Student-Based Prioritized Recommendations to Address the Program’s Deficiencies

4. Discussion

5. Conclusions

6. Patents

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI