Skip to Content
Education SciencesEducation Sciences
  • Article
  • Open Access

9 January 2026

Language Assessment Literacy Development: A Case Study of Three EFL Teachers

,
and
1
Doctoral School of Education, University of Szeged, Petőfi sgt. 30, 6722 Szeged, Hungary
2
Department of English Applied Linguistics, Institute of English Studies, Faculty of Humanities, University of Pécs, Ifjúság u. 6, 7624 Pécs, Hungary
3
Department of Kindergarten Education, Institute of Applied Pedagogy, Juhász Gyula Faculty of Education, University of Szeged, Hattyas utca 10, 6725 Szeged, Hungary
*
Author to whom correspondence should be addressed.

Abstract

Language Assessment Literacy (LAL) is critical for teachers to perform their assessment tasks, but many teachers in low-resource contexts do not receive adequate assessment training. This qualitative multiple-case study examined the impact of a short-term Professional Development (PD) program on three in-service English as a Foreign Language (EFL) teachers in developing their LAL and in shaping their assessment conceptions, knowledge and practices as assessors. The PD training program consisted of a 30 h workshop delivered over one week and integrated theory, practice, collaborative learning, reflection, and feedback. Data collection instruments included classroom observations and interviews. Findings showed that the PD program improved teachers’ LAL by developing their assessment conceptions, knowledge, skills, and confidence, although the degree of improvement varied across participants. The findings also identified challenges teachers encountered in their assessment practices, including limited time, large class sizes, insufficient resources, and sociocultural factors that constrained teachers’ assessment conceptions and restricted their LAL development. The findings showed that PD programs could strengthen teachers’ professional identity as assessors by incorporating relevant content, practice opportunities, feedback, a supportive learning community, and self-reflection. The study findings have broader implications for professional development of LAL in other low-resource and exam-oriented EFL contexts with strong sociocultural constraints.

1. Introduction

Professional development (PD) in language assessment literacy (LAL) is crucial for developing teacher identity as assessors, maintaining the quality of LAL (Vogt & Tsagari, 2014) and increasing teachers’ confidence in their LAL practices (Levi & Inbar-Lourie, 2020). Adequate LAL among in-service English as a foreign language (EFL) teachers has been a key concern for researchers, as many studies revealed a need for formal training to improve teachers’ classroom assessment practices (Lan & Fan, 2019; Tsagari & Vogt, 2017; Yamtim & Wongwanich, 2014). Although in-service teachers might have participated in pre-service LAL training, the field of language assessment is dynamic and changeable, so teachers need to update their knowledge and refine their skills. Additionally, some pre-service programs are often not enough for an adequate level of LAL (Gan & Lam, 2020; López & Bernal, 2009). Therefore, LAL development should be continued through in-service professional development programs to keep teachers well-prepared for assessment practice (Berry et al., 2019; Vogt & Tsagari, 2014).
In the Yemeni context, serious deficiencies characterize the support and training offered to EFL teachers. These include limited professional development opportunities, insufficient hands-on training, and a lack of sustained support (Al-Jaro et al., 2020). The absence of LAL PD programs has resulted in narrow assessment practices that prioritize students’ written competencies over their oral communicative skills (Ahmed & Qasem, 2019). Furthermore, the assessment system is exam-oriented, and classroom assessments remain largely grammar-based, with teachers emphasizing grammatical accuracy at the expense of learners’ fluency (Al-Sohbani, 2013). In 2008, a national report identified test construction and evaluation as the primary areas requiring teacher training at the Basic Education level (International Standard Classification of Education [ISCED] 1-2; Republic of Yemen, Ministry of Education, 2008). Therefore, the Yemeni government proposed a five-year professional development plan. However, no evidence-based improvements in teachers’ assessment practice have been reported since then. Recent empirical research by Al-Akbari et al. (2025) confirms that the same gap persists for Yemeni EFL teachers who continue to demonstrate insufficient training in ‘knowledge of assessment’ and ‘assessment construction and evaluation’ areas within LAL. These findings suggest that addressing teachers’ specific training needs alone is not sufficient, but it is also important to evaluate the effectiveness of the PD programs and the factors that enhance their impact.
Empirical studies have found three effective factors: needs analysis results (Tsagari & Vogt, 2017), self-reflection (Berry et al., 2019; Yan & Fan, 2021; Weng & Shen, 2022), and collaborative learning communities (Yamtim & Wongwanich, 2014; Xu, 2019). Designing an effective language assessment PD program requires a clear understanding of what teachers’ needs are (Weng & Shen, 2022). Thus, needs analyses results should inform the development of these programs (Muhammad & Bardakçı, 2019; Tsagari & Vogt, 2017). Additionally, for better learning outcomes, teachers should reflect on their previous assessment experiences and imagine how they would assess their learners in the future (Berry et al., 2019). Reflection provides teachers with opportunities to pause and think of where they are and where they want to go (Weng & Shen, 2022). Additionally, teachers need time to process new information and apply what they have learned (González, 2021), and a practice that is facilitated by reflection. Thus, self-reflection helps teachers improve their understanding of language assessment, connect it to their practice (Yan & Fan, 2021), and promote their LAL learning (Xu, 2019). Furthermore, teachers in training programs need to have a sense of community because a supportive environment among participants gives them the opportunity for a free and open exchange of ideas. Interacting with others while sharing and negotiating what they have learned can lead to knowledge construction (Yan & Fan, 2021). Teachers who are involved in learning communities not only receive professional support but also gain autonomy in their assessment practices as assessors (Xu, 2019).
Understanding how LAL PD programs can effectively support teachers’ assessment development in exam-oriented and low-resource contexts like Yemen remains under-researched. Addressing this gap, the current study aims to explore the effects of a specific LAL professional development program on teachers’ assessment literacy identity by working closely with three EFL in-service teachers. The results of this study are expected to contribute to improving the design and implementation of LAL PD programs for in-service teachers by exploring the opportunities and challenges of enhancing teachers’ LAL. While this study investigates the Yemeni EFL context, Yemen reflects conditions common to many educational settings worldwide that are characterized by low resources. These include large class sizes, limited institutional resources and support, and sociocultural influences. Understanding how teachers develop LAL and assessor identity under such constraints offers insights into what may be relevant to similar EFL contexts internationally. The present study investigates the impact of a PD program on developing three teachers’ LAL by answering these research questions:
  • How did teachers perceive their LAL before and after participating in the PD program?
  • How did participation in the LAL PD program impact teachers’ assessment practice?
  • What challenges did teachers face in practicing their LAL after the PD?
  • Which features of the PD program did teachers perceive as the most effective in supporting their LAL development?

Theoretical and Conceptual Framework of the Study

The conceptualization of language assessment literacy (LAL) has evolved over the past three decades, building on a foundation established in general assessment literacy. The term assessment literacy was first introduced by Stiggins early in the nineties (Stiggins, 1991). He defined assessment literacy as “having a basic understanding of the meaning of high- and low-quality assessment and being able to apply that knowledge to various measures of student achievement” (Stiggins, 1991, p. 535). Brindley (2001) later extended assessment literacy into the domain of language education and proposed five dimensions: understanding the social context of assessment, defining proficiency, constructing and evaluating tests, integrating assessments into the language curriculum, and implementing assessments in practice. Davies (2008) further conceptualized LAL in terms of knowledge, skills, and principles while Inbar-Lourie (2008) emphasized its sociocultural embeddedness, arguing that assessment practices are shaped by contextual values, institutional norms, and classroom relationships. Fulcher (2012) broadened the scope of LAL to include social, cultural, historical, political, and ethical conditions, reinforcing the view that assessment is not merely a technical activity but a socially embedded professional practice. These early conceptualizations illustrate the expanding scope and increasing complexity of the LAL construct, moving from technical knowledge and skills toward a more contextualized, socially situated, and role-sensitive understanding and attitude, thereby creating the need for interrelated frameworks that systematically account for these multiple dimensions.
In 2013, Taylor conceptualized LAL and proposed a multidimensional framework, which integrates eight interrelated dimensions: knowledge of theory, technical skills, principles and concepts, language pedagogy, sociocultural values, local practices, personal beliefs/attitudes, and scores and decision making. Taylor’s model presents LAL as five levels of expertise, defined by varying degrees of assessment proficiency across these dimensions, with higher levels indicating greater depth and breadth of assessment proficiency required for particular assessment roles. The model also provides a practical structure for planning LAL programs by aligning training content with stakeholders’ assessment roles. Since its publication, the framework has been widely validated and contextualized in several contexts including Yemen (Al-Akbari et al., 2025; Baker & Riches, 2018; Bøhn & Tsagari, 2021; Kremmel & Harding, 2020). Al-Akbari et al. (2025) reconceptualized Taylor’s LAL framework for the Yemeni context. They expanded it into a ten-dimensional LAL framework, including assessment administration and use, scores and decision-making, personal beliefs/attitudes, language pedagogy, assessment techniques, principles and concepts, sociocultural values and local practices, assessment construction and evaluation, knowledge of language and methodology, and knowledge of assessment.
While these multidimensional frameworks, such as Taylor’s (2013) model and its Yemeni adaptation (Al-Akbari et al., 2025), identify what teachers need to know about assessment, they do not explain how teachers develop and enact LAL in practice. For this reason, this study adopted Xu and Brown’s (2016) Teacher Assessment Literacy in Practice (TALiP) model to frame the LAL development because it conceptualizes LAL as a developmental hierarchical process ranging from the conceptual knowledge base of assessment foundations to the construction of assessor identity. TALiP comprises six components across three mastery levels: a foundational knowledge base, teachers’ conceptions of assessment, institutional and sociocultural context, assessment literacy in practice, teacher learning, and the (re)construction of assessor identity. Central to this model is the distinction between two theoretically distinct but interconnected strands of LAL development. The first is a cognitive-emotional strand, encompassing teachers’ beliefs, values, confidence, and identity as assessors, while the second is a behavioral strand, reflected in their enacted classroom assessment practices. This distinction is particularly relevant for professional development, as gains in assessment knowledge do not automatically translate into changes in practice or assessor identity. Xu and Brown’s (2016) TALiP model guided the development of the design, implementation, and evaluation of the PD program by focusing on teachers’ LAL knowledge and conceptions, identity development, and classroom practice. This model provides a coherent theoretical foundation for examining both teachers’ conceptual development and their enacted assessment practice after participation in the PD programs.

2. Materials and Methods

The study adopted a qualitative multiple-case study design (Yin, 2003, p. 13) to explore the impact of professional development on teachers’ LAL. A case study approach was chosen because it allows for an in-depth examination of teachers’ knowledge, perceptions, and practices before and after the intervention (Creswell, 2012; Yin, 2003). A multiple-case design was employed for two reasons. The first reason was to reduce the risk of misinterpretation by avoiding unexpected case-specific factors (e.g., teachers’ sense of agency, motivation, prior experience, variations in school resources, class sizes, etc.) which could make cases unique in unforeseen ways (Duff, 2018, p. 113). The second reason was to ensure that the withdrawal of participants would not affect the study, as including multiple cases allows the research and its findings to remain valid and robust even if one or more participants drop out.

2.1. Participants and Setting

The study was conducted in three public schools in Al-Mahrah Governorate in the Republic of Yemen, where English is taught as a foreign language. Three in-service EFL teachers were selected through purposeful sampling based on their availability, sample characteristics, and willingness to engage in the study. Recruitment was coordinated with the Office of Education, which nominated teachers according to the sample criteria of the study. Although the study had a small sample size of three teachers and focused on one area in Yemen, it allowed for an in-depth, detailed exploration of each participant’s experiences, perceptions and practices. Additionally, small, focused samples are appropriate in qualitative case studies, as they provide rich contextually grounded data that contribute to understanding complex processes such as LAL development (Creswell, 2012; Yin, 2003).
These teachers shared the following characteristics: all taught in Basic Education (ISCED 1 and 2), were female, and held university degrees. They had six to seven years of teaching experience and represented different EFL teaching contexts, including both urban and rural schools. None of them had received any LAL PD training before this study. For confidentiality, they were referred to by pseudonyms: Saba, Bilqis, and Awsan. All were native speakers of Arabic, and they used English as the primary language of instruction. Although there was no official measurement of their English proficiency (e.g., CEFR-aligned test), their fluency and communicative competence were evident through classroom observations and their ability to engage with program content. Initially, the sample included four teachers. Unfortunately, one participant withdrew from the study, reducing the sample to three. Table 1 summarizes participants’ information, including their name, academic qualifications, years of teaching experience, school location (urban or rural), grade level they taught, and the number of students in each class. In the Yemen education system, grades 7–9 (ISCED 2) correspond to the lower secondary classes.
Table 1. Participant information.
These teachers taught in public schools, where students received five English lessons per week, each lasting 45 min (Mohdar & Pawar, 2020). The textbook, Crescent English Course for Yemen (CECY), is prescribed by the Ministry of Education in collaboration with the British Council (Mohdar & Pawar, 2020). This textbook includes a pupil’s book, a workbook, a teacher’s guide, and audio recordings, but it does not provide formal tests. The teacher’s guide provides general teaching guidelines and suggested lesson plans. However, teachers are still required to prepare their daily lesson plans. While teachers are expected to follow the prescribed textbook, they have the freedom to adapt lessons and supplement them with additional materials, activities, and assessment tasks to meet their students’ needs. They are also in charge of administering the tests, grading, and reporting results.

2.2. Fieldwork and Intervention

The study included fieldwork that lasted for approximately four weeks, from 8 January to 6 February 2023 (see Table 2). The fieldwork involved classroom observations, a professional development (PD) program as an intervention, and individual interviews. Pre- and post-program observations were conducted at the participating teachers’ schools to document assessment practices before and after the intervention. Individual interviews were carried out with teachers following the post-program observations.
Table 2. Detailed fieldwork and data collection timeline.
The intervention consisted of a one-week PD program in language assessment literacy, attended exclusively by the three participating teachers. It was facilitated by the first author (hereafter referred to as the tutor) under the supervision of the Training Office of the Ministry of Education in the governorate. The Training Office provided administrative oversight and logistical support, including venue arrangements and coordination with the participating schools. They also reviewed the program materials and ensured alignment with national educational priorities. Additionally, they appointed a representative, an English teacher educator in the governorate, to contribute to evaluating the program’s effectiveness. This teacher educator observed both pre-program and post-program observations and, during the final observation, recorded notes and reflections on any changes in teachers’ assessment practices. His role was to provide an external perspective on the impact of the PD program on teachers’ assessment classroom practices.
The PD program comprised 30 h, from 14 to 19 January 2023. It was held at a centralized public school provided by the Office of Education. During this period, the teachers were released from their regular teaching duties, and temporary classroom coverage was arranged by the school administration to ensure continuity of instruction. Transportation to and from the PD venue was also provided.
The content of the PD program aimed at meeting teachers’ LAL needs to enhance their LAL levels and practices. It was informed by a recent needs analysis of Yemeni EFL teachers’ LAL (Al-Akbari et al., 2025), which identified two main areas of need: (1) knowledge of assessment and (2) assessment construction and evaluation. Specific topics were selected from each of these two general areas. This helped keep the program well-structured and aligned with the objectives and within the timeframe. Another reason was to avoid overwhelming participants with the broad scope of each dimension. The selected topics included an introduction to LAL and assessment, different types of assessment, and specific modules on speaking, writing, reading, and listening assessment (Appendix A).
The six-day program began with an orientation and an introduction to Language Assessment Literacy (LAL), followed by a session on different types of assessment, including formative, summative, diagnostic, and alternative assessments. The subsequent days were organized around skill-specific assessment modules, with each day dedicated to a specific language skill: listening, speaking, reading, and writing. Each skill-based module included input sessions (e.g., assessing speaking or assessing writing), discussions, and practical tasks tailored to the teachers’ local teaching context. For example, during the speaking and writing modules, participants worked collaboratively with the prescribed textbooks to develop assessment tasks and rubrics, exchanged feedback on task design, and practiced scoring. In the reading module, teachers evaluated and revised assessment tasks previously used in their own classrooms, while the listening module involved developing listening tasks and that were reviewed by peers. These activities aimed to promote technical skills in assessment construction and evaluation. The final day of the program was dedicated to the presentation of an individual LAL development plan.
The PD program emphasized active learning through collaborative tasks, reflection, peer feedback, and task-based practice. Teachers also engaged in hands-on demonstration activities, during which they designed and implemented assessment tasks for specific language skills in simulated classroom situations. After each demonstration, teachers first completed a self-reflection form to reflect on their assessment practice (see Appendix B). This was then followed by structured feedback from the tutor and peers, focusing on task design and administration, alignment with assessment constructs and the targeted language skill, and appropriateness for learners. Additionally, participants engaged in group discussions to address common assessment challenges, such as grading consistency and task appropriateness, and to collaboratively explore practical solutions. The three teachers showed enthusiasm in group discussions about what they had learned as a team and in sharing their teaching experience.
The participants were experienced teachers, so they had substantial insights to share and reflect on. They offered suggestions, sought clarification, and brought up real-life classroom challenges. Awsan was very excited and active. She appeared highly focused and engaged, frequently asking questions and providing feedback. Bilqis showed willingness to take part in the PD program, often asking questions and participating actively in discussions. Saba seemed a bit reserved and quieter during the first day, although she expressed her excitement about receiving the PD program after a long period of teaching without any professional development opportunities. She had a keen ear and began contributing more as the program progressed. Overall, the three teachers approached the PD program with excitement and curiosity as it was their first participation in a program about assessment.

2.3. Data Collection

Data were collected through pre- and post-program classroom observations conducted by the first author, semi-structured interviews with the participating teachers, and supplementary observation notes and reflections from the teacher educator.
Unstructured observations were carried out twice during the fieldwork to examine changes in teachers’ assessment practices resulting from the LAL PD program. Observations focused on teachers’ assessment strategies, task design, and feedback practices in the classroom. The teacher educator observed alongside the first author in both pre- and post- program sessions, and during the last observations, provided written notes reflecting on whether any improvements were evident in teachers’ assessment practice following the PD program. For note-taking, data were initially captured through handwritten notes using a template designed by Creswell (2012, p. 216) as observation logs (see Appendix C). Notes were then typed, expanded, and reflected on shortly after each session to preserve contextual details and ensure completeness (Cohen et al., 2018, p. 387).
Following the final observation, individual semi-structured interviews were conducted for further investigation of the perceived impact of the PD program (see Appendix D). The interviews were conducted in both English and Arabic to accommodate the participants’ convenience and to facilitate communication. Each interview lasted approximately 60 min. The interviews were not audio-recorded, as none of the participants consented to being recorded. Instead, the first author took detailed notes during each interview and revised them immediately afterward for accuracy and completeness. As the interviews were conducted in both Arabic and English, the Arabic portions were translated into English during transcription by the first author, with attention to preserving meaning. For member checking, we shared the transcripts with the participants to confirm the accuracy of their statements and to obtain their validation and approval (Cohen et al., 2018, p. 248).
Participants were informed of the study’s purpose before data collection. While awareness of the study’s goals may have influenced their responses, steps were taken to minimize socially desirable responses by fostering a supportive, non-judgmental environment and assuring participants that all responses would remain confidential. This process helped enhance the trustworthiness and credibility of the data by ensuring that the participants’ perspectives were accurately represented. We also used methodological triangulation by combining observation notes, interview data, and the teacher educator’s notes to cross-verify findings (Cohen et al., 2018, p. 265). Finally, all data were reviewed and prepared for thematic analysis.

2.4. Data Analysis

A thematic analysis was conducted using cross-case synthesis techniques (Yin, 2003). First, we prepared the data by checking and transcribing them. Then, we read all transcriptions (interviews, observations, and teacher educator’s notes) to ensure familiarity with the overall dataset. During this step, we made initial notes about recurring ideas or statements to capture early impressions. The first stage of analysis involved coding each teacher’s data separately (within-case analysis). Meaningful units of data were identified and assigned interpretive codes. These codes were then grouped into sub-themes and themes that addressed the research questions. Then, the codes and subthemes from each case were compared side by side to identify patterns of similarities and differences across the three teachers. This process allowed the integration of cases while preserving their unique characteristics. This deductive coding was the primary source for the categories, themes, and sub-themes (Braun & Clarke, 2006). After that, we refined these initial codes and categorizations of themes through comparison across data sources. Through the iterative process, codes were revised, merged, or refined as new insights emerged. This process of refining and validating themes was repeated several times until full agreement was reached, after which the themes for each research question were finalized. Codes were grouped into subthemes and broader themes that captured patterns within and across datasets (see Appendix E). Finally, the themes were used to address the research questions and to identify the underlying meanings, relationships, and patterns that explained how and why teachers’ LAL beliefs, knowledge, skills, and practices developed. Themes were further interpreted in light of the conceptual framework and relevant literature to explore the role of training in shaping teachers’ LAL conceptions, background and practical skills, and to identify challenges teachers faced when practicing their assessment literacy.

2.5. Ethical Considerations

Ethical approval for the study was obtained from the Doctoral School of Education, University of Szeged. After that, we obtained permission from the Office of Education in Al-Mahrah Governorate. All participants provided their written informed consent. As recommended by Duff (2018, pp. 146–147), the consent form included information about the study topic, procedures and methods, the PD program period, etc. The participants were informed that their participation was voluntary and that they had the right to withdraw from the study at any point without negative repercussions (Duff, 2018, p. 147). Pseudonyms were used to protect their identities, and data were securely stored in compliance with institutional and ethical guidelines. We maintained the dataset confidentiality during the whole research process.

3. Results

This section presents the findings of the study based on interviews and classroom observations of the three participating teachers. The results are organized around the main themes explored in the research, including teachers’ LAL perceptions before and after the PD program, the impact of the PD program on teachers’ assessment practices, challenges in implementing LAL, and factors influencing the PD program’s effectiveness. Each subsection provides case-specific evidence illustrating individual trajectories and cross-case patterns in teachers’ knowledge, assessment practice, and evolving assessor identities following the PD program.

3.1. Teachers’ LAL Perceptions Before and After the PD Program

This subsection reports on teachers’ LAL perceptions before and after the PD program, based on both self-reported experiences and classroom evidence.

3.1.1. Limited Knowledge and Skills Before PD Program

During the interviews, participants stated that they had narrow and limited LAL conceptions before the development program. For example, Saba stated that “my information about assessment was very limited.” She added that she saw assessment primarily as a grading tool and did not recognize its value in the ongoing learning process. Similarly, Bilqis acknowledged that before the PD program, she equated assessment almost exclusively with examination, noting that “before the PD, I thought assessment could be done through examination only like the final or monthly tests”. Teachers described assessment primarily as grading rather than as a tool to inform instruction.
Assessment in the daily classroom was a marginal practice. Teachers also reported how they perceived assessment as an optional procedure rather than an integral practice of instruction. Awsan admitted that “I used to assess my students only through tests or by questioning them at the end of the lesson”. Saba echoed the idea, saying, that “I used assessment to gauge my students at the end of my class if I had time to do so,” and added, “I also assessed my students using summative assessments through exams.” Additionally, Awsan and Bilqis pointed out that their LAL skills were not adequate to design their own classroom assessment tasks, so they relied on the teacher’s guide (we will elaborate on this point later in answering the second question) and the exercises in the textbooks. They reported low confidence in designing assessments on their own and relied on textbook materials. For example, Awsan stated that “for years, I used only the assessment tasks that were available in the books or suggested in the teacher’s guide.”

3.1.2. Broadened Teachers’ LAL Knowledge

After the PD program, teachers reported that the program expanded their assessment background knowledge and enhanced their assessment conceptions as assessors, as stated by Saba; “I can say now that my knowledge of assessment has become broader”. The three teachers updated their knowledge about different assessment types, their purposes, and their appropriate use. Awsan reported that she had gained knowledge about various assessment methods and when to use them. The same idea was shared by Saba talking about her LAL knowledge after the program as she said that “I know what different kinds of assessments are available to me as a teacher, why I may use them, and when I can use them.” In the same vein, Bilqis explained that the PD program helped her reconceptualize assessment as an ongoing classroom process rather than a final product, stating that “I used to believe assessment was only about exams, but now I understand it is part of each class and each lesson from the beginning to the end”. Moreover, Bilqis illustrated her extended LAL knowledge and conceptual change by reflecting on her evolving understanding of assessment and improved practice. She described moving from traditional assessment approaches to incorporating formative and alternative assessment methods. Bilqis also emphasized using multiple assessment strategies to gauge the achievement of learning objectives, provide constructive feedback, and identify and address learning difficulties.
Teachers reported increased awareness of their strengths and areas for improvement in assessment following the PD program. They became conscious of their LAL skills and what they needed to improve. For instance, Saba became cognizant that she was gradually developing her LAL after the PD program through each of her LAL practices. This point was traced in her comment, “I feel that the concept [LAL] that I had no idea about in the beginning has been developing, sometimes slowly, but in the right direction.” Additionally, Awsan talked about her LAL strengths and weaknesses as she mentioned that “my ability to assess reading and writing has improved,” and added “I use many speaking activities and exercises in my classroom, but I think I have not yet reached a satisfactory level.”

3.2. The Impact of the Program on Teachers’ Assessment Practice

This subsection outlines the change in teachers’ LAL following the PD program. The analysis integrates post-program interview insights and classroom observations to illustrate how teachers expanded their assessment knowledge, adopted new assessment practices, and reshaped their identities as assessors.

3.2.1. Considering the Construct

The LAL PD program helped teachers develop skills in designing assessments based on the construct, which is the knowledge, skill, ability, or attribute that teachers intend to measure with the assessment task (Pellegrino et al., 2001). The teachers reported that after the training, they considered the construct being assessed when designing tasks, reflecting a deeper understanding of how to align assessments with learning objectives. For example, Awsan provided a clear articulation of the construct, goals, and skills alignment, stating “in each lesson, I identify which skills need to be assessed and the appropriate way to assess them”. She further explicitly declared her gained competence to use the construct to develop assessments and shared that “I am able to develop my classroom-based assessment… based on the construct and the learning goals.” In the same vein, Saba reported that the PD program developed her ability to design assessments that aligned with the intended construct. This newly adopted practice enabled her to move beyond using assessment tasks mechanically and make more principled decisions about what aspects of language ability were assessed. This expanded the teachers’ competency in developing a wider range of language assessment tasks, including adapting ready-made assessments. Teachers reported that they became careful to align assessment tasks with intended learning outcomes and the specific skills or knowledge being assessed. Bilqis’s reflections further support this development. She indicated that following the PD, she considered “what assessment task better assesses the construct,” which enabled her to adapt and design assessment tasks that aligned with targeted learning outcomes.

3.2.2. Shift in Assessment Practice

The three teachers changed their assessment practice from traditional summative methods to more formative-oriented approaches after the PD program. Before the PD program, classroom observations showed limited use of formative assessment, minimal feedback, and few opportunities for students to demonstrate understanding beyond memorized responses. Post-program observations revealed shifts toward formative assessment as an integral part of instruction, which involved increased student participation and the use of feedback. For example, Saba moved from traditional grammar-translation style with minimal formative assessment to interactive lessons with ongoing formative assessments. She also used contextualized and scaffolded tasks with feedback. Similarly, Awsan replaced recall-focused questioning with multiple assessment practices (e.g., skimming and scanning for reading assessment) that elicited evidence of comprehension and supported learners through guided prompts and timely feedback. Bilqis previously depended on traditional and summative approaches such as reading aloud and drills, rote questions, and translation tasks. Her pre-PD reading class was isolated from assessing students’ learning. However, after the PD, she integrated assessment with instruction by using multiple formative tasks throughout the lesson.
The interviews confirmed the integration of formative assessment into instruction after the PD program. All teachers reported relying on tests and summative assessment tasks before the program and acknowledged limited awareness of formative assessment. For example, Saba mentioned, “before the training, I thought it was enough to assess my students through tests such as monthly and final exams.” After the training, they expressed increased understanding of its importance and more systematic integration of assessment throughout lessons. For instance, Awsan described using assessment at the beginning, during, and at the end of lessons to monitor learning continuously, using activities, such as games, performance tasks, role-plays, and storytelling. Similarly, Bilqis shared that “I use formative assessment in my classroom more than before.” She added, “The program helped me change my way of assessing my students. I adapted my assessment methods to assess my students’ learning accurately and on an ongoing basis.”

3.2.3. Assessing Language Skills Post-PD Program

The LAL program encouraged teachers to assess the four language skills (listening, speaking, reading, and writing) rather than just content knowledge. Observations of both the researcher and the teacher educator showed how teachers moved from only assessing reading aloud, vocabulary and grammar to assessing speaking, reading comprehension, etc., and integrated skill assessment tasks. The same has been found in the interview data. For instance, Saba indicated that she began diversifying her assessment methods and integrating skill-based assessment, focusing on assessing language skills through a range of tasks. She added that “all language skills should be assessed.” Similarly, Bilqis pointed up a broadening of her assessment focus, explaining that while her assessment had previously concentrated mainly on vocabulary and reading, after the PD program she assessed all four language skills and paid attention to students’ communicative ability. Awsan reported that her assessment practice changed from only emphasizing assessing grammar, vocabulary, and pronunciation to assessing the four skills and other language elements. She also reported that she gained confidence in assessing listening skills. Given that listening had been previously absent from her assessment repertoire, this change represented a notable expansion of her assessment focus and practice.

3.2.4. Tailoring Assessments to Student Needs

Teachers in this study reported gaining the skills to develop classroom-based assessments tailored to their students’ language levels and needs. This development was consistently confirmed by the three teachers. For example, Awsan explained that she now employs “assessment that suits the lesson objective and the students’ levels and needs” adding, “I am able to develop my classroom-based assessments that are appropriate for my students’ language level and needs.” Similarly, Bilqis emphasized that the PD program increased her awareness of the importance of considering learner differences in assessment, stating, “I also learned I should consider students’ different language levels.” They developed their capacity to design more personalized and effective assessment tasks and became aware of classroom diversity. Saba reported that the PD program helped her design assessment tasks that suited her students’ level, age, and background. The teachers also carefully considered choosing the appropriate assessment tasks when using and adapting ready-made assessment tasks to align with students’ language proficiency. For example, Bilqis talked about her new gained ability to customize textbook exercises to better suit her students’ language proficiency.

3.2.5. Confidence and Autonomy in Assessment Design

Teachers’ confidence and autonomy in creating assessments were boosted by the program. Awsan and Bilqis relied primarily on textbooks and teaching manuals prior to the program. That was evident in the initial observations: they taught while they kept holding and looking at their textbooks during the whole lesson, but after the training, they became less reliant on the books. The teachers reported higher confidence in developing and implementing assessment tasks independently after the PD program. Awsan said, “I prepare my own activities and exercises to assess my students’ learning.” She further stated that she had the confidence to assess her students’ language skills. In a similar vein, Saba and Bilqis stated that they became able to develop assessments without depending on ready-made tasks, reflecting their improved skills and confidence.
One of the notable outcomes of the program was the reduction in teachers’ reliance on teachers’ guides as they became autonomous assessment designers instead of only implementing prescribed materials. Teachers reported that they minimized relying on manuals for planning their instruction and assessment. Bilqis indicated that she gained the ability to plan her lessons and the assessment-related activities without using guidance materials as she used to do before the PD program. Additionally, Awsan admitted that “I was too dependent on the teacher’s guide; now, I plan the lesson, and I plan how to assess it.” LAL equipped teachers with the skills and knowledge to plan the appropriate assessment for each lesson as Awsan reported that “I am able to make informed decisions about what to teach or include from the coursebook because knowing what assessment is, why we assess, and how, helps me focus on achieving the learning objectives.” This development provides concrete evidence of increased assessment autonomy and practical application of assessment tasks that align with the intended learning outcomes. Teachers acquired the ability to design classroom assessment tasks independently in a way that served the intended learning outcomes. Bilqis indicated that “I plan my own lesson with the assessment-related activities without using the teacher’s guide,” demonstrating increased confidence and autonomy in assessment construction. Bilqis explained that:
For classroom assessment, I used to rely on the exercises in the textbook only. However, now I am able to adopt and adapt exercises and tasks that suit my students’ language level and needs while also helping me to achieve the learning objective. This is because I know how to assess students much better and what kind of assessment better assesses the construct.
In practice, observations supported the teachers’ claim of reduced dependence on the textbook to assess their students’ skills. For example, after the PD program Awsan used a reading text from the textbook, which did not include reading tasks like scanning skills, but she created a task with three questions and wrote them on the board. Then, she asked the students to answer these questions by referring to the text in their books. Confidence and autonomy in assessment construction reflected the improved assessment identity and how teachers after the training saw themselves as assessment designers.

3.3. Challenges Teachers Faced in Practicing Their LAL

During the post-PD program, the participants encountered several challenges while practicing their LAL, including class sizes, time constraints, resource availability, lack of support, diverse language proficiency, and local practices/sociocultural factors. These challenges were observed in the classroom and reported during interviews.

3.3.1. Class Size

All three participating teachers had large classes. However, observations showed that teachers with larger class sizes faced greater challenges in practicing LAL compared to those with fewer students. Awsan struggled the most with implementing assessment tasks and classroom management. It was also difficult for Bilqis to assess each student individually or in groups because she had 58 students in her class. As a result, she tended to elicit answers from the students instead of involving them in more comprehensive assessment activities. For Saba, whose class was smaller, this problem was less severe, but it nevertheless affected her ability to completely apply the intended LAL practices. The crowded classroom made it infeasible to implement interactive and personalized assessment tasks effectively. Assessing students individually was time-consuming, and grouping students was too difficult to manage or monitor. Similarly, the data analysis from the interviews revealed that the large number of students posed challenges for assessment. Saba indicated, “The large number of students makes it difficult to assess the four language skills for all the students.”

3.3.2. Time Constraints

Saba and Awsan reported minimal class time as one of the barriers to assessing students effectively across different language skills. Saba added that the large class sizes and limited numbers of English classes per week exacerbated this problem. The lack of sufficient time to implement comprehensive assessments was noticeable during the observation, and neither Awsan nor Saba could carry out all the assessment tasks they planned. As a consequence of the limited time available for EFL teachers to cover the entire syllabus, teachers face another challenge. As stated by Saba, “there is no smooth transition in the syllabus between grades 7, 8, and 9”. Because teachers usually can’t complete the syllabus, it becomes difficult to teach new or disconnected topics and language skills.

3.3.3. Resource Availability

The lack of resources hindered teachers’ ability to fully implement the acquired LAL. This included the absence of necessary teaching aids for listening exercises, which constrained the teachers’ ability to assess certain skills effectively. Bilqis pointed out that the lack of resources, such as listening materials, as well as the difficulty of downloading listening materials available online, posed obstacles for assessing listening skills. Saba shared the same concern as she stated that the school administration had failed to provide the necessary facilities and materials to assess students.

3.3.4. Local Practices and Sociocultural Factors

During the interviews, teachers discussed some negative influences of local practices and sociocultural factors on their assessment practices. Awsan mentioned the educational system and pressure to complete the syllabus as examples of local practices that hindered her assessment work. Additionally, Bilqis noted the sociocultural effects of learning English in her community that there was little interest in language learning within the society. Therefore, people wanted the tests to be easy for their kids so they could pass from one grade to another. She added that many parents were not concerned about whether their children actually acquired the language.

3.3.5. Diverse Language Proficiency

Although teachers gained skills to assess students’ different needs and proficiency levels, variation in language proficiencies made it challenging to design and administer assessments that were appropriate for all students. As mentioned by Awsan, there were several barriers to practicing what she learned in the training, including the wide range of language proficiency levels in the same class. Diverse language abilities were noted during observations among the students of both Saba and Awsan.

3.4. Factors or Components Boosting the Effectiveness of the PD Program

To address this question, we analyzed the answers from the interviews. Teachers attributed the effectiveness of the LAL PD program to several factors, including the content, hands-on practice during the training, role of the learning community, self-reflection techniques, and exposure to assessment samples.

3.4.1. Relevant Content

The relevance of the content was highlighted by all three teachers as a crucial element in enhancing their assessment practices. The content was tailored to meet the specific LAL needs of teachers in Yemen, as it was designed according to the result of previous research. It was focused on areas where teachers needed improvement, as mentioned by Awsan.

3.4.2. Self-Reflection

Teachers in this study agreed that self-reflection was another important factor in the LAL PD program’s efficacy. Saba indicated that self-reflection helped her evaluate her assessment practices, identify strengths and weaknesses, and make informed decisions for further improvement. She related the effectiveness of the self-reflection technique to the use of the written reflection which allowed her to review her thoughts and make decisions about her future actions. Bilqis believed that self-reflection made her more aware of her LAL knowledge and skills as well as the areas that needed improvement. This suggests that the integration of self-reflection enhances their understanding of effective assessment practices and acts as a valuable tool for their professional growth.

3.4.3. Learning Community

The third factor that teachers believed had a positive influence on their learning was the learning community. Awsan expressed that she learned from the ideas shared by the two other teachers, which included other assessment activities, and problem-solving ideas. According to Saba, the learning community created during the PD program provided a platform for teachers to share knowledge, discuss assessment challenges, and find solutions together. Bilqis stated that the learning community “helped me discuss assessment difficulties and find possible solutions by sharing knowledge and experience with my colleagues.” During the PD program, teachers also discussed how to implement the newly acquired knowledge or skills in their classroom, the possible challenges they might face, and the strategies to overcome them; thus, these discussion sessions helped bridge the gap between classroom realities and what they were learning in the PD program as stated by Saba.

3.4.4. Demonstration and Feedback

The participants emphasized that the demonstration and feedback sessions played a crucial role in developing their LAL. Hands-on demonstrations allowed teachers to consolidate their learning, test their understanding, and enhance their practical skills in LAL. Saba said that “I had the chance to practice what I learned and got feedback to develop my assessment practice”. Teachers were able to apply the assessment concepts and strategies in their classroom during the PD program itself. Saba and Bilqis agreed that the practice during the PD program was beneficial in developing their LAL skills. According to Bilqis, the benefits of practice were found in the opportunity to demonstrate one’s abilities, receive feedback, and take responsibility for planning and implementing activities effectively. The feedback received after each demonstration helped teachers recognize areas for improvement, refine their assessment methods, and gain confidence in planning and implementing tasks effectively. Bilqis reflected on the benefits, stating, “When you think you understood something and then you have the chance to implement it to show others what you could do. That moment to realize what you really could do or not.”

3.4.5. Assessment Samples

Another effective factor of this PD program, according to Saba, was the exposure to assessment samples for designing appropriate assessments. These samples provided the participating teachers with concrete examples of how to design and implement assessments that were appropriate for their students’ levels and needs. This showed the importance of providing assessment samples for trainees to see an example of how to align a task with the intended learning outcomes, what possible methods to assess each skill, etc. Saba found these samples effective in developing their LAL. Here is Saba’s excerpt.
Samples of the tests and assessment tasks gave me an idea of how to assess my students. The analysis of these samples and tasks also gave me an idea of how to design the assessments to assess the construct.

4. Discussion

The study found that a short-term professional development program in LAL improved teachers’ knowledge, conceptions, and assessment practices, which corresponds with previous studies (Baker & Riches, 2018; González, 2021; Levi & Inbar-Lourie, 2020; Saputra et al., 2020). They moved beyond the traditional exam-oriented assessment focus that emphasized written accuracy and grammatical knowledge over oral and communicative language skills. Participants’ ability to create, evaluate and implement effective assessment developed, in line with Levi and Inbar-Lourie (2020). These findings suggest that the teachers’ LAL improved as a result of participating in the PD program. The PD program not only expanded teachers’ LAL knowledge and skills but also transformed their assessment conceptions. Teachers’ ideas about assessment shifted from viewing assessment as summative and grade-based to understanding that assessment is also a formative, integral process that can be implemented using various methods to achieve learning objectives. This change in teachers’ assessment background and conceptions through the PD program supported the development of their assessment literacy. Furthermore, this PD program also contributed to shaping teachers’ identities as assessors. The findings indicate increased confidence and autonomy after the PD program, as it empowered teachers’ identity as assessors. This finding is consistent with previous research (Levi & Inbar-Lourie, 2020; Xu, 2019).
Improvement among the three teachers was uneven, with Bilqis showing the most progress, followed closely by Saba, and Awsan, who demonstrated the least improvement. These differences cannot be related to teaching experience, as all participating teachers had similar experience levels (six to seven years), but it may be related to the participants’ educational backgrounds (Muhammad & Bardakçı, 2019; Weng & Shen, 2022). Bilqis and Saba, both graduates of faculties of education, had a foundation in educational assessment, whereas Awsan, whose degree was in literature and arts, had limited exposure to assessment theories. Consequently, Awsan’s classroom assessment practice showed less improvement compared to the other two teachers. These variations reflect the role of teachers’ prior educational background in facilitating their training and emphasize the role of formal education in shaping teachers’ LAL identity as assessors. Therefore, teachers’ background knowledge should be considered when designing and implementing assessment literacy PD programs.
Despite observable improvements in their LAL understanding and practice in the classrooms, discrepancies emerged between teachers’ reported practices and observed classroom behaviors. For example, Saba reported designing assessment tasks appropriate for students at different proficiency levels, yet observation indicated that she did not involve tasks and techniques to assess all the students. Similarly, Awsan expressed high confidence in her ability to create and implement assessments, but observation data revealed that she struggled with task implementation. This indicated a gap between her perceived and enacted competence. In contrast, Bilqis demonstrated better alignment between her reported gains and observed classroom practices. Although she adopted newly learned assessment methods to a greater extent, full alignment between her perceived LAL and actual classroom enactment had not yet been fully achieved. These discrepancies are expected and reveal how LAL development is gradual, mediated, and bound to context. These findings align with previous research (Lan & Fan, 2019; Levi & Inbar-Lourie, 2020; Tsagari & Armostis, 2025; Saputra et al., 2020; Vogt & Tsagari, 2014; Yan & Fan, 2021), highlighting the mismatch between teachers’ assessment competence and their classroom practice.
This gap between knowledge and practice reflects the complex and gradual nature of LAL development. Teachers may internalize conceptual knowledge before fully mastering and translating it into practice. Previous research emphasized that assessment training alone is insufficient without ongoing support and time for reflection and experimentation (González, 2021). Although exam-oriented culture resists change (Yeşilçınar & Kartal, 2020), research proved that exam dominated contexts, similar to the Yemeni system, can change gradually when provided by sustained, supported PD programs (Davison, 2023). The tensions between perceived and enacted competence are not indicative of failure but rather reflect the gradual and mediated nature of LAL development, particularly in contexts characterized by large classes, time constraints, limited resources, and strong exam-oriented pressure. These findings also suggest that while the PD programs initiate conceptual and practical change, teachers often require extended time and continued support to integrate new assessment practices into daily classroom routines.
The findings also highlighted the complex and diverse challenges teachers faced in practicing LAL, with variations in how these challenges manifested across the three teachers’ practices. Participants identified large class sizes, limited instructional time, and lack of resources as obstacles to implementing their LAL effectively. These findings are consistent with earlier research that pointed to local constraints such as time constraints, overcrowded classes, lack of teaching aids, and absence of audio resources as barriers to English teaching and assessment (Ahmed & Qasem, 2019; Al-Sohbani, 2013). These findings also resonated with research from other low-resources and exam-oriented contexts, including Iraq, Turkey, China, and Indonesia (Davison, 2023; Muhammad & Bardakçı, 2019; Yeşilçınar & Kartal, 2020; Zulaiha et al., 2020).
Additionally, teachers reported the pressure to complete the full syllabus as another constraint, echoing international studies showing that curriculum demands often limit teachers’ ability to adapt assessment to students’ needs; thus, impeding the translation of LAL into practice (Tsagari, 2016; Xu, 2019). Previous local studies also pointed out that teachers felt compelled to complete the textbooks prescribed by the Ministry of Education regardless of students’ proficiency level (Ahmed & Qasem, 2019). This highlights the influence of local practices in low-resource contexts on LAL practices and development. This dominance of these local practices and institutional factors suggests that while bottom-up PD initiatives can enhance teachers’ knowledge and agency, sustainable LAL development requires complementary top-down support. Without institutional alignment such as curriculum flexibility, assessment reform, administrative support, and reduced exam pressure, teachers’ ability to translate LAL into practice remains limited. Therefore, sustainable LAL development requires top-down institutional support to enable teachers to implement and maintain assessment innovations effectively (Zhang et al., 2025).
Sociocultural factors further constrained teachers’ LAL development. Parents prioritized passing tests over developing language skills and students were motivated primarily by the need to achieve passing grades due to limited perceived future benefits from learning English in the Yemeni economic context (Ahmed & Qasem, 2019; Al-Sohbani, 2013). These findings align with international research showing that societal expectations, economic constraints, and parental attitudes influence teachers’ assessment practices and professional identity as assessors (Liu & Li, 2020; Yeşilçınar & Kartal, 2020). Similarly, comparative research has shown that educational policies and local assessment cultures play a central role in shaping how teachers understand and practice LAL (Tsagari & Armostis, 2025). Therefore, both sociocultural factors and the local practices shaped teachers’ conceptions of assessment and constrained their identity development as assessors. The results indicate that LAL PD programs should be context-sensitive, culturally responsive, and aligned with the existing beliefs, local constraints, and cultural expectations.
The LAL PD program’s effectiveness was driven by a combination of relevant content, hands-on practice, collaborative learning in the learning community, self-reflection, and exposure to assessment samples. Teachers named different aspects of the PD program impactful, reflecting its well-designed structure. Hands-on practice, peer feedback, and reflection allowed teachers to refine their skills to build new LAL knowledge and skills and develop their LAL. The content was based on the needs analysis of Yemeni EFL teachers, ensuring alignment with their context. This approach aligns with previous studies emphasizing that LAL training should address teachers’ specific needs rather than adopting a one-size-fits-all model (Tsagari & Armostis, 2025; Vogt & Tsagari, 2014). These findings indicate that engaging teachers in hands-on activities and reflective practice enables them to develop practical knowledge. This supports the claim by Yan and Fan (2021), who argued that experiential learning and reflective practice help teachers to construct practical knowledge of what works in their context through reflection upon these assessment experiences. The importance of learning communities and peer collaboration during LAL PD programs has been highlighted in previous research (Berry et al., 2019; Saputra et al., 2020; Yamtim & Wongwanich, 2014) and self-reflection has been shown to enhance professional growth (Berry et al., 2019; Xu, 2019). PD programs that encourage collaboration, self-reflection, and peer support can strengthen teachers’ assessment agencies and contribute to sustainable LAL development.

5. Conclusions

Language assessment literacy development is increasingly recognized as essential for effective language teaching, particularly in contexts where teachers face large classes, limited resources, traditional exam-oriented systems, and local practices, and strong sociocultural constraints. Many EFL teachers in such settings have limited access to sustained and contextually relevant LAL training. Responding to this need, the present study examined the impact of a short-term LAL professional development program on three in-service EFL teachers’ LAL perceptions before and after the program, their classroom assessment practices, the factors influencing the PD program’s effectiveness, and the challenges they encountered when enacting LAL in classroom practice.
The findings indicate that the PD program successfully enhanced the participants’ assessment knowledge, skills, and conceptions. It contributed to shifting the teachers’ practices and perceptions of assessment from a traditional summative practice to a formative process integrated with instruction. Beyond technical gains, the PD program also supported the development of teachers’ professional identities as assessors, fostering greater confidence, autonomy, and a sense of responsibility for students’ learning. At the same time, the findings revealed variation in the extent of improvement in the three teachers, highlighting the influence of prior educational background and contextual constraints. The findings also revealed discrepancies between teachers’ reported assessment competence and their enacted classroom practices. This suggests that LAL development involves not only acquiring assessment knowledge, and skills but also cognitively internalizing them and negotiating contextual constraints, professional identity and classroom realities, indicating that LAL development is gradual and mediated. Moreover, teachers faced persistent challenges in translating their LAL knowledge into practice due to time constraints, large class sizes, limited resources, curriculum pressures, sociocultural constraints, and expectations that prioritize examination results over language learning. These findings suggest that factors such as individual and contextual factors play a prominent role in LAL development and implementing changes in assessment practices in EFL classrooms.
This study contributes to the growing international literature on LAL by providing empirical evidence from a low-resource, culturally constrained EFL context. It highlights the value of PD programs that integrate relevant content, hands-on practice, collaborative learning communities, relevant feedback, and structured self-reflections. Furthermore, it illustrates how such programs can strengthen teachers’ assessment literacy, autonomy and professional identity. At the same time, the study emphasizes that short-term PD programs may be insufficient without ongoing support to help teachers bridge the gap between knowledge and classroom practice. Overall, the objectives of the study were achieved, and the research questions were addressed by documenting both notable changes in assessment knowledge and practice as well as deeper conceptual and identity-related development in the case of the three teachers.
Several limitations should be acknowledged. The small sample size of three teachers does not represent EFL teachers in Yemen. Furthermore, the focus on a single geographic area in Yemen further limits the generalizability of the findings. Therefore, these outcomes may not be applicable to teachers in other sociocultural or educational settings even in Yemen. The short duration of the study was another limitation. Time constraints made it difficult to capture the long-term impacts of the PD program on teacher assessment practices. Lastly, the reliance on self-reported data from interviews may present bias, as teachers might have provided socially desirable responses, as they were aware of the study’s purpose prior to data collection. Efforts were made to encourage honest and reflective answers, but the potential for social desirability or response bias cannot be fully ruled out. Triangulation of data was used to counterbalance some of the challenges. Future research should explore how LAL PD programs affect a broader range of characteristics of teachers, including those from various geographical areas and educational settings, with different academic qualifications. Further research should involve a larger sample to explore the applicability of our findings and to enhance generalizability. Researchers should conduct longitudinal studies tracking the long-term impact on assessment practices and student outcomes. Studies could compare how various LAL PD programs contribute to teachers’ professional development.
Based on the findings, several concrete recommendations can guide the design of effective LAL professional development programs. First, PD programs should incorporate a structured cycle of hands-on practice, where teachers design, use, reflect, and revise assessment tasks with support from facilitators. Second, peer learning and collaborative communities of practice should be embedded into the PD program design to enable teachers to share experiences, co-construct knowledge, and collectively troubleshoot challenges. Third, PD programs should include guided reflection activities, such as reflective journals, post-observation debriefings, or model analyses, to help teachers internalize assessment principles and connect them to their daily instructional decisions. Fourth, given the contextual constraints identified in this study, PD programs should adopt a context-sensitive approach, offering examples and strategies that are feasible in large, resource-limited classrooms. Finally, the findings highlight the importance of ongoing follow-up support such as coaching, mentoring, or short refresher workshops to help teachers sustain change, ensure sufficient time and pace for gradual learning to bridge the gap between theoretical understanding and enacted practice. These practical implications can inform the development of more effective and contextually responsive LAL PD programs.
The findings also highlight the inevitable influences of individual, institutional, and sociocultural factors on the development of LAL and the adoption of positive LAL-related changes in the classroom. Although this study employed a bottom-up approach to professional development that aimed at strengthening teachers’ agency and identity as assessors, institutional factors appear to play a particularly powerful role in shaping teachers’ assessment practices, often mediating how assessment knowledge is interpreted and enacted in classroom contexts (Zhang et al., 2025). Hence, institutional top-down support is necessary to sustain LAL-related change and to improve both teachers’ assessment literacy and students’ learning outcomes. Accordingly, the findings of this study imply an urgent need for reform in the assessment system in Yemen. Such reforms are also relevant to other similarly constrained contexts, where governmental bodies or educational authorities seek to improve or contribute to the systematic development of teachers’ LAL.

Author Contributions

Conceptualization, S.A.-A., M.N. and Á.H.; methodology, S.A.-A., M.N. and Á.H.; validation, S.A.-A., M.N. and Á.H.; formal analysis, S.A.-A., M.N. and Á.H.; investigation, S.A.-A.; resources, S.A.-A.; data curation, S.A.-A., M.N. and Á.H.; writing—original draft preparation, S.A.-A.; writing—review and editing, S.A.-A., M.N. and Á.H.; supervision, M.N. and Á.H.; project administration, S.A.-A.; funding acquisition, S.A.-A.; All authors have read and agreed to the published version of the manuscript.

Funding

The first author is a recipient of the Stipendium Hungaricum scholarship, awarded by the Hungarian Government to international doctoral students. The PD program was funded and supported by the Local Authority and the Office of Education in Al-Mahrah Governorate, Yemen. The APC was funded by the University of Szeged Open Access Fund, grant number [7086].

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the institutional Review Board (IRB) of the Doctoral School of Education, University of Szeged (Reference number: 24/2021; 30 December 2021).

Data Availability Statement

Data is unavailable due to privacy or ethical restrictions.

Acknowledgments

The authors would like to express their sincere gratitude to all those who contributed by facilitating data collection or devoting their time to participate in the study, making this research possible. The authors have reviewed and edited the output and taken full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
LALLanguage assessment literacy
EFLEnglish as a foreign language
ISCEDThe International Standard Classification of Education
CECYCrescent English Course for Yemen
IRBInstitutional Review Board
PD Professional Development

Appendix A

LAL training program schedule
SaturdaySundayMondayTuesdayWednesdayThursday
14 Jan.
Day 1
8:30–9:30 Orientation
9:30–10:30 Introduction to LAL and assessment.
Coffee break
11:00-13:00 Input- assessment types.
Homework.
Collect samples of different assessments you used to assess your students. Reflect on these assessment samples.
15 Jan.
Day 2
8:30–9:30 Speaking assessment module.
Reflection on the module.
9:30–10:30 Input: Assessing speaking. Then reflection and discussion on your speaking assessment practice.
11:00–13:00
Discussion: Speaking assessment difficulties and possible solutions.
Design speaking assessment tasks in pairs.
Demonstration
Group work: Speaking feedback
Homework.
Choose a lesson or unit from the textbook and develop a speaking test or a classroom assessment activity individually.
16 Jan.
Day 3
8:30–9:30 Writing assessment module. Reflection on the module.
9:30-11:30 Input: Assessing writing. Then reflection and discussion on your writing assessment practice. Develop a writing task.
12:00–13:00
Group work: Demonstrate speaking assessment tasks and receive peer feedback.
Homework.
Choose a lesson or unit from the textbook and develop another writing assessment task with a rubric.
17 Jan.
Day 4
8:30–9:30 Reading assessment module. Reflection on the module.
9:30-11:30 Input: Assessing reading. Then reflection and discussion on your reading assessment practice.
11:00–12:00
Develop reading tests in pairs.
12:00–13:00
Demonstration:
Exchange writing tasks to do them. Then, return the tasks to the developers for scoring.
Reflection on writing tasks, rubrics, and scoring.
Homework.
Choose a reading task from your previous assessment practice to evaluate it. Be prepared to share and discuss this task evaluation with your peers.
18 Jan.
Day 5
8:30–9:30 Listening assessment module.
Reflection on the module.
9:30-12:00 Input: Assessing listening. Then reflection and discussion on your listening assessment practice.
Develop listening tests in pairs.
Share them for discussion and reflection.
12:30–13:00
Group work: Demonstration task.
Conference on evaluating their reading assessment tasks
Homework.
A plan for how to use what you have learned to improve your assessment practice. A plan for LAL professional development.
19 Jan.
Day 6
8:30–9:30 Present your learning development plan on language assessment
9:30-10:00
Closing activity

Appendix B

Self-reflection form
Name: ______________________________________________ Date: ____________
Lesson focus: ________________________________________ Class: ____________
Lesson objective: ______________________________________________________
Please fill this in AFTER your class, BEFORE your peers’ feedback.
How I feel about my assessment practice:



What were the objectives of the lesson? How well did my assessment practice help in achieving this objective? How well did my assessment practice help in measuring the achievement of this objective?





What worked well?



What were the challenges in implementing my planned assessment tasks?



Questions I want to ask my observers about my assessment practice:




Fill this in AFTER feedback:
The key ideas I am taking away from today’s feedback and/or what I’ve learned about assessment:

Appendix C

Interview Questions
How does the LAL teacher training program affect or influence your LAL?
What are the impacts of the LAL development program on your assessment practice?
Why do you think this LAL training program will impact your LAL?
How did the learning work community help you develop your LAL?
How did self-reflection help you develop your LAL?
What barriers, if any, prevent you from implementing what you have learned in the training in your classroom?

Appendix D

Observation Form
Fieldnotes From Classroom Observations
Observational Fieldnotes—Teachers’ LAL in the EFL classroom
Observer: ____________
Teacher’s code: ____________
Grade: ____________
Number of students: ____________
Time & Date: _______________________
Length of Observation: ____________
Lesson title and objectives: _____________________________________________________________________________________________________________________.
Description of teacher’s assessment practice Reflective notes (insights, hunches, themes)









Note. This form was adapted from the sample designed by Creswell (2012, p. 216).

Appendix E

Coding
CodesSub-ThemesThemes
Teachers’ LAL perceptions before and after the PD programLimited knowledge and skills before PD programLimited assessment background
Narrow conceptions of assessment
Limited assessment practices
Inadequate skills to design assessment tasks
Overdependence on ready-made assessment materials (mainly teacher’s guide and textbooks)
Low confidence in assessment construction
Low self-efficacy as assessors
Limited understanding of teacher’s role as assessors
Broadened teachers’ LAL after the PD programExpanded LAL knowledge base
Awareness of personal LAL level
Evolving assessor identity
Recognition of ongoing development
The impacts of the program on teachers’ assessment practiceConsidering the constructConstruct-based assessment design
Improving in selecting and adapting ready-made tasks
Conceptual understanding of construct
Shift in assessment practiceMoved from traditional to formative assessment
integration of assessment with instruction
Various types of feedback
Scaffolded tasks
Student engagement in assessment
Assessing language skills after the programComprehensive language skill assessment
Integrated skill assessment tasks.
Diversification of language assessment methods
Tailoring assessments to student learning needs
Assessment differentiation based on proficiency
Develop personalized assessments
Responsiveness to learner learning needs
Confidence and autonomy in assessment designReduced reliance on textbook/teacher’s guide
Increased assessment self-efficacy as assessors
Identity as assessment designer
Informed decision making
Challenges teachers faced in practicing their LALClass sizesClass sizes challenges Limited individual attention
Reduced assessment variety
Time-consuming assessment
Classroom management difficulties
Time constraintsInsufficient class time
Incomplete syllabus coverage
Resource availabilityInsufficient resources
Lack of listening materials Limited access to technological resources
Institutional support issues
Local practices and sociocultural factorsEducational policies
Cultural attitudes toward language learning
Parental expectations
Diverse language proficiencyMixed proficiency level
Assessment design challenge
Implementation barriers
Factors or components boosting the effectiveness of the PD programRelevant contentContent tailored to teacher needs
Context-specific
Alignment with recent needs analysis
Self-reflectionInvolve teachers in evaluating their assessment practices
Identify strengths and weaknesses
Increase self-awareness of LAL knowledge and skills
Help teachers make improvement decisions
Promote professional growth
Learning communitySharing knowledge and experience
Collaborative problem-solving
Peer support and feedback
Discuss assessment challenges and solutions
Bridging theory and classroom practice
Demonstration and feedbackHands-on practice
Feedback from tutor and peers
Practice new skills
Apply learning in context
Improve assessment methods
Opportunity to apply new knowledge
Guidance on how to improve their assessment methods and ensure effective learning.
Practice during the PD program was beneficial in developing their LAL skills.
Take responsibility for planning and implementing activities effectively.
Assessment samplesExposure to model assessment tasks
Learning through analysis of samples

References

  1. Ahmed, S. T. S., & Qasem, B. T. A. (2019). Problems of teaching and learning English as a foreign language in South Yemen: A case study of Lahj Governorate. ELS Journal on Interdisciplinary Studies in Humanities, 2(4), 485–492. [Google Scholar] [CrossRef]
  2. Al-Akbari, S., Nikolov, M., & Hódi, Á. (2025). EFL teachers’ language assessment literacy training needs. Social Sciences & Humanities Open, 11, 101254. [Google Scholar] [CrossRef]
  3. Al-Jaro, M., Asmawi, A., & Abdul-Ghafour, A. Q. K. (2020). Supervisory support received by EFL student teachers during practicum: The missing link. International Journal of Language and Literary Studies, 2(4), 22–41. [Google Scholar] [CrossRef]
  4. Al-Sohbani, Y. A. (2013). An exploration of English language teaching pedagogy in secondary Yemeni education: A case study. International Journal of English Language & Translation Studies, 1(3), 41–57. Available online: https://www.researchgate.net/profile/Yehia-Al-Sohbani/publication/285884016_An_Exploration_of_English_Language_Teaching_Pedagogy_in_Secondary_Yemeni_Education_A_Case_Study/links/56641dd408ae418a786d338b/An-Exploration-of-English-Language-Teaching-Pedagogy-in-Secondary-Yemeni-Education-A-Case-Study.pdf (accessed on 30 May 2025).
  5. Baker, B. A., & Riches, C. (2018). The development of EFL examinations in Haiti: Collaboration and language assessment literacy development. Language Testing, 35(4), 557–581. [Google Scholar] [CrossRef]
  6. Berry, V., Sheehan, S., & Munro, S. (2019). What does language assessment literacy mean to teachers? ELT Journal, 73(2), 113–123. [Google Scholar] [CrossRef]
  7. Bøhn, H., & Tsagari, D. (2021). Teacher educators’ conceptions of language assessment literacy in Norway. Journal of Language Teaching and Research, 12(2), 222–233. [Google Scholar] [CrossRef]
  8. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. [Google Scholar] [CrossRef]
  9. Brindley, G. (2001). Language assessment and professional development. In C. Elder, A. Brown, N. Iwashita, E. Grove, K. Hill, T. Lumley, T. McNamara, & K. O’Loughlin (Eds.), Experimenting with uncertainty: Essays in honour of Alan Davies (pp. 126–136). Cambridge. [Google Scholar]
  10. Cohen, L., Manion, L., & Morrison, K. (2018). Research methods in education. Routledge. [Google Scholar]
  11. Creswell, J. W. (2012). Educational research: Planning, conducting, and evaluating quantitative research (4th ed.). Pearson. [Google Scholar]
  12. Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3), 327–347. [Google Scholar] [CrossRef]
  13. Davison, C. (2023). Assessment literacy: Changing cultures, enculturing change in Hong Kong. Chinese Journal of Applied Linguistics, 46(2), 180–197. [Google Scholar] [CrossRef]
  14. Duff, P. (2018). Case study research in applied linguistics. Routledge. [Google Scholar]
  15. Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9(2), 113–132. [Google Scholar] [CrossRef]
  16. Gan, L., & Lam, R. (2020). Understanding university English instructors’ assessment training needs in the Chinese context. Language Testing in Asia, 10(1), 11. [Google Scholar] [CrossRef]
  17. González, E. F. (2021). The impact of assessment training on EFL writing classroom assessment. Voices of Mexican university teachers. Profile: Issues in Teachers’ Professional Development, 23(1), 107–124. [Google Scholar] [CrossRef]
  18. Inbar-Lourie, O. (2008). Constructing a language assessment knowledge base: A focus on language assessment courses. Language Testing, 25(3), 385–402. [Google Scholar] [CrossRef]
  19. Kremmel, B., & Harding, L. (2020). Towards a comprehensive, empirical model of language assessment literacy across stakeholder groups: Developing the language assessment literacy survey. Language Assessment Quarterly, 17(1), 100–120. [Google Scholar] [CrossRef]
  20. Lan, C., & Fan, S. (2019). Developing classroom-based language assessment literacy for in-service EFL teachers: The gaps. Studies in Educational Evaluation, 61, 112–122. [Google Scholar] [CrossRef]
  21. Levi, T., & Inbar-Lourie, O. (2020). Assessment literacy or language assessment literacy: Learning from the teachers. Language Assessment Quarterly, 17(2), 168–182. [Google Scholar] [CrossRef]
  22. Liu, J., & Li, X. (2020). Assessing young English learners: Language assessment literacy of Chinese primary school English teachers. International Journal of TESOL Studies, 2(4), 36–50. Available online: https://www.tesolunion.org/attachments/files/0YMIWDY2E0DY2NK7YTC30MJBK1ZDJJ0ZDE22NDAWCZJEWCMJVI1OGNM0MDA39NGZH2NTVMBZTQ57NJC0DOWEW3MDY53LJC04MZGX1NDGX5LJYZ.pdf (accessed on 1 July 2025).
  23. López, A., & Bernal, R. (2009). Language testing in Colombia: A call for more teacher education and teacher training in language assessment. Profile: Issues in Teachers’ Professional Development, 11(2), 55–70. [Google Scholar]
  24. Mohdar, H. S. A., & Pawar, T. M. (2020). History of education system and teaching English language in Yemen. Literary Endeavour, XI(1), 28–32. [Google Scholar]
  25. Muhammad, F. H. N., & Bardakçı, M. (2019). Iraqi EFL teachers’ assessment literacy: Perceptions and practices. Arab World English Journal, 10(2), 431–442. [Google Scholar] [CrossRef]
  26. Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The science and design of educational assessment. National Academies Press. [Google Scholar]
  27. Republic of Yemen, Ministry of Education. (2008, November 25–28). The development of education in the Republic of Yemen: National report. 48th Session of the International Conference on Education, Geneva, Switzerland. Available online: https://planipolis.iiep.unesco.org/2008/yemen-national-report-48th-session-international-conference-education-ice-inclusive-education (accessed on 21 March 2025).
  28. Saputra, E., Abdul Hamied, F., & Suherdi, D. (2020). The development of beliefs and practices of language assessment literacy: Does a professional learning community help? Journal of Education for Teaching, 46(3), 414–416. [Google Scholar] [CrossRef]
  29. Stiggins, R. J. (1991). Assessment literacy. Phi Delta Kappan, 72(7), 534–539. [Google Scholar]
  30. Taylor, L. (2013). Communicating the theory, practice and principles of language testing to test stakeholders: Some reflections. Language Testing, 30(3), 403–412. [Google Scholar] [CrossRef]
  31. Tsagari, D. (2016). Assessment orientations of state primary EFL teachers in two mediterranean countries. Center for Educational Policy Studies Journal, 6(1), 9–30. [Google Scholar] [CrossRef]
  32. Tsagari, D., & Armostis, S. (2025). Contextualizing language assessment literacy: A comparative study of teacher beliefs, practices, and training needs in Norway and Cyprus. Education Sciences, 15(7), 927. [Google Scholar] [CrossRef]
  33. Tsagari, D., & Vogt, K. (2017). Assessment literacy of foreign language teachers around Europe: Research, challenges and future prospects. Papers in Language Testing and Assessment, 6(1), 41–63. [Google Scholar] [CrossRef]
  34. Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Findings of a European study. Language Assessment Quarterly, 11(4), 374–402. [Google Scholar] [CrossRef]
  35. Weng, F., & Shen, B. (2022). Language assessment literacy of teachers. Frontiers in Psychology, 13, 864582. [Google Scholar] [CrossRef]
  36. Xu, Y. (2019). English language teacher assessment literacy in practice. In X. Gao (Ed.), Second handbook of English language teaching (pp. 517–539). Springer. [Google Scholar] [CrossRef]
  37. Xu, Y., & Brown, G. T. (2016). Teacher assessment literacy in practice: A reconceptualization. Teaching and Teacher Education, 58, 149–162. [Google Scholar] [CrossRef]
  38. Yamtim, V., & Wongwanich, S. (2014). A study of classroom assessment literacy of primary school teachers. Procedia–Social and Behavioral Sciences, 116, 2998–3004. [Google Scholar] [CrossRef]
  39. Yan, X., & Fan, J. (2021). “Am I qualified to be a language tester?”: Understanding the development of language assessment literacy across three stakeholder groups. Language Testing, 38(2), 219–246. [Google Scholar] [CrossRef]
  40. Yeşilçınar, S., & Kartal, G. (2020). EFL teachers’ assessment literacy of young learners: Findings from a small-scale study. Journal of Theoretical Educational Science, 13(3), 548–563. Available online: https://dergipark.org.tr/en/pub/akukeg/article/639234 (accessed on 1 July 2025).
  41. Yin, R. K. (2003). Case study research and applications. Sage. [Google Scholar]
  42. Zhang, J., Yu, G., & Browne, W. (2025). Teachers’ language assessment literacy: Exploring its construct and contextual factors. Studies in Educational Evaluation, 87, 101525. [Google Scholar] [CrossRef]
  43. Zulaiha, S., Mulyono, H., & Ambarsari, L. (2020). An investigation into EFL Teachers’ assessment literacy: Indonesian teachers’ perceptions and classroom practice. European Journal of Contemporary Education, 9(1), 189–201. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.