A Toolkit for the Investigation of Greek EFL Teachers’ Assessment Literacy

: The role of assessment in a learner-centred environment is considered to be signiﬁcant for both learners and teachers. Most of the time, however, it is used in traditional ways and ignores learners’ individual needs. Based on the results of a survey conducted in 2019, in which a questionnaire was administered to a hundred and twenty EFL teachers, the present study aims to investigate Greek EFL teachers’ responses to communicative testing techniques and their awareness of assessment methods and principles. The aforementioned survey revealed that the majority of EFL teachers in the Greek educational context use traditional tests to assess their students and, although they are aware of alternative assessment methods and the beneﬁts they offer, they fail to employ them. Thus, a 106-item tool was created in order to help teachers design, develop, and critically evaluate tests, as well as reﬂect on their assessment techniques to promote the use of alternative assessment and supplement the teachers’ theoretical knowledge and experience. Ninety-three EFL teachers evaluated themselves and rated their practices through the toolkit to ﬁnd out the type of assessors that they are. The ﬁndings revealed that a lot of the participants are aware of the key principles of assessment and try to assess the four skills in a communicative and authentic way to a great extent, but most of them are mainstream assessors. The ﬁndings can be used to help design samples of authentic tasks for all skills and assessment-related teacher training material.


Introduction
Fundamentally, assessment is an integral part of the learning process. It is interwoven with teaching and learning, and involves making judgments about learners (Nunan 1990) and monitoring their development (Hedge 2000) in order to assess their needs and tailor instruction to optimize learning. As McNamara and Roever (2006) assert, language testing in education dictates what is to be taught, what is to be valued in instruction, and what becomes the focus of activity (Swan et al. 2006). This shows that assessment plays an important role in many people's lives (McNamara 2000), and teachers therefore need to be "competent in the principles and practice of language assessment" (Harding and Kremmel 2016).
Traditional summative assessment or standardized testing that aims to measure the students' ability and knowledge (Brown 2003), using product-oriented techniques, seems to be inadequate at measuring ongoing student development. In many cases, "teaching to the test" (Bowers 1989) in order to allocate a mark overlooks other parameters, such as the learner profile, individual needs and preferences (Tsagari 2004), and lacks authenticity and contextualization with a negative washback effect on learning. Moreover, it has a negative washforward effect, since product-oriented teaching does not prepare learners for real-life situations (Widdowson 1976). In particular, more and more EFL learners in Greece are interested in acquiring a language certificate (Papageorgiou 2009), which leads to exam-oriented teaching based on the final product and not the process of learning.
Considering this, assessment in public and private language schools in Greece seems problematic and action needs to be taken to promote teachers' professional development in alternative assessment and communicative testing. Dissatisfaction caused by the limitations of traditional testing has paved the way for alternative assessments that encourage metacognition, reflection, and self-directed learning, and which can be integrated with instruction, emphasizing both the product and the process of language learning (Chirimbu 2013). Contrary to traditional practices, alternative assessment and communicative testing can be used as a means of reflection and portraying advancement or lack thereof (Baker 2016). Communicative, process-oriented curricula, in conjunction with alternative methods for collecting information and student-centered ways of assessment, emphasize the importance of integrating assessment with instruction. There are a variety of alternative methods to assess continuous student progress and address the problems with standardized tests (Griva and Kofou 2017). Alternative/formative assessments can be used as aids in the learning process or as decision springboards for the steps that follow instruction, together with self-and peer-assessment (Bøhn and Tsagari 2021), to describe ongoing student-related information, as well as to make evaluative decisions (Brown 2003;Ioannou-Georgiou and Pavlou 2003;Tsagari 2011).
"Assessment literacy", a relatively new term coined by Stiggins (1991), refers to how literate teachers are in regarding what, why, and how they assess in order to generate "good examples of student performance" (p. 240). Alternatively, Popham (2018) describes this concept as "an individual's understanding of the fundamental assessment concepts and procedures deemed likely to influence educational decisions" (p. 2). Clearly, assessment literacy can empower teachers (Grabowski and Dakin 2014) who need to be aware of the assessment purpose and tools they use, the testing conditions, and the utility of the learners' results, as well as the importance of their decision making (Inbar-Lourie 2008). The research conducted by López and Bernal (2009) indicates that trained language teachers use different practices of assessment to improve teaching and learning, whereas teachers with no training in language assessment used assessment as a way to solely obtain grades. Thus, teachers need to be literate in assessment and understand their critical role in the assessment process. Added to this, similar studies that have been conducted to find out the participants' training in assessment (Vogt and Tsagari 2014) point out that teachers are not equipped with sufficient knowledge on testing and assessment and commonly regard assessment as an activity separate from teaching, equal to allocating a grade or score. As Herrera and Macías (2015) claim, teachers need to "have a working knowledge of all aspects of assessment to support their instruction and to effectively respond to the needs and expectations of students, parents, and the school community" (p. 303). Therefore, appropriate pre-service and in-service training needs to take place to offer sufficient education in language assessment that will help teachers employ more effective means of assessment, taking into account the fact that alternative assessment methods have been included in the evaluation process of public schools (Government Gazette 140 2021).
With this in mind, and based on the results of a survey (Gkogkou 2019) conducted in 2019 as part of one of the authors' master's degree in the Hellenic Open University (https://apothesis.eap.gr/bitstream/repo/42987/1/103390_GKOGKOU_EIRINI.pdf, accessed on 22 October 2021), by means of a questionnaire administered to a hundred and twenty EFL teachers, in conjunction with structured interviews, the present study aims to investigate Greek EFL teachers' responses to communicative testing techniques and their awareness of assessment methods and principles. Specifically, it was revealed that the majority of EFL teachers in the Greek educational context use traditional tests to assess their students and, despite the fact that they are aware of alternative assessment methods and the benefits they offer, they fail to employ them (Gkogkou 2019). It was also shown that most teachers resort to discrete-point testing items, which test language in a rather fragmentary way and focus on language competence and usage rather than use, and not in the context of authentic real-life tasks, which require a full, authentic task environment and promote integrative language (Gkogkou and Kofou 2020). Thus, there was a tool created in order to help the abovementioned teachers design, develop, and critically evaluate tests, as well as reflect on their assessment techniques to promote assessment literacy and supplement the teachers' theoretical knowledge and experience. The purpose of this study is to implement the tool and measure the teachers' awareness of language assessment. The toolkit is constructed based on specific criteria, according to relevant literature (Alderson and Banerjee 2001;Bachman and Palmer 1996;West 2004), and aims to urge teachers to reflect on their assessment methods. Moreover, the toolkit serves as a guide to promote the use of alternative assessment and authenticity in teaching, learning, and assessing in a foreign language classroom, since "educational communities lack empirical evidence about the value of many influential assessment instruments" (Alderson and Banerjee 2001). The tool can be used by pre-service and in-service teachers as a guide and as a self-evaluation instrument to encourage teachers to rethink their roles and develop professionally.

Materials and Methods
When the respondents to the aforementioned survey (Gkogkou 2019) were asked if they would be open to using an assessment instrument with specific criteria for evaluating and designing tests, more than 80% were positive, and more than 90% were willing to apply alternative assessment forms if they were supplied with appropriate materials and guidance. Thus, the researchers' consequent aim was to develop an assessment tool to supplement the teachers' theoretical knowledge and experience and enhance the selfassessment procedure. The toolkit was based on the research conducted in 2019, in which a questionnaire was administered to a hundred and twenty EFL teachers, in conjunction with structured interviews, which investigated the beliefs, perceptions, and practices of EFL teachers regarding assessment, as mentioned previously. This present paper aims to implement the previously constructed toolkit in order to explore the beliefs and practices of EFL teachers regarding assessment, focusing on the purpose, forms, and processes used; that is, whether the teachers' assessment methods actually involve the use of authentic and alternative forms of assessment, or whether more traditional practices are preferred. Similarly, Greek EFL teachers' responses to communicative testing techniques and their awareness of assessment methods and principles are also investigated. At the same time, the toolkit that we created serves as a measurement tool and a research instrument designed to measure the language assessment knowledge of the EFL teachers who have been participating in the survey. It aspires to help teachers design, develop, and critically evaluate tests, as well as help them reflect on their assessment techniques. Subsequently, it can be used by pre-service and in-service EFL teachers to raise their awareness of language assessment and can support them to make informed decisions when assessing students' learning. It can also be regarded as a teachers' guide and self-evaluation tool, which can be used several times during the school year in order to observe and reflect on their development and growth.
The checklist was developed over three stages. First, we collected an initial set of items based on West's (2004) existing recommendations for constructing and administering tests. Based on these recommendations, we developed the assessment criteria that were afterwards grouped into categories and divided into two parts. Finally, clear language descriptions were used to provide objective evaluations of the three different types of assessors.
The toolkit (Appendix A) is divided into two parts and can be regarded as a form of self-assessment for the teachers. Determining what teachers do or do not know with regard to language assessment was the starting point of the study, with the aim of encouraging them to become more assessment literate. Therefore, the first part of the toolkit aims to provide both novice and experienced teachers with input that concerns the basic principles of language assessment in order to help them understand how tests are constructed. Teachers can review or familiarize themselves with fundamental principles of language testing for describing, categorizing, and evaluating published tests, as well as designing their own tests. Specifically, the first part consists of 130 randomly ordered criteria, which are provided in the form of a checklist and can be used to evaluate and rate the teachers' practices. Furthermore, key assessment principles, such as authenticity, reliability, validity, practicality, washback and washforward effects, feedback, and reflection, are included and investigated in this part, based on the relevant literature, as mentioned above. For example, according to Bachman and Palmer (1996, p. 18), a model of test usefulness should include qualities such as reliability, construct validity, authenticity, interactiveness, impact, and practicality, which can also apply to alternative assessment techniques.
The second part deals with testing each of the four communicative skills (writing, speaking, reading, and listening), each by itself or in combination with others (OECD 2021), using a checklist that shows teachers what to take into account when assessing the students' receptive and productive skills in a communicative and integrated way (Fulcher 2000(Fulcher , 2012. It is divided into four sections, one for each skill, and includes specific criteria that can help language teachers evaluate tests. The distinction of the four skills that is usually drawn in large-scale standardized testing and textbooks, however, does not invalidate the integration of skills that is desirable both in classroom and testing settings, which is also evident from the fact that many of these criteria may overlap. These criteria refer to text and task authenticity, the types of tasks and processes, and rating. Teachers can take these criteria into consideration when exploring the testing of all four skills to understand how tests are constructed, which will also aid them to design their own assessment tasks. Thus, teachers will be equipped with useful insights into test evaluation and design that will help them design their own tests for assessing learners in more authentic and communicative ways. To that end, the toolkit (Appendix A) was practised by ninety-three EFL teachers out of the one hundred and twenty who participated in the original survey, corresponding to 77.5% of them and very close to the 80% of the EFL teachers who had stated their willingness to apply the instrument. Actually, random sampling was used to ensure the generalizability of the findings (Marshall 1996) by minimizing the possibility for bias and increasing the credibility of the results (Patton 2002). Thus, the sample included English language teachers who either work in the private or public sector and who are university degree or simply C2 language certificate holders (93.5% female, 21-50 years old, 60% with a master's degree, 20% with a bachelor's degree). The aim was to understand the criteria that should be assigned to testing in order to become better prepared for assessing learners, and therefore the present study focuses on the investigation of their attitude towards the integration of the particular assessment toolkit. It aims to indicate the methods, techniques, and types of traditional and alternative assessment that teachers use and outline the criteria that they take into account when assessing learners. The criteria aim to unfold their beliefs about assessment, concerning the purposes, reasons, and types of assessment they employ, as well as the feedback they provide to students. By using the aforementioned toolkit, the survey participants evaluate themselves and respond on the basis of a three-point scale (yes/to some extent/no). After responding, the answers are analyzed by using a simple formula as follows: P = f/n × 100% P = Percentage f = Frequency n = Number of questions By using the aforementioned toolkit, with no psychometrics involved, the teachers evaluated themselves via the three-point checklist, reflected on their assessment methods and rated their practices to find out the type of assessors that they are. The final aim was to encourage and invite teachers to use the specific criteria in their future assessment practices.

Results
The population of both the initial survey and this present study was important for the reliability and validity of the research results in order to draw robust conclusions. For this reason, the group of the participants was carefully defined. We used purposeful sampling so that we could identify and select "information-rich cases", in line with the research purpose (Patton 2002). On the one hand, the participants were knowledgeable and experienced with the subject of interest, while on the other hand, random sampling was used to ensure the generalizability of the findings (Marshall 1996).
In the present study, the assessment toolkit was dispensed to ninety-three EFL teachers, who either work in the private or public sector. The data collected after evaluating themselves and rating their practices through the toolkit offered insights into the type of assessors that the majority of teachers are. The findings revealed that a lot of the participants are aware of the key principles of assessment and try to assess the four skills in a communicative and authentic way to a great extent. However, the majority of them are mainstream assessors who feel that while they should be using authentic assessment more often, there are reasons that keep them back.
The findings can be used to help design samples of authentic tasks for all skills, but also for teacher training in assessment modes and teacher professional development programmes related to language assessment.

Principles of Testing
The first part of the questionnaire dealt with the principles of testing applied to the respondents' tests. In particular, the majority of the participants believe that the testing content is similar to the teaching content (85%). Specific teachers assume that testing promotes autonomous and self-directed learning and learner-centred assessment with a clear purpose, including tasks that suit the learners' abilities. However, skills are not highly integrated, and students' motivation and involvement are not triggered. It also seems that assessment is not regarded as a shared responsibility, and students cannot express their opinion on how they will be assessed, nor are they given a choice of assessment tasks (Table 1). Significantly, in Greek public schools, the introduction of the item bank will likely lead to rather predictable close-ended assessment tasks (Government Gazette 111 2020). Regarding authenticity (Table 2), almost all of the participants, to some or great extent, put emphasis on communicative language, believing that the language in the tests is natural, the test items are contextualized and emphasize the communicative view of the language, and that the topics and situations are interesting, enjoyable, humorous, and relevant to the learners' age and level. About four out of ten participating teachers think that the student's ability to apply knowledge to real-life problems is tested and that realworld situations and processes, useful for everyday life, are replicated. The task rubrics are contextualized to offer a more realistic and communicative view of the language, but genuine materials, found in the real world and not for testing purposes, or some thematic outline, are not used to a great extent. It is evident, therefore, that assessment practices do not follow communicative teaching practices to a great extent as EFL teachers allege, probably because testing is seen as summative assessment and related to assigning a grade and not for spotting learners' weaknesses and needs. Regarding validity (Table 3), more than 90% of the participants believe that the test guidelines are clear, the timing of the test is appropriate, students are notified in advance, and appropriate review and preparation for the test is offered. About eight out of ten teachers think that, before the test, beneficial strategies are frequently suggested, the learning objectives are identified in the assessment practices and appropriately represented in the test, and that the test's difficulty level is appropriately pitched. According to the two-thirds of the participants, assessment is tied significantly to curricular practices and supports the goals and objectives of the syllabus, using methods that measure what needs to be measured, with assessment types being explained, and with no surprises. A percentage of 66.7% believe that the structure of the test is challenging enough to match students' performance and therefore motivate them. More than half of the respondents allege that regarding test specifications (e.g., time allocated to each skill) (53.8%) were considered and advice was offered by a colleague on improvement of the test (57%-no, 19.4%-to some extent). Thus, piloting a test or sharing it with a colleague would probably increase the validity of a test. This is an action followed in the item bank tasks (Government Gazette 111 2020), which are reviewed by two assessors before they are uploaded onto the platform (http://www.iep.edu.gr/el/anazitisi-thematon, accessed on 22 October 2021).
Reliability (Table 4) seems to be taken into account, since all of the participants make sure that every student has a cleanly photocopied test sheet (100%), sound amplification is clearly audible to everyone in the room (95.7%), and objective scoring procedures are used (90.3%). However, writing and speaking rating scales and assessment criteria to reduce subjectivity are used by half of the respondents (51.6%), although teachers seem to be quite well trained and competent (see also Vogt and Tsagari 2014) and a lot of them should be acquainted with rating scales when preparing learners for certification. Since rating scales and rubrics are also part of alternative and descriptive assessment (Griva and Kofou 2017), they could be the focus of attention in teacher training seminars and workshops, as they are important for increasing reliability.  Practicality in testing (Table 5) also seems to be taken into consideration by the majority of the participating teachers. Specifically, more than 90% assert that they know how the test will be marked, that the materials and equipment are ready in advance, that the cost of the test is within budgeted limits, and that students can complete the test within the set time frame. Furthermore, more than three-quarters think that they know how the results will be reported (87.1%), that students are informed about task marking (73.1%) and assessment criteria (76.3%), that the scoring/evaluation system of the test is feasible in the teacher's time frame (84.9%), and that any administrative details are established before the test (74.2%).
Regarding the washback effect (Table 6), 86% of the participants believe that the test tasks are related to teaching and learning, but that the results are not very promising for the washforward effect. In particular, 65.6% of the teachers ask their students to use the test results as a guide for setting goals for their future effort, half of them (49.5%) assume that learners acquire strategies and necessary life-long learning skills in tasks that emphasize communication, and only one-third believe that the test is forward looking (33.3%) and satisfies the learner's communicative needs with tasks that have real-world applications that might be encountered in real life. That means that the test is basically assigned for allocating a grade and not for preparing students for real-life problem-solving settings, depriving them of skills that are highly requested in the 21st century. As far as feedback is concerned (Table 7), nine out of ten participants (90.3%) encourage students to improve their learning processes, give them guidance and assistance in their learning, and discuss the assignments with them in order to help them understand the content better (87.1%). About seven to eight out of ten teachers inform students about their strong points concerning learning (80.6%) and discuss with them how to utilize their strengths to move forward (73.1%), discuss the progress that they have made (81.7%), and comment on the students' test performances (69.9%). They also make a list of the weak points (71%) and discuss them in a class conference, and, after the assessment, they inform students on their weak points concerning learning and consider ways on how to improve them together (73.1%). However, only six out of ten teachers give students a chance to report on their own feedback and seek clarification of any issues that are fuzzy (68.8%) and to set new and appropriate goals for themselves in the future, and less than half of them give more than a number, grade, or phrase as feedback when returning students' tests (49.5%) or discuss the answers given with each student (45.2%). Teachers' competence in providing feedback seems rather high, at least on a class level, although in Vogt and Tsagari's (2014) study, training in this field does not appear to be as high and there is an expressed demand for more training. The last principle of testing considered is reflection (Table 8), which is rather undervalued. About 65% of the participants ensure that students know what they can learn from their assessment (67.7%) and encourage students to reflect on their learning processes (67.7%) and how they can improve their performance, or ask students to indicate what went well and what went badly concerning their assessment (63.4%). Only 41.9% ask students how they think they are doing while working on their assignments. The above data make one consider that the implementation of alternative assessment practices can develop learners' ability to reflect, especially by using a diary (Kofou 2017, p. 357).

Assessing Skills
The second part of the questionnaire concerned the assessment of skills. Regarding reading (Table 9), it seems that the focus is on examining the students' reading skills, integrated with grammar and vocabulary use (71%), integrating higher-and lower-order skills, taking into account the interactive nature of reading (62%), and incorporating objective-integrative techniques that pay attention to processes of reading, such as inference, completion, and construction (61.3%). Only half of the respondents affirm that they use third-generation communicative tasks, such as information transfer, multiple matching, and modified cloze tasks (55.9%), or simulations, real-life-based activities, and problemsolving activities (50.5%), which test global comprehension and are authentic in purpose, or learner-friendly and learner-centred activities (50.5%) that involve the reader in a reencoding process. About four out of ten are positive to text authenticity (44.1%), i.e., real-life text, written for a real-world purpose, with the source and text-type identified to the reader, or tasks that focus on the process and the 'how', rather than a final product and the 'what' of the tasks (49.5%). Thus, in order to promote authenticity in testing and prepare learners for real-life tasks, teachers have to be trained, obtain support through communities of practice, and share materials that could be used for testing practices. Regarding authenticity in listening testing items (Table 10), half of the respondents use simulations, real-life-based activities, and problem-solving activities (55.9%), as well as top-down and bottom-up processing (52.7%) in order for the students to understand both the overall and specific meanings, or use an authentic source (48.4%) to avoid unauthentic, contrived language. Just over half include productive tasks, with students tested objectively using reliable testing techniques (51.6%), as they focus on the process and the 'how' (47.3%), rather than on the final product and the 'what' of the tasks. Third-generation testing tasks, such as information transfer, multiple matching, and modified listening cloze tasks, which are learner-friendly and learner-centred and involve the reader in a re-encoding process, are used by 45.2%, while hard-focus, extended listening activities, which require selective listening to gather specific information and listen with a purpose in mind, are included by 30.1% only, and tasks which entail interpretation rather than asking students to identify points and extract specific information are used by 20.4%. Thus, authenticity in testing listening is included only to some extent and needs to be enhanced for learners to be able to cope with a variety of authentic communicative situations and feel more confident when encountering different types of listening tasks. On the contrary, writing (Table 11) is tested in a more authentic way, since the writer is aware of the writing purpose, the register to be used, and the audience that they are addressing (90.3%). About 65% to 75% of the participants believe that when assessing writing, the context is pre-defined (73.1%), the task is communicative, involving the learner in meaningful, forward-looking communicative situations (75.3%), and that the learner needs to exhibit useful language skills that may be needed in a real-world context (69.9%). Guidance is also provided (65.6%), with notes given to the learner in order to guide the content and the lexical elements of the language, and a full task environment is given (66.7%). About 60% of teachers stated that the writer is involved in a purposeful situation (60.2%) because they adopt a realistic role with a real-life outcome and a realistic output text (giving the product authenticity), and that, in this way, the tasks are authentic in purpose. In contrast, a lower percentage of teachers used global and analytic rating scales that avoid impressionistically scoring essays (impressionistic scoring has low reliability), reduce subjectivity, and increase reliability (47.3%). A similar number used tasks that focus on the process and the 'how', rather than on the final product and the 'what' of the tasks (45.2%), while fewer ensured text authenticity with the use of real-world sources and genuine input texts, not written for language teaching (41.9%). Fewer still included mediation, relaying information from an authentic Greek text into English (33.3%). It seems that writing is tested in a more authentic way than other skills, but the need for training teachers on how to use rating scales also emerges.
Finally, speaking (Table 12) seems to be tested in a partially authentic way. About six to seven out of ten teachers acknowledge that the learner is involved in a purposeful conversation within a given context (73.1%). Interactive, guided tasks that give weight to communication are used by 73.1%, and the stated aim of the tasks is to elicit authentic language (73.1%) that can be used in non-test situations and real-world tasks. Moreover, during the task, the speakers exchange information and communicate ideas for normal purposes, using spontaneous and unplanned language to negotiate meaning (68.8%), and simulations, real-life-based activities, problem-solving activities, as well as information-gap techniques, are included (68.8%), in which the interactive nature and unpredictability of the spoken language is ensured (59.1%). About half of them believe that the tasks they use focus on the process and the 'how', rather than on the final product and the 'what' (52.7%). Real-world sources and authentic visual input are used by 50.5%, guidance is provided in the form of given notes by 49.5%, and global and analytic performance scales are employed to reduce subjectivity and increase reliability by 53.8%. With regard to text authenticity, only 43% say that the input and prompts are not simplified so that tasks are authentic in purpose and context.

Categorization of Teachers
The application of the toolkit (see Appendix A) gave the participating teachers the opportunity to test how authentic an assessor they are. In particular, they evaluated themselves and responded on the basis of a three-point scale (2 points for each YES, 1 point for each TO SOME EXTENT, 0 points for each NO), and then converted their score to a percentage (Part I: score/130 × 100 = . . . . . . . . . %; Part II score/86 × 100 = . . . . . . . . . %). The categorization more or less followed the categorization patterns of EFL exam certification, taking into account that 60% is the passing score, and a score of 90% is a distinction.
The three different profiles of assessors that were created were based on the assessment activities and methods preferred by the teachers. The teachers' assessment processes, actions, and activities used determine the kind of assessors that they are. In other words, the characteristics they share and the differences they have help establish the different profiles. Added to this, their beliefs about assessment and how they relate to the practical side of assessing is what led the researchers to distinguish and establish the three different models of assessors, ranging from alternative to non-enthusiastic.
In Part I, those that score 91% and above are considered to be Alternative Assessors (TEACHER A). Most teachers of this profile use different types of assessment. Teachers who belong to this category devise their own tasks and tests to help students develop their independence. They use assessment as a way for learning and as guidance for the next steps. They may also feel it is very important to use alternative, authentic, and thirdgeneration assessment as much as possible by adhering to the key principles of assessment. They probably prefer the sort of language assessment that offers a more realistic and communicative view of the language, for example replicating real-world processes using genuine unaltered materials. This is often the sort of language assessment that they do in class and may be able to increase its benefits with continuous reflection and by promoting active student involvement.
Those that score 61-90% are considered to be Mainstream Assessors (TEACHER B). Teachers who score close to average belong to this profile. They may find that they do not fall exactly into either of the other categories (alternative and non-enthusiastic), which makes them a mixture. They tend to combine different ways at different times depending on the situation and what they are doing. As a result, they use both test-based and alternative assessments for different purposes. They may sometimes feel, however, that they should be using more authentic ways of assessment, but there are probably reasons that keep them back. Mainstream Assessors need to find more time to learn and be more self-critical. If they become more aware of the reasons why they avoid using alternative, authentic, and third-generation assessment more often, they may find it easier to do something about them. Following the principles of testing will also help them increase the benefits of their assessment.
Finally, those that score up to 60% are Non-enthusiastic Assessors (TEACHER C). Teachers in this category use assessment in a less diverse way. They mainly use the information that they get from assessment to grade students' performance using more traditional methods. This can be identified as collecting information and providing feedback through grading. Assessment for Non-enthusiastic Assessors relies a lot on reproduction and memorization by focusing on form and accuracy. Perhaps they should be less preoccupied with assessing student knowledge, and focus more on learning as well. This score does not mean that they are not good language assessors, but there is always room for improvement. Perhaps this is the first time that they have thought about the way they assess learners. A good starting point would be to adopt the principles of language assessment and offer a choice of assessment methods. Knowing more about this and receiving adequate training can be very useful in helping them become more effective language assessors.
In Part II, those that score 71% and above are considered to be Alternative Assessors (TEACHER A). Teachers of this profile try to involve learners in communicative, realworld, guided tasks or simulated authentic situations that reflect genuine communication.
Teachers who belong to this type of assessor take into account the interactive nature of learning by combining authentic, third-generation techniques that focus both on the product and process, which aim to help learners exhibit language skills that may be useful in a real-world context. As a result, they seek to involve learners in meaningful, forwardlooking communicative situations that extend to real-life language use.
Those that score 41-70% are considered to be Mainstream Assessors (TEACHER B). Teachers in this category move beyond traditionally constructed tests by trying to adopt a communicative approach that extends to real-life language use with the use of a combination of third-generation testing techniques. Mainstream assessors may find that they use a mixture of different testing techniques, combining all three generations of testing. They might feel, however, that they should be using more authentic, third-generation tasks to assess the students' skills. Taking into account the criteria for assessing the receptive and productive skills will make it easier for them to create authentic tasks to trigger motivation and increase the students' involvement. The more the communicative criteria are met, the more the learners' needs are satisfied.
Finally, those that score up to 40% are Non-enthusiastic Assessors (TEACHER C). Teachers in this category tend to test skills objectively using non-authentic, disembodied techniques that are not related to the students' real-world needs. The teacher who belongs to this category should try modifying tasks to achieve authenticity. The first step is to embody third-generation tasks and real-world activities, using authentic, unaltered materials. Similarly, the tasks could be significantly improved by providing an authentic context to engage students in meaningful activities that are actually needed in the real world. As a result, priority will be given to the issue of authenticity with the aim of boosting the students' communicative competency and satisfying their needs.

Discussion
The purpose of this research was two-fold. Our aims were, firstly, to investigate Greek EFL teachers' awareness of communicative testing techniques and alternative assessment methods and principles, and, secondly, to promote assessment literacy based on the use of the aforementioned assessment tool.
As for the type of assessors the research indicated that the majority of teachers belong to Mainstream Assessors, since alternative methods of assessment fail to be employed on a regular basis and third-generation assessment techniques are rarely used owing to lack of familiarity. Teachers seem to be positively disposed towards the benefits of alternative assessment, but at the same time are indecisive due to the practicality that traditional assessment offers (Zarali and Kofou 2020, p. 187). Given this, the need to achieve solid assessment literacy is great. Assessment literacy and training needs to be promoted to help teachers modify their assessment practices for the benefit of teaching and learning. This highlights the importance of continuing professional development and training in assessment practices, which is also evident in Vogt and Tsagari's (2014) study across Europe, in which the participating teachers "seemed to perceive a need for training across the whole spectrum" (p. 376) of language assessment literacy. Promoting changes in educational policies to help teachers employ new methods can prompt them to experiment and increase the use of alternative sources to keep up with the changes in the field of assessment.
All in all, this particular study offered the chance to help teachers understand the criteria that should be assigned to testing to become better prepared for assessing learners and to pull away from traditional testing with written tests that include second-generation activities, such as multiple-choice and true/false questions and summative assessment, usually conducted at the end of each unit or at the end of the semester. The findings of this study may have various implications for the development of assessment literacy, as they may help teachers, policymakers, stakeholders, and researchers proceed with their work in the field. More specifically, as Giraldo (2018) asserts (and which the present study reinforces), language teachers need to have the necessary knowledge (theoretical considerations regarding validity and reliability, for example), instructional skills that they can apply to assessment practices, and the ability to design testing items for the four language skills and apply language assessment principles to be considered assessment literate. In this context, the interpretation of the data implies that appropriate pre-service and in-service training should take place to help teachers employ more effective means of assessment that can lead to student empowerment. As a result, teachers should reflect more on the concept of third-generation communicative testing along with alternative methods to achieve an effective, more communicative, lifelike assessment of the students' performance.
Thus, the main contribution of the toolkit presented lies in the fact that it supports and guides teachers on how to use assessments in order to have a positive effect on student learning. It also points out the inextricable relationship between language assessment and teaching, as well as the importance of reflection on the practices and methods used by the teachers. This toolkit can remain an open-access tool that will be available for reflective practice or used as a guide to draw teachers' attention to ways that they could improve their assessment practices. Ideally, it can contribute to the shaping of language assessment literacy in general. To that end, together with the results of the study, it can be the basis of a language assessment training programme.

Conclusions
In summary, assessment literacy and training can encourage teachers to modify their assessment practices for the benefit of teaching and learning. This also highlights the importance of giving professional development opportunities to teachers. Clearly, promoting changes in educational policies to help teachers employ new methods will give 'traditional' teachers the opportunity to incorporate new methods when assessing. As Scarino (2013) claims, assessment-literate teachers can explore and evaluate their preconceptions and become aware of their own framework of knowledge and practices. The ideal result would be to motivate EFL teachers to redefine their roles and equip them with techniques that enrich their practice of assessment. To that end, the Tale project (Tsagari et al. 2018) can further help them reflect on their assessment practices and raise their awareness and levels of LAL through an online, self-study training course. Shaping teacher assessment literacy can improve and enhance the quality of language education by preparing autonomous and independent learners, as well as competent teachers who can understand the nature of assessment.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

LANGUAGE ASSESSMENT FOR TEACHERS TEACHER SELF-EVALUATION CHECKLIST ON ASSESSMENT A TOOLKIT FOR TEACHER DEVELOPMENT
This toolkit is based on the results of a research project which investigated the beliefs, perceptions, and practices of EFL teachers regarding assessment. It aims to help teachers design, develop, and critically evaluate tests as well as help them reflect on their assessment techniques. It can be used by pre-and in-service EFL teachers to raise their awareness of language assessment and support them to make informed decisions when assessing students' learning.
The toolkit is divided into two parts. The first part outlines the principles of language assessment, while the second part deals with the assessment of the four skills. Teachers are provided with criteria in the form of checklists with which they evaluate and rate their practices. It can also be regarded as a guide and self-evaluation tool for teachers that can be used several times during the school year to observe and reflect on their development and growth. Thus, teachers will be equipped with useful insights into test evaluation and design so as to be able to design their own tests for assessing learners.
PART I The first part aims to provide both novices and experienced teachers with input that concerns the basic assessment principles to help them understand how tests are constructed. Teachers can review or familiarize themselves with fundamental principles of language testing for describing, categorizing, and evaluating published tests, as well as designing their own tests.

Teacher Self-Evaluation Checklist on Assessment
Directions: Indicate the degree to which you incorporate each criterion below into your current practices.

PRINCIPLES OF LANGUAGE ASSESSMENT
CRITERIA 2 1 0 Authenticity 1. Tasks emphasize the communicative view of the language.

2.
The language in the test is as natural as possible.
3. The test items are as contextualized as possible rather than isolated.

4.
The rubrics used for the exercises have been contextualized to offer a more realistic and communicative view of the language.

5.
Genuine unaltered materials that can be actually found in the real world, not written for a language teaching purpose, have been used.

6.
The test topics and situations are interesting, enjoyable, and humorous for the learners, as well as relevant to their age and level.

7.
Some thematic organization is provided in the test, such as through a storyline or episode.
8. Assessment tasks represent, or closely approximate, real-world processes with effective tasks which are relevant to real-life situations.

9.
The assessment tests the student's ability to apply knowledge to real-life problems.

10.
The assessment tasks replicate real-world situations which are useful for everyday life.

1.
The testing content is similar to the teaching one.

2.
All skills are integrated.

3.
Promotes autonomous and self-directed learning.

4.
Promotes learner-centred assessment with a clear purpose.

5.
Triggers student's motivation, engages and involves him/her in the activities 6. Students can express their opinion on how they will be assessed.

7.
Assessment is regarded as a shared responsibility.

8.
Students are given a choice of assessment tasks.

9.
Students are given assessment tasks that suit their abilities. TOTAL SCORE: Convert your score to a percentage: /130 = . . . . . . . . . % RESULTS SCORE: 91% and above TEACHER A-Alternative Assessors Most teachers in this profile use different types of assessment. If you belong to this category, you devise your own tasks and tests to help students develop their independence. You use assessing as a way for learning and as guidance for the next steps. You may also feel that it is very important to use alternative, authentic, and third-generation assessments as much as possible by adhering to the key principles of assessment. You probably prefer the sort of language assessment where you need to offer a more realistic and communicative view of the language, for example, by replicating real-world processes using genuine unaltered materials. This is often the sort of language assessment you do in class, and you may be able to increase its benefits with continuous reflection and by promoting active student involvement. SCORE 61-90% TEACHER B-Mainstream Assessors Teachers who score close to average belong to this profile. You may find that you do not fall exactly into either of the alternative and non-enthusiastic categories, which makes you a mixture as you combine different ways at different times depending on the situation and what you are doing. As a result, you use both test-based and alternative assessments for different purposes. You may sometimes feel, however, that you should be using more authentic ways of assessment but there are probably reasons that keep you back. Try finding more time to learn and be more self-critical. If you become more aware of the reasons you avoid using alternative, authentic, and third-generation assessments more often, you may find it easier to do something about them. Following the principles of testing will also help you increase the benefits of your assessment. SCORE Up to 60% TEACHER C-Non-enthusiastic Assessors Teachers in this category use assessment in a less diverse way. You may find that you mainly use the information you get from assessment to grade your students' performance using more traditional methods. This can be identified as collecting information and providing feedback through grading. Assessment for you relies a lot on reproduction and memorization by focusing on form and accuracy. Perhaps you should be less preoccupied with assessing only student knowledge, and focus more on learning as well. This score does not mean that you are not a good language assessor, but there is always room for improvement. Perhaps this is the first time that you have thought about the way you assess learners. Try to offer a choice of assessment methods. A good starting point would be to adopt the principles of language assessment. Knowing more about this and receiving adequate training can be very useful in helping you to become a more effective language assessor.
PART II The second part deals with the assessment of the four skills. The following checklist will show teachers what to take into account when assessing the students' receptive and productive skills. It is divided into four sections, one for each skill, and includes specific criteria that can help evaluate tests and also help teachers design their own assessment tasks. Teachers can take these criteria into consideration when exploring the testing of all four skills.

Teacher Self-Evaluation Checklist on Assessing Skills
Directions: Indicate the degree to which you incorporate each criterion below into your current practices.
2-YES, 1-TO SOME EXTENT, 0-NO ASSESSING SKILLS Reading CRITERIA 2 1 0 1. Text authenticity; real-life text, written for a real-world purpose, with the source and text-type identified to the reader.

2.
The tasks are authentic in purpose.

3.
Simulations, real-life-based activities, and problem-solving activities are included.

4.
Tasks focus on the process and the 'how', rather than a final product and the 'what' of the tasks.

5.
The tasks test global comprehension.

6.
The questions try to integrate higher-and lower-order skills, taking into account the interactive nature of reading.

7.
The focus is on examining the student's reading skills, integrated with grammar and vocabulary use.

8.
Use of third-generation activities, such as information transfer activities, multiple matching activities, and modified cloze tasks.

9.
Use of learner-friendly and learner-centred activities that involve the reader in a re-encoding process.

10.
Incorporate objective integrative techniques that pay attention to processes of reading, such as inference, completion, and construction.
1 point for each TO SOME EXTENT 0 points for each NO TOTAL SCORE: Convert your score to a percentage: /86 = . . . . . . . . . % RESULTS SCORE 71% and above TEACHER A-Alternative Assessors Teachers in this profile try to involve learners in communicative, real-world, guided tasks or simulated authentic situations that reflect genuine communication. If you belong to this type of assessors, it is evident that you take into account the interactive nature of learning by combining authentic, third-generation techniques that focus both on the product and process, which aim to help learners exhibit language skills that may be useful in a real-world context. As a result, you seek to involve learners in meaningful, forwardlooking communicative situations that extend to real-life language use. SCORE 41-70% TEACHER B-Mainstream Assessors Teachers in this category move beyond traditionally constructed tests by trying to adopt a communicative approach that extends to real-life language use with the use of a combination of third-generation testing techniques. You may find that you are a mixture of different techniques, combining all three generations of testing. You might feel, however, that you should be using more authentic, third-generation tasks to assess the students' skills. Taking into account the criteria for assessing the receptive and productive skills will make it easier for you to create authentic tasks to trigger motivation and increase the students' involvement. The more the communicative criteria are met, the better the learners' needs are satisfied. SCORE Up to 40% TEACHER C-Non-enthusiastic Assessors Teachers in this category tend to test skills objectively using non-authentic, disembodied techniques that are not related to the students' real-world needs. If you belong to this category, try modifying tasks to achieve authenticity. The first step is to embody thirdgeneration tasks and real-world activities, using authentic unaltered materials. Similarly, the tasks could be significantly improved by providing an authentic context to engage students in meaningful activities that are actually needed in the real world. As a result, priority will be given to the issue of authenticity with the aim of boosting the students' communicative competence and satisfying their needs.