A Corpus-Based Word Classiﬁcation Method for Detecting Difﬁculty Level of English Proﬁciency Tests

: Many education systems globally adopt an English proﬁciency test (EPT) as an effective mechanism to evaluate English as a Foreign Language (EFL) speakers’ comprehension levels. Similarly, Taiwan’s military academy also developed the Military Online English Proﬁciency Test (MOEPT) to assess EFL cadets’ English comprehension levels. However, the difﬁculty level of MOEPT has not been detected to help facilitate future updates of its test banks and improve EFL pedagogy and learning. Moreover, it is almost impossible to carry out any investigation effectively using previous corpus-based approaches. Hence, based on the lexical threshold theory, this research adopts a corpus-based approach to detect the difﬁculty level of MOEPT. The function word list and Taiwan College Entrance Examination Center (TCEEC) word list (which includes Common European Framework of Reference for Language (CEFR) A2 and B1 level word lists) are adopted as the word classiﬁcation criteria to classify the lexical items. The results show that the difﬁculty level of MOEPT is mainly the English for General Purposes (EGP) type of CEFR A2 level (lexical coverage = 74.46%). The ﬁndings presented in this paper offer implications for the academy management or faculty to regulate the difﬁculty and contents of MOEPT in the future, to effectively develop suitable EFL curriculums and learning materials, and to conduct remedial teaching for cadets who cannot pass MOEPT. By doing so, it is expected the overall English comprehension level of EFL cadets is expected to improve.


Introduction
Many education systems globally have adopted the English Proficiency Test (EPT) as an effective mechanism to evaluate English as a Foreign Language (EFL) speakers' comprehension levels [1,2]. It is also used to determine the success or failure of pedagogies and English learning performances [3,4]. Hence, any exploration of EPT can help stimulate EFL practitioners and learners' further interest [5,6].
With the advancement of Information and Communications Technology (ICT), EPT has gradually developed towards digitalization. For example, internationally well-known English certification tests, such as the Test of English as a Foreign Language (TOEFL) and the International English Language Testing System (IELTS), are implemented online [7]. Because the Online English Proficiency Test (OEPT) naturally offers many benefits, including random selection of test items, easy-to-update-and-expand test banks, and the fact that the evaluation process of EFL learners' academic performance is more flexible and ubiquitous (e.g., [8,9]), many educational institutes have developed OEPT for facilitating EFL students' English learning performance (e.g., [10,11]). Since OEPT is used to test EFL students' English proficiency, in order to effectively evaluate their learning performance, the literature has brought forth various derived issues, including whether the test is objective, its reliability and validity, its difficulty tendency, and so on (e.g., [9,12]).
When it comes to tests, most people in education think of the item response theory (IRT), along with test response patterns and theories, test reliability and validity, test construction, item banking, computer adaptive test (CAT), and so on [13][14][15]. IRT embraces two basic concepts: (1) test-takers' performances in test items can be predicted or explained by a single (or a group of) factor(s), which are called "latent traits" or "abilities"; and (2) test-takers' performances and their latent traits can be represented by a continuous increasing mathematical function, which is called the item characteristic curve (ICC) [16]. IRT has achieved remarkable results in the development of test banks and CAT systems (e.g., [13,14,17]). Its advantages include the ability to effectively define the reliability and validity of test items, classify the difficulty of test items, and conduct an integrated analysis of students' ability and test questions [18].
IRT tools or CAT systems rely on statistical experts or in-service teachers who have the relevant statistical expertise (e.g., [13][14][15]17,18]), which may not be easily implemented by Teaching English for Speakers of Other Languages (TESOL) faculty. Moreover, establishing the CAT system of OEPT requires considerable resources and funds, which may not be accepted by most schools. Thus, seeking a low-cost and efficient approach to explore the composition of EPT or even OEPT and defining its lexical difficulty tendency would undoubtedly be a boon for English language learning and teaching.
Language ability is multidimensional and includes three different concepts: (1) standards describe language performance in specific contexts at different proficiency levels; (2) language competence includes lexical and grammatical competence; and (3) language skills comprise listening, reading, speaking, and writing [12]. EPT is used as the standard to assess whether EFL learners can understand English language knowledge (i.e., lexicon and grammar) when those pop up in real-world English contexts [19,20]. In addition, EPT is fundamentally composed of a bunch of lexical items (i.e., vocabularies).
From the perspective of corpus linguistics, the difficulty level and classification of vocabularies have been processed by many linguists. Furthermore, many word lists have been developed for corpus-based analysis to date. For example, West [21] compiled the general service list (GSL), Coxhead [22] compiled the academic word list (AWL), Browne et al. [23] compiled the new general service list (NGSL) as an expanded version of GSL, Chen et al. [24] compiled the function word list as a standard to optimize the word list results of the corpus software. Domestically, the Taiwan College Entrance Examination Center (TCEEC) also compiled a word list that includes two different levels: Common European Framework of Reference for Language (CEFR) A2 and B1 word lists, which serve as the lexical threshold for the English curriculum of K12 education in Taiwan [25].
Vocabulary is an essential element for composing meaningful linguistic patterns of English [11,15,26]. Thus, Laufer [27] noted that the lexical threshold refers to an EFL reader's vocabulary knowledge that is required for understanding a genuine text. Vocabulary knowledge is also considered as a good predictor of English proficiency. Based on a corpus-based approach, Nation [28] found 3000 high-frequency word families that covered 89-95% of lexical items and 5000 high-frequency word families covered 92-98% of lexical items in written texts. Laufer [27] explained that once EFL learners acquire 6000-8000 vocabularies, they will reach 98% lexical coverage-namely, they may understand 98% of the words in an English text.
Based on the prior studies, it can be inferred that high lexical coverage brings a better comprehension level in reading English contexts with the same for listening comprehension and speaking ability [20,26,[29][30][31]. Lexical coverage determines EFL learners' comprehension level of English contexts [32], while EPT is used to test EFL learners' comprehension level. Thus, this research hypothesizes that the lexical difficulty tendency of OEPT can be explored by a corpus-based approach that conducts vocabulary classification and composition analysis as well as vocabulary coverage calculation.
In the present study, a Taiwan military academy has established the Military Online English Proficiency Test (MOEPT) for evaluating EFL cadets' English comprehension levels. The reason that English acquisition is also emphasized by the military is because in cadets' future careers, they need to operate many U.S.-made weapon systems [33]. Additionally, frequent military exchanges between Taiwan and the U.S. will rely on officers who have adequate English communication skills. Thus, in addition to using MOEPT to evaluate cadets, more importantly, it has also become a referential indicator for English pedagogical goals.
In the development process of this MOEPT system, the academy hopes MOEPT can be a test system with moderate difficulty level and that meets the needs of English for military purposes. Under the condition that there is not enough professional manpower and sufficient funds to build a CAT system of MOEPT, this study detects the difficulty level of MOEPT by a corpus-based word classification method. In addition, the function word list [24] and TCEEC word list [25] are adopted as standardized measurement tools to classify the lexical items. The results offer important implications for future updates of MOEPT and the developments of English pedagogy and learning strategy.

Lexical Threshold
Vocabulary is an essential element in English language communication and is widely considered to be one of the crucial indicators of successful reading comprehension [27,34]. In other words, when EFL speakers acquire extensive vocabulary knowledge, they will find it easier to perform communication-related tasks (i.e., listening, speaking, reading, writing, or even translating). Conversely, weak vocabulary knowledge will decrease English comprehension [20,26,[29][30][31].
Nation [35] stated the following. (1) Vocabulary coverage indicates the proportion of running words that readers are able to understand in English contexts. Using vocabulary coverage to define the percentage of comprehensive vocabularies in texts is based on the theory that there exists a threshold of language knowledge that draws a line to segregate those with or without sufficient lexical knowledge to achieve reading comprehension. Identifying lexical coverage is a prerequisite to understanding the text because different text genres have different customized vocabularies in composition (i.e., high-frequency words used in certain domains); (2) vocabulary size helps with the necessity of understanding diverse texts and the degree of requirement. Word families, in this case, are the critical consideration that must be taken into account when identifying the vocabulary size. Hence, coverage and size indicate whether EFL readers possess certain English capabilities to handle daily reading and communication; (3) Vocabulary level identification is based on different criteria, such as frequency, range, and dispersion, to categorize tokens in textual data. After confirming the vocabulary level of certain texts, the mechanisms, such as wordlist creation, genre analysis, and customary linguistic usage identification, will be triggered.
As Hsu [36] mentioned, the lexical threshold explores the connection between vocabulary and handling of communication-related tasks, analyzing vocabulary from three aspects: vocabulary coverage, size, and level. Hsu [36] compiled corpora of business textbooks (embracing 7,200,000 running words) and business research papers (embracing 7,620,000 running words) as the basis of linguistic analysis. The study subsequently used the RANGE program [37] and British National Corpus (BNC) to calculate the proportions of lexical coverage and to define the corresponding difficulty levels of the words. The results show that if learners possess 5000-8000 word-level comprehension capability in BNC, then they will be able to handle 98% language coverage to facilitate their understanding in learning English business textbooks and research papers.
Durrant [38] used corpus-based approaches to calculate lexical coverage of the academic vocabulary list (AVL) [39] in the British academic written English (BAWE) corpus. BAWE was considered an outstanding corpus of student writing, created and collected by four British colleges. Lexical coverage calculation is based on Durrant's approach [38], which combines many variations (e.g., level, discipline, text type, and item) to process the corpus in detail. The results showed that there were core lexical items that were frequently used by 90% of the disciplines, indicating that once the students can acquire core lexical items, they would be able to achieve greater success in their academic writing.
To summarize, apart from the cognition of semantic rules, vocabulary is undoubtedly the most basic and important cognition unit for EFL and native speakers [20,26,27,[29][30][31]34,35]. Hence, the lexical threshold illustrates that a certain degree of English vocabulary comprehension is required to reach the cognitive needs of English in specific domains and certain education levels. Lexical threshold identification can be calculated by the proportions of word types and running words, and it can be based on certain criteria to identify lexical coverage, size, and difficulty level. Once lexical coverage, size, and difficulty level are successfully verified, the results can be adopted to facilitate pedagogical applications, and language acquisition (e.g., [35,36,38]).

TCEEC Word List
TCEEC word list is an important referential basis for the college entrance exam of the English subject and the English curriculum developments of junior and senior high schools in Taiwan. It was released in 2002 and embraced junior high school level (JHSL) and senior high school level (SHSL) word lists, which respectively represent CEFR A2 and B1 difficulty levels [25]. The word list contains 6480 American-style words and mainly includes words' lexemes without their word family. According to Cobuild English Dictionary's statistics, the TCEEC word list includes the 2000 most primary words that occur at over 75% frequency in Anglophone contexts.
The word list is divided into function words and content words, and each group is separated into a different part of speech (POS). This can effectively help understand English words' linguistic behaviors in grammar and pronunciation better [25]. The TCEEC word list took two years, and its development was based on more than 35 kinds of foreign and domestic English books and the most powerful and useful English word lists. A word that can be included in the word list had to meet the following four criteria: (1) needs to be used with high frequency; (2) needs to occur in Taiwanese or American culture; (3) can be inferred to expand its word family or lemma; and (4) can reflect the life experience of Taiwan's K12 students [25]. The goal to develop the TCEEC word list is to bring benefits to the entrance exams, English curriculums, and editing of English textbooks [25].

The Function Word List
The advantage of Chen et al.'s [24] function word list is that it embraces the most frequent function words in any corpus data, but there are only 228 word types of function words in their research. In addition to excluding the function words, the meaningless tokens and letters should also be removed for enhancing analytical efficiency. Hence, this study expands the function word list [24], which eventually covers 256 word types (see Figure 1). The revised function word list is adopted for measuring the target corpus.

CEFR A2 Level Word List
The compilation of the CEFR A2 level word list is based on the JHSL word list declared by TCEEC in 2002 [25]. The JHSL word list serves as a lexical threshold that has been utilized in Taiwan primary and junior high schools for defining the difficulty, progress, and scope in English curricula. Given that the JHSL word list in Taiwan's English education system is generally considered as a CEFR A2 level word list to define a Taiwanese's English proficiency level, the JHSL word list is thus set as the CEFR A2 criterion in this study.
Approximately 2000 word types exist on the original JHSL word list, but the word list covers only the words' lexemes (i.e., the base form of verbs and the singular nouns). To accurately determine the difficulty level of a vocabulary, the researcher expanded the words' word family (e.g., accept, accepts, accepting, accepted) and made turned the JHSL word list into a CEFR A2 level word list with 3811 word types (see Figure 2). By doing so, actor and actors, for example, can be both considered at the CEFR A2 level, rather than just actor being defined at the CEFR A2 level, while actors becomes an off-list word.

CEFR A2 Level Word List
The compilation of the CEFR A2 level word list is based on the JHSL word list declared by TCEEC in 2002 [25]. The JHSL word list serves as a lexical threshold that has been utilized in Taiwan primary and junior high schools for defining the difficulty, progress, and scope in English curricula. Given that the JHSL word list in Taiwan's English education system is generally considered as a CEFR A2 level word list to define a Taiwanese's English proficiency level, the JHSL word list is thus set as the CEFR A2 criterion in this study.
Approximately 2000 word types exist on the original JHSL word list, but the word list covers only the words' lexemes (i.e., the base form of verbs and the singular nouns). To accurately determine the difficulty level of a vocabulary, the researcher expanded the words' word family (e.g., accept, accepts, accepting, accepted) and made turned the JHSL word list into a CEFR A2 level word list with 3811 word types (see Figure 2). By doing so, actor and actors, for example, can be both considered at the CEFR A2 level, rather than just actor being defined at the CEFR A2 level, while actors becomes an off-list word.

CEFR B1 Level Word List
The compilation of the CEFR B1 level word list is based on the SHSL word list also declared by TCEEC [25]. The SHSL word list serves as the lexical threshold that has been utilized in Taiwan senior high schools for defining the difficulty, progress, and scope in English curricula. As the SHSL word list in Taiwan's English education system is generally considered a CEFR B1 level word list to define a Taiwanese's English proficiency level, the SHSL word list is thus set as the CEFR B1 criterion in this study.
Approximately 4480 word types exist on the original SHSL word list, and just like the JHSL word list, SHSL covers only the words' lexemes. Thus, the reason to expand SHSL words' word family (e.g., bully, bullies, bullying, bullied) is the same as with the

CEFR A2 Level Word List
The compilation of the CEFR A2 level word list is based on the JHSL word list declared by TCEEC in 2002 [25]. The JHSL word list serves as a lexical threshold that has been utilized in Taiwan primary and junior high schools for defining the difficulty, progress, and scope in English curricula. Given that the JHSL word list in Taiwan's English education system is generally considered as a CEFR A2 level word list to define a Taiwanese's English proficiency level, the JHSL word list is thus set as the CEFR A2 criterion in this study.
Approximately 2000 word types exist on the original JHSL word list, but the word list covers only the words' lexemes (i.e., the base form of verbs and the singular nouns). To accurately determine the difficulty level of a vocabulary, the researcher expanded the words' word family (e.g., accept, accepts, accepting, accepted) and made turned the JHSL word list into a CEFR A2 level word list with 3811 word types (see Figure 2). By doing so, actor and actors, for example, can be both considered at the CEFR A2 level, rather than just actor being defined at the CEFR A2 level, while actors becomes an off-list word.

CEFR B1 Level Word List
The compilation of the CEFR B1 level word list is based on the SHSL word list also declared by TCEEC [25]. The SHSL word list serves as the lexical threshold that has been utilized in Taiwan senior high schools for defining the difficulty, progress, and scope in English curricula. As the SHSL word list in Taiwan's English education system is generally considered a CEFR B1 level word list to define a Taiwanese's English proficiency level, the SHSL word list is thus set as the CEFR B1 criterion in this study.
Approximately 4480 word types exist on the original SHSL word list, and just like the JHSL word list, SHSL covers only the words' lexemes. Thus, the reason to expand SHSL words' word family (e.g., bully, bullies, bullying, bullied) is the same as with the

CEFR B1 Level Word List
The compilation of the CEFR B1 level word list is based on the SHSL word list also declared by TCEEC [25]. The SHSL word list serves as the lexical threshold that has been utilized in Taiwan senior high schools for defining the difficulty, progress, and scope in English curricula. As the SHSL word list in Taiwan's English education system is generally considered a CEFR B1 level word list to define a Taiwanese's English proficiency level, the SHSL word list is thus set as the CEFR B1 criterion in this study.
Approximately 4480 word types exist on the original SHSL word list, and just like the JHSL word list, SHSL covers only the words' lexemes. Thus, the reason to expand SHSL words' word family (e.g., bully, bullies, bullying, bullied) is the same as with the principle of the CEFR A2 level word list compilation. After the SHSL word list is expanded as the CEFR B1 level word list, it embraces 5981 word types (see Figure 3).
Appl. Sci. 2023, 13, x FOR PEER REVIEW 6 of 17 principle of the CEFR A2 level word list compilation. After the SHSL word list is expanded as the CEFR B1 level word list, it embraces 5981 word types (see Figure 3).

Data Processing and Analysis
This study uses AntConc 3.5.9 [40], a popular corpus software widely used in many corpus-based studies, to conduct data processing and analysis. The procedure can be divided into three major steps: (1) remove the overlapping parts between word classification

Data Processing and Analysis
This study uses AntConc 3.5.9 [40], a popular corpus software widely used in many corpus-based studies, to conduct data processing and analysis. The procedure can be divided into three major steps: (1) remove the overlapping parts between word classification criteria; (2) explore the composition of the off-list words; and (3) calculate word types proportion and lexical coverage of each word classification criteria. Detailed descriptions are given as follows.
(1) Remove the overlapping parts between word classification criteria After expanding and compiling the function word list [24] and CEFR A2 and B1 level word lists, the researcher discovered overlapping words between each word list. If the overlapping words cannot be effectively removed, then these words will be classified repeatedly and cause errors and inaccuracy in detecting the difficulty level of the target corpus. Before the initiation of the process, it was assumed that if an overlapping word exists in both two different difficulty level word lists, then the word should be classified into the lower (i.e., easier) difficulty level word list. For example, if "the" exists on the function word list and CEFR A2 level word list simultaneously, then "the" should belong to the function word list.
Under the above condition, the researcher input the CEFR A2 level word list into AntConc 3.5.9 and then input the function word list and set it as "a stoplist" to remove the function words that occurred in the CEFR A2 level word list. Through this process, the CEFR A2 level word list removed 118 overlapping words and decreased its word types to 3692. Similarly, the researcher input the CEFR B1 level word list into the corpus software and then input the word list that integrated the function word list and CEFR A2 level word list and set it as "a stoplist" to remove the function words and CEFR A2 level words that occurred in the CEFR B1 level word list. From this process, the CEFR B1 level word list removed 116 overlapping words and decreased its word types to 5865. The purpose of this series of filtering processes is to precisely distribute the words to proper difficulty level word lists to avoid bias in the analytical results.
(2) Explore the composition of the off-list words After the function words, CEFR A2 level words and CEFR B1 level words are excluded; the remaining words are classified as off-list words. The function words can be recognized as grammar words, and CEFR A2 and B1 level words can be recognized as English for general purpose (EGP) words (i.e., everyday English) with high frequent usage in Anglophone countries [25]. The off-list words should be fully investigated, because these words may cover EGP-oriented words with a higher difficulty level or words that are for military purposes. Because MOEPT is designed for testing cadets' English competency in military contexts, the researcher hypothesizes that the off-list words should cover many military-purpose words (e.g., terminologies, technical words, and acronyms).
In order to address the off-list words, the researcher followed prior studies (e.g., [33,41]) and the words' meanings and usages to conduct the vocabulary taxonomy qualitatively. Three raters (including the researcher) conducted classification tasks based on each offlist word's literal meanings and functions in the test items. After the words' classifications among the off-list words are confirmed, each classification's statistical data can also be calculated.
(3) Calculate word types proportion and lexical coverage of each word classification criterion In order to detect the difficulty level, the calculation of the proportion of different word classification criteria have to be implemented from two aspects: word types proportion and lexical coverage. Word types proportion represents the lexical composition of the target corpus; in addition, lexical coverage, as Nation [35] mentioned, indicates the proportion of tokens (i.e., running words). If a certain word has a high lexical coverage in a context, a reader will have higher likelihood to encounter that word in the future. Accordingly, identifying lexical coverage of different word classification criteria will make us understand the overall difficulty level of the target corpus. Furthermore, identifying lexical coverage is also a prerequisite to effectively analyze the text because different text genres have different customized vocabularies in composition. The calculation of word types proportion and lexical coverage can be carried out by Equation (1).

Definition 1 ([35]
). If L i represents a word classification criterion's word types proportion or lexical coverage, then, T i is a word classification criterion's number of word types or tokens on the target corpus, and T σ is the total number of word types or tokens of the target corpus.
Taking the function word list as an example, the researcher input the target corpus, then input the function word list and set it as "specific words", and then re-generated the word list on AntConc 3.5.9, which shows the function words' number of word types and tokens that exist in the target corpus. Next, the researcher used Equation (1) to compute for obtaining the word types proportion and lexical coverage values.

Overview of the MOEPT at Taiwan Military Academy
MOEPT is an online version of EPT developed by the Taiwan military academy located in Kaohsiung to evaluate cadets' English proficiency levels (see Figure 4). Its question types and forms are based on the English comprehension level (ECL) test issued by the U.S. Defense Language Institute English Language Center (DLIELC). The ECL test is an instrument used to test international participants' English language reading and listening proficiency in certain U.S.-sponsored military exercises. It is also adopted as criteria for recruiting EFL military personnel or determining their eligibility for commissioning, attending training courses, or being assigned certain jobs (https://www.dlielc.edu/testing/ ecl_test.php, accessed on 1 November 2022).

Appl. Sci. 2023, 13, x FOR PEER REVIEW 8 of 17
Textual data of MOEPT were adopted as the target corpus. Overall, MOEPT included 13 simulation tests and 2182 test items. The compiled target corpus included 5465 word types and 79,526 tokens, and its lexical diverse indicator type/token ratio (TTR) was 0.07.

Composition of the Off-List Words
When the target corpus removed word types and tokens of the function words, CEFR A2 level words, and CEFR B1 level words, there remained 734 word types and 1833 tokens that did not belong to the aforementioned three-word classification criteria and were defined as the off-list words. The word types of the off-list words account for 13.4% of word types of the target corpus, and the tokens of the off-list words account for 2.3% of tokens Because MOEPT has not yet reached the mature developed stage and the test bank has still continued to be expanded and updated, the reliability and validity of MOEPT have not been tested yet. Hence, in this study, it is expected that the evaluation and analysis of lexical difficulty tendency can effectively be implemented by the corpus-based approach. The composition of question types currently includes 1337 listening comprehension questions, 748 grammar/vocabulary questions, and 97 short reading comprehension passages, which are similar to the U.S. DLIELC-issued ECL test.
Textual data of MOEPT were adopted as the target corpus. Overall, MOEPT included 13 simulation tests and 2182 test items. The compiled target corpus included 5465 word types and 79,526 tokens, and its lexical diverse indicator type/token ratio (TTR) was 0.07.

Composition of the Off-List Words
When the target corpus removed word types and tokens of the function words, CEFR A2 level words, and CEFR B1 level words, there remained 734 word types and 1833 tokens that did not belong to the aforementioned three-word classification criteria and were defined as the off-list words. The word types of the off-list words account for 13.4% of word types of the target corpus, and the tokens of the off-list words account for 2.3% of tokens of the target corpus.
In order to explore the composition of the off-list words, based on prior studies [e.g., 33,41], the researcher implemented vocabulary taxonomy. Based on each off-list word's literal meanings and functions in the test items, three raters conducted classification tasks. Because the off-list words (N = 734) were not in large quantity, during the process of classification, the raters, through cross-check and discussion methods, distinguished the vocabulary into five major categories: (1) other EGP words (including nouns, verbs, adjectives, and adverbs), (2) Names, (3) Countries/Locations, (4) Military, and (5) Medical.
The definitions, functions, and example sentences were retrieved from MOEPT database (i.e., the target corpus) through the interface of Key Word in Context Index (KWIC) on AntConc 3.5.9 [40]. This information is described in (1) to (5) as follows.  (2) Names: The function of a name is to make the test questions have situational effects so that test-takers can understand the conditions and timing to properly use vocabularies (2-1, 2-2). Additionally, some test questions also adopted celebrities' biographies or short introductions as contents of short reading comprehension passages (2-3).

Word Types Proportion and Lexical Coverage of Each Word Classification Criteria
(1) The function words After limiting the word list range by setting the function words as the "specific word" on AntConc 3.5.9 (see Figure 5), it generated 237 word types and 49,190 tokens of the function words that exist on the target corpus (see Figure 6). After calculating their proportions on the target corpus, it was discovered that the word types of function words account for approximately 4% of word types of the target corpus, while their lexical coverage is 62%, indicating the function words are indispensable elements to form meaningful sentences, and the more function words that are extracted, the more simple sentences there are on the target corpus (e.g., [24,42]).

(2) CEFR A2 level words
After limiting the word list range by setting CEFR A2 level words as the "specific words" on AntConc 3.5.9 (see Figure 7), it generated 2557 word types and 22,589 tokens of CEFR A2 level words that existed on the target corpus (see Figure 8). After calculating their proportions on the target corpus, it was discovered that the word types of CEFR A2 level words account for approximately 47% of word types of the target corpus, while their lexical coverage is 28%. (2) CEFR A2 level words After limiting the word list range by setting CEFR A2 level words as the "specific words" on AntConc 3.5.9 (see Figure 7), it generated 2557 word types and 22,589 tokens of CEFR A2 level words that existed on the target corpus (see Figure 8). After calculating their proportions on the target corpus, it was discovered that the word types of CEFR A2 level words account for approximately 47% of word types of the target corpus, while their lexical coverage is 28%.

(3) CEFR B1 level words
After limiting the word list range by setting CEFR B1 level words as the "specific word" on AntConc 3.5.9 (see Figure 9), it generated 1937 word types and 5914 tokens of CEFR B1 level words that existed on the target corpus (see Figure 10). After calculating their proportions on the target corpus, it was discovered that the word types of CEFR B1 level words account for approximately 35% of word types of the target corpus, while its lexical coverage is 7%.

(3) CEFR B1 level words
After limiting the word list range by setting CEFR B1 level words as the "specific word" on AntConc 3.5.9 (see Figure 9), it generated 1937 word types and 5914 tokens of CEFR B1 level words that existed on the target corpus (see Figure 10). After calculating their proportions on the target corpus, it was discovered that the word types of CEFR B1 level words account for approximately 35% of word types of the target corpus, while its lexical coverage is 7%.
To summarize, after the composition of the target corpus was comprehensively investigated by the proposed method, the results indicated that from the perspective of  Table 1). To summarize, after the composition of the target corpus was comprehensively investigated by the proposed method, the results indicated that from the perspective of word types, CEFR A2 level words account for the largest proportion (46.79%) on the target corpus, while Medical had the lowest proportion (0.13%); in addition, from the perspective of lexical coverage, the function words has the largest lexical coverage (61.85%), while, Medical also had the lowest lexical coverage (0.02%) (see Table 1).

Overall Difficulty Level of the Target Corpus
Because the function words are grammatical words that are used for composing meaningful sentences, such words usually meaningless but have larger lexical coverage, which may misjudge the over difficulty tendency of the target corpus. Thus, to effectively detect overall difficulty level of the target corpus, the researcher firstly excludes the function words from the target corpus, then, analyzes the composition of the target corpus from two perspectives, word types and tokens for deep interpretation.
From the perspective of word types, the word types' difficulty tendency of the target corpus mainly tends toward the CEFR A2 level and CEFR B1 level, because the above two word lists account for about 85.96% of word types of the target corpus (see Table 2). This indicates that the settings of MOEPT still mainly belong to the EGP-type test bank. After comparing the lexical coverage of the word classification criteria, the difficulty tendency of the target corpus mainly tends toward the CEFR A2 level (74.46%), indicating that many test items are composed of CEFR A2 level words; in addition, CEFR B1 level words have the second large lexical coverage (19.49%), while the classifications of the off-list words are infrequently occurred (6.04%) on the target corpus (see Table 2).
Lexical coverage reflects the total frequency of word types in the target corpus. In addition, the frequency also affects the probability of the appearing vocabulary [11,20,[26][27][28][29]35]. In this case, when test-takers take MOEPT, they may encounter CEFR A2 level words with a higher likelihood. After a series of filtering, computing, and analyzing procedures, this study anatomizes the composition of MOEPT in detail and successfully detects the lexical difficulty level by means of the word classification criteria, thus defining the difficulty level of MOEPT is mainly EGP-type of the CEFR A2 level.

Discussion
This study used the corpus-based word classification method to detect the difficulty level of MOEPT at a Taiwanese military academy. The analytical results indicated that MOEPT is mainly EGP-type at the CEFR A2 level. This is completely inconsistent with what the academy expected when the MOEPT system was first built. The military academy belongs to the academic system of universities. Under the policy of the Ministry of Education, undergraduate students in Taiwan are expected to have CEFR B2 level English proficiency, because TCEEC has already set JHSL and SHSL word lists as the lexical thresholds for English curricula in junior and senior high school, respectively. In other words, students who complete the senior high school (K12) are expected to have acquired the TCEEC word list (N = 6480) and be able to learn more academic-oriented or CEFR B2 level words in their future higher education [43]. However, the researcher discovered current MOEPT included only 405 word types (760 tokens) that may be categorized as higher CEFR B2 level words (i.e., other EGP words). Moreover, these words accounted for only 7.75% of word types and 2.51% tokens of the target corpus (see Table 2). This information serves as an important warning call in the future update and development of the MOEPT test bank. The lexical difficulty tendency should not remain at the CEFR A2 level. On the contrary, it should be improved to the CEFR B2 level. Once the lexical difficulty level is enhanced, the difficulty of MOEPT will also be enhanced, as prior research studies have reported that vocabulary knowledge is considered as a good predictor of EPT [11,15,20,26,[29][30][31].
From the perspective of English for a specific purpose (ESP), the growing emphasis is on including ESP in curricula to improve the English performance of each professional field in EFL countries (e.g., [36,44]). Nguyen [45] mentioned that ESP is a pedagogical approach that provides EFL learners with language skills for specific professional and academic purposes. The cadets will face military exchanges between Taiwan and the U.S. after graduating from the academy. Their English proficiency and communication capabilities will be involved in English for military purposes. As for other ESP cases, military English embraces uncommon terminologies, slang, and lexical bundle usages [33]. The academy's English curricula must also play an important role in teaching military English. However, the results show that there were even fewer military-oriented words than other ESP categories. There were only 27 word types (70 tokens) that can be categorized as military-purpose words. Moreover, these words accounted for only 0.52% of word types and 0.23% of tokens of the target corpus (see Table 2). In order to enhance MOEPT's practicality in future real-word military English contexts, the researcher suggests that military-oriented words should be embedded into MOEPT to increase military-purpose dialogues and military related articles for making cadets better understand real-word situations and make MOEPT more aligned with cadets' future careers.
In terms of cadets' academic performances and English curriculum developments, the academy should not consider passing MOEPT as the major pedagogical goal; otherwise, it will fall into the predicament that the test dominates the English curriculum. Moreover, passing MOEPT will not substantially improve cadets' English competency as the overall lexical tendency is toward the CEFR A2 level, while the CEFR A2 level should be targeted after finishing the English curriculum in junior high school for Taiwan's education system. For cadets who cannot pass MOEPT (i.e., their score is less than 60 points), the academy is suggested to relocate them to more basic English classes for remedial teaching, because they may have a problem with the basic construction of vocabulary knowledge. Hence, there is a necessity for implementing English competence-based class grouping (e.g., [46,47]).

Conclusions
Prior research studies have supported that a significantly positive correlation exists between vocabulary knowledge and EPT performances [11,15,26,48,49]. EFL learners' vocabulary size is crucial for comprehending English contexts-that is, better vocabulary knowledge brings out higher lexical coverage and a better understanding of English contexts [20,26,[29][30][31][32]. Based on the lexical threshold theory, this study adopted the corpusbased approach to explore the lexical difficulty tendency of MOEPT, as developed by a Taiwanese military academy. The findings presented have important implications for the future development and updating of MOEPT. Moreover, for English pedagogical implications, the findings are vital indicators for setting vocabulary goals, selecting materials and lexical tasks, designing lexical syllabi, or even monitoring cadets' vocabulary learning progress. This paper has the following contributions. First, the word classification process has successfully separated the word types and tokens of the target corpus into function words, CEFR A2 level words, CEFR B1 level words, and off-list words in a machined-based way. Furthermore, off-list words have also been investigated to further be divided into sub-classifications, including other EGP words, Name, Country/Location, Military, and Medical by the qualitative vocabulary taxonomy that was conducted by the three raters, which made the composition of the target corpus be more clearly anatomized. Second, the corpus-based word classification method used for the detection of difficulty level can be replicated and used to explore any EPT corpus data (e.g., entrance exams, TOEFL, and IELTS). Third, the TCEEC word list, including CEFR A2 and B1 level word lists, is well-known within Taiwan's education system [43]. Thus, this difficulty level analysis will allow Taiwanese teachers and students to immediately understand the trend of EPT that they want to analyze. To summarize, although MOEPT may not be a representative EPT sample, the proposed corpus-based word classification method is still capable of unveiling the mystery of the difficulty level of any EPT and can be utilized widely.
This paper still has some limitations. For example, unlike IRT, this corpus-based analysis did not take EFL learners' English competency into consideration or integrally analyze the correlation between cadets' English competency and MOEPT difficulty tendency. Moreover, the reliability and validity of MOEPT were not explored and discussed in this study. Future researchers can rely on the present study to develop different difficulty level word lists as measurement tools to conduct more extensive corpus analysis and to conduct in-depth evaluations of the reliability and validity of EPT by integrating information on linguistic aspects. Other factors may affect test analysis, including average text length, percentage of complex words, percentage of parts of speech, and readability indices, can also be discussed in the future (e.g., [50,51]). In addition, exploring the relationships between tests, pedagogical design and practice, learning strategy, the washback effect, and an external monitoring mechanism can help ensure that the curriculum does not deviate from the test preparation tendency (e.g., [52][53][54][55][56]).