Factor Structure and Validation of the 12-Item Korean Version of the General Health Questionnaire in a Sample of Early Childhood Teachers

: The 12-item General Health Questionnaire (GHQ-12) is designed to detect a diagnosable psychiatric disorder and has demonstrated positive psychometric properties in adult populations. Despite these ﬁndings, the psychometric properties of the GHQ-12 have hardly been examined with regard to early childhood teachers. This study purposed to examine the factor structure of the GHQ-12 and to assess its psychometric properties vis- à -vis a sample of Korean early childhood teachers. An aggregate of 252 participants completed the Korean version of the GHQ-12 in tandem with other psychiatric measures, including the Patient Health Questionnaire-9 (PHQ-9) and the Beck Depression Inventory (BDI). The resulting data were subjected to conﬁrmatory factor analyses to compare the goodness-of-ﬁt of the previously proposed models of the GHQ-12. The three-factor model comprising anhedonia/sleep disturbance, social performance and loss of conﬁdence was found by the goodness-of-ﬁt indices to excellently ﬁt our study sample. The average variance extracted and all factor loadings exceeded the recommended threshold of 0.50; hence, convergent validity was established. The criterion posited by Fornell and Larcker veriﬁed the discriminant validity. The instrument evidenced superior reliability evinced by its adequate internal consistency and composite reliability. This evidence allows the assertion that the GHQ-12 may be deployed as a screening tool for the evaluation of general symptoms of psychiatric disorders in Korean early childhood teachers.


Introduction
Recently conducted studies have consistently demonstrated the importance of the mental health of early childhood teachers [1,2]. It is critical to attend to the mental health of early childhood teachers for several reasons. Mental health problems such as depression and anxiety have become prominent national concerns in South Korea (hereafter Korea) [3,4]. This paper focuses on teachers in early childhood education settings, a professional group that experiences one of the highest levels of job-related stress. Teachers thus represent a vulnerable cohort that is at high risk of developing mental disorders [5,6]. Several researchers have repeatedly identified the common stressors of early childhood practitioners: work overload, time pressure, difficulties with administration or management and the need to manage behavioral problems in children [6][7][8]. Studies have also cited the challenges of dealing with parents who treat preschools as child-minding services and the performance of other non-teaching tasks as additional stressors for this group [9,10]. Further, the Korean public does not fully recognize the professional stature of early childhood teachers: the prevailing perception of this group as low in status is combined with meager remunerations [11]. These elements also contribute to the poor mental health of early childhood teachers. It is widely acknowledged that the mental wellbeing of early childhood teachers significantly influences instructional effectiveness. It also affects the personal growth, emotional development and academic performance of the children in their charge [12,13]. Thus, concerns about the mental health of early childhood teachers assume immense individual and social importance. The increasing scholarly interest in the mental health of early childhood teachers has created a greater demand for valid and reliable research instruments that can appropriately measure the psychological distress of this group. Such evaluations are crucial before apt interventions to promote the mental health of this professional category can be planned, implemented and appraised. Our study tested the validity of the General Health Questionnaire-12 (GHQ-12) [14], a widely used instrument, as a measure of the mental health aspects pertaining specifically to a sample of early childhood educators.

Application of the GHQ-12
Several extant instruments measure symptoms indicative of psychological distress or psychiatric disorders. The General Health Questionnaire (GHQ) devised by Goldberg is one of the most widely applied assessments of the severity of symptoms associated with psychological distress [14]. The GHQ-12 index was also originally intended to screen for general (non-psychotic) psychiatric morbidity. The original GHQ comprised 60 items, but abridged versions have been developed and modified (e.g., GHQ-30, GHQ-28, GHQ-20 and GHQ-12). The GHQ-12 is one of the most used of such shorter adaptations. The popularity of the GHQ-12 in comparison to longer versions is attributable largely to its ease of use, brevity, self-reporting format and reliability in generating robust results [15,16]. In fact, the GHQ-12 was adopted by a World Health Organization study screening for psychological disorders in primary care because it was considered the most valid among similar vetting tools [17,18].
Empirical evidence suggests that the GHQ-12 evinces adequate internal consistency and superior sensitivity and specificity [19,20]. The GHQ-12 has been widely applied in multiple settings since its development. It has been utilized in both clinical and nonclinical samples, in different cultures and for different age groups [14,18]. To date, the GHQ-12 has been translated into 38 languages, making it accessible to practitioners and researchers across many parts of the world. Some studies have used the GHQ-12 and attended to its applicability in Korea and the reliability and validity of the instrument have been demonstrated [21,22]. However, the research subjects of such studies have mostly been members of the general population or university students. The Korean version of the GHQ- 12 has not yet been tested with occupational groups such as the early childhood teachers. This significant gap in the literature must be addressed for prospective research initiatives. The Korean version of the GHQ-12 would facilitate the research process and allow direct comparison of studies focusing on the mental health of teachers in the Korean settings vis-à-vis investigations conducted across different cultures or settings.

Existing Factor Structures of the GHQ-12
A large number of studies have found the GHQ-12 to have favorable psychometric properties among various populations in different countries, including adults in the general population [19,20,22], older adults [23], primary care patients [18], out-patients with psychological disorders [24], pregnant women [25] and adolescents [15].
Despite the encouraging psychometric evaluation of GHQ-12, several studies have applied exploratory and/or confirmatory factor analyses (EFA, CFA) to query whether the GHQ-12 is dimensional or multidimensional and subsequently debated the validity of its underlying structure. Goldberg initially developed the GHQ-12 as a unidimensional construct; however, only a few scholars have supported the one-factor latent structure in subsequent empirical studies [26,27]. Conversely, different factor models emerged when researchers investigated the dimensionality of the GHQ, suggesting that the instrument is multidimensional and that it contains two or three clinically meaningful factors. Several alternative multidimensional models have been proposed since then, mainly with two or three factors. The three-factor model proposed by Graetz [28] has received the most empirical support in this context and has later been replicated in confirmatory analyses [29][30][31]. This model comprises the factors of anxiety, social dysfunction and loss of confidence. Notably, the one-factor encompasses all six positively worded items and the six negatively worded items are divided into two separate factors. Simultaneously, other studies have also evidenced that the GHQ-12 comprises three dimensions but have termed the factors differently from Graetz. For example, Worsley and Gribbin's [32] EFA study produced three dimensions (anhedonia/sleep disturbance, social performance and loss of confidence) with several cross-loadings. Martin [33] used CFA and found support for a three-factor solution, labeling the dimensions as self-esteem, stress and successful coping.
Nonetheless, the instrument's two-factor model, which includes six negatively worded and six positively worded items grouped into two factors, has also been sustained by studies based on EFA [34][35][36]. However, one of the problems with both the two-and three-factor model involves the separation of negatively and positively worded items into separate factors. As such, the question remained whether these factors represented substantive meaning or whether they only denoted artifacts of a response style associated with the positive and negative wording of the items or the so-called method factor. Responding to this challenge, later studies have attempted to model wording effects for the negatively worded items in confirmatory factor models. Hankins' [37] pioneering work conducted on an English sample found that the unidimensional model, with correlated errors on the negatively worded items, was more apt than both the two-factor (positively and negatively worded items) and three-factor models. The studies conducted by Li [38] and Aguado et al. [39] also discovered that the unidimensional model, including its wording effects, was a better fit than Graetz's three-factor model. Nevertheless, no consensus has been achieved about the validity and utility of these multidimensional models, primarily Graetz's. These models have been questioned because of the high degree of correlation between factors.
Some studies in Korea have performed EFA and CFA on the Korean version of the GHQ-12 to examine the psychometric properties of this measure. The following results have been reported: (1) the GHQ-12 demonstrated adequate internal consistency [21,22]; (2) the EFA revealed a two-factor structure [22]; (3) comparisons of single-factor, two-factor and three-factor models using CFA have found that the three-factor model fit the structure of the scale [21]. However, no evidence currently exists to posit that GHQ-12 is suitable for use with early childhood teachers in Korea. Investigations conducted by Park et al. [22] and Lee et al. [21] evinced the good psychometric properties of the Korean version of the GHQ-12; however, at least two limitations currently prevent its use in the context of scholarship. First, Park et al.'s study included the general adult population, whereas Lee et al.'s examination encompassed university students; our teacher sample may differ in important ways from these two distinct samples. Second, the mean age in the university student sample studied by Lee et al. was 20.2 years (age range 18-28 years), a span that is much more limited than the age range represented by the teachers participating in our study. Notably, it would be erroneous to translate the psychometric findings attained from the general adult population and from university students to specific teacher populations that teach and nurture young children. These deficiencies signify that the extant studies do not sufficiently validate the applicability of the GHQ-12 to early childhood teachers. Psychometric properties of measures must be examined in new populations to ensure that they function in manners similar to the original instrument.

Significance of the Study
Several gaps in the existing scholarship must be addressed when the available research on factor structures of the GHQ-12 is considered. The existing studies assessing the factor structure of the GHQ-12 have yielded inconclusive results. Hence, it is still neces-sary to verify the factor structure of the GHQ-12. It is true that validated mental health measurements are required to screen and investigate the effects of interventions on early childhood teachers; however, the selection of the most appropriate measure for a specific application depends on several factors. Facets to consider could include study sample characteristics, practical issues such as respondent burden, mode of administration, the need for validated language translation and the psychometric properties of the instrument. Psychometric properties of instruments such as the GHQ-12 can vary among different populations and cultural groups [16]. Hence, a systematic assessment of the instrument's psychometric properties is mandated before the instrument is employed and widely used on a specific population. Further, it is important in practical terms to identify whether the psychometric properties of the Korean version of the GHQ-12 are apt for use with early childhood teachers for whom the identification of efficient measures of mental health is especially important.
As concerns increase about early childhood teachers' job-related stress, the need for brief instruments that efficiently evaluate symptoms of mental health disorders also increases. The GHQ-12, then, might be a particularly useful measure with this population. Distinguishing the psychometric properties of the GHQ-12 could inform health professionals with respect to the appropriate design of prevention programs pertaining to mental disorders that target early childhood teachers who potentially suffer, or are already suffering, from psychological distress. In Korea, no information is available on the GHQ-12 s psychometric properties with early childhood teachers. Thus, far, the current study aims to examine the psychometric properties of the GHQ-12 among early childhood teachers through an evaluation of its measurement model validity by CFA and to demonstrate preliminary evidence of convergent and discriminate validity of the GHQ. To date, only one psychometric properties study has assessed the convergent validity of GHQ-12 [40] and evaluated its unique correlations with similar psychiatric instruments such as Patient Health Questionnaire-9 (PHQ-9) [41] and Becks Depression Inventory (BDI) [42]. The current study also used the same measures such as PHQ-9 and BDI to confirm whether the measuring symptoms of psychiatric disorders were complementary rather than duplicative.

Participants and Procedures
Ethical approval to conduct this study was obtained from the Institutional Review Board of Woosong University in Korea (Protocol Code: 1041549-201006-SB-103). Our study employed non-random purposive sampling. Thus, our participants comprised early childhood teachers charged with the nurture and instruction of children aged 0-5, who was recruited from daycare centers. Elements of this population were selected arbitrarily and in accordance with certain characteristics; thus, non-random sampling did not allow the estimation of sampling errors. There is no statistical method of assessing the validity of the results obtained from non-random samples.
After receiving ethical approval, the pilot study was conducted with four teachers from two childcare centers who agreed to participate in the study. First, the principal investigator visited the childcare centers in person to explain the purpose of the study, construction of the study scale and method of response. The responses were recorded using a survey tool provided by Google. Questions were added that concerned the time required to complete the survey, whether any questions were difficult to understand or ambiguous in meaning and whether there was any inconvenience in using the system. It was revealed that both understanding and recognizing the questions were not difficult and that the time required to complete the survey was approximately 20 min.
After the pilot study, the main survey was conducted with teachers working at childcare centers. Data were collected using a free survey tool provided by Google in the form of a web-based drive. In more detail, participants were recruited through online postings on the Korean national early childhood teachers' community website, on which only certified early childhood teachers can access. Postings described the study's purpose and directed those interested to an online-survey link to complete the questionnaire. The participants were informed that participation was voluntary and return of the completed questionnaire was considered as the informed consent. The completed questionnaires were automatically submitted to the researcher. Owing to the possibility of duplicate respondents or the reduction of survey response rate, which is possible when the survey period is too long or too short, the survey period was set to 10 days from April 1, 2020 to April 10, 2020. A message that requested respondents to complete the survey sincerely and emphasized the advantages of anonymity and flexibility of response time was also sent. In total, 230 copies of the questionnaire were collected, of which 225 were utilized for the final analysis; five with unreliable responses were excluded.

Measures
The General Health Questionnaire (GHQ-12) [14] is a self-report measure for detecting psychiatric disorders in the general population within community and non-psychiatric clinical settings. The questionnaire contains 12 items, each scored on a four-point Likert scale from 0 to 3. Thus, the total score ranges from 0 to 36, with higher scores indicating worse conditions. The Korean version, translated and validated by Park et al. [22] was used in this study. Cronbach's α coefficient was 0.86 for the overall GHQ-12 in this study.
The Patient Health Questionnaire (PHQ-9) [41] was used as a measure of depressive symptomatology. The PHQ-9 consists of nine items scored on a four-point Likert scale from 0 to 3, resulting in a total score from 0 to 27, with a higher score reflecting more severe symptoms of depression. The Korean version of the PHQ-9 has been validated and demonstrated to exhibit excellent psychometric properties in Korean adults [43]. In this study, the PHQ's Korean version demonstrated a Cronbach's α coefficient of 0.81.
The Beck Depression Inventory (BDI) [42] was used to assess depressive symptomatology's presence and severity based on the past 2 weeks. It comprises 21 items scored on a four-point Likert scale ranging from 0 to 3. Items are summed to provide a total score ranging from 0 to 63, with higher scores indicating more severe depressive symptoms. This study used the BDI's Korean version, verified and validated by Lee and Song [44]. In the current study, its Cronbach's α coefficient was 0.80.

Statistical Analyses
Collected data were analyzed using IBM SPSS Statistics for Windows (version 23) and AMOS (version 20) (IBM Corp., Armonk, NY, USA). CFA was conducted through structural equation modeling, using robust maximum likelihood estimation to assess varied latent structure models of the GHQ-12 because Mardia's test indicated that our data violated the multivariate normality assumption (Mardia's kurtosis = 104.70, p < 0.001). Models examined were based on results from previous research on the GHQ-12's factor structures, specifically, five competing models. Model 1 is the original one-factor structure hypothesized by Goldberg [14], with all 12 items loaded onto a single factor. Proposed by Andrich and Van Schaubroeck [34], Model 2 is a correlated two-factor structure with six negatively worded items loaded onto one factor and six positively worded items loaded onto another. Model 3 is a unidimensional model with a method factor specifically for the negative items suggested by Hankins [37]. Suggested by Graetz [28], Model 4 is a correlated three-factor model consisting of anxiety and depression (4 items), anhedonia and social dysfunction (6 items) and loss of confidence (2 items). Postulated by Martin [33], Model 5 is also a correlated three-factor model in which three latent variables are represented by cope (4 items), stress (3 items) and depression (5 items). Finally, Model 6 was reported by Worsley and Gribbin [32] who also proposed three factors: anhedonia and sleep disturbance (2 items), social performance (6 items) and loss of confidence (4 items).
To determine whether models differed significantly, chi-square difference tests were used. To evaluate convergent validity, Pearson's r was used to test the associations between the GHQ-12 and criteria instruments (i.e., PHQ-9 and BDI). Convergent validity was also assessed through an assessment of item factor loadings and their statistical significance, followed by an assessment of factor-related average variance extracted (AVE). Convergent validity was indicated by an item factor loading and AVE equal to or greater than 0.50 [49]. Discriminant validity was assessed by adhering to the procedures suggested by Fornell and Larcker [50]. Discriminant validity is assured if the square root of the AVE of each construct is greater than its correlations with any other composite construct in the assessed model. The internal consistency was computed using Cronbach's α and composite reliability (CR) scores for each of the suggested factors of the model. Cronbach's α value above 0.70 and above are generally considered acceptable [51]. CR values between 0.60 to 0.70 are deemed satisfactory; however, the value must be higher than 0.70 at more advanced stages [50]. Table 1 displays the GHQ-12 s overall and individual item scores. A mean score of 21.05 (SD = 5.03)-higher than the cutoff point of 12-was obtained. Items with the highest mean scores-more than 2.30-were 1 and 5. Item 5 was notable for the highest score, indicating that the majority of respondents felt they were under strain. Moreover, separate mean scores for males and females were not calculated because the sample included only nine males.  Table 2 shows competing models' goodness-of-fit indices. Across the whole sample, the overall fit indices of the six-factor models were examined across the entire sample using a variety of fit indices. The results revealed that all two-and three-factor models except for Martin's single-factor and three-factor models were acceptably apt. However, the evaluation accomplished using the stated model fit indices disclosed that Worsley and Gribbin's three-factor model achieved the best fit, demonstrating highly satisfactory suitability across all model fit indices (Figure 1).

Convergent Validity
The relationship between the three GHQ-12 subscales and total GHQ-12, PHQ-9 and BDI scores was obtained through Pearson's correlations. Table 3 displays moderate correlations among total scores of the GHQ-12, PHQ-9 and BDI. GHQ-12 subscales and total scores of the GHQ-12, PHQ-9 and BDI were also moderately correlated. Lastly, the GHQ-12 total score correlated moderately with subscales of anhedonia and sleep disturbance, social performance and loss of confidence. All correlation p values were less than 0.001. Further, convergent validity was satisfactory, with all factor loadings exceeding 0.50. The factor loading of all items was significant, given the range of 0.56-0.82. The AVE of all constructs also surpassed 0.50, indicating sufficient convergent validity (Table 4). Table 3. Correlations between the GHQ-12, its subscales PHQ-9 and BDI. The AIC statistics further confirm the superior fit of Worsley and Gribbin's three-factor model (Model 6), as the AIC is 100.04, which is lower than the rest of the models tested in the study. Moreover, χ 2 difference tests revealed that Model 6 had significantly better fit to data than Model 1 (χ 2 (22)  Overall, results demonstrated that all two-and three-factor models, as well as a model with a method factor excepting those of Goldberg and Martin, have acceptable fit. Nevertheless, evaluation of indices revealed Worsley and Gribbin's three-factor model as the best because it demonstrated highly acceptable fit across all indices.

Convergent Validity
The relationship between the three GHQ-12 subscales and total GHQ-12, PHQ-9 and BDI scores was obtained through Pearson's correlations. Table 3 displays moderate correlations among total scores of the GHQ-12, PHQ-9 and BDI. GHQ-12 subscales and total scores of the GHQ-12, PHQ-9 and BDI were also moderately correlated. Lastly, the GHQ-12 total score correlated moderately with subscales of anhedonia and sleep disturbance, social performance and loss of confidence. All correlation p values were less than 0.001. Further, convergent validity was satisfactory, with all factor loadings exceeding 0.50. The factor loading of all items was significant, given the range of 0.56-0.82. The AVE of all constructs also surpassed 0.50, indicating sufficient convergent validity (Table 4). Table 3. Correlations between the GHQ-12, its subscales PHQ-9 and BDI.   Table 5 exhibits the square roots of AVE indexes for all three subscales. Our results confirm that discriminant validity was achieved as all indexes (diagonal values in bold) were higher than the inner diagonal values representing the correlations among constructs. Hence, the results support the discriminatory validity of the instrument.

Internal Consistency
Cronbach's α ranged between 0.42 (for anhedonia/sleep disturbance) and 0.85 (for social performance). The CR values exceeded the recommended computation of 0.60 for all three subscales (Table 4).

Discussion
This study examined the psychometric properties of the Korean version of the GHQ-12. To the authors' best knowledge, this is the first study in Korea that has attempted to examine the GHQ-12 s factor structure using CFA and its psychometric properties with a sample of early childhood teachers, an occupational group particularly vulnerable to mental health problems. Thus, the current research contributes to past research by examining the structure of the GHQ in another vulnerable occupational group.
Our CFA findings suggested that all models exhibited an RMSEA of less than 0.08. However, our overall results revealed that Andrich and Van Schaubroeck's [34] twofactor model, Hankins' [37] model including an artifactual factor that encompassed all the negative items, Graetz's [28] three-factor model and another three-factor model proposed by Worsley and Gribbin [32] were the only models to evidence a good fit. These three models also revealed acceptable model fit across other indices. Although the GHQ-12 was originally developed as a unidimensional structure, numerous other two-and three-factor structures have been identified and, thus, researchers have reached no consensus regarding its dimensionality or factor structure [15]. In this study, Worsley and Gribbin's [32] threefactor model, which was initially described in a cross-sectional community sample of Australian adults, provided the best fit to data with three factors of anhedonia and sleep disturbance, social performance and loss of confidence. Aloba et al. [15] confirmed this finding with Nigerian adolescents. We also found low to moderate correlations among the three factors, reflecting a low amount of covariance, thus further supporting this three-factor model as best explaining psychological distress in our sample.
As for convergent validity, the Korean version of the GHQ-12 total score showed moderate correlations with the three subscales of anhedonia and sleep disturbance, social performance and loss of confidence, consistent with results from current CFA and implying that its total score can measure general distress in this population. Notably, moderate positive correlations of the three subscales and the PHQ-9 and BDI corroborate these subscales' associations with mental health problems. Although the correlational strength of the total GHQ-12 with its three subscales and the other similar measures were moderate, the directions were all as expected. Therefore, the convergent validity between the GHQ-12, PHQ-9 and BDI was moderate, confirming that GHQ-12 was designed to measure symptoms assessing mental distress and minor psychiatric morbidities. Similar findings have been reported by Martin et al., [40] who also found moderate to strong associations between the total GHQ-12, PHQ-9 and BDI in community-based samples in Germany. The convergent validity of the GHQ-12 was also indicated by adequate factor loadings and acceptable AVE values. For discriminant validity, AVE values for the subscales were higher in relation to r 2 values. Hence, the discriminant validity for each subscale was confirmed.
Cronbach's α coefficient values for this study indicated adequate total internal consistency for the GHQ-12 and for two of the subscales. Other studies involving non-clinical and clinical adult samples in Germany, China, India and Iran have similarly reported Cronbach's α coefficients ranging from good to excellent [18][19][20]23]. In sum, these findings suggest that the GHQ-12 demonstrates satisfactory internal consistency across varied populations and languages. However, only the anhedonia/sleep disturbance subscale showed an internal consistency lower than the recommended value. This finding is aligned with Aloba et al. [15] who also found and replicated the three-factor model developed by Gribben and Worsely. In Aloba et al.'s study, Cronbach's alpha values ranged from 0.60 to 0.69, a level that is deemed unsatisfactory. Another study that discovered a three-factor model, however, did not calculate the Cronbach's alpha of the different subscales because factors 2 and 3 each comprised only three items [29]. The low internal consistency of the neutralizing subscale probably resulted from the fact that the anhedonia and sleep disturbance factor is only comprised of two items. This possibility was tested by computing composite reliability indexes, an action that has been recommended for the generation of better estimates of true reliability in testing subscales than is possible through the coefficient alpha [52]. The estimates of true reliability obtained in our study through CR were, on average, better (larger) than corresponding coefficient alpha values for all the subscales.
Our study has an important implication for assessment and diagnosis of mental health problems among early childhood teachers in Korea. Effective preventive and promotional measures are essential to minimize mental disorders' impact on the individual. Therefore, a valid instrument such as the GHQ-12 can enable clinicians to identify those at increased risk of mental health issues, with early intervention appropriately planned, implemented and evaluated.
Limitations of this study need to be acknowledged. First, the sample size was somewhat small for CFA and that female teachers outnumbered male teachers. Gender imbalance in the early childhood workforce is a longstanding global phenomenon [53]. Extensive research conducted in this domain has discovered male early childhood educators represented only 1-3% of the aggregate of early childhood practitioners in most Western and non-Western countries [53,54]. Korea is no different, with males only denoting 1% of the early childhood teachers [55]. The reason for this gender imbalance may be attributed to the fact that early childcare and education have historically been seen as women's work. The widespread cultural belief that women are more nurturing and caring than men hinders men who may want to make a career in early childhood education [56]. Future evaluations of the GHQ-12 would benefit from larger samples with more male respondents even though women are considerably overrepresented in the teaching profession, especially in early childhood education settings.
Next, early childhood teachers belong to a highly stressed occupational group. Indeed, our participants scored above the cutoff threshold, indicating poor mental health, so these results might not be applicable to the general population or other occupational groups. Importantly, our findings regarding the CFA needs to be interpreted with caution. It should be noted that the alpha was too low for the "anhedonia and sleep" factor as it comprised of only two factors. The elimination of specific factors with low factor loadings or low alpha is a controversial issue because it implies the contemplation of both the positive and negative aspects of reducing the number of items of an established questionnaire [21]. In fact, the removal of items or factors could guarantee that a measure would become more robust and reliable. However, such exclusions could also mean that the newly validated scale cannot be compared to other previously published and currently used versions. Notably, the original 12-item GHQ is the most widely used across different studies and samples despite the potential weaknesses that could arise from the retention of all items of the scale. In addition, it seems apt and relevant for comparative purposes to sustain the same 12-item version. Moreover, the deletion of a specific factor may not represent the correct solution in the case at hand because the three-factor model provided the best fit to the data in reproducing the observed data, as all standardized factor loadings were significant at 0.50 and CR values were excellent for all GHQ-12 scale scores. We believe that there is no appropriate indication of item or factor removal.

Conclusions
The current study addressed a gap in the literature by employing CFA to examine the factorial structure of the Korean version of GHQ-12 with respect to early childhood teachers. The findings of this study confirm that the GHQ-12 is best conceived as a multi-dimensional tool that can assess several distinct aspects of distress rather than a unidimensional or a unitary screening measure. The results of CFA revealed that Worsley and Gribbin's threefactor model offered the best fit to the data. Further, the outcomes of the study confirm the reliability and validity of the Korean version of the GHQ-12 as a tool that can effectively be employed for the assessment of general symptoms of psychological distress in early childhood teachers. It is also worth mentioning the strengths of this study. The current study employed both classical (e.g., based on Cronbach's alpha) and modern (e.g., based on structural equation modeling and CR, an alternative preferred to Cronbach's alpha as a test of convergent validity in a reflective model [57]) methods to evaluate the psychometric properties of the Korean version of the GHQ-12. The convergent and discriminant validity of the GHQ-12 signifies that this scale accurately measured perceived psychological distress in the sample of early childhood teachers. Further, to the best of our knowledge, this study is the first to demonstrate that a three-factor model provided a conceptually acceptable fit to the data in a sample of early childhood teachers in Korea.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.