Validating a Motivational Self-Guide Scale for Language Learners

: The aim of the current study was to develop and validate a new instrument that taps into learners’ self-image as a means of exploring language motivation, which plays a pivotal role in sustaining language learners’ e ﬀ orts. A critical review of the literature revealed that the current measures of the second language (L2) self-guide instruments in language learning motivation research su ﬀ ered from either under-representativeness of the ought-to L2 self others or weak validity of the ideal L2 self own . Since multilingualism has become more salient in foreign language education, it was necessary to develop a measurement that could better reﬂect self-imagery which was both plausible and relevant in foreign language contexts. This study utilized four scales in total that tapped into the targeted latent constructs: ideal L2 self own , ideal L2 self others , ought-to L2 self own , and ought-to L2 self others . Two independent samples recruited from Taiwanese college students were employed in the study. After an item-pool was developed through interviewing and piloting, each subscale was comprised of 4 items, totaling 16 items for formal model testing. The formal model testing involved three phases. Phase I conducted an exploratory factor analysis to explore the possible dimensions using the ﬁrst sample. Phase II proceeded with a series of conﬁrmatory factor analysis (CFA) on the eight hypothesized models using the second sample. Phase III also relied on the second sample and further examined the item ﬁt performance by using the multidimensional Rasch model. The results of formal model testing conﬁrmed the validity and reliability of a 4-factor correlated model, as well as the ﬁt performance of the ﬁnalized scale items, and thus lent strong empirical support to Higgins’s theory regarding the inner structure of future self-guides. It is suggested that the new L2 self-guide scale can be adopted and applied to future L2 and languages other than English motivational research.


Introduction
Language learning motivation has been the target of a large body of research because of the significant role it plays in the process of language learning. There have been a variety of L2 motivational theories proposed over the years since learning an L2 involves not only mastering a repertoire of linguistic codes, but also an appreciation of and identification with L2-related elements such as culture, history, and community. Over the past four decades, L2 motivation theory has evolved from an emphasis on a sociocultural milieu to a focus on the dynamic, cognitively-driven and context-situated self-construction process of a second language [1][2][3]. This paradigm shift highlights a recognition of the important identity-shaping process in L2 learning symbolized by an awareness of learning on a cultural and collective level as well as L2 self-individuality. Perhaps the most seminal breakthrough in L2 motivation studies has been Dörnyei's [3] theory of L2 Motivational Self System (L2MSS). The L2MSS is made up of three main components: ideal L2 self, ought-to L2 self, and the L2 learning experience which jointly explain learners' intended learning efforts and motivated behavior. The ideal L2 self refers to the representation of image one ideally hopes to possess in the future (wishes or aspirations). This includes the desire to close the distance between one's actual self and ideal self which serves as a strong impetus to instigate action. The ought-to L2 self is the representation of attributes that one believes one ought to possess (obligations or duties) in order to meet external expectations or to escape potentially negative consequences. The L2 learning experience concerns motives related to the immediate learning environment and experience (teacher, group and past learning experiences). A principal premise in the L2MSS states that a perceived disparity between a learner's present state and future self-guide acts as a catalyst to reduce the perceived divide and reach the ideal end-state.
Unlike Gardner's [1] emphasis on the importance of integrative attitudes (i.e., integrativeness) the focus of recent research has been shifted to the emerging and evolving nature of identity which is an increasingly relevant construct in a globalized world where "ownership of English does not necessarily rest with a specific community of speakers" ( [4], p. 2). Indeed, with the phenomenon of English as a Lingua Franca (ELF), native speakers are no longer the sole reference group for aspiring linguistic proficiency. Instead, those learners who can contribute to a wider, and therefore more inclusive, linguistic community are cast in a more favorable light as legitimate English users. This updated interpretation of L2 learners' motivation is in keeping with Kanno and Norton [5] who advocate an extended notion of learner community which includes groups of people who may not have easy access to and direct contact with the wider community. The learners could have greater access to each other with a touch of imagination [6]. Kanno and Norton [5] contend that such an imagined community is "no less real than those in which the learners have daily engagement and might even have a stronger impact on their current actions and investment" (p. 242).
Over the past decade, a large body of research has been conducted to empirically validate the tripartite construct (i.e., ideal L2 self, ought-to L2 self, and the L2 learning experience) as an L2 motivational self system across a number of various educational contexts in different countries [7][8][9][10]. This line of research validated future self-guides as powerful and valuable constructs that can represent the motivational make-up off the L2 learners in Hungarian learners [11] and later among Asian EFL learners in China, Japan and Iran [12], Japan [13], Korea [14] and Taiwan [15,16]. Most studies have reached the consensus that the ideal L2 self is a salient predictor of learners' motivated behavior [3,9,10,12,17,18], such as higher degree of willingness to communicate [19], stronger motivational intensity [20], a decrease in learners' anxiety [9], a higher endorsement of international posture [13], a higher level of self-efficacy [21] and better use of self-regulatory strategies [22,23].
By contrast, the potential power of the ought-to L2 self in explaining learners' intended effort has been equivocal across the literature with many of the findings showing little or no meaningful effect when it comes to explaining learners' intended efforts while engaging in language learning (e.g., [7,9,12,14,[24][25][26]). Other researchers argue for the potential role of the ought-to L2 self in encouraging motivation and investment in learning due to the fact that it is very sensitive and responsive to the sociocultural contexts in which learning takes place [15,16,20,27]. For instance, in Asian settings where learners are heavily influenced by Confucianism, which underscores the value of diligence, they might feel strongly obliged to outperform others and excel academically.
These inconclusive findings may be derived from factors related to the applicability of the model among various groups or contexts, or diverse criterion measures (i.e., intended effort or achievement) used in various studies [28]. The inconsistent predictive power of the ought-to L2 self may be partly attributed to the fact that the original measurement items are generic in nature. Dörnyei [29] and Oyserman and Markus [30] contend that imagined self needs to be specific and elaborate. Therefore, it is necessary to develop a measurement that is better able to reflect self-concept which is both plausible and relevant to language learners. Another reason that may account for these inconsistent findings is the inadequate operationalization of the construct. In fact, many studies show unsatisfactory reliability of the scale of ought-to L2 self [11,24,25]. According to Higgins's [31] self-discrepancy theory, both domains of the self (ideal; ought to) and standpoint on the self (own; other) are important when conceptualizing different types of discrepancies between self-state representations. However, Teimouri [32] contends that in the original item development process, Dörnyei [3] only addressed two aspects of self: ideal L2 self own and ought-to L2 self others ; the other two dimensions of self-guides (i.e., ideal L2 self others and ought-to L2 self own ) were not dealt with. Teimouri [32] sought to create and test an improved model based on Higgins's [31] self-discrepancy theory. The factor analysis yielded only three factors: ideal L2 self, ought-to L2 self own , and ought-to L2 self others . Surprisingly, with regard to ideal L2 self, the differentiation between own and other did not gain empirical support. Teimouri [32] attributed the findings to the ideal that L2 self tends to be greatly internalized, and as a consequence, it is not easily divided into constructs that are more akin to one's own as opposed to those derived from significant others. Another attempt at revising the self-guides was made by Papi et al. [23] who tested a modified version of the self-guides specified in an L2 motivational self system [3]. The self-guides were reconstructed based on self-discrepancy theory [31] by taking into account two central dimensions underlying the different types of self-state representations: domains and standpoints. CFA was adopted in their study to compare the fit between the two-factor model by Dörnyei's [3] L2MSS (i.e., ideal L2 self and ought-to L2 self) and Teimouri's [32] three-and 4-factor models. The findings suggest that the 4-factor model has a better fit compared to alternative models. Importantly, the results of multiple regression showed that ought-to L2 self own was found to be the strongest predictor of motivated behavior followed by ideal L2 self own , ought L2 self others , and ideal L2 self others . The implication of this finding is important in that it supports the importance of delineating and fine-tuning a more sophisticated and rigorous model that addresses different types of discrepancies between self-state representations in the L2MSS model.
Similarly, according to Csizér [33], there are possible sub-dimensions which are fundamentally distinct due to various sources of expectations. In other words, despite the fact that the current conceptualization of the ought-to L2 self can capture some degree of the motivation which propels learners to act in a way that meets others' expectations or to avoid possible negative consequences, she likewise argued for the merits in separating ought-to L2 self into two aspects that properly reflect "the differences regarding the locus of pressures and perceived obligations" (p. 77). Papi et al. [23] also pointed out a lack of attention paid to addressing the potential regulatory differences of the L2MSS. These researchers went a step further to argue that some of the reservations regarding the validity of ought-to L2 self may be attributed to the failure in distinguishing own and other standpoints in the conceptualization of the L2 self construct. Another rationale for the merit of a four-dimension self-based model which incorporates both own and other perspectives is that it offers a way to accommodate self-determination theory [34]. This theory focuses on the motivational capacity of different types of motivations based on the extent to which they have been internalized and regulated by external forces.
Indeed, considering the aforementioned problematic findings pertaining to the validity of the L2 self construct, it is surprising that only a few studies were conducted to test the dimensionality of the L2 constructs [23,32]. As seen from the review thus far, related findings suggest that the L2 self-guide construct is in want of revision to better capture the underlying tenets of ought-to L2 self, particularly with regard to ought-to L2 self others. Due to the fact that few empirical studies validate the factorial structure of self-guides based on Higgins's [31] theory, it can be seen that there is a pressing need to fill the void of the research in this regard. Therefore, to address this research gap, by adopting both CFA and multidimensional Rasch modeling, the aim of the current investigation is to develop and validate a four-component self-guide model to gain more insights into the internal factorial structure of L2 self construct.

Motivational Self System
Borrowing the Possible Selves theory from social psychology [31,35], Dörnyei [3] adopted this theory to propose the L2MSS, made up of three main dimensions: ideal L2 self, ought-to L2 self and the L2 learning experience. The ideal L2 self refers to the kind of desirable qualities one aspires to embody in the future. The ought-to L2 self is one's projection of the attributes that others believes the one should possess to avoid unfavorable circumstances. The L2 learning experience has to do with the immediate contextual environment (e.g., the effect of teachers, peers and the course). The L2MSS was proposed as an integrative framework that embodies the self theory rooted in social psychology and classic motivational notions typically embraced in the field of second language acquisition.
In the past decade, a growing number of studies focused on systematically and empirically validating this tripartite construct in various learning contexts across different learner groups in a wide array of contexts. [7][8][9][10]. This line of research provided empirical data suggesting that future self-guides are potentially relevant and robust constructs which can be successfully transferred to L2 learning contexts not only in western settings where the theoretical framework originated [11], but also in Asian contexts such as China and Japan [12,13]. While the majority of studies have found the ideal L2 self and L2 learning experience as predictors of learners' intended efforts toward learning English, the ought-to L2 self was revealed as playing a less salient role in explaining motivated learning behaviors (e.g., [7,9,12,25]).

The Problems of Current Measures of L2 Self
Although it is claimed that possible selves have the potential of initiating, sustaining and regulating motivated behavior, it is contended that such an imagined self needs to be specific and vividly elaborated [3,30]. That is, the more precise and well-defined one's L2 self is, the more motivational force it is likely to carry. According to both Markus and Nurius's [35] Possible Selves Theory and Higgins's [31] self-discrepancy theory, individuals who have more elaborated, specific and plausible plans in their mind as to what they would like to become are more likely to invest effort in striving towards their goals. Similarly, in the realm of social cognition research, it is asserted that the existence of desired possible future selves do not necessarily exhibit motivational power. Instead, in order to lead to the motivated behavior, certain conditions need to be met. For instance, it is not enough for learners to have a vivid image of their desirable future self; future selves need to be within an individual's self-concept and reasonably attainable [36]. In other words, when trying to develop measurements to elicit students' responses with regard to their projected self-guides, it is necessary to tap into their here-and-now self and examine the extent to which it is prominent in their working concept. To this end, the current work finds it important to take into account learners' everyday reality when designing scale items. In reviewing the literature, Dörnyei and Chan's [37] revised items were found to be inspirational as they also found it important to develop statements such as "I can imagine myself writing e-mails in these languages fluently" and "I can imagine myself participating in a debate in these languages" to reflect the language tasks learners may engage in their learning context. Nonetheless, it is fair to argue that the existing measurement of L2 self has failed to address self-concept in a distinct and realistic manner. For instance, a review of the instruments used to measure L2 self reveals that the questionnaire targeted at gauging the construct of the ideal L2 self typically asks questions such as "The things I want to do in the future require me to speak English", "If my dreams come true, I will speak English fluently in the future" or "Whenever I think about my future, it is important that I use English" [12]. One can argue that these items are at best addressing the future-pointed nature of self-guides; however, the vagueness of the items may not adequately mirror the contextual and educational reality facing contemporary language learners on a day-to-day basis. In other words, most of the items do not suitably reflect the tasks that EFL language learners need to tackle every day. Similarly, Dörnyei [29] also suggests that some conditions should be in place in order for learners to conceive and nurture self-images. That is to say, future self-images should be reinvigorated on a regular basis. Despite theoretical underpinnings and guidance, it is argued that most of the item statements used to measure ideal L2 self in the literature (e.g., [12,32,35]) suffer from validity weaknesses. That is, they focus on measuring EFL learners' self-perceived competence in creating self-images (i.e., [32] "I can imagine myself . . . ") rather than measuring their proclivity toward becoming the created ideal or ought selves. To elaborate, that an individual can imagine does not necessarily mean that he/she would reveal proactiveness toward the creation and becoming of ideal self-images. The statement of "I can imagine" is nearly isomorphic with "I can do something". Most of the phrasing of previous measures of L2 self tend to use the wording "I can imagine" [12,23,32]. This phrasing can be misleading because if a learner "can imagine" himself conversing with foreigners in the future someday, this may have the connotations and undertones indicating he firmly believes he has the ability to do that, which is conceptually overlapped with the self-construct of self-efficacy. As Al-Hoorie [28] put it, "If a learner cannot imagine herself mastering English someday, this could additionally mean that she does not believe she can do that (self-efficacy), that she does not want to do that (value of the activity), that she experiences a complete absence of motivation (amotivation), that she does not need to do that (e.g., she has already mastered English), or any other interpretations different learners might conjure up" (p. 736). As the aforementioned ambiguity of the phrasing opens itself to multiple plausible and divergent interpretations, there is an urgent need to develop a refined measure that is better able to reflect the discrepancies involving current and future self conceptions, which is the fundamental underpinning of the L2MSS conceptualization.
Furthermore, the items that measure the ought-to L2 self may not achieve sufficient content validity due to the fact that the conceptual interpretation of the ought-to L2 self has not been borne out of the operationalization of the scale. For instance, an item such as "Studying English is important to me because an educated person is supposed to be able to speak English" [12] is in relation to the presence or absence of positive outcomes (i.e., education) and is therefore more pertinent to the ideal L2 self than the ought-to L2 self. Lamb [24] argues that the wordings used in questionnaires aiming to capture the ought-to L2 self may likely lead to conceptual ambiguity. To elaborate, the item such as "It will have a negative impact on my life if I don't learn English" used in Taguchi et al.'s [12] study, does not explicitly specify the standpoint of others. In a similar vein, an examination of the psychometric property of the existing L2 self scales further reveals the inconsistent reliability performance of the scales measuring the ought-to L2 self, with some research reporting low Cronbach alpha values [11,24,25]. Such a discrepancy may arise out of the theorizing mismatch between the items developed and the theories on which they are based. Therefore, it appears that the ought-to L2 self may fail to reach homogeneity regarding its validity evidence. For this reason, the current work argues that it is imperative to develop a new measurement system to tap into EFL learners' L2 self-images in both the ideal and ought-to self constructs.
Besides the concerns of content validity over the phrasing of items, the concerns of construct validity with regards to the overall factorial structure of the L2 self construct have also been raised. Typically, Higgins's [31] self-discrepancy theory consists of four types of future self-guides: (1) ideal self own , which stands for the attributes one aspires to possess; (2) ideal self others , which represents attributes one believes others would like him or her to possess; (3) ought self own , which refers to attributes one ought to possess; and (4) ought self others , which concerns attributes one ought to possess because significant others expect them to do so. In the L2 motivational self system, Dörnyei [3] only deals with ideal L2 self own and ought-to L2 self others , while the other two future self-guides, namely ideal L2 self others and ought-to L2 self own , do not even enter the equation. One can argue that it is likely people might confuse one's ought-to L2 self with the ideal L2 self that others have imposed on them. In addition, it may be challenging to distinguish between the learner's internalized ought-to self and his or her ideal L2 self. Indeed, Kim's [38] study shows that the ought-to L2 self may need to be internalized to a certain extent in order for it to play a paramount role in determining motivated behavior. In other words, L2 obligations could display various motivational impacts depending on how they are deeply and internally embodied in the individual learners.
Recently, realizing the inadequacy of the present measurement of the L2 self, Dörnyei and Chan [37] argued succinctly that " . . . in light of the ambiguities that have surfaced with regard to the ought-to L2 self, it would have been better to apply more elaborate scales targeting different types of external pressures separately-as has been done, for example, in the instrument used by [12]-instead of using a single ought-to self scale" (p. 456). As implied, a dichotomous L2 self model has become incomplete in capturing the full scope of one's future self-guides. To the authors' knowledge, there are currently only two studies tailored to examine the potential dimensionality about the L2 self-construct ( [23,32]). This is somewhat unexpected considering the controversy surrounding the validity of the L2 self construct. Teimouri [32] has put forward a revised model of the L2 self through principal component analysis and the results suggested the L2 selves consist of three components: a unitary ideal L2 self, ought-to L2 self own , and ought-to L2 self others . That is, the ideal L2 self turned out to be the only single factor extracted instead of falling into two substrates (i.e., ideal L2 self own and ideal L2 self others ) as theoretically presumed.
Unconvinced by Teimouri's [32] trichotomous model, Papi et al. [23] tested a modified version of self-guides based on the L2 motivational self system wherein the two dimensions (i.e., ideal L2 self and ought-to L2 self) were bifurcated by two contrasting expansions (i.e., own and others' perspectives) following the theorizing of Higgins [31]. Although the CFA supports the overall fitness of the 2 × 2 model proposed (i.e., ideal L2 self own , ideal L2 self others , ought-to L2 self own , and ought-to L2 self others ), further scrutiny suggests several instrumental deficits exist in their study. First, although ideal L2 self own , ideal L2 self others , and ought-to L2 self own all had 4 items, ought-to L2 self others had only 2 items. Doubts, hence, may be cast down on the representativeness regarding the construct of ought-to L2 self others . Second, the overall scale may also possibly suffer from insufficient convergent and discriminant validity required for supporting the four distinct yet still correlated substrates. For instance, ought-to L2 self others correlated more strongly with its counterpart, ideal L2 self others (r = 0.72), than with its twin construct, ought-to L2 self own (r = 0.62). This indicates a lack of discriminant validity between ideal L2 self others and ought-to L2 self others . Third, unlike the high correlation observed with ideal L2 self others , the intercorrelation between ought-to L2 self others and ideal L2 self own was very marginal (r = 0.10). Furthermore, the intercorrelation between ought-to L2 self own and ideal L2 self own was also very marginal (r = 0.11). That is, the mutual variance explained between ideal L2 self own and the two ought-to L2 self-constructs was nearly zero, suggesting ideal L2 self own should not be included as a substrate of the overall operationalized construct of the L2 self model. In actuality, this contradicts the theory as proposed by Higgins [31] and implies the viability of a three-factor model of the L2 self without inclusion of ideal L2 self own substrate. This against-theory outcome may be attributed to the adoption of the 'can do' phrasing uniquely applied to the ideal L2 self own construct, as argued above.
Clearly, a critical review of the related literature suggests that the current measures of the L2 self instrument suffers from either under-representativeness of ought-to L2 self others or weak representativeness like ideal L2 self own . In particular, the extent to which the quadro-component construct including ideal L2 self own , ideal L2 self others , ought-to L2 self own , and ought-to L2 self others can be upheld in a foreign language context has remained largely unknown and awaits empirical verification. Hence, the current research project is aimed at examining the extent to which a quadro-component measurement model of the L2 self can be deemed valid via a methodological cross-check through confirmatory factory analysis and multidimensional Rasch modeling.

Participants
In total, two independent samples were collected for the current research investigation: One for phase I of the study and the other for phases II and III. The first sample was comprised of 362 Taiwanese Sustainability 2020, 12, 6468 7 of 20 college EFL learners, and the second sample included 528 participants of the same educational background. They were recruited from intact classes and, at the time of the survey, were taking a required freshman English course that placed more emphasis on reading and listening comprehension skills than on speaking and writing skills. All participants were between 18-20 years old and had learned English during their junior and senior high school years. None of them were English majors. Prior to attending university, they learned English in school as a general subject (like math or Chinese), primarily because English ability is tested on almost all university entrance examinations in Taiwan.

Phase I: Exploratory Factor Analysis (EFA) and Reliability Analysis
This study in total involved four scales to tap into the targeted latent constructs: ideal L2 self own , ideal L2 self others , ought-to L2 self own , and ought-to L2 self others . The generation of the four-dimension items was, in principle, referred to two distinct sources: (1) the typical item phrasings adopted by the prior empirical studies, and (2) a focus group interview of seven college learners who had English learning experiences of more than 10 years. Regarding the first source, we essentially adopted four types of item phrasing formats for the corresponding L2 self constructs: (1) "I imagine a day . . . ", (2) "I imagine myself . . . ", (3) "I think I should be able to . . . ", and (4) "People around me think that . . . " [12]. As for the second source, the participants of the focus group were interviewed with the four questions as follows: (1) "In what ways do you think that learning English can change your view of the world?", (2) "In what ways do you think that learning English can help you connect to the world?", (3) "Have you experienced some sort of pressure or expectation from your friends, teachers or family members regarding English learning? If yes, would you please share some stories with us?", and (4) "Now, let's talk about yourself. Have you set up any personal goals or aims for yourself in English learning? If yes, what are they?" Then, the creation of sample items was fundamentally based on the genuine verbal quotes extracted from the interview data so that the content validity of the L2 self scale could be ideally fulfilled. That is, the items created bore a direct link to the language tasks that may be conducted by language learners in general language learning settings, and thus pointed to the practical and functional utility associated with authentic language use. The piloting pool in total comprised 22 items, and were categorized into a four-component factor structure as theorized by Higgins [31]: ideal L2 self own (5 items), ideal L2 self others (6 items), ought-to L2 self own (5 items), and ought-to L2 self others (6 items). However, the dimensional space underlying the 22 items still needs to be empirically explored and confirmed. To investigate the possible dimensions behind the 22 items, the items were then subjected to Principle Axis Factoring (PAF) with Varimax rotation. Psych package was employed to execute the EFA. A check of the locations and distributions of rotated factor loadings associated with the 22 items revealed that all of the descriptors could answer to four latent variables with eigenvalues greater than 1. However, six items were removed due to their low factor loadings (<0.3).
To verify the results of EFA, parallel analysis was further implemented. As indicated in Figure 1, the solid line indicates the eigenvalues from the actual data, whereas the dotted line refers to the eigenvalues from the simulated data generated from the R software. The dashed line represents the eigenvalues obtained from the resampled data. Figure 1 clearly shows that the dotted and dashed lines completely overlapped, which suggested that both the simulated and resampled data pointed to the same results. According to Figure 1, the number of factors (denoted by triangle) above both the dotted and dashed lines is four. In sum, the results of both PAF and parallel analysis suggested that a four-component factorial structure could be achieved.
Ideal L2 self own consisted of four items (α = 0.82, sample item: I imagine a day I blog in English to share my thoughts with friends abroad.); ideal L2 self others also comprised of four items (α = 0.83, sample item: I imagine a day that friends around me will admire me because I converse in English with international friends online.); four items were conceptualized respectively for ought-to L2 self own (α = 0.86, sample item: I think that I should be able to easily converse with others in English and for ought-to L2 self others (α = 0.87, sample items: People around me think that I should blog in English to share my thoughts with international community online.). Please refer to Appendix A for the specific items of each L2 self measure. Ideal L2 selfown consisted of four items (α = 0.82, sample item: I imagine a day I blog in English to share my thoughts with friends abroad.); ideal L2 selfothers also comprised of four items (α = 0.83, sample item: I imagine a day that friends around me will admire me because I converse in English with international friends online.); four items were conceptualized respectively for ought-to L2 selfown (α = 0.86, sample item: I think that I should be able to easily converse with others in English and for ought-to L2 selfothers (α = 0.87, sample items: People around me think that I should blog in English to share my thoughts with international community online.). Please refer to Appendix A for the specific items of each L2 self measure.

Phase II: Confirmatory Factor Analysis
To validate the hypothesized quadro-component CFA model, confirmatory factor analysis was performed to test the extent to which the implied model fit the empirical model. LISREL 9.0 was used to implement the analysis. Several criteria were considered for the fit analysis. These included the chi-square difference test, Comparative Fit Index (CFI), Normed Fit Index (NFI), and Non-Normed Fit Index (NNFI), and RMSEA.
To validate the viability of the quadro-component CFA model, seven other competing models were further conceptualized as competing models. The quadro-component CFA model was operationalized as Modelg, whereas Modela, Modelb, Modelc, Modeld, Modele, Modelf, and Modelh were referred to as Modelg's competing models. They are further described in detail in the following: a) Modela = 1-factor model, in which all 16 items were assumed to load on the same factor. b) Modelb = 2-factor modelUncorr., in which the items of the two types of ideal L2 self and the items of the two types of ought-to L2 self were assumed to load on the same factor, respectively, but no correlation was assumed between the two factors. c) Modelc = 2-factor modelCorr., in which the modeling strategy is the same as that of d) Modelb except that the correlation between the two factors was allowed to be freely estimated.
The formulations of Modelb and Modelc essentially operationalized Dörnyei's [3] dichotomous classification of L2 self-images. e) Modeld = 3-factor modelUncorr., in which the items of ideal L2 self were assumed to load on only one factor, whereas the items of the ought-to L2 self were assumed to load on two different factors (i.e., ought-to L2 selfown and ought-to L2 selfothers). Besides, the three factors were also assumed to be mutually independent.

Phase II: Confirmatory Factor Analysis
To validate the hypothesized quadro-component CFA model, confirmatory factor analysis was performed to test the extent to which the implied model fit the empirical model. LISREL 9.0 was used to implement the analysis. Several criteria were considered for the fit analysis. These included the chi-square difference test, Comparative Fit Index (CFI), Normed Fit Index (NFI), and Non-Normed Fit Index (NNFI), and RMSEA.
To validate the viability of the quadro-component CFA model, seven other competing models were further conceptualized as competing models. The quadro-component CFA model was operationalized as Model g , whereas Model a , Model b , Model c , Model d , Model e , Model f , and Model h were referred to as Model g 's competing models. They are further described in detail in the following: (a) Model a = 1-factor model, in which all 16 items were assumed to load on the same factor. (b) Model b = 2-factor model Uncorr. , in which the items of the two types of ideal L2 self and the items of the two types of ought-to L2 self were assumed to load on the same factor, respectively, but no correlation was assumed between the two factors. (c) Model c = 2-factor model Corr. , in which the modeling strategy is the same as that of Model b except that the correlation between the two factors was allowed to be freely estimated. The formulations of Model b and Model c essentially operationalized Dörnyei's [3] dichotomous classification of L2 self-images. (d) Model d = 3-factor model Uncorr. , in which the items of ideal L2 self were assumed to load on only one factor, whereas the items of the ought-to L2 self were assumed to load on two different factors (i.e., ought-to L2 self own and ought-to L2 self others ). Besides, the three factors were also assumed to be mutually independent. (e) Model e = 3-factor model Corr. , the modeling strategy is the same as that of Model d except that the correlations among the three factors were allowed to be freely estimated. The formulations of Model d and Model e were based on the empirical findings of Teimouri [32]. (f) Model f = 4-factor model Uncorr. , in which the four types of L2 self constructs were independently modeled, and each factor had four items loaded on it. (g) Model g = 4-Factor Model Corr. , in which the mutual relationships among the four L2 self-factors are released to be freely estimated. The formulations of Model f and Model g were based on the theorizing of Higgins [31].
(h) Model h = 2nd-order 4-Factor Model Corr. , in which the modeling strategy is the same as that of Model h but the correlation between the two superordinate factors was freed to be estimated. Table 1 reports the means and standard deviations of the model variables as well as their inter-correlations. As a whole, the participants were more inclined to develop ideal L2 self own (M = 18.77, SD = 4.74) and ideal L2 self others (M = 18.23, SD = 5.03) images than ought-to L2 self own (M = 15.68, SD = 4.76) and ought-to L2 self others (M = 14.64, SD = 4.74) images. Regarding the relationships between the four types of L2 self-images, it can be seen that all the correlation coefficients reached statistical significance. The strongest relationship was detected between ideal L2 self own and ideal L2 self others (r = 0.78, p < 0.01). Similarly, ought-to L2 self own and ought-to L2 self others were also strongly correlated (r = 0.7 2, p < 0.01). Additionally, various degrees of moderate relationships were further detected among the four types of L2 self-measures. Notably, ideal L2 self others bore a stronger relationship with ought-to L2 self others (r = 0.58, p < 0.01) than with ought-to L2 self own (r = 0.48, p < 0.01), whereas ideal L2 self own was more associated with ought-to L2 self own (r = 0.52, p < 0.01) than with ought-to L2 self others (r = 0.31, p < 0.01).  However, although Model e fit the empirical data better than its counterpart (i.e., Model d ), this did not necessarily mean that a 3-factor construct could ideally represent the inner factorial structure of L2 self. By way of testing Model f and Model g , the results showed that Model g (4-factor correlated model) had a much better model fit performance (χ 2 = 364.02, df = 98, p < 0.001, NFI = 0.97, NNFI = 0. → Model c →Model d →Model e → Model f → Model g ). Thus, a multidimensional, mutually-correlated 4-factor structure of L2 self could be empirically supported.
Finally, on the basis of Model g , we implemented Model h to check whether a higher order structure of L2 self could be empirically supported. The results showed that the model fit performance of Model h (χ 2 = 370.45, df = 99, p < 0.001; NFI = 0.97, NNFI = 0.97, CFI = 0.98, RMSEA = 0.081, [90% CI: 0.071, 0.090]) was nearly identical to that of Model g except for its slightly higher χ 2 values. Nevertheless, by way of implementing a χ 2 diff. test to allow Model h to compete against Model g , it was found that Model g was still more parsimonious than Model h (χ 2 diff. = 6.43, df diff. = 1, p < 0.01), suggesting that a simpler, 1st-order factorial structure significantly differed from a more complex, 2nd-order one.
As shown in the outcome of Model g (4-Factor Model Corr. ), it could be seen that after the measurement errors of the items were modeled, the strongest relationship was again found between ideal L2 self own and ideal L2 self others (r = 0.86, p < 0.001). Likewise, ideal L2 self own was found to correlate with both ought-to L2 self own and ought-to L2 self others in an identical manner (r = 0.53, p < 0.001). However, ideal L2 self others was found to correlate more with ought-to L2 self others (r = 0.5 8, p < 0.001) than with ought-to L2 self own (r = 0.5 0, p < 0.0 01). Furthermore, the relationship between ought-to L2self own and ought-to L2 self others in the CFA analysis (r = 0.72, p < 0.001) was the same as that in the raw data analysis as reported in Table 1 best model fit performance among the first seven competing models (Modela  Modelb  Modelc Modeld Modele  Modelf  Modelg). Thus, a multidimensional, mutually-correlated 4-factor structure of L2 self could be empirically supported.
As shown in the outcome of Modelg (4-Factor ModelCorr.), it could be seen that after the measurement errors of the items were modeled, the strongest relationship was again found between ideal L2 selfown and ideal L2 selfothers (r = 0.86, p < 0.001). Likewise, ideal L2 selfown was found to correlate with both ought-to L2 selfown and ought-to L2 selfothers in an identical manner (r = 0.53, p< 0.001). However, ideal L2 selfothers was found to correlate more with ought-to L2 selfothers (r = 0.5 8, p< 0.001) than with ought-to L2 selfown (r = 0.5 0, p < 0.0 01). Furthermore, the relationship between ought-to L2selfown and ought-to L2 selfothers in the CFA analysis (r = 0.72, p < 0.001) was the same as that in the raw data analysis as reported in Table 1. Evidently, on top of the support of construct validity, the 4factor ModelCorr. further demonstrated not only a strong degree of convergent validity, but also a clear level of discriminant validity among the four types of L2 self-measures. Finally, the factor loadings of all 16 items varied between 0.90 and 0.70. This finding suggested that the items developed for the four types of L2 self-measures were indicative of strong empirical validity of the specific L2 self construct they represented. The outcome model figure (Modelg, 4-factor ModelCorr.) is illustrated in Figure 2, whereas the other rival models are shown in Appendix B.

Phase III: Multidimensional Rasch Analysis
After the inner factorial structure was empirically confirmed, we then examined the item fit performance on the basis of the 4-factor correlated model. To achieve this, item parameter fits were calculated using the multidimensional Rasch model. The weighted MNSQ values served as item fit indices for evaluating the adequacy of the multidimensional model. The software ConQuest 3.0 was used to generate item fit indices for all 16 items.
As the previous literature [39,40] has suggested, the performance of item fit was assessed by checking the value of mean square fit statistics, ideally ranging between 0.7 and 1.3. As reported in Table 2, the weighted MNSQ fit values of all the 16 final items fell within this acceptable range, indicating that these items all showed good fit. Furthermore, we used category characteristic curves to examine whether the items fit the multidimensional Rasch measurement model. Category characteristic curves illustrate the anticipated pattern of the probability for a person to endorse a response category, with higher person ability or latent trait levels entailing higher probabilities of endorsing a higher-level response category. Since a total of six response categories (Absolutely Not True of Me-Not True of Me-Slightly Not True of Me-Slightly True of Me-True of Me-Very True of Me) were embedded in every scale item, every item could ideally have five ordered thresholds as signified by the Delta points along the horizontal axis for latent trait logits. The category characteristic curves of Item 6 altogether serve as an example of category characteristic curves bearing five ordered thresholds as well as six arc-shaped lines (see Figure 3). In Figure 3, the curve of response category 1 showed that a person with an ideal L2 selfothers trait level of -5.0 logits has about a 0.97 probability of selecting the 'absolutely not true of me' category, whereas a person with a trait level of 0.0 logits had a relatively much lower probability (0.03) of selecting the same category in Item 6. As for the curve of response category 4, a person with an ideal L2 selfothers trait level of about -2.7 logits had a near zero probability of selecting the 'slightly true of me' category, in contrast with a person of a trait level of near 0.5 logits having an approximately 0.4 probability of selecting the same category.
Furthermore, the curve of response category 6 displayed that a person with an ideal L2 selfothers In Figure 3, the curve of response category 1 showed that a person with an ideal L2 self others trait level of −5.0 logits has about a 0.97 probability of selecting the 'absolutely not true of me' category, whereas a person with a trait level of 0.0 logits had a relatively much lower probability (0.03) of selecting the same category in Item 6. As for the curve of response category 4, a person with an ideal L2 self others trait level of about −2.7 logits had a near zero probability of selecting the 'slightly true of me' category, in contrast with a person of a trait level of near 0.5 logits having an approximately 0.4 probability of selecting the same category.
Furthermore, the curve of response category 6 displayed that a person with an ideal L2 self others trait level of −0.6 logits has a probability near zero of selecting the 'very true of me' category, while a person with a trait level of 2.0 logits had a probability of approximately 0.58 of selecting the same category. In sum, Figure 3 illustrates that the respondents could justifiably and consistently differentiate the response categories as specified by the scale. Figure 4 further displays the item information curves based on the 4-factor multidimensional Rasch model analysis. It was shown that all the curves peaked above 1, suggesting that all 16 items were informative in that they were representative of the corresponding dimensions to which they belong. This result could be triangulated with the high factor loadings as revealed in the 4-factor correlated CFA model. In Figure 3, the curve of response category 1 showed that a person with an ideal L2 selfothers trait level of -5.0 logits has about a 0.97 probability of selecting the 'absolutely not true of me' category, whereas a person with a trait level of 0.0 logits had a relatively much lower probability (0.03) of selecting the same category in Item 6. As for the curve of response category 4, a person with an ideal L2 selfothers trait level of about -2.7 logits had a near zero probability of selecting the 'slightly true of me' category, in contrast with a person of a trait level of near 0.5 logits having an approximately 0.4 probability of selecting the same category.
Furthermore, the curve of response category 6 displayed that a person with an ideal L2 selfothers trait level of -0.6 logits has a probability near zero of selecting the 'very true of me' category, while a person with a trait level of 2.0 logits had a probability of approximately 0.58 of selecting the same category. In sum, Figure 3 illustrates that the respondents could justifiably and consistently differentiate the response categories as specified by the scale. Figure 4 further displays the item information curves based on the 4-factor multidimensional Rasch model analysis. It was shown that all the curves peaked above 1, suggesting that all 16 items were informative in that they were representative of the corresponding dimensions to which they belong. This result could be triangulated with the high factor loadings as revealed in the 4-factor correlated CFA model. On the basis of the four-dimension analysis, Figure 5 illustrates that, regardless of which latent trait level, ideal L2 selfown was the most likely to be configured by the participants of the study, On the basis of the four-dimension analysis, Figure 5 illustrates that, regardless of which latent trait level, ideal L2 self own was the most likely to be configured by the participants of the study, followed by ideal L2 self others , then by ought-to self own , and the least by ought-to self others. This finding suggests that the two ideal L2 self-guides (blue curve and red curve) may carry more weight than the two ought-to self-guides (green curve and purple curve) in leading L2 learners to exhibit a higher degree of intended effort regarding L2 learning. Furthermore, by way of comparison, the two own-driven self-guides (blue curve and green curve) were more likely to be endorsed than their counterparts which were typically driven by significant others (red curve and purple curve). followed by ideal L2 selfothers, then by ought-to selfown, and the least by ought-to selfothers. This finding suggests that the two ideal L2 self-guides (blue curve and red curve) may carry more weight than the two ought-to self-guides (green curve and purple curve) in leading L2 learners to exhibit a higher degree of intended effort regarding L2 learning. Furthermore, by way of comparison, the two owndriven self-guides (blue curve and green curve) were more likely to be endorsed than their counterparts which were typically driven by significant others (red curve and purple curve).

Discussion
Following Higgins's [31] self-discrepancy theory, which proposes four types of future selfguides (ideal selfown, ideal selfothers, ought selfown, and ought selfothers) the current study develops and validates a new instrument that possesses robust psychometric properties in measuring learners' L2 self-guides. Empirically, the outcome of the study complements the L2 motivational self system proposed by Dörnyei [3], which partly deals with ideal selfown and ought-to selfothers. Although the results of the study show that Teimouri's [32] 3-factor correlated model (Modele) can be empirically well-fitted and its model fit performance is better than the 2-factor correlated model (Modelc), the completeness of theoretical meanings of Modele is not as thorough and balanced as a 4-factor correlated model [23]. If Modele is taken, then ideal L2 selfown would not have its counterpart, ideal L2 selfothers, as is the case in ought-to L2 selfown via an χ 2 diff. test. Despite the promising model fit performance of the 3-factor correlated model, the 4-factor correlated model was still more promising, and this suggests that ideal L2 selfothers is a valid component of the L2 self construct. To be specific, the present study contributes to the literature by reconceptualizing the self-guides in the L2MSS based on Higgins's [31] theory of self-discrepancy. The bifurcations of two distinct standpoints (i.e., own and other) within the dimensions of both the ideal L2 self and the ought-to L2 self has been empirically validated to be viable and meaningful.
By evidence, the results of the study confirm the validity and reliability of a 4-factor correlated model through a series of rigorous model comparisons with the other seven competing models, and thus lend strong empirical support to Higgins's [31] theorizing regarding the inner structure of future self-guides. As shown in the figure of Modelg, both convergent validity and discriminant validity are obtained in the sense that the relationships between the two ideal L2 self-guides and between the two ought-to L2 self-guides are quite high. In fact, these are comparatively higher than the crossrelationships between the two ideal L2 self-guides and the two ought-to L2 self-guides. Evidently, the attainment of convergent and discriminant validity is a significant psychometric improvement over Papi et al.'s [23] study in which the ought-to L2 selfother and ideal L2 selfown measures were only loosely operationalized. Similarly, because the factor loadings of all 16 items went above 0.70, this further demonstrates clear evidence for the item validity regarding the representativeness of the items specifically conceptualized for the four types of L2 self measures, respectively. In particular,

Discussion
Following Higgins's [31] self-discrepancy theory, which proposes four types of future self-guides (ideal self own , ideal self others , ought self own , and ought self others ) the current study develops and validates a new instrument that possesses robust psychometric properties in measuring learners' L2 self-guides. Empirically, the outcome of the study complements the L2 motivational self system proposed by Dörnyei [3], which partly deals with ideal self own and ought-to self others . Although the results of the study show that Teimouri's [32] 3-factor correlated model (Model e ) can be empirically well-fitted and its model fit performance is better than the 2-factor correlated model (Model c ), the completeness of theoretical meanings of Model e is not as thorough and balanced as a 4-factor correlated model [23]. If Model e is taken, then ideal L2 self own would not have its counterpart, ideal L2 self others , as is the case in ought-to L2 self own via an χ 2 diff. test. Despite the promising model fit performance of the 3-factor correlated model, the 4-factor correlated model was still more promising, and this suggests that ideal L2 self others is a valid component of the L2 self construct. To be specific, the present study contributes to the literature by reconceptualizing the self-guides in the L2MSS based on Higgins's [31] theory of self-discrepancy. The bifurcations of two distinct standpoints (i.e., own and other) within the dimensions of both the ideal L2 self and the ought-to L2 self has been empirically validated to be viable and meaningful.
By evidence, the results of the study confirm the validity and reliability of a 4-factor correlated model through a series of rigorous model comparisons with the other seven competing models, and thus lend strong empirical support to Higgins's [31] theorizing regarding the inner structure of future self-guides. As shown in the figure of Model g , both convergent validity and discriminant validity are obtained in the sense that the relationships between the two ideal L2 self-guides and between the two ought-to L2 self-guides are quite high. In fact, these are comparatively higher than the cross-relationships between the two ideal L2 self-guides and the two ought-to L2 self-guides. Evidently, the attainment of convergent and discriminant validity is a significant psychometric improvement over Papi et al.'s [23] study in which the ought-to L2 self other and ideal L2 self own measures were only loosely operationalized. Similarly, because the factor loadings of all 16 items went above 0.70, this further demonstrates clear evidence for the item validity regarding the representativeness of the items specifically conceptualized for the four types of L2 self measures, respectively. In particular, the item validity as evidenced by the high factor loadings is also triangulated by the good item fit performance as verified by the multidimensional Rasch model.
To further support this claim, the outcome of the newly-conceived L2 self-measure can be useful in understanding the self profiles in relation to language learning motivation. Specifically, because the factorial structure of the four-dimension L2 self-measure is more theoretically balanced and comprehensive than the prior instruments, it can be used for understanding the extent to which learners' language proficiency and learning effort can be explained by different types of motivational self profiles as identified by the four-dimension language self instrument. Similarly, the new instrument can be further adopted for cross-cultural group comparisons in different settings of distinct sociocultural backgrounds. For instance, Asian EFL learners heavily influenced by Confucius ideology may exhibit different self-image configurations compared to their Western counterparts.
In addition, the practical implication of delineating a self construct with two domains and two standpoints helps relate the self to emotion [41]. For example, in the evaluation of one's ought-to L2 self, if his or her actual attribute does not match the state that the person believes some significant others expect to strive for, one may be afraid of being punished and is inclined to feel a sense of impending threat. As a result, he/she may have the feeling of shame from being penalized by others for breaking the rules imposed by social norms. On the other hand, if the actual self does not match the state that one believes he/she should attain, he/she may experience a sense of guilt that could become self-hatred. Specifically, the different emotional states such as shame and guilt become salient when induced by these two different types of self-discrepancy. This discrimination between the two configurations of the ought-to L2 self is crucial as shame-proneness negatively predicts learners' language achievement whereas guilt-proneness does not lead to a decline in language performance [41].
Furthermore, because prior research findings based on the two-dimension or three-dimension self instrument may not be as informative or valid as those in which the four-dimension self instrument is taken, it is urged that future research consider the adoption of the newly-conceived and empirically-validated quadro-component L2 self instrument. Replication or new studies could focus on including ideal L2 self others and ought-to L2 self own to examine the way in which the four types of L2 self future guides may interact with one another, or other individual difference variables such as anxiety, belief, or willingness to communicate. For instance, in relation to the findings of Papi et al.'s [23] study, which suggests that different types of self-discrepancies may lead to diverse regulatory orientations, researchers and practitioners are likely to be in a better position to equip each type of learner with a specific set of self-regulatory skills and strategies that are in alignment with their motivational self-guides to maximize learning. As cognitive, affective and motivational factors often jointly exert effects on language learning outcomes in L2 acquisition, more empirical research may be anticipated in the future which examines what effect different types of self-discrepancies may have on the emotional states experienced by individual learners as well as the regulatory inclinations endorsed by L2 learners.
Admittedly, although the 2 × 2 model developed and validated in the current study has potentials in breaking new grounds of L2 motivation research, it needs to be made clear that there are many facets of L2 selves whose conceptual and pragmatic implications have not been duly addressed here. For instance, another key self-domain in Higgins's [31] self-discrepancy theory is the actual self. This refers to the representation of the attributes that a person believes one currently possesses and is believed to be important since when one perceives the incongruence between the actual self and the ideal end-state, he/she may self-regulate in ways that achieve a harmonious end-state. Moreover, another relevant self-concept proposed by Markus and Nurius [35] in their possible self theory refers to feared self which is a set of qualities the person would not like to possess but is worried about possibly possessing. They argue that a representation of a feared self is likely to drive the individual to behave in a way so as to avoid the materialization of his/her feared self. It is also argued that a balanced ideal possible self with a feared self counterpart is likely to lead to more powerful regulatory force that is inducive to bringing out motivated learning behavior [36]. Future studies in this vein should incorporate these potential self constructs into a more comprehensive theoretical framework so as to yield a more holistic model that can have more robust explanatory power over motivated learning behavior.

Conclusions
The current work developed and validated a new instrument that taps into L2 learners' self-imagery in foreign language learning. A four-dimension model comprised of ideal L2 self own , ideal L2 self others , ought-to L2 self own , and ought-to L2 self others was confirmed to be empirically adequate via both CFA and multidimensional Rasch analysis, with the attainment of robust construct, as well as convergent and discriminant validity. Good fit was achieved at both the factor and item levels. It is suggested that the new L2 self-guides scale can be adopted and applied to future L2 motivation research. The study, however, is not without limitations. As the validation study was conducted within a comparatively monolingual context, the validity evidence obtained in the study has yet to be cross-checked in multilingual settings. Additionally the two independent groups of participants recruited in the study are mainly college learners. Further validation studies need to be undertaken among secondary school learners, for which English is typically learned as a school subject, and languages other than English are normally learned as electives.