Next Article in Journal
Healing Estranged Sorrows Through Narrative, Imaginal, and Mythic Amplification
Previous Article in Journal
Extending Cognitive Load Theory: The CLAM Framework for Biometric, Adaptive, and Ethical Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessing the Reading the Mind in the Eyes Test with Spanish Adolescents

Department of Psychology, Faculty of Education, Psychology and Social Work, University of Lleida, 25003 Lleida, Spain
*
Authors to whom correspondence should be addressed.
Psychol. Int. 2025, 7(2), 41; https://doi.org/10.3390/psycholint7020041
Submission received: 19 March 2025 / Revised: 11 May 2025 / Accepted: 14 May 2025 / Published: 21 May 2025
(This article belongs to the Section Psychometrics and Educational Measurement)

Abstract

:
The Reading the Mind in the Eyes Test (RMET) is widely regarded as the primary instrument for measuring the Theory of Mind (ToM). However, its reliability, validity, and administration procedures, particularly across the lifespan and within adolescence, have been a subject of controversy. This study addresses these concerns using the Spanish version of the RMET with a Spanish sample of 162 late adolescents (93 women). The main findings indicate low reliability and questionable validity, casting doubt on the RMET’s suitability for assessing ToM during adolescence. The study discusses the administration of the instrument as a potential factor contributing to its psychometric deficiencies. Furthermore, it posits that the assumption of ToM as a singular, unitary construct may compromise the validity of the instrument.

1. Introduction

The Theory of Mind (ToM) encompasses the capacity to attribute mental states to oneself and others. It is conceptualized as a “theory” due to the indirect observability of these states, allowing for predictions about others’ behaviors (Premack & Woodruff, 1978). The ToM is considered integral to successful social interactions and comprises two distinct components. The affective component entails the ability to empathize with others’ mental states, while the cognitive component involves making rational inferences about others’ mental states. Moreover, both components are believed to undergo significant development during preadolescence and adolescence (Baron-Cohen et al., 1995; Kalbe et al., 2007). Adolescents often report lower levels of ToM performance when compared to adults (Altgassen et al., 2014), albeit statistically non-significant differences have also been found (Sebastian et al., 2012). These performance discrepancies may be explained according with the neurobiological model of the ToM (Abu-Akel & Shamay-Tsoory, 2011). More specifically, there are specific changes in connectivity between prefrontal, temporal, and temporoparietal brain regions involved in the ToM, particularly during early, middle, and late adolescence (Blakemore, 2008). Since adolescence is a crucial life stage for cognitive maturation, the evidence of ToM development is inconclusive and insufficient and needs to be assessed adequately at this age.
Since the inception of the Theory of Mind (ToM), several instruments have been devised to measure this construct. Among the pioneering tools addressing the ToM are the Maxi task (Wimmer & Perner, 1983) and the Sally and Ann task (Baron-Cohen et al., 1985), with its focus on evaluating the affective ToM component in children. These instruments assess belief understanding, specifically false beliefs, in which children are required to attribute beliefs to others. Other instruments tapping ToM skills, such as the Theory of Mind task battery, show convergent validity with empathy measures (Nader-Grosbois & Simon, 2023).
The Reading the Mind in the Eyes Test (RMET) stands out as one of the most widely used instruments for assessing the Theory of Mind. Diverging from tools based on mental state reasoning without visual stimuli, the RMET evaluates mental state decoding from facial features, aiming to measure the ability to recognize others’ mental states (Bora et al., 2006). Comprising 36 images of the eye region of diverse individuals displaying different mental states, participants are tasked with selecting the word that best describes the inferred mental state from each picture from among four options. A glossary offers brief definitions for each word. Figure 1 depicts two sample items from the RMET (items 18 and 36).
The design of the RMET is founded on the notion that individuals possess a strong ability to accurately interpret emotions solely by observing the eyes’ region (Baron-Cohen et al., 1997). Additionally, studies indicate that individuals are equally adept at decoding complex mental states from the eye region as they are from the entire face (Baron-Cohen et al., 1997). The RMET aims to differentiate between individuals with healthy versus impaired ToM skills, particularly in clinical conditions, such as autistic spectrum disorders or schizophrenia (Adolphs et al., 2002; Baron-Cohen et al., 1999; Frith et al., 1991; Gavilán-Ibáñez & García-Albea, 2013; Johnson et al., 2022), and also in educational settings with adolescents (Laghi et al., 2016).
The original version of the RMET comprises 25 items featuring same-size photographs (15 × 10 cm) of the eye region taken from magazines (Baron-Cohen et al., 1997). These black-and-white photographs capture the facial area from the eyebrow to the middle of the nose. Each photograph is displayed for 3 s, after which the participant selects between two words defining the mental state. A correct answer scores 1 point, while an incorrect response scores zero points. Validity is established by comparing clinical and non-clinical samples (autism, n = 16; Tourette syndrome, n = 10; normal, n = 50), and through control tasks, such as gender recognition task and Basic Emotion Recognition task. Reliability was not assessed in this version.
Subsequent work expands the RMET to 36 items, offering four answer options to reduce range restriction, enhance accuracy, and offer a more balanced representation of both sexes (Baron-Cohen et al., 2001). The test scoring method incorporates target words that are answered correctly by at least 50% of the subjects, with any incorrect answer selected more than 25% of the time. Validity is assessed by comparing various clinical groups (autism, n = 15; normal, n = 225; IQ-matched controls, n = 14). Additionally, there is a 28-item version designed for children between 6 and 16 years old (Baron-Cohen et al., 2001).
The RMET has been adapted into languages such as Turkish (Girli, 2014), Italian (Vellante et al., 2013), French (Prevost et al., 2014), and Spanish (Fernández-Abascal et al., 2013; Huerta-Ramos et al., 2021). Across different language adaptations, the test–retest reliability of the RMET is generally deemed acceptable, with values ranging from 0.63 to 0.83. The internal consistency reliability, however, tends to fall below commonly accepted standards, with alpha reliabilities typically ranging from 0.58 to 0.63 (Harkness et al., 2005; Redondo & Herrero-Fernández, 2018; Voracek & Dressler, 2006).
In terms of construct validity, there is a discrepancy in proposals regarding the dimensionality of the RMET. The original authors suggested a single-factor structure (Baron-Cohen et al., 2001), while other researchers have proposed two factors: namely positive affect and negative affect (Konrath et al., 2014), or a three-factor model encompassing positive, negative, and neutral affect (Harkness et al., 2005). The RMET may prove unsuitable for accurately measuring the ToM in higher-functioning individuals (Huerta-Ramos et al., 2021). Additionally, abbreviated forms of the instrument tend to exhibit poor model fit, despite the application of statistical tools aimed at maximizing reliability and construct validity (Olderbak et al., 2015). Construct validity may also be influenced by significant individual differences in the ToM, which are markedly pronounced during adolescence, coinciding with substantial changes in brain and psychosocial dynamics and increased ToM-related cognitive demands (Moriguchi et al., 2007).
Furthermore, the administration of the instrument varies across different studies. Some studies administer individual tests in laboratory settings (Holt et al., 2014) or in home environments (Baron-Cohen et al., 2001). Other studies conduct collective applications using projectors (Laghi et al., 2016; Redondo & Herrero-Fernández, 2018), pen-and-paper, or computer-based applications (Fernández-Abascal et al., 2013; Huerta-Ramos et al., 2021). For instance, Huerta-Ramos et al. (2021) employ an online administration of the RMET to 211 healthy participants (134 female) between 19 and 70 years old, by using an internet-connected device. Other collective applications with projector devices have been undertaken in large group classrooms with high school (Laghi et al., 2016) or university students (Redondo & Herrero-Fernández, 2018).
In clinical contexts, the individual administration of psychological tests is more frequent and advisable. In educational and occupational settings, however, collective administration is more frequently employed for initial screening. Indeed, collective administration requires less stringent preparation protocols, allows for the evaluation of a high number of subjects more cost-effectively, tends to be standardized more effectively, and can be implemented in a more natural context for the subjects. Nonetheless, the collective administration of psychometric tests implies several limitations as well, since the examiner must remain vigilant to prevent cheating, talking, or unnecessary noise—and has fewer opportunities to observe the examinee’s behavior during the administration, as well as to establish interpersonal interaction (Aiken, 2003).

The Present Study

Following the COVID-19 pandemic, the utilization of the RMET has significantly expanded, accompanied by a surge in criticisms regarding its psychometric properties (Huerta-Ramos et al., 2021; Olderbak et al., 2015; Pavlova & Sokolov, 2022; Vellante et al., 2013). Some researchers have suggested that the RMET functions as a measure of emotion perception rather than as a measure of the ToM (Kittel et al., 2022). Furthermore, there is a remarkable concern about the RMET’s inability to discriminate between individuals with low and high ToM (Black, 2019). A recent systematic review concluded that the validity of RMET scores is largely unsubstantiated and inappropriate, advising against the use of the RMET as a measure of social cognitive ability (Higgins et al., 2024). However, most of the reviewed studies contained adult samples and only included 36 studies with a total of 11,066 non-pathological adolescents. Moreover, none of them were conducted in Spain.
In this regard, the purpose of the current study is to assess the psychometric properties of the Reading the Mind in the Eyes Test (RMET), regarding its suitability for adolescent cohorts and its appropriateness in collective applications (Redondo & Herrero-Fernández, 2018). To enhance the understanding of the construct validity and reliability of the RMET, our study delves into evaluating the factorial structure of the test and the efficacy of several models and previously suggested methodologies aimed at maximizing the instrument’s psychometric properties. Additionally, we aim to validate the appropriateness of the collective application in an educational context and to provide new insights regarding the RMET performance in the Spanish language.

2. Method

2.1. Participants

The study sample consisted of 162 subjects (93 of whom were female), with ages ranging from 15 to 19 years (Mean = 16.71, SD = 0.71). Statistical analysis revealed no significant age differences between sexes (t = −0.318, p = 0.751; effect size d = 0.05, 95% Confidence Interval [−0.26, 0.36]). All participants were students at a secondary education institution located in northeastern Spain. All the participants were informed about the study aims and signed an informed consent form. The study was approved by the ethics committee of our university.

2.2. Instruments

The Reading the Mind in the Eyes Test (RMET; Baron-Cohen et al., 2001) is a measure designed to assess the individual capacity to discern the mental states of others. It consists of 36 distinct items, each presenting four potential adjectives as answer choices. The items are close-up photographs of individual eyes purportedly conveying a specific emotion. Of the four adjectives provided, only one accurately describes the emotion depicted in the photograph (see Figure 1). Each correct answer scores one point and zero points for an incorrect answer. The total score is the sum of correct responses. The effectiveness of three abbreviated versions of the RMET was scrutinized using alternative theoretical models.
The Maximizing Main Loadings (MML) model, as formulated by Olderbak et al. (2015), was constructed around the principle of maximizing the number of items that strongly correlate with a single underlying factor in confirmatory factor analysis (CFA), while ensuring an adequate model fit according to Chi-square (χ2), the Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI), and the Tucker–Lewis Index (TLI). The method involved conducting a CFA and successively eliminating items with the lowest factor loadings, resulting in a refined model comprising 7 of the original items.
The Ant Colony Optimization (ACO) model represents another shortened version of the RMET devised by Olderbak et al. (2015). Employing the ACO heuristic algorithm, which seeks the optimal abbreviated form of a questionnaire based on empirical data (Leite et al., 2008), this model aimed to determine the most concise form of the test that could still produce a reliable omega estimate of at least 0.70 and maintain robust model fit indices, achieving a CFI of 0.95 and an RMSEA of 0.02. The final model included 10 items.
The Harkness Model (Harkness et al., 2005) postulates a tripartite structure reflecting the emotional valence portrayed in the test items. It segments the RMET into three distinct subtests: Positive Affect, Negative Affect, and Neutral Affect, encompassing all 36 items of the original instrument.

2.3. Procedure

The instrument was administered in a group setting within classrooms spanning 40 square meters across seven different sessions, each comprising 20 to 25 students. Initially, participants were briefed on the task using the sample item provided by the instrument. Any uncertainties regarding the task were addressed using the glossary from the Spanish version of the instrument. Subsequently, each photograph, accompanied by the four corresponding response options, was displayed for a duration of 20 s with the aid of a projector on a 2 × 2 m screen, positioned at least 3 m from the participants. Participants were then instructed to select one of the four options and record their choice on the answer sheet provided.

2.4. Statistical Analysis

Initially, we undertook both descriptive and reliability analyses for the test and its modified versions (MML, ACO, Harkness), employing KR-20 and split-half reliability coefficients, alongside assessing the inter-item tetrachoric correlations. Additionally, we evaluated the test’s reliability after omitting items that had a correct response rate below 50% and an incorrect response rate for any distractor above 25%, according to the criteria set out by Baron-Cohen et al. (2001). The reliability of items was also assessed separately for each sex, using model representations tailored to male and female groups.
Second, we conducted a parallel analysis to infer our latent factors, and we tested the different EFA solutions from 1 to 3 factors in the full test version and in the version dropping items.
Third, confirmatory factor analysis (CFA) was applied to the models sourced from extant literature to verify the reliability of the identified latent factors. The Weighted Least Squares Mean and Variance Adjusted (WLMSV) estimation method was implemented, given its suitability for handling dichotomous data (Flora & Curran, 2004). All statistical analyses were performed using the R software (R Core Team, 2022). The raw dataset supporting these analyses is accessible in the Supplementary Materials.

3. Results

3.1. Descriptive Data, Reliability, and Exploratory Factor Analysis

The mean score for the full-scale RMET was 22.49 (SD = 3.14), and there were no discernible differences between sexes (t = 1.174, p = 0.2423; d = 0.19, 95% CI [−0.13, 0.50]). However, the assessment of score distribution yielded mixed results. While skewness and kurtosis fell within the normal range (−0.21 and 0.66, respectively), the Shapiro–Wilk normality test suggested a departure from a normal distribution (W = 0.9818, p = 0.03154), as depicted in Figure 2. Figure 3 illustrates the inter-item tetrachoric correlation, with a mean value of 0.041, highlighting a virtually non-existent correlation between items. Notably, despite the test’s aims to measure a singular factor, some items exhibited negative correlations.
Table 1 presents the reliability estimates for the full test and the different solutions. Overall, the reported values fell notably below the commonly accepted thresholds of 0.70 to 0.80. The most favorable reliability emerged from the version of the test crafted according to the item deletion criteria outlined by Baron-Cohen et al. (2001), with a KR-20 coefficient of 0.36. Similarly, the subscale consisting of positive emotion items within the Harkness model demonstrated a KR-20 coefficient of 0.37. Both the ACO and MML abbreviated forms exhibited an internal consistency approaching negligible levels.
Figure 4 shows the parallel analysis conducted with the full test and the deleted items version, which graphically leads to inconclusive results but with a suggested numerical outcome of three factors to be extracted in both models. According to these results, we additionally conducted three EFAs from the three factors solution toward more parsimonious factorial solutions. The results of the EFA are shown in Tables S1 and S2 of the Supplementary Material and led to inconsistent results.

3.2. CFA

Table 2 provides the standardized beta weights of the CFA models derived from previous studies. Notably, in the ACO Model, only 3 out of 10 items loaded onto a single factor, while in the MML Model, only 3 out of 7 items exhibited loadings on the single factor. Within the Harkness model, all items, apart from item 16, showed significant loadings in the positive latent factor. Conversely, in the negative latent factor model, only three items (34, 35, and 36) exhibited significant loadings, with two of these items loading negatively. Finally, in the neutral latent factor, only four items displayed significant negative loadings (10, 13, 18, and 28). Due to our limited sample size, the goodness-of-fit indices presented abnormal values for the ACO and MML models (TLI > 1, RMSEA = 0). Additionally, the Harkness model failed to converge due to the large number of free parameters introduced. Collectively, these findings underscore the poor fit of this test.

4. Discussion

The Reading the Mind in the Eyes Test (RMET) was designed to evaluate the Theory of Mind (ToM) by interpreting others’ mental states from facial expressions (Baron-Cohen et al., 1997; Baron-Cohen et al., 2001). The objective of the present study was to evaluate the reliability and construct validity of the RMET when administered collectively to a group of late adolescents via a projector.
The findings revealed several psychometric challenges and limitations when employing this test in a collective setting for adolescents. Adolescence is a turmoil life stage characterized by cognitive and affective changes, including ToM related skills involving social relationships (Moriguchi et al., 2007). The current results highlight shortcomings regarding the RMET reliability and validity in measuring advanced ToM in Spanish late adolescents.

4.1. Reliability

The internal consistency, as measured by KR-20, alpha, and omega, exhibited notably low reliability across even the psychometric-based models, with no substantial improvement observed after removing “low quality” items. The calculated consistency values ranged from 0.02 to 0.36, an indication of very poor reliability. Similarly, the split-half assessment produced a maximum value of 0.49, following the same trend. Furthermore, the findings indicate that the practice of dropping items with less than 50% correct answers and at least one incorrect answer above 25%, as proposed by Baron-Cohen et al. (2001), does not appear to be a helpful or reliably valid method for selecting low-quality items. In essence, the elimination of such items does not substantially enhance test reliability. Overall, the results indicate poor reliability of the test and suggest that item-dropping is insufficient to improve its performance in the current sample of adolescents.

4.2. Factor Structure and Validity

The study revealed a notable absence of a clear factorial structure in terms of construct validity. Parallel analysis, together with Exploratory Factor Analysis and Confirmatory Factor Analyses conducted by Olderbak et al. (2015), indicated that alternative models suggested an unfeasible factor structure. These results support the findings of a recent systematic review, highlighting the uncertainty about the factorial structure of the instrument’s scores. The frequent modifications made to the RMET or the use of modified scoring methods without carefully examining their impact, might impinge on the psychometric properties of the RMET (Higgins et al., 2024). In terms of the instrument validity, it is posited that the RMET taps the participant’s ability to comprehend second-order beliefs (cognitive ToM) while disregarding their empathetic understanding of others’ mental states (affective ToM).
Recent studies indicate that the RMET functions as a measure of emotion perception rather than a measure of the Theory of Mind (ToM). This view challenges the interpretation of earlier RMET results while underscoring the need to develop measures that assess the ToM from distinct constructs (Kittel et al., 2022).

4.3. RMET Administration

The methodology employed for administering the RMET raises notable concerns. While the original and Spanish versions of the RMET do not explicitly outline a specific data collection approach, instances of collective administration of the instrument have been documented (Redondo & Herrero-Fernández, 2018).
However, the practice of administering the collective questionnaire via projection onto a screen may contribute to the low reliability observed in the results. For instance, conducting the test collectively may lead to individuals losing focus and attention. Furthermore, potential vision impairments and heightened difficulties in comprehending the task may further compound the challenges associated with collective administration.

4.4. Implications for Counseling Practice and Research

The current study is subject to limitations, primarily the constrained sample size, which might compromise the statistical power of some Confirmatory Factor Analysis models, leading to abnormal goodness-of-fit values and non-convergence. Additionally, cultural disparities and variations in linguistic familiarity could affect item difficulty and responses, especially as the original instrument was designed for the English-speaking population. The adaptation to the Spanish language might have influenced the participants’ responses, particularly regarding the ten items representing complex mental states.
Follow-up research should investigate whether the observed low reliability and construct validity stem from instrument limitations or are rather attributable to the developmental stage of adolescence. For example, future studies may consider evaluating the reliability and validity of the instrument across various phases during adolescence and early adulthood. Furthermore, it is important to consider an individual application of this instrument for counseling purposes, instead of a collectivistic application. Further research contrasting the psychometric behavior of the test under varying administration conditions would be helpful to clarify this issue.

5. Conclusions

The RMET is an interesting instrument widely used for assessing the Theory of Mind (ToM). The current findings align with the large previous literature suggesting several limitations of the RMET. The internal consistency, reliability, and factorial structure of the RMET with the current data are less than ideal. The collective administration of the test might compromise its reliability; henceforth, individualized methods of administering this instrument may be more prudent than a collective application. Moreover, further reinterpretation and theoretical investigation are required to determine whether the ToM is a singular construct or comprises multiple orthogonal dimensions. Research efforts aiming at the study of the ToM during adolescence should strive to apply alternative measurement approaches to the RMET.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/psycholint7020041/s1, Table S1: Fit of the exploratory analyses for 1, 2 and 3 factor models for the full test version (top half) and deleted items version (bottom-half); Table S2: Factor loadings for the 3 factors model suggested by the parallel analysis.

Author Contributions

A.M.: Data curation, Writing—Original draft preparation, Writing—Reviewing and Editing, Methodology, Formal analysis, Software. A.R.: Conceptualization, Writing—Original draft preparation, Writing—Reviewing and Editing, Project administration, Data curation. O.M.: Writing—Reviewing and Editing, Formal analysis, Visualization. A.B.: Conceptualization, Writing—Original draft preparation, Software, Writing—Reviewing and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

No funding was received for conducting this study.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Comité de Ética de Investigación con Medicamentos del Hospital Universitario Arnau de Vilanova de la Gerencia Territorial de Lleida (protocol code CEIC-2454 and 3 May 2021).

Informed Consent Statement

The ethical committee of our University approved the study. An informed consent was obtained from the participants.

Data Availability Statement

The data and code are available from the corresponding author on request.

Acknowledgments

This research was performed within the Catalonian Consolidated Research Group 2021 SGR 01423.

Conflicts of Interest

There were no conflicts of interest/competing interests.

References

  1. Abu-Akel, A., & Shamay-Tsoory, S. (2011). Neuroanatomical and neurochemical bases of theory of mind. Neuropsychologia, 49(11), 2971–2984. [Google Scholar] [CrossRef] [PubMed]
  2. Adolphs, R., Baron-Cohen, S., & Tranel, D. (2002). Impaired recognition of social emotions following amygdala damage. Journal of Cognitive Neuroscience, 14(8), 1264–1274. [Google Scholar] [CrossRef]
  3. Aiken, L. R. (2003). Tests psicológicos y evaluación. Pearson Educación. [Google Scholar]
  4. Altgassen, M., Vetter, N. C., Phillips, L. H., Akgun, C., & Kliegel, M. (2014). Theory of mind and switching predict prospective memory performance in adolescents. Journal of Experimental Child Psychology, 127, 163–175. [Google Scholar] [CrossRef] [PubMed]
  5. Baron-Cohen, S., Campbell, R., Karmiloff-Smith, A., Grant, J., & Walker, J. (1995). Are children with autism blind to the mentalistic significance of the eyes? British Journal of Developmental Psychology, 13(4), 379–398. [Google Scholar] [CrossRef]
  6. Baron-Cohen, S., Jolliffe, T., Mortimore, C., & Robertson, M. (1997). Another advanced test of theory of mind: Evidence from very high functioning adults with autism or Asperger syndrome. The Journal of Child Psychology and Psychiatry and Allied Disciplines, 38(7), 813–822. [Google Scholar] [CrossRef]
  7. Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a “theory of mind”? Cognition, 21(1), 37–46. [Google Scholar] [CrossRef] [PubMed]
  8. Baron-Cohen, S., O’riordan, M., Stone, V., Jones, R., & Plaisted, K. (1999). Recognition of faux pas by normally developing children and children with Asperger syndrome or high-functioning autism. Journal of Autism and Developmental Disorders, 29, 407–418. [Google Scholar] [CrossRef]
  9. Baron-Cohen, S., Wheelwright, S., Spong, A., Scahill, V., & Lawson, J. (2001). Are intuitive physics and intuitive psychology independent? A test with children with Asperger Syndrome. Journal of Developmental and Learning Disorders, 5(1), 47–78. [Google Scholar]
  10. Black, J. E. (2019). An IRT analysis of the reading the mind in the eyes test. Journal of Personality Assessment, 101(4), 425–433. [Google Scholar] [CrossRef]
  11. Blakemore, S. J. (2008). The social brain in adolescence. Nature Reviews Neuroscience, 9(4), 267–277. [Google Scholar] [CrossRef]
  12. Bora, E., Eryavuz, A., Kayahan, B., Sungu, G., & Veznedaroglu, B. (2006). Social functioning, theory of mind and neurocognition in outpatients with schizophrenia; mental state decoding may be a better predictor of social functioning than mental state reasoning. Psychiatry Research, 145(2–3), 95–103. [Google Scholar] [CrossRef] [PubMed]
  13. Fernández-Abascal, E. G., Cabello, R., Fernández-Berrocal, P., & Baron-Cohen, S. (2013). Test-retest reliability of the ‘Reading the Mind in the Eyes’ test: A one-year follow-up study. Molecular Autism, 4, 33. [Google Scholar] [CrossRef] [PubMed]
  14. Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466–491. [Google Scholar] [CrossRef]
  15. Frith, U., Morton, J., & Leslie, A. M. (1991). The cognitive basis of a biological disorder: Autism. Trends in Neurosciences, 14(10), 433–438. [Google Scholar] [CrossRef]
  16. Gavilán-Ibáñez, J. M., & García-Albea, J. E. (2013). Theory of mind and language comprehension in schizophrenia. Psicothema, 25(4), 440–445. [Google Scholar] [CrossRef] [PubMed]
  17. Girli, A. (2014). Psychometric properties of the Turkish child and adult form of “Reading the Mind in the Eyes Test”. Psychology, 5(11), 1321–1337. [Google Scholar] [CrossRef]
  18. Harkness, K., Sabbagh, M., Jacobson, J., Chowdrey, N., & Chen, T. (2005). Enhanced accuracy of mental state decoding in dysphoric college students. Cognition and Emotion, 19, 999–1025. [Google Scholar] [CrossRef]
  19. Higgins, W. C., Kaplan, D. M., Deschrijver, E., & Ross, R. M. (2024). Construct validity evidence reporting practices for the reading the mind in the eyes test: A systematic scoping review. Clinical Psychology Review, 108, 102378. [Google Scholar] [CrossRef]
  20. Holt, R. J., Chura, L. R., Lai, M. C., Suckling, J., Von Dem Hagen, E., Calder, A. J., Bullmore, E. T., Baron-Cohen, S., & Spencer, M. D. (2014). ‘Reading the Mind in the Eyes’: An fMRI study of adolescents with autism and their siblings. Psychological Medicine, 44(15), 3215–3227. [Google Scholar] [CrossRef]
  21. Huerta-Ramos, E., Ferrer-Quintero, M., Gómez-Benito, J., González-Higueras, F., Cuadras, D., del Rey-Mejías, Á. L., Usall, J., & Ochoa, S. (2021). Traducción y validación del test de caras de Baron Cohen en población española. Actas Españolas de Psiquiatría, 49(3), 106. [Google Scholar]
  22. Johnson, B. N., Kivity, Y., Rosenstein, L. K., LeBreton, J. M., & Levy, K. N. (2022). The association between mentalizing and psychopathology: A meta-analysis of the reading the mind in the eyes task across psychiatric disorders. Clinical Psychology: Science and Practice, 29(4), 423. [Google Scholar] [CrossRef]
  23. Kalbe, E., Grabenhorst, F., Brand, M., Kessler, J., Hilker, R., & Markowitsch, H. J. (2007). Elevated emotional reactivity in affective but not cognitive components of theory of mind: A psychophysiological study. Journal of Neuropsychology, 1(1), 27–38. [Google Scholar] [CrossRef]
  24. Kittel, A. F. D., Olderbak, S., & Wilhelm, O. (2022). Sty in the mind’s eye: A meta-analytic investigation of the nomological network and internal consistency of the “Reading the Mind in the Eyes” test. Assessment, 29(5), 872–895. [Google Scholar] [CrossRef]
  25. Konrath, S., Corneille, O., Bushman, B. J., & Luminet, O. (2014). The relationship between narcissistic exploitativeness, dispositional empathy, and emotion recognition abilities. Journal of Nonverbal Behavior, 38, 129–143. [Google Scholar] [CrossRef]
  26. Laghi, F., Lonigro, A., Levanto, S., Ferraro, M., Baumgartner, E., & Baiocco, R. (2016). The role of nice and nasty theory of mind in teacher-selected peer models for adolescents with autism spectrum disorders. Measurement and Evaluation in Counseling and Development, 49(3), 207–216. [Google Scholar] [CrossRef]
  27. Leite, W. L., Huang, I. C., & Marcoulides, G. A. (2008). Item selection for the development of short forms of scales using an ant colony optimization algorithm. Multivariate Behavioral Research, 43, 411–431. [Google Scholar] [CrossRef] [PubMed]
  28. Moriguchi, Y., Ohnishi, T., Mori, T., Matsuda, H., & Komaki, G. (2007). Changes of brain activity in the neural substrates for theory of mind during childhood and adolescence. Psychiatry and Clinical Neurosciences, 61, 355–363. [Google Scholar] [CrossRef] [PubMed]
  29. Nader-Grosbois, N., & Simon, P. (2023). Adaptation and validation of a french version of the griffith empathy measure. Journal of Psychopathology and Behavioral Assessmnet, 45, 993–1009. [Google Scholar] [CrossRef]
  30. Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., & Roberts, R. D. (2015). A psychometric analysis of the reading the mind in the eyes test: Toward a brief form for research and applied settings. Frontiers in Psychology, 6, 1503. [Google Scholar] [CrossRef]
  31. Pavlova, M. A., & Sokolov, A. A. (2022). Reading language of the eyes. Neuroscience & Biobehavioral Reviews, 140, 104755. [Google Scholar] [CrossRef]
  32. Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4), 515–526. [Google Scholar] [CrossRef]
  33. Prevost, M., Carrier, M. E., Chowne, G., Zelkowitz, P., Joseph, L., & Gold, I. (2014). The Reading the Mind in the Eyes test: Validation of a French version and exploration of cultural variations in a multi-ethnic city. Cognitive Neuropsychiatry, 19(3), 189–204. [Google Scholar] [CrossRef] [PubMed]
  34. R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available online: https://www.R-project.org/ (accessed on 11 September 2022).
  35. Redondo, I., & Herrero-Fernández, D. (2018). Validation of the Reading the Mind in the Eyes Test in a healthy Spanish sample and women with anorexia nervosa. Cognitive Neuropsychiatry, 23(4), 201–217. [Google Scholar] [CrossRef] [PubMed]
  36. Sebastian, C. L., Fontaine, N. M., Bird, G., Blakemore, S. J., De Brito, S. A., McCrory, E. J., & Viding, E. (2012). Neural processing associated with cognitive and afective theory of mind in adolescents and adults. Social Cognitive and Afective Neuroscience, 7(1), 53–63. [Google Scholar] [CrossRef]
  37. Vellante, M., Baron-Cohen, S., Melis, M., Marrone, M., Petretto, D. R., Masala, C., & Preti, A. (2013). The “Reading the Mind in the Eyes” test: Systematic review of psychometric properties and a validation study in Italy. Cognitive Neuropsychiatry, 18(4), 326–354. [Google Scholar] [CrossRef]
  38. Voracek, M., & Dressler, S. G. (2006). Lack of correlation between digit ratio (2D:4D) and Baron-Cohen’s ‘‘Reading the Mind in the Eyes’’ test, empathy, systemising, and autism-spectrum quotients in a general population sample. Personality and Individual Differences, 41, 1481–1491. [Google Scholar] [CrossRef]
  39. Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13(1), 103–128. [Google Scholar] [CrossRef]
Figure 1. Sample items of the RMET (18 and 36, respectively) for a female (A) and a male (B) model. The correct answers are shown in boldface.
Figure 1. Sample items of the RMET (18 and 36, respectively) for a female (A) and a male (B) model. The correct answers are shown in boldface.
Psycholint 07 00041 g001
Figure 2. Histogram of the full test scores distribution.
Figure 2. Histogram of the full test scores distribution.
Psycholint 07 00041 g002
Figure 3. Distribution of the tetrachoric correlations of the items. The mean in the correlations (M = 0.041) is represented by the central black line.
Figure 3. Distribution of the tetrachoric correlations of the items. The mean in the correlations (M = 0.041) is represented by the central black line.
Psycholint 07 00041 g003
Figure 4. Parallel and Principal Component Analysis of the full test (Left) and the version with deleted items (Right).
Figure 4. Parallel and Principal Component Analysis of the full test (Left) and the version with deleted items (Right).
Psycholint 07 00041 g004
Table 1. Reliability of the different versions and solutions.
Table 1. Reliability of the different versions and solutions.
Model KR 20Split HalfCFA AlphaCFA Omega
Full test 0.280.49------
Delete as Baron-Cohen et al. b 0.360.46------
Sex of the model Male 0.070.43------
Sex of the model Female 0.20.36------
ACO 0.100.250.090.11
MML 0.020.270.020.06
NEG−0.010.28------
HARKNESSPOS0.370.41------
NEU0.270.42------
b Items dropped: 3, 9, 10, 16, 19, 25, 27, 33, 35, and 36.
Table 2. Beta standardized weights and reliability of the CFA models in Olderbak et al. (2015).
Table 2. Beta standardized weights and reliability of the CFA models in Olderbak et al. (2015).
ItemCorrect Answer ENGCorrect Answer
ESP
Model SexMMLACOHARKNESS
POSITIVENEGATIVENEUTRAL
1PlayfulJuguetónH------0.265 *------
2UpsetMolestoH---------0.045---
3DesireDeseoM------------0.102
4InsistingInsistenteH------------−0.229 *
5WorriedPreocupadoH---------0.219---
6FantasizingFantasiosaM------0.332 **------
7UneasyIntranquiloH------------−0.08
8DespondentAbatidoH−0.010.267 *------−0.163
9PreoccupiedAngustiadaM---0.332 **------−0.142
10CautiousPrudenteH−0.195---------−0.334 ***
11RegretfulArrepentidoH---------0.098---
12SkepticalEscépticoH---0.511 **------−0.135
13AnticipatingExpectanteH------------−0.240 *
14AccusingAcusanteH---0.226---0.004---
15ContemplativeAbstraídaM0.277 *−0.073------0.024
16ThoughtfulConsideradoH------0.042------
17DoubtfulInseguraM---------0.053---
18DecisiveDecididaM------------−0.490 *
19TentativeVacilanteM0.052−0.116------0.054
20FriendlyAmistosoH------0.395 ***------
21FantasizingFantasiosaM------0.350 **------
22PreoccupiedAngustiadaM---−0.027---−0.036---
23DefiantDesafianteH---------−0.038---
24PensiveAbstraídoH0.048−0.001------−0.141
25InterestedInteresadaM------0.290 **------
26HostileHostilH---------−0.024---
27CautiousPrudenteM---------0.140---
28InterestedInteresadaM------------−0.514 ***
29ReflectiveReflexivaM------------−0.187
30FlirtatiousSeductoraM------0.257 *------
31ConfidentSeguraM------0.249 *------
32SeriousSerioH---0.121------−0.105
33ConcernedPreocupadoH------------0.219
34DistrustfulRecelosaM−0.285 *------0.430 ***---
35NervousNerviosaM0.76 4 **------−0.276 *---
36SuspiciousDesconfiadoH---−0.185---−0.443 ***---
χ2
(df)
7.569
(14)
30.735
(35)
641.062
(591)
Note. * p < 0.05, ** p < 0.01, *** p < 0.001.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Martínez, A.; Romero, A.; Malas, O.; Blanch, A. Assessing the Reading the Mind in the Eyes Test with Spanish Adolescents. Psychol. Int. 2025, 7, 41. https://doi.org/10.3390/psycholint7020041

AMA Style

Martínez A, Romero A, Malas O, Blanch A. Assessing the Reading the Mind in the Eyes Test with Spanish Adolescents. Psychology International. 2025; 7(2):41. https://doi.org/10.3390/psycholint7020041

Chicago/Turabian Style

Martínez, Albert, Alicia Romero, Olga Malas, and Angel Blanch. 2025. "Assessing the Reading the Mind in the Eyes Test with Spanish Adolescents" Psychology International 7, no. 2: 41. https://doi.org/10.3390/psycholint7020041

APA Style

Martínez, A., Romero, A., Malas, O., & Blanch, A. (2025). Assessing the Reading the Mind in the Eyes Test with Spanish Adolescents. Psychology International, 7(2), 41. https://doi.org/10.3390/psycholint7020041

Article Metrics

Back to TopTop