Persistent racial/ethnic gaps in intelligence/cognitive ability have been documented for decades almost everywhere in the world (see [1
] for a global review). However, at present there is little agreement on the causes of these gaps [2
]. Surveys show that with regards to the Black–White gap in the United States, experts believe it is roughly equally caused by genetic and environmental factors [4
]. The matter is extremely complex as it involves data from a large variety of scientific fields including psychology, psychometrics, behavioral genetics, population genetics, genomics, archaeology, economics, and history. Because of this complexity, most literature reviews only cite a small selection of the total evidence. One piece of evidence (the Eyferth study [6
]) is a German study of biracial children fathered by U.S. servicemen stationed in Germany after World War 2. Most of the fathers were White, but some were African-Americans (Black). The resulting offspring from these couplings were generally raised by their mothers in Germany without contact with the fathers. Thus, the parental environment was entirely German in origin, while half of the genetic contribution was foreign. As such, the design can be thought of as a natural experiment in which genetic models would suggest that there would be an IQ gap between the Black-German and White-German children. However, only a very small gap (0.7 IQ) is seen in the data. Unsurprisingly, because of the near-null results, the study has been widely cited by researchers arguing for a small or zero genetic contribution to the Black–White IQ gap in the US [8
]. Despite the attention paid to this study, to our knowledge, no one has attempted to replicate this finding with similar data from other countries which hosted U.S. troops. The northeast Asian countries of Japan and Korea are a natural starting point considering their occupation by U.S. troops in Japan following World War 2 (1945–1952) and more than 60 years of continued presence in (South) Korea following the Korean war (1950–1953) and ongoing hostilities between communist China-supported North Korea and democratic U.S.-supported South Korea (1953–present). U.S. troop presence in China has been comparatively minor, restricted to a handful of post-World War 2 operations. The purpose of the present study was to search for and report data from children of biracial couplings between U.S. servicemen and indigenous women in these northeast Asian countries.
Japanese results have been reported in Japanese-language studies [15
]. However, these studies use either the same or a strongly overlapping sample of children recruited from more than five foster homes near Tokyo. The first study, which was conducted in 1950, reported as follows: A total of 267 children were examined by the research team. All children were fathered by White or Black Americans and all mothers were Japanese: 25% of these children were abandoned by their mothers. Most of the mothers explained the circumstances for their pregnancies as follows: 30% worked for American military personnel; 23% worked in general hospitality; 27% worked as prostitutes; 2.5% reported they were raped, and others unreported. As is apparent from these descriptions many of these children were raised in suboptimal environments.
The first study was conducted with 28 children together with 59 pure Japanese peers as controls. They used the classic Goodenough Draw-a-Man test translated and standardized by Kirihara in 1950 [18
The follow-up study in 1967 was conducted at St. Stephen’s school, in Oiso city, Kanagawa prefecture. This time, children took a Japanese nonverbal IQ test called the Tanaka B-type IQ test which was devised by Kan-ichi Tanaka [19
]. We converted these results to the standard IQ norm (mean = 100, standard deviation = 15). Table 1
shows the results.
The data reveal that the mean IQ of the two biracial categories were similar, with only trivial gaps (0.6 IQ points on Goodenough Draw-a-Man and 0.7 IQ points on Tanaka B-type). On this point, Ishihara—the research team leader for the study and a professor of medical anthropology—noted “[i]n our study, the IQ of Black children were not inferior to that of White Children” [17
] and “[w]e do not observe any significant differences in IQ or achievement between White and Black children” [15
]. Furthermore, both groups had below-average IQs compared to their Japanese peers in the same orphanages, suggesting negative selection for inclusion.
There was a substantial Flynn effect observed in Japan both before and after World War 2 [20
]. Given that the Tanaka B-type test was standardized in 1953, we should also expect a large Flynn effect by 1967, when the test was conducted with our samples. This did not seem to have affected the scores of half-White or half-Black children (88.9/88.2, respectively).
We found near-zero gaps between biracial children of Black–Japanese and White–Japanese origin, replicating the findings of the Eyferth study in a novel setting. The most straightforward interpretation is that this finding accords with environmental models of group differences through which one might expect near-zero gaps, depending on the prevalence of relevant environmental factors. Unfortunately, the interpretation of the study is limited for several reasons. First, the sample sizes were very small, and so provide only uncertain estimates. As such, they may not be sufficiently informative as standalone samples, but they can contribute to future meta-analyses. Second, the children were unlikely to be a representative sampling of biracial offspring since they were living in foster homes mainly for abandoned children. As can be inferred from the low IQs relative to Japanese norms, mothers choosing to abandon their children probably possessed below-average IQs themselves. Such selection tends to diminish group differences among the selected persons, bringing differences nearer to zero depending on the intensity of the selection [22
]. Third, there is practically no information about the fathers, from rank to educational background or ancestry. It is known that U.S. servicemen had above-average IQs because of the Army’s selection process whereby individuals who scored too low on entrance tests were barred from service. The resulting truncation effect was stronger for African-Americans than for European-Americans because their mean IQ was closer to the threshold. The effect of this was to diminish the expected IQ gap among the biracial children, but without background information, one cannot make a quantitative estimate of the expected size of such a gap from a hereditarian model. See previous discussions of the Eyferth study in [8
Aside from the present study based on U.S. servicemen, there are other countries where similar natural experiments are currently ongoing. One example is the children born to sex tourists and indigenous women working as sex workers in countries such as the Philippines, where it is estimated that thousands of children are fathered each year by foreign men and are subsequently reared by local women [26
]. Of interest to scientific research is that in some cities where sex work is common, there are genetic testing companies whose business is to ascertain paternity in disputed cases (generally to convince the fathers to provide some form of financial support). It might be possible to obtain genetic data from these companies alongside detailed information about the subjects. This would enable a genetic admixture study like that previously published in this issue [27
], but with little paternal environment effect, thus allowing for a reasonable separation of genetic and environmental causation. Similarly, commercial direct-to-consumer ancestry testing companies such as 23andme possess data for millions of people [28
]. Among these are surely many adoptees or people who have never met their fathers, whose genetic ancestry results are already known. Data from these sources could be used to clarify the relative roles of genetic and environmental variables to racial gaps.
A second alternative is to study within-country transracial and cross-national/international adoption which is often transracial in nature. A recent review of international adoption was published by Thomas, writing from an environmentalist perspective [29
]. As the author points out, previous analyses of transracial adoption studies suffer from a few shortcomings, most importantly the lack of adjustment for Flynn effect-related IQ gains. Thomas showed that for several small studies of internationally adopted East Asians (primarily Koreans) the observed IQ advantage compared to Europeans could be explained by Flynn effect gains. As for the low scores of non-Korean adoptees, Thomas ascribes these to a variety of confounders including age of adoption, early malnutrition/poor environment, and worse environment among adoptive families (as in the case of Blacks in the Minnesota Transracial Adoption Study [30
]). The general problem with these ideas is that Thomas does not consider genetic confounding, the fade-out of environmental effects, or the psychometric properties of environmental effects. To begin with genetic confounding, one issue with the interpretation of adoption studies is that the parents (usually single mothers) who elect to give up their child are unlikely to be representative of their population in terms of genotypic IQ. We may assume they are somewhat below-average, being unable to take care of their own child. These women are presumably more likely to have children born out of wedlock. Hence, we expect a slight decrease in IQ relative to the average genotypic IQ of the origin population. Thus, when Korean children are given up for adoption, one might expect their genotypic IQ to be perhaps 3–7 IQ points below the mean of the Korean population (104.6 in Lynn’s 2012 dataset [33
], 102.4 in Becker’s version 1.3.2 [34
]). This would lead one to expect the children to attain IQs of 98–102. Furthermore, studies of environmental variation in adoptive homes and childhood IQ tests have generally come up with weak or null findings [30
], including a large Swedish study of international adoptees (this study was not cited by Thomas but was based on an overlapping dataset with some of the studies cited by Thomas) [36
]. For the adopted Koreans, there was no relationship between age of adoption and IQ scores at age 18 in the Swedish draft test. However, this pattern was clearly present for non-Koreans, causing an unclear interpretation. Results from the NAEP National Report Card and an uncited analysis of Korean adoptees by Linda A. Gildea show similar results [37
]. Furthermore, there was no relationship between the educational attainment of adoptive parents and the IQs of their adopted children for either Koreans or non-Koreans in Sweden. However, a somewhat weak relationship was found between parental socioeconomic status (SES) and educational attainment and income in a sample of adopted Koreans in the United States [38
]. The main limitation of the large Swedish study is that the authors lumped every non-Korean sending country together, presumably trying not to single out the low IQ of any particular origin country, and in the process preventing an analysis based on the national IQs of the other source countries except that of Korea. Additionally, one must be attentive to the age of testing since heritability is lower at younger ages and shared environmental effects are larger [39
]. Only the Swedish study and the Minnesota study tested subjects nearing adulthood.
Second, the pattern of subtest differences is not addressed with proper psychometric methodology. This is important because racial differences are not confined to global composites (such as full-scale IQ); instead, they mainly relate to the g
factor of cognitive ability tests. A meta-analysis by te Nijenhuis et al. confirmed the result of an earlier study by Jensen, finding that adoption IQ gains showed negative correlations to g-loadings [40
]. Third, effects on IQ from malnutrition fade with age. This was shown in the case of the Dutch famine of 1944 caused by the Nazi occupying forces who limited the supply of food in the Netherlands, causing widespread starvation [42
]. In this way, this fade-out is similar to the fade-out from other early-life interventions that also boost IQ for a limited time and which do not show a g-loaded pattern of subtest gains [46
]. Overall, we conclude that the existing transracial adoption data leave much to be desired, but that the largest studies (i.e., the Minnesota Transracial Adoption Study and Swedish international adoptee studies) tend to support genetic models over environmental ones. It appears more common for studies to support an environmental hypothesis of group differences when the data are insufficient to draw firm conclusions (e.g., the tiny adoption studies of Moore (1986) [48
] and Tizard (1974–1975) [49
]). This pattern is troubling because there is evidence from previous meta-analyses that findings that are incongruent with egalitarian ideology tend to not be published [51
Finally, we note that there are long-existing datasets which contain cognitive data from transracial adoptees which have thus far not been explored by researchers (e.g., the High School Longitudinal Study of 2009). We urge researchers to investigate these datasets with more advanced psychometric methods (e.g., Jensen’s method, multiple-group confirmatory factor analysis, local structural equation models, and item response theory analyses of various types).