We probed the mechanisms underlying contextual effects in face lightness perception. While several groups have shown that context, in the form of race-specifying information, influences perceptual estimations of face reflectance [
25,
32], it is unknown as to whether these contextual effects are governed by the same classical mechanisms that drive face perception. We exploited well-known expertise-related effects in face recognition (i.e., the other race effect; [
42,
43]) and asked if comparable mechanisms underlie contextual (race-based) modulations of lightness judgments involving face stimuli. We posited that experience would diminish matching errors made with respect to own-race faces versus other-race faces. Our results, however, were surprising. The data indicated a rather homogenous pattern of distortions between our two expert groups. In particular, both groups had the smallest matching errors in trials where the standard and variable stimuli were of the same race. Errors under these conditions did not vary across the three contextual (race) categories. For trials where the standard and variable stimuli were of different races, however, observers made the largest (negative) errors when the standard face was Black, as compared to Chinese and White standards, for which matching errors trended slightly positively. Critically, this pattern was similar for both White and Chinese observers. We consider the broader contextual effects first, prior to considering implications of the apparent lack of expertise-related effects.
4.1. The Effect of Race Context on Face Lightness Judgments
Congruent with previous work [
23,
24,
25] observers in our tasks (and in particular in different variable/standard race trials) tended to judge faces to be darker when posed with a Black standard stimulus as compared to Chinese/White standard stimuli. These findings are in line with previous work and suggest that the race context of the stimulus has a significant influence on the estimated reflectance (lightness) of the face. That observers are both able to categorize race, and are affected by such information in their lightness estimations, with all pigmentation information neutralized, also emphasizes the importance of morphology in race perception [
33,
34,
35,
36]. What is perhaps a bit puzzling about the current data, however, is the fact that context effects were not evident in trials involving variable and standard stimuli of the same race (
Figure 2). In those trials, matching errors were much smaller, and did not systematically deviate according to the race context of the stimuli (nor between the two groups). In fact, errors for these conditions were very comparable to those obtained for non-face stimuli (i.e., for the patch matching task,
Figure 3). At first glance, this may seem to be incongruent with findings of Levin and Banaji [
25]. However, we note that in their trials involving standard and variable stimuli of the same race, errors were also much smaller than in their different-race matching trials. If we consider only the conditions that were present both here and in the study of Levin and Banaji [
25], similar trends are evident. In both studies, for Black standard stimuli (and different standard/variable race trials), there was a tendency for observers to undershoot (make darker estimations of the variable stimulus relative to the luminance of the standard stimulus); correspondingly, for White standard stimuli, there was a tendency for observers to overshoot (make slightly lighter estimations).
What might explain the apparent difference in matching errors between same- and different- standard/variable race trials observed here and in previous studies? We speculate that discrepant racial features might have acted as additional stimulus noise in trials where the races of the standard and variable stimuli were not the same, although of course all stimuli regardless of condition were equated in terms of their contrast profile. That is, we speculate that when matching between faces of same race, as the racial features were similar between faces, the relevance of the assumptions about social category may have been down-weighted and participants may have been able to rely on more rudimentary mechanisms for reflectance estimations in order to complete the matching tasks. Such rudimentary mechanisms may reflect those same early (Hering/Mach -type, centre-surround, receptive-field mechanisms) and mid-level (Gestalt-based) visual mechanisms as traditionally implicated in lightness perception [
21]. By contrast, when matching between faces of different races, the discrepant racial features may have introduced additional stimulus noise which increased the challenge for achieving accurate matches. In this case, rudimentary mechanisms alone may prove insufficient to produce accurate estimations, and social category (race) information may be greater weighted to resolve the task. An alternative explanation is that race context, in these trials, does in fact produce a bias that affects the visual system’s processing of lightness; yet, this bias is equivalently applied to both computations involving the variable and the standard stimulus. Lightness estimations that are equivalently biased (in the same direction) for both stimuli would then lead to relatively more accurate matches of lightness in these ‘same’ variable/standard trials versus in the ‘different’ trials, for which the degree (and direction) of bias would be discrepant.
4.2. The (Lack of) Expertise-Related Modulations in Face Lightness Estimations
We originally hypothesized that, to the extent that contextual race effects of face lightness perception are governed by the same mechanisms that produce race-based modulations in the classical face recognition literature [
40,
41,
51], we should observe additional modulations of face lightness judgments based on expertise (i.e., the observer’s own race). To our knowledge, the only hints as to expertise-dependent consequences in lightness judgments come from the work of Levin and Banaji [
25], and Hill [
52]. Levin and Banaji [
25] analysed separately, data from a subgroup of (7) Black participants. They found that like their remaining (predominantly White) observers, most of these participants still produced darker estimations for Black standard and variable stimuli. While it is unclear as to whether matching distortions for their subgroup were significantly different from those of the main group, their data appear to suggest that expertise effects, if any, should still not abolish the base contextual modulations of stimulus race. Our data are congruent with these prior observations. In fact, while we observed contextual race effects of lightness judgments (as reviewed earlier), matching errors were very comparable for our two groups of observers, suggesting no additional moderation of these effects based on observers’ degree of expertise with the social category. Admittedly, interpretations of our effects would have been strengthened had we been able to test a third group of Afro–Caribbean observers, particularly in light of the fact that skin pigmentation for this group would differ most markedly from that of the other two. Our ability to recruit such observers, however, were limited by the geography of our laboratory.
Still, our findings appear to be somewhat incongruous with those of Hill [
52]. In this particular work, the author tested whether the interviewers’ own race (Black or White) affected skin tone classification (dark/medium/light) for survey respondents who were themselves, Black or White. They found that Black interviewers tended to categorize skin tones of white respondents as lighter as compared to White interviewers. Similarly, White interviewers tended to categorize skin tones of black respondents to be darker as compared to Black interviewers. We caution, however, that there are key methodological differences between this, and our present study. Specifically, the interviewers in Hill [
52] were not provided with any reference(s) as to what constituted a dark/medium/light tone. The consequence of this approach is that judgments of lightness are (likely by design), encouraged to be subjective, and subject-specific. For example, interviewers could adopt an egocentric reference frame, where darkness or lightness is judged relative to their own skin tone. We conjecture that the difference in expertise-related conclusions drawn from our study versus in those drawn from that of Hill [
52] could be due to such key methodological differences, which may translate into different reference frames being used by observers in the two studies. That is, unlike in the setup by Hill, there is no obvious need for observers in the current study to reference their own skin tone in their matches.
Alternatively, it is also important to consider again the fact that we were unable to test a third group of Afro–Caribbean observers, who have skin pigmentation that differs most markedly from that of both the Caucasian and Chinese groups. A possibility follows then, that the lack of group level differences in the present study is simply due to the fact that the two groups we tested have skin pigmentations that are proximate to one another, and hence, they are rather more expert to the pigmentation of one another than they would be to Afro-Caribbean observers. That is, if expertise effects exist between these two particular groups (Chinese, Caucasians), they may be too subtle to be detected by the current paradigm due to the closeness of the two pigments (and hence, the closeness of their levels of expertise).
One last possibility is that the pattern of contextual effects (i.e., that errors were largest for the Black standard stimuli) was due to participants using specific areas of the face when performing their matches. If this is the case (and due to our manner of contrast normalization—histogram normalization), then errors would be exaggerated for matches involving the two most discrepant stimuli for the particular area(s) considered. This would most likely affect trials involving either a Black standard or variable stimulus. Still, to produce consistent effects, all (or the majority) of observers would have needed to perform matches by using the same restricted regions, which we deem rather unlikely. Moreover, observers were explicitly instructed to consider the stimuli in their entirety when performing their matches.
Still, the homogenous results across the two groups tested here are striking. We consider it unlikely for recent perceptual experience with Chinese faces in the White expert group to have diminished group-based effects. Regardless, as many of our White observers were exchange students nearing the close of their single or dual-term study, we were able to compare matching errors between those who were roughly subdivided into half- (
n = 11) or greater-than-half-year (
n = 12) exposed groups. Comparisons after subdividing this group in such a manner revealed no significant difference in terms of matching errors between the two subgroups [2 (subgroup) × 3 (standard stimulus’ race) × 2 (same/different races of the standard/variable stimuli) ANOVA; F(1, 21) = 0.14,
p = 0.72, and no interactions involving subgroup]. Previous work has revealed that, rather than the amount of contact, early experience with other race faces seems to have a stronger influence in the development of other-race effects (e.g., [
53,
54,
55]). For instance, Heron-Delaney et al. [
55] reported that infants’ ability to discriminate faces of their own versus other races are gradually lost (along with any other race advantages) if they are not exposed to other races by the age of 9 months. In the present study, as we classified our experts based on their ethnicity, and only ensured the respective groups spent their formative years in predominantly White/Asian settings, we assumed that they all should be well positioned as White/Chinese experts; we don’t know, and indeed it is very difficult to obtain a reliable method to quantify however, the extent to which these individuals were exposed to other-race faces during these early years. Nonetheless, it is important to note that the apparent lack of expertise-dependent effects in our data cannot be attributed to the inability of observers to extract race information from some or all of our stimuli. New observers tested on an explicit race categorization task (control task) using the same stimuli could distinguish the relevant categories with 100% accuracy.
One pattern that is particular salient in our data is the fact that matching errors (in both groups) were the smallest for the Chinese/White standard faces as compared to the Black standard faces. What could account for the apparent advantage for matching to these particular standard race categories? One possibility, as noted earlier, relates back to an earlier speculation with regards to a potential role of expertise being masked in our data, due to the proximity between Caucasian and Chinese skin pigmentation, and hence potentially enhanced expertise of these observers to these two racial categories of stimuli. As an alternative, our data might instead relate to the computational need to integrate and estimate reflectance information across space. Similar to Levin and Banaji [
25]’s prototype faces, even though all of our faces were matched for mean contrast and luminance (histogram-normalized), facial features of faces belonging to different race categories may inevitably lead to significant variations in the spatial distributions of luminance information [
56]. Indeed, early research has demonstrated that the spatial histogram for faces is unique to a particular individual and even individuals of the same race category can produce different distributions under the same pose [
57]. As a result, it is possible that the contextual influence of race category in face lightness perception emerges during the process in which observers need to compute an overall estimate of mean lightness across the spatial extent of the stimulus. As noted by Firestone and Scholl [
31] as well as Zeimbekis and Raftopoulos [
56], despite efforts to normalize mean luminance and contrast, there remain low-level (spatial) differences between the stimuli of different race categories. Black faces appear to have more illuminated brows and cheekbones as compared to White and Chinese faces, and the faces of the different races perhaps differ most markedly in the bottom half. While our procedures of luminance and contrast normalization should have acted to reduce the saliency of these differences somewhat, they are nevertheless still retained to some degree. Still, we note that in their reply to Firestone and Scholl [
31], Baker and Levin [
32] used photonegative (luminance-inverted) versions of blurred stimuli (which preserved such spatial heterogeneities across races, albeit inverted), and found that observers no longer judged one face to be lighter than the other (which was the case of judgments for the veridically blurred stimuli). Thus, it seems unlikely that low-level differences alone, and specifically with respect to differences in spatial distribution, could explain the data observed here (and elsewhere; [
25,
32]).
We instead speculate that contextual influence of lightness judgments set in when race knowledge, following categorization, interferes with the subsequent computations of mean lightness estimations. In other words, having identified the race of a particular standard stimulus, participants may have used existing assumptions as to reflectance distributions typical of that race to guide their estimations. In this manner, assumptions regarding the comparatively more homogenous distribution of White and Chinese faces may have been a better match to the face stimuli as presented than assumptions regarding the Black face. Correspondingly, varying degrees of accuracies in terms of estimating luminance values of the standard stimuli across the three race categories used here would then give rise to larger/smaller matching errors. Verification of this speculation, of course would require a solid means to quantify observers’ priors, in conjunction with systematic variations of how well stimuli conform with/deviate from these expectations.
While we do not wish to engage in a discussion about whether such an interaction between race assumptions (in terms of the expected surface reflectance patterns) and traditional mechanisms for reflectance computations would constitute a ‘top-down’ contextual influence, we note that such an interaction would be congruent with Baker and Levin [
32]’s observations that lightness judgments relate well to observers’ assignments of race categories. Critically, however, if we draw upon speculations underlying other-race effects in the traditional face literature, that is, that such effects manifest in better efficiencies in encoding own-race category faces [
40,
44,
45,
46]—then it follows that we should have observed additional expertise-dependent modulations in lightness judgments should the two phenomena be related. This was not the case. We therefore conclude that context (race) effects in face lightness perception are not related to the classical other race effects in face perception.