Champ versus Chump: Viewing an Opponent’s Face Engages Attention but Not Reward Systems

: When we play competitive games, the opponents that we face act as predictors of the outcome of the game. For instance, if you are an average chess player and you face a Grandmaster, you anticipate a loss. Framed in a reinforcement learning perspective, our opponents can be thought of as predictors of rewards and punishments. The present study investigates whether facing an opponent would be processed as a reward or punishment depending on the level of difﬁculty the opponent poses. Participants played Rock, Paper, Scissors against three computer opponents while electroencephalographic (EEG) data was recorded. In a key manipulation, one opponent (HARD) was programmed to win most often, another (EASY) was made to lose most often, and the third (AVERAGE) had equiprobable outcomes of wins, losses, and ties. Through practice, participants learned to anticipate the relative challenge of a game based on the opponent they were facing that round. An analysis of our EEG data revealed that winning outcomes elicited a reward positivity relative to losing outcomes. Interestingly, our analysis of the predictive cues (i.e., the opponents’ faces) demonstrated that attentional engagement (P3a) was contextually sensitive to anticipated game difﬁculty. As such, our results for the predictive cue are contrary to what one might expect for a reinforcement model associated with predicted reward, but rather demonstrate that the neural response to the predictive cue was encoding the level of engagement with the opponent as opposed to value relative to the anticipated outcome. post-study)


Introduction
In head-to-head competition, knowing one's opponent can offer strategic advantage. For instance, in the Netflix series "The Queen's Gambit" after learning from previous losses Beth Harmon uses her knowledge of her opponent on her way to victory against Chess Master Vasily Borgov. Knowing one's opponent can also indicate the likelihood of a favorable (or unfavorable) outcome. Consider the expectations of a tennis player about to face 23-time Grand Slam Champion Serena Williams, versus a match against a random opponent from a neighboring tennis club. Here, we sought to use electroencephalography (EEG) to examine the cognitive processes underlying competitive games. While not as skill based as the aforementioned examples, given some of the limiting methodological factors of EEG, we decided to explore real-world competitive contexts by using simple games.
Indeed, simple games are fruitful tools for our efforts to understand the mind in a competitive setting. Games such as Rock, Paper, Scissors (RPS) or Tic-Tac-Toe are widely known, and thus are easy to explain to experimental participants. They are often enjoyable to play, leading to increased engagement in the experimental task relative to prototypical the case. Although originally associated with infrequent novel stimuli, the P3a has more recently been linked with attentional processes in general [18]. Thus, while novel or rare items can grab our attention, eliciting an enhanced P3a, other types of stimuli can as wellspeech sounds [19], emotional stimuli [20], and television advertisements [21]. Faces are also known to elicit a P3a component. Specifically, it has been demonstrated that activity in this time range is greater for emotional faces compared to neutral faces [22,23] and for normal compared to distorted or inverted faces [24,25] (Here we include studies on both the P3a and the P250, as these are thought to be the same component [26,27]). Our exploratory analysis asked whether the P3a component is also sensitive to opponent ability.

Participants
Participants were recruited from the subject pool at Dalhousie University. 21 people took part in the study over the course of a single, 2.5 h session. They were compensated with course credit (0.5/30 min) for their time. All participants provided informed consent approved by the Health Sciences Research Ethics Board at Dalhousie University.

Stimuli & Procedure
Participants were seated 75 cm in front of a 22-inch LCD monitor (75 Hz, 2 ms response rate, 1680 by 1050 pixels, LG W2242TQ-GF, Seoul, Korea). Visual stimuli were presented using the Psychophysics Toolbox Extension [28,29] for MATLAB (Version 8.2, Mathworks, Natick, MA, USA). Participants were given both verbal and written instructions in which they were asked to minimize head and eye movements.
Participants played 150 blocks of RPS against three virtual opponents. The opponents were well known celebrities of the same gender as the participant. Each block consisted of three rounds (one for each opponent), and the order of opponents within each block was randomized. Rounds began with the presentation of a central fixation cross for 600-1000 ms. The opponent's face then appeared above the fixation cross for 1500-2000 ms, followed by the appearance of three hand shapes indicating the possible choices (rock, paper, or scissors). This was the participant's cue to choose, which they did by clicking on the appropriate hand shape using a computer mouse. The participant's choice then appeared in the lower center of the screen for 600-1000 ms. Finally, feedback-the opponent's choice-was displayed for 1200-1500 ms. See Figure 1 for a sample round. All jittered intervals were drawn from random uniform distributions. See Supplementary Material for the script that was read to participants.

Data Collection
Our experimental software recorded the identity of each opponent (hard, easy, average), the participant choice (rock, paper, scissors), response time, and trial outcome (tie, loss, win). The "opponent ability" scores for each opponent (hard, easy, mid) and time point (pre-study, break 1, break 2, post-study) were recorded on paper and later transcribed.
EEG was recorded from 64 electrode locations using Brain Vision PyCorder software (Version 1.0.4, Brain Products GmbH, Munich, Germany). The electrodes were mounted in a fitted cap with a standard 10-20 layout and were recorded with respect to a virtual Unbeknownst to participants, we varied opponent difficulty by controlling the number of wins, losses, and ties. The "hard" opponent won approximately 60% of the time, lost 20% of the time, and tied 20% of the time. The "easy" opponent was the reverse-they lost 60% of the time, won 20% of the time, and tied 20% of the time. The "average" opponent won, lost, and tied with equal probability. These outcome frequencies were achieved by sampling from a random uniform distribution prior to the start of the experiment in order to generate the predetermined outcome sequence. Opponent choice was then determined at the time of participant choice according to the predetermined outcome, e.g., if the predetermined outcome was "win" and the participant chose "scissors" then the opponent choice was "paper". To gauge their perception of opponent difficulty, participants completed an "opponent ability" survey prior to the study, at two points during the study, and at the end of the study. See Supplementary Figures S1 and S2 for the opponent ability surveys.

Data Collection
Our experimental software recorded the identity of each opponent (hard, easy, average), the participant choice (rock, paper, scissors), response time, and trial outcome (tie, loss, win). The "opponent ability" scores for each opponent (hard, easy, mid) and time point (pre-study, break 1, break 2, post-study) were recorded on paper and later transcribed.
EEG was recorded from 64 electrode locations using Brain Vision PyCorder software (Version 1.0.4, Brain Products GmbH, Munich, Germany). The electrodes were mounted in a fitted cap with a standard 10-20 layout and were recorded with respect to a virtual ground built into the amplifier. Electrode impedances were below 20 kΩ when the recording began and the EEG was sampled at 500 Hz and amplified (ActiCHamp, Brainproducts GmbH, Munich, Germany).

Behavioral
The data file for one participant was lost. For the remaining 20 participants, we computed the mean number of each outcome type and mean response time for each opponent. Questionnaires for four participants also went missing.

EEG
The EEG was analyzed using the EEGLAB library for MATLAB [30]. The EEG was first downsampled to 250 Hz, then filtered through a 0.1-30 Hz bandpass filter. Next, we applied a 60 Hz notch filter to reduce line noise power. The data were then re-referenced to the average of the two mastoid channels, which were removed from subsequent analysis. Next, noisy channels were removed from the dataset. On average, we removed 0.67 channels, 95% CI [0.25, 1.08]. No more than three channels were removed for any participant.
We then used independent component analysis (ICA) to identify and correct ocular artifacts. First, the ICA was trained on three-second epochs starting at the presentation of the opponent's face. Epochs with large artifacts (voltage change exceeding 500 µV) were excluded from the ICA training. We then used the iclabel function to identify components that were more likely to be eye-related than brain-related, which we removed from the data. Finally, any electrodes that were previously removed due to noise were interpolated.
Feedback-locked and face-locked ERPs were constructed by first creating epochs from 200 ms pre-feedback to 600 ms post-feedback. Epochs were excluded from further analysis according to the following artifact rejection criteria: a voltage exceeding +/− 100 uV, a voltage difference exceeding 100 uV, a sample-to-sample difference of more than 40 uV, or all voltages below 0.1 uV. On average, we removed 1.50% of feedback-locked epochs, 95% CI [0.87, 2.12], and 2.27% of face-locked epochs, 95% CI [1.35, 3.20].
Wins, Losses, and Ties. Feedback-locked epochs were then averaged for each participant and outcome condition (win, loss, tie). To define a reward positivity score, we used the method of difference waves, subtracting the loss waveform from the win waveform for each participant. To capture the peak of this difference wave, we focused on electrode FCz, a known scalp location of the reward positivity [8].
We then identified the time points at which 75% of the maximum voltage was reached in the grand average difference waveform: 284-365 ms, which is compatible with previous studies [31]. Finally, we computed the mean voltage in this time window for each participant and condition (win, loss, tie).
Outcome Expectancy. To examine the effect of expectancy on feedback processing, we also created average "win" and "loss" waveforms for each opponent type (hard, easy, mid). Win and loss waveforms were then subtracted in such a way that feedback expectancy was matched. For example, we compared easy opponent losses to hard opponent wins-in both cases, the outcomes were rare. Matching outcomes in this way is important because expectancy is a known confound of the reward positivity [8]. Difference waves were created for each expectancy condition (low, medium, high) and a reward positivity score was computed by averaging at the same scalp location and over the same time window as before-that is, the time window identified from collapsing across all conditions, a method that is unbiased towards any of the expectancy conditions [32].
Opponent Faces. Finally, we examined the face-locked response by averaging over each opponent type (hard, easy, average) for each participant. Epochs were excluded from the average using the same artifact detection procedure as before. Upon examining the facelocked waveforms, we noted that the pattern of deflections did not match that of a typical reward positivity-there was no prominent negative deflection in any of the conditions [7,8]. Rather, we observed a prominent positive deflection in each condition, which appeared to scale to opponent difficulty. The effect appeared to be in the P3a time range, and an exploratory analysis was conducted. To isolate the effect, we constructed a difference wave by subtracting the "easy" face from the "hard" face. We then calculated the mean voltage within a time window (220-256 ms) and electrode (Cz) where the difference was greatest.

Inferential Statistics
Response times were analyzed using a one-way repeated-measures ANOVA. Of the three response time conditions, two failed the Shapiro-Wilk test of normality (easy and mid). However, no corrections were made as the one-way ANOVA is robust to violations of normality. Opponent ratings were analyzed using a 3 (opponent: hard, easy, mid) X 4 (time: pre-study, break 1, break 2, post-study) repeated-measures ANOVA. Participant response choice 3 (rock, paper, scissors) relative to opponent ability 3 (easy, mid, hard) was analyzed with a repeated-measures ANOVA.
The assumption of sphericity was tested using Mauchly's test for all repeated-measures ANOVAs and was not violated thus no corrections were applied. For each EEG analysis (outcome type, outcome expectancy, opponent) we analyzed the resulting scores using a one-way repeated-measures ANOVA after verifying the assumption of normality using the Shapiro-Wilk test. For the one-way ANOVAs, we computed two effect sizes: partial eta squared (η p 2 ) and generalized eta squared (η g 2 ).
The tests showed that participants chose their first-ranked response more often than their second-ranked response, t(20) sponse interaction, F(4,80) = 1.15, p = 0.34, ηp = 0.05, ηg = 0.05. In other words, the distribution of responses was not influenced by opponent. We collapsed across opponent type and conducted three post-hoc tests against a Bonferroni-adjusted alpha value of 0.017 (0.05/3). The tests showed that participants chose their first-ranked response more often than their second-ranked response, t (20) Table 1 and Figure 4). Three post-hoc tests against a Bonferroniadjusted alpha value of 0.017 (0.05/3) were done to compare final opponent ratings. The tests showed a difference between the final "hard" rating and the final "easy" rating, t(16) = 5.51, p < 0.001, Cohen's d = 1.34, between the final "hard" rating and final "average" rating, t(16) = 4.41, p < 0.001, Cohen's d = 1.07, and between the final "average" rating and   Table 1 and Figure 4). Three post-hoc tests against a Bonferroni-adjusted alpha value of 0.017 (0.05/3) were done to compare final opponent Games 2021, 12, 62 7 of 12 ratings. The tests showed a difference between the final "hard" rating and the final "easy" rating, t(16) = 5.51, p < 0.001, Cohen's d = 1.34, between the final "hard" rating and final "average" rating, t(16) = 4.41, p < 0.001, Cohen's d = 1.07, and between the final "average" rating and the final "easy" rating, t(16) = 2.99, p = 0.009, Cohen's d = 0.73.

Face Processing
An analysis of the average waveforms locked to the onset of the face of each opponent type (hard, easy, average) revealed an effect of opponent type on the average voltage from 220-256 ms at electrode Cz, F(2,40) = 12.18, p < 0.001, η p 2 = 0.39, η g 2 = 0.04 ( Figure 5).

Discussion
In the present experiment we had participants play RPS against three virtual opponents-one that was "hard" and won most of the time, one that was "average" and won, lost, and tied with equal frequency, and one that was "easy" and lost to participants most of the time. In terms of our behavioral results, we found what was expected-participants lost more against the hard opponent, won more against the easy opponent, and had equivalent outcomes against the average opponent. Interestingly, we did not find a difference in response time for participants moves against any of the three opponents-thus a speedaccuracy tradeoff was not observed [33,34]. Further, we did not find any difference in response selection in relation to opponent ability (see Figure 3). This is important, as it suggests that participants were engaged and trying to "outwit" their opponents. Moreover, it suggests that when competitive contexts are constantly changing, participants are likely to defer to a stable gameplay strategy regardless of the learned difficulty of their immediate opponent. It would be interesting to explore whether this stability in response selection strategy persists under conditions wherein competitive contexts are less dynamic (i.e., if one faces the same opponent for multiple sequential trials).
In terms of our ERP data, we observed a clear reward positivity when comparing wins and losses. A similar difference was seen when we compared wins to ties (Figure 4). This finding is in line with a wide range of work showing that feedback indicating the outcome of a choice elicits a reward positivity [6][7][8]11,35]. Further, this finding also suggests that, to some extent, a reinforcement learning system in the brain was engaged during gameplay. However, contrary to our hypothesis and contrary to previous work [9,10], we did not find the reward positivity to be modulated by expectancy. What about the neural responses to the opponents themselves? Again, contrary to our hypothesis, and contrary to previous literature on the evoked response to predictive cues [13][14][15], the faces of the opponents did not elicit a reward positivity. One potential reason for this is that we used faces instead of the simple stimuli that were used in previous experiments like col-

Discussion
In the present experiment we had participants play RPS against three virtual opponentsone that was "hard" and won most of the time, one that was "average" and won, lost, and tied with equal frequency, and one that was "easy" and lost to participants most of the time. In terms of our behavioral results, we found what was expected-participants lost more against the hard opponent, won more against the easy opponent, and had equivalent outcomes against the average opponent. Interestingly, we did not find a difference in response time for participants moves against any of the three opponents-thus a speedaccuracy tradeoff was not observed [33,34]. Further, we did not find any difference in response selection in relation to opponent ability (see Figure 3). This is important, as it suggests that participants were engaged and trying to "outwit" their opponents. Moreover, it suggests that when competitive contexts are constantly changing, participants are likely to defer to a stable gameplay strategy regardless of the learned difficulty of their immediate opponent. It would be interesting to explore whether this stability in response selection strategy persists under conditions wherein competitive contexts are less dynamic (i.e., if one faces the same opponent for multiple sequential trials).
In terms of our ERP data, we observed a clear reward positivity when comparing wins and losses. A similar difference was seen when we compared wins to ties (Figure 4). This finding is in line with a wide range of work showing that feedback indicating the outcome of a choice elicits a reward positivity [6][7][8]11,35]. Further, this finding also suggests that, to some extent, a reinforcement learning system in the brain was engaged during gameplay. However, contrary to our hypothesis and contrary to previous work [9,10], we did not find the reward positivity to be modulated by expectancy. What about the neural responses to the opponents themselves? Again, contrary to our hypothesis, and contrary to previous literature on the evoked response to predictive cues [13][14][15], the faces of the opponents did not elicit a reward positivity. One potential reason for this is that we used faces instead of the simple stimuli that were used in previous experiments like colored shapes. With that said, this is not a likely explanation because faces have been shown to elicit a reward positivity when playing the ultimatum game [13][14][15]. At this time, we are uncertain why the faces in our experiment did not elicit a reward positivity and further work is needed.
However, and interestingly, we did observe a clear P3a ERP response that differentiated opponents faces ( Figure 5). As discussed previously, there are theoretical reasons why the P3a-indicative of attentional orienting [18] may differentiate between opponent types. Our results suggest a scaling of attentional orienting to opponent ability-the harder they are to beat, the more attention is engaged at the onset of the opponent's face. We can imagine two reasons for this. The first is that the P3a indicates a marshalling of resources in preparation for the upcoming response. Supporting this hypothesis, the P3a has been linked to enhanced cognitive control [36,37]. For instance, previous work has associated the P3a with task-set uncertainty [38]. It has also been postulated to reflect stimulus entropy, or the amount of information associated with a stimulus over and above stimulus-responseoutcome probabilities, and may represent an aspect of the central bottleneck of attentional control [39].
A second explanation for our ERP results for faces relates to the meaning of the faces-not as low-level stimuli, but as opponents in a game. Under this view, the P3a could reflect the allocation of attention to facilitate learning of opponents' strategies, a task known to activate a "mentalizing" network in the brain [40]. This view harkens back to the context updating theory of the P300, which states that this component is related to revising an internal model of the world [41]. Unlike simulated players in the ultimatum game, who tend to be consistently fair or unfair, we presented participants with three opponents, each generating a distribution of three possible responses. Of relevance in RPS of course is the likelihood of each response for each opponent. Under this view, participants learned about opponents by updating an internal model of opponent strategies, not through a low-level association with rewards/punishments. While it is beyond the scope of this paper, this is an issue that is pertinent to model-based versus model-free reinforcement learning [42,43].
We note a potential issue with our choice of fixed outcome rates as opposed to variable outcome rates. Although participants "played" RPS, outcome likelihoods were fixed throughout the experiment. Thus, participants' actions did not influence outcomes, unlike in real-life RPS. The use of fixed outcome rates has a long precedence in reward positivity research [6,9,10,12,[44][45][46][47][48][49][50][51][52][53][54]. In some studies, fixed outcome rates are a convenient way to investigate the effect of expectancy on the reward positivity (e.g., [9,10]). In others, they provide an important means of controlling "frequency confounds" across experimental conditions [8]. However, fixing outcome rates-and untethering actions from outcomes-may come at the cost of ecological validity [4]. For example, it is yet unknown whether our results would replicate while playing actual RPS against real opponents. While acknowledging this issue, we argue that the use of simple games in neuroscience (and the benefits they afford) sometimes involves methodological compromise [8].
We also note limitations related to our choice of game. Here we relied on RPS to explore interactions with an opponent-a game that requires little skill. Potentially, future work could address this as EEG studies have been done with games such as Blackjack [50] and Chess [55]. Another potential limitation of the present work is that our experimental design did not allow us to discern whether participants thought opponents were actually "better" or whether they were just lucky. Further, a fundamental assumption of the present study was that all participants were familiar with RPS, something that we did not check but should have. Additionally, our game was structured (for methodological reasons) so that the participant always went first. It would be interesting to probe the neural response to opponent goes first trials however this of course is beyond the scope of the present experiment.

Conclusions
Here we used EEG and specifically ERPs to examine how the face on an opponent was processed while playing RPS. Our principal finding was that viewing an opponent's faceas a predictor of their ability-did not engage reward processing systems within the brain. Instead, we found that viewing an opponent's face activates the brain's attentional system with the harder opponent drawing more attention than the average or easy opponents. We interpret this result as an indicator of strategy learning, but not via a reinforcement learning process but rather via the updating of memory for the opponent.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/g12030062/s1, Figure S1: Opponent ability survey for female participants/opponents, Figure S2: Opponent ability survey for male participants/opponents, Figure S3 Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data for this study can be found by contacting the corresponding author.