1. Introduction
Many employees face competition within the workplace. Job promotions, awards such as employee of the month or incentive pay for top performers are a few of the common tournament structures employed in many workplaces.
1 Additionally, employee interaction outside of the workplace is increasingly common.
A study recently published by Millennial Branding found that individuals between the ages of 18 and 29 are friends with an average of 16 co-workers on Facebook.
2 Garcia et al. (2013) [
3] cite determined personal history as a potential influence on competitive behavior. The authors note that situational factors can come into play where comparison and competitiveness are found within what the authors label “Social Category Fault Lines.”
I use a controlled laboratory setting to enhance the understanding of how emotions towards other individuals affect tournament performance, as well as how the behavioral findings may differ across genders. Behavioral responses to emotions, tournament performance and gender differences have been studied separately, but there is little research that combines all three of these variables. My study adds to the body of research in that it aims to assess emotion, tournament performance and gender differences simultaneously.
While individual emotions can be significantly dynamic and complex outside the laboratory, the anonymity of the laboratory allows me to observe the specific impact of laboratory induced emotions on individual behavior in the presence of competition and how the impact on behavior differs across genders. Specifically, I test how negative interpersonal experiences affect performance in competitive situations, while controlling for the level of negativity and gender. My study provides a setting in which women are found to respond positively to the presence of a tournament while men are unaffected.
I conducted a two-stage laboratory experiment. In the first stage, a four-person public goods game was used to generate a range of emotions. Subjects observed the individual contribution decisions of all group members. I then asked them to state their feeling about each group member on a scale of one to five, where one indicated strongly negative feelings and five reflected strongly positive feelings. The second stage consisted of a one-on-one tournament using a real-effort task where the assigned opponent is a group member from the public goods game. The opponent remained the same for all five rounds of the tournament, and subjects were informed about the selected opponent’s identity prior to the start of the tournament. Between tournament rounds, subjects were informed whether their number of completed tasks exceeded the number of completed tasks of their opponent, but not how many tasks the opponent completed.
By comparing individual performance in the tournament across the reported feelings, I determined if and how emotions affect tournament performance. Specifically, the experiment investigated if an individual who was matched with an opponent he rated as strongly negative responded with increased performance, compared to an individual matched with a less negatively rated opponent. Additionally, because gender may affect tournament behavior, the study tested how behavior differs between men and women, particularly while controlling for the level of emotion. I found that strongly negative emotions positively affect performance for women but not for men. This observed difference is not due to a difference in the intensity of the generated emotion across genders. As with all laboratory experiments, follow-up experiments and replications are important for confirming and validating the conclusions drawn.
In previous studies, Fehr and Gächter (2000a, 2002) [
4,
5] show that strong negative emotions bring about a desire to punish free-riding in public goods games and, if given the opportunity, subjects choose to undertake costly punishment a significant portion of the time, even when there is no direct monetary benefit to the subject.
In a survey on observed reciprocity in the literature, Fehr and Gächter (2000b) [
6] mention numerous other studies also suggesting that negative reciprocity arises out of a desire to punish “hostile intentions” (e.g., Rabin, 1993 [
7]; Blount, 1995 [
8]; Dufwenberg and Kirchsteiger, 2004 [
9]; and Falk and Fischbacher, 2006 [
10]). In my study, punishment takes the form of increased effort in a real effort task so as to increase the probability of winning the tournament and therefore the other person losing the tournament.
While emotions can influence individual behavior, the effect of this influence may differ between men and women. Bettencourt and Miller (1996) [
11] show through a meta-analysis that while unprovoked males are more aggressive than unprovoked females, this difference is significantly reduced when both genders are provoked. In a review of the literature on gender differences, Croson and Gneezy (2009) [
12] note many observed differences between men and women that are consistent across the data; however, the data on gender differences in public good contributions is mixed. They state that some studies show that women are more pro-social in a public goods game than men (e.g., Seguino, Stevens, and Lutz, 1996 [
13]), whereas other studies find the opposite (e.g., Brown-Kruse and Hummels, 1993 [
14]; Sell and Wilson, 1991 [
15]; and Solow and Kirkwood, 2002 [
16]). The authors note that psychological research indicates that women are more sensitive to social cues and therefore may respond differently depending on the experimental design. In my study, I look at how emotions affect behavioral responses to a tournament environment, specifically when these emotions are connected to the other individuals in the tournament.
Combining gender differences and tournament performance, Gneezy, Niederle, and Rustichini (2003) [
17] demonstrate that males respond more strongly than females to a tournament environment by increasing effort while female effort remains unchanged or decreases across piece-rate versus tournament environments. Similarly, in a study of elementary school children, Gneezy and Rustichini (2004) [
18] assess the running speed of boys and girls in both competitive and noncompetitive environments. They find no difference in running speed between boys and girls in the noncompetitive environment, but do find a significant increase in the running speed of the boys when presented as a competition. The running speed of the girls did not change significantly, thus creating a significant gender gap in competition performance. Additionally, in a study assessing gender differences in tournament entry decisions, Niederle and Vesterlund (2007) [
19] show that, when given the option of compensation from either a piece-rate scheme or a tournament, males choose the tournament environment significantly more than females. The authors show that even when the tournament may be beneficial for high performing females, women tend to avoid competitive tasks. In contrast, a recent study by Cassar, Wordofa, and Zhang (2016) [
20] finds that the gender gap in tournament selection is erased when incentives benefit one’s child such as workplace daycare as opposed to monetary. This recent study provides evidence that there may not be an actual difference in preference for competition between males and females but rather that in the right environment and with properly aligned incentives, women can be enticed to compete as vigorously as men. Additionally, a difference in beliefs about future performance may be responsible for the observed difference in tournament entry rates across genders rather than a difference in preference for competition. Using a female stereotyped task, I find no difference in tournament entry rates across genders while I do find a difference in tournament entry rates across genders using a male stereotyped task (Halladay, 2016) [
21]. This suggests a difference in beliefs about future performance is the channel through which differences in tournament entry rates across genders operates.
To the best of my knowledge, Gneezy and Imas (2014) [
22] is the only experiment combining emotion and tournament performance. The authors find that with a strength-based task, anger improves performance. However, their all-male subject pool does not allow for the analysis of gender differences.
The paper proceeds as follows. 
Section 2 outlines my experimental design and my hypotheses and predictions. I present my results in 
Section 3 and discuss the implications of these results in 
Section 4. 
Section 5 concludes.
  2. Materials and Methods
The experiment takes place in two stages. The first stage presents a situation that can trigger feelings in the laboratory, both positive and negative, while the second stage is a tournament.
3 Participants do not learn about the nature of the second stage until after the conclusion of the first stage.
In the first stage, experimental subjects are randomly placed into groups of four to play a one-shot public goods game with voluntary contributions. Prior to the allocation stage, subjects learn how to calculate payoffs through a series of examples. Furthermore, I require that subjects successfully calculate payoffs for two hypothetical scenarios on their own before the first stage begins. Subjects start with 
$7.00, total contributions are multiplied by 1.6, and then distributed equally to all group members. Additionally, subjects are told that all contribution decisions will be revealed to the group with identification by subject ID numbers. After allocation choices, subjects learn their own payoffs and the contributions and payoffs of the other three group members. Subsequently, subjects provide feedback about their feelings regarding the other three group members using a five-point scale. A rating of one indicates strongly negative feelings, and a rating of five indicates strongly positive feelings. A rating of three denotes neutral feelings, neither positive nor negative. I include the neutral feelings rating option in the event that a subject does not feel they can rate a group member. A screen shot of this zTree screen can be found in the 
Appendix.
For the second stage, I match each participant with one other participant from their original group of four for five rounds of a slider task using a tournament payment scheme (Gill and Prowse, 2013) [
23]. Participants are informed about the subject ID (e.g., Person 1) of the selected individual on the instructions screen for the second stage.
I followed essentially the same procedures as Charness and Villeval (2009, AER) [
24]. This design is also similar to many studies that use a matching groups protocol (e.g., Charness, 2000 [
25]; Greiner and Vittoria Levati, 2005 [
26]; Charness, Fréchette, and Qin, 2007 [
27]).
After reading the instructions, but before beginning the tournament, I remind subjects of the results of the first stage and the ratings they assigned to each group member. In each of the five rounds, participants have 90 seconds to complete as many slider tasks as possible. A slider bar is complete if the subject slides the marker exactly to the halfway position (50). Initially, subjects see a screen displaying 48 slider bars. If the subject completes all 48 slider bars within the 90 s, I give subjects an additional 48 slider bars to complete to ensure performance is not constrained. As the round progresses, the screen displays how many sliders the subject has successfully completed. A screen shot depicting this zTree screen can be found in the 
Appendix. A participant wins the tournament round if he completed more slider tasks than his opponent. At the end of each round, each participant learns whether he won the tournament but not how many tasks the matched person completed. Participants were told that one of the five tournament rounds would be randomly selected for payment in addition to the public goods game payoff. If the participant won the selected round, he receives 
$2.50. Total payment consisted of the payoff from the first stage and the result of the tournament. Subjects earned 
$8.36 on average from the public goods game.
My design follows a between-subjects design. I conducted three treatments: positive feelings (PF), negative feelings (NF) and median feelings (MF). In the PF treatment, individuals were matched with the group member they rated most positively, while in the NF treatment, individuals were matched with the group member they rated most negatively. In the MF treatment, individuals were matched with the group member they rated intermediately compared to the other two group members.
4 Treatments did not vary within sessions.
This experiment was programmed in z-Tree (Fischbacher, 2007 [
28]). All sessions took place in the Experimental and Behavioral Economics Laboratory at the University of California, Santa Barbara. I used the University’s ORSEE system to recruit subjects, and all were current students. A total of 180 students participated. I ran a total of 12 sessions with each session having either 12 or 16 subjects (three or four groups). 64 subjects participated in the PF treatment, 72 participated in the NF treatment, and 44 participated in the MF treatment. No subject participated in more than one session or more than one treatment. Average earnings were 
$9.84 and each session lasted approximately 45 min. A set of instructions can be found in the 
Appendix.
  2.1. Hypotheses
This experiment addresses two main hypotheses about how individuals behave in a competitive environment with someone with whom they have some recent experience. Rational economic theory suggests that effort in the competition should be unaffected by the results of the first stage unless first stage contributions to the public goods game are informative about the opponent’s second stage performance/effort choice. If, for example, individuals who are viewed as strongly negative are also more likely to exert higher levels of effort, opponents might respond to the higher expected effort levels by also increasing their own effort.
Will the opportunity to compete with an individual who has generated negative feelings for a subject lead to increased performance? This kind of behavior would be evidence of negative reciprocity. Previous studies show that in public goods games, individuals will punish free riders if given the chance, even when punishment is costly (Fehr and Gächter, 2000a [
4]). Though this experiment does not allow for direct punishment, subjects can increase performance as a means of punishing a non-cooperator. While previous work has focused on punishment through reducing one’s own payoff, another form of punishment is increasing one’s probability of winning a tournament and thus reducing the payoff of the other individual through increased real effort. In another experiment allowing for sanctions, Fehr and Ficshbacher (2004) [
29] find that negative emotions drive sanctioning decisions that promote more pro-social behavior. Additionally, Kahneman, Knetsch, and Thaler (1986) [
30] find evidence of punishment through indirect reciprocity when subjects chose to forgo a larger payoff for themselves to punish an individual who had previously acted unfairly. In an experiment that allows subjects to retaliate against each other, Bolle et al. (2010) [
31] find that when subjects can vent their frustration and anger, social inefficiencies are reduced. As seen in Garcia et al. (2013) [
3], the history between two parties can significantly affect competitive behavior. The first stage of this experiment may indirectly draw “social category fault lines” dividing cooperators and free-riders. It is plausible to consider that given an individual’s negative feelings, these emotions may ’light a fire’, so that he may seek to ’let off steam’, or desire to reestablish dominance, all resulting in increased performance. However, I will be unable to distinguish if this observed increase in performance arises in part from being previously hurt financially by someone, or in part from being matched with someone who harms you by violating a social norm.
Previous research also demonstrates that it is much easier to find evidence of negative reciprocity than evidence of positive reciprocity. The lack of evidence of positive reciprocity in their data led Charness and Rabin (2002) [
32] to not even include positive reciprocity as an explanation for behavior in their model. In an experimental labor market, Charness (2004) [
33] finds strong evidence of negative reciprocity when employer-assigned wages are low, but no significant evidence of positive reciprocity when employer-assigned wages are high. Additionally, Offerman (2002) [
34] provides evidence that individuals respond more strongly to negative intentions as opposed to positive intentions as the result of a self-serving bias. Individuals tend to view positive outcomes as a positive reflection of themselves while they tend to view negative outcomes as a negative reflection of others. Fehr and Gächter (2000a) [
4] demonstrate through punishment in a public goods game that “...there is a large drop in punishments if an individual’s contribution is close to the average...Thus, the more an individual’s contribution falls short of the average the more she gets punished.” If low contributions are viewed more negatively, consistent with Fehr and Gächter, I hypothesize that I should only observe the increase in performance for individuals competing with someone toward whom they have strongly negative feelings. Succinctly:
Hypothesis 1. The number of completed slider tasks will be decreasing in one’s rating of his opponent.
 Gneezy and Rustichini (2004) [
18], find no gender differences in speed when children run alone but do find that boys outperform girls when running in mixed-gender pairs providing further evidence that males tend to be more responsive to competitive environments. Additionally, Buser and Dreber (2014) [
35] show that under a piece-rate payment scheme, men significantly outperform women in the slider task. Therefore, with the combination of the male dominant task and the competition driven male performance, I hypothesize that men will outperform women independent of emotion. To summarize:
Hypothesis 2. The number of completed slider tasks will be higher for men than for women, holding reported emotion constant.
 Lastly, previous research yields inconsistent evidence regarding which gender will be most affected by the emotions. Eckel and Grossman (2005) [
36] find that women are more likely to punish unfair behavior than men. In contrast, Christensen et al. (1983) [
37] find evidence that in romantic relationships, males exhibit a stronger self-serving bias, and they may be more apt to engage in negative reciprocity. Hence, there is no clear prediction about behavior in this respect.
  3. Results
My subject pool was 43.64% male. Subjects contributed an average of 
$2.25 (32% of the endowment) during the public goods game. The average rating of feelings towards the matched opponent over all treatments was 2.9 and the average assigned rating across all group members was 2.89.
5 Table 1 presents the breakdown of opponent ratings by treatment, while 
Table 2 presents the breakdown of opponent ratings by gender. Average tournament performance was 25.07 tasks across all five rounds.
Finding 1: Higher (lower) contributions do lead to higher (lower) ratings:
My data confirm that individuals view low contributions negatively. There is a clear positive relationship between contribution and average rating assigned as shown in 
Figure 1. A test of correlation between contribution and rating assignment affirms the relationship that subjects negatively view low contributions (
, 
p < 0.0000). Using 
Figure 2, I check for average contribution given a rating across genders to determine if there is a gender difference when assigning ratings based on contribution levels. Confirmed by pair-wise 
t-tests, there is no evidence that women and men have different thresholds for assigning a specific rating. 
Figure 3 suggests that contributions that fall below the group average are viewed negatively, while contributions matching or exceed the group average are viewed positively, further evidence that lower contributions lead to lower ratings.
Figure 4 illustrates that there is also no gender difference in contributions to the first stage public goods game. A Kolmogorov-Smirnov test used to detect a difference in the distribution of public goods game contributions for men and women is not significant (
p = 0.147) and therefore there is no evidence that there exists a gender difference in the distribution of contributions. Furthermore, a test of the equality of means for average public goods game contributions can also not be rejected (
p = 0.170). I find no evidence that average contributions differ across genders.
 Because subjects receive feedback between tournament rounds, the forthcoming analysis will proceed using only data from the first period in order to avoid learning and feedback effects. Dechenaux et al. (2015) [
38] note that research on feedback in tournaments is complicated and varied. There is no set conclusion on the effect of feedback across all experimental environments. In fact, the effect of feedback is quite sensitive to the specific environment including incentive structure, timing of feedback, structure of feedback, number of rounds, etc. For example, Gneezy and Rustichini (2004) [
18] demonstrate that in the presence of relative performance feedback, the performance of boys increased while the performance of girls was unaffected. Using International Tennis Federation data, David Wozniak (2012) [
39] finds that while males are influenced by performance over many periods and their behavior seems to reflect a belief in a “hot hand”, females are influenced by their most recent tournament performance. Additionally, previous work in psychology has shown that women view their success as the result of good luck while men view their success as the result of their own ability. Further, research has shown that emotional responses tend to die out over time, and therefore the all period analysis may fail to capture the behavioral responses tied to the reported emotions. Grimm and Mengel (2011) [
40] find that low ultimatum game offers are accepted 60 to 80% of the time when subjects are given a ten minute delay prior to the rejection decision, whereas these low offers are only accepted 20% of the time without a delay. An additional concern about the confounds of the all period analysis is that it will be impossible to disentangle the behavioral responses due to the first stage induced emotions and the behavioral responses due to the feedback induced emotions. Not only will the emotions from the first stage diminish, but it will be unclear when and to what extent the feedback emotions take over.
6 While the results using only period one and all five periods are relatively consistent, it is clear that the results from all five rounds are potentially confounded due to the between round feedback and the time elapsed.
7As illustrated in 
Figure 5 and 
Figure 6, men significantly outperform women in every round when aggregating all subjects regardless of opponent rating. For subjects with an opponent rated strongly negative, there is no statistically significant gender gap in performance for each round. For the subjects without an opponent rated strongly negative, men outperform women in every round. Because of the between round feedback, interpreting the evolution of performance across rounds can be very complex as one needs to control for the history of feedback. For this specific study, I do not have a sufficient number of observations to fully separate individuals by the entire feedback history and then provide meaningful analysis. Instead, I will focus on the behavioral responses in Round 2 after subjects receive their first between round feedback.
8 Figure 7 depicts the results for Round 2 and illustrates the varying responses to feedback across genders. Similar to the finding of Gneezy and Rustichini (2004) [
18], men increase performance in Round 2 after the feedback significantly more than women (
p = 0.002). When paired with a liked individual, men and women respond similarly regardless of relative performance feedback (
p = 0.610 and 
p = 0.452). When paired with a disliked individual, women respond similarly regardless of the relative performance feedback (
p = 0.767). When paired with a disliked individual, men who win Round 1 significantly increase effort in the next round compared to men who lost round 1 (
p = 0.038). These results further emphasize the contamination concerns of the all-period analysis. The remaining analysis using all five rounds is available in the 
Appendix.
While 
Table 3 illustrates there is an effect when running the regression analysis using the randomly assigned treatment, the effects are weaker because, as shown in 
Table 1, there are still subjects matched in the tournament with an individual assigned the strongly negative rating despite being in the MF or PF treatments. Therefore, using the treatment variable is a much noisier, however still significant, signal of reported feelings.
9 The average opponent rating in the NF treatment was 2.06, 3.12 in the MF treatment, and 3.73 in the PF treatment. All three of these pair-wise tests of the equality of means are significant (
p ≤ 0.008). It does appear that the treatment captured a difference in opponent ratings but is clearly a noisier signal. The remaining analysis will group data into bins by assigned opponent rating as opposed to treatment.
10Finding 2: Performance is higher for individuals competing with someone rated as strongly negative:
Not controlling for gender, average performance is higher for individuals whose opponents received ratings of one (strongly negative feelings) compared to ratings two through five.
11 For the individuals matched with an opponent rated one, average performance was 15.91. For opponents rated two through five, average performance was 12.79, 10.79, 11.00 and 12.54, respectively. This sharp increase in performance is illustrated in 
Figure 8, and a two-sided 
t-test comparing the difference in performance for subjects matched with an opponent who received a rating of one versus grouping subjects matched with opponents who received ratings greater than one (
Figure 9) is significant (
p = 0.0169).
12 I have evidence that strongly negative personal history significantly increases competitive behavior.
These results are in line with Fehr and Gächter (2000a) [
4] where punishment and negative emotions both intensify the larger the negative deviation from the group average. In 
Figure 3, I demonstrated that ratings of one and two pertained to below-average contributions while ratings three through five reflected above-average contributions. Keeping with Fehr and Gächter (2000a) [
4], I would expect to see punishment when the contributions are below average, but not necessarily when the contribution is close to the group average. The average group deviation for opponents rated strongly negative (a rating of one) was −
$1.80 and the deviation for opponents rated somewhat negative (a rating of two) was −
$1.08. Both of these values are significantly different from zero and negative, however, in terms of potential contribution values, because subjects were restricted to contributing whole numbers, a deviation of one from the group average does not seem to indicate a significant deviation. Therefore similar to Fehr and Gächter (2000a) [
4], I would not expect to see an increase in effort for individuals matched with subjects rated somewhat negative.
13 However, it seems plausible that because the range of potential contributions was only [0,7] inclusive, a deviation of roughly two from the group average would be viewed much more significantly. As with Fehr and Gächter (2002) [
5], this is where I would expect to see the increase in punishment. These results support Hypothesis 1 that performance will increase when competing with someone about whom you feel strongly negative.
Finding 3: Performance only increases significantly when the behavior of the opponent is particularly flagrant:
Using a Kruskal-Wallis test for a difference in median performance among individuals competing with subjects rated two through five, I find that there is no difference in effort among these four groups (p = 0.4168). Additionally, a Kruskal-Wallis test for a difference in median performance across all ratings is marginally significant, suggesting at least one of the medians differs (p = 0.0813). It must be then that this difference lies in the median performance of individuals competing with an opponent viewed strongly negative. Pairwise t-tests tests produce relatively consistent results with the mean performance of individuals competing with a subject rated one being significantly different than subjects rated three, four or five (p = 0.0229, p = 0.0554, p = 0.1781, respectively). The p-value on the two-sided t-test comparing average performance of individuals with opponents rated one and opponents rated two is insignificant at 0.2062. This may very well be due to the small sample sizes in each of the rating bins. All t-tests comparing opponents of ratings two through five could not reject the null hypothesis of no difference in average performance. Behavior appears to differ when the emotions involved are strongly negative, supporting Hypothesis 1.
Finding 4: Male performance is higher than female performance:
Looking at average effort by gender, my data supports the hypothesis that men have higher performance in the tournament. Average male performance is 14.85 tasks while average female performance is 10.52 tasks. This difference is significant using a two-sided 
t-test (
p = 0.0030). The creators of the slider task, Gill and Prowse, provide evidence that within their subject pool, male and female behavior was not significantly different. By the final round, men completed 25.75 tasks on average while women complete 26.83 tasks on average. 
Figure 10 corroborates this finding, illustrating that male performance is above the performance of females for every reported emotion other than strongly negative emotions.
Finding 5: Women respond more strongly to negative emotions:
Figure 11 allows a comparison in performance for opponent ratings of one versus ratings above one by gender. The 
t-tests in 
Table 4 illustrate five results. All 
p-values presented in 
Table 4 are two-sided unless otherwise specified. Male performance rises by 0.98 tasks (
p = 0.7430) on average when the emotions are strongly negative while female performance rises by 6.96 tasks (
p = 0.0012) on average in response to the strongly negative emotions. In the absence of strongly negative emotions, males perform 5.45 tasks (
p = 0.0004) more than females, however, with strongly negative emotions, this difference falls to 0.53 tasks (
p = 0.8850). All of these results hold as well using a Fisher’s exact test on the difference in medians. The gender gap in competition performance is eliminated in the presence of this strongly negative emotional stimulus. Women seem to be responding to the emotion more significantly than men. This notion is confirmed by the difference-in-differences estimate provided in the regression results of Column 4 on 
Table 5 (
p = 0.098). The observed increase in performance when competing with a negatively viewed opponent appears to be purely driven by the female response as male performance is unaffected. Though men are more competitive across the board, negative emotions appear to evoke a “competitive fire” in women. A similar analysis comparing performance across subjects who were “badly wronged” in the public goods game can be found in the 
Appendix.
 Lastly, it is worth noting that aggregate performance is significantly higher for pairs with one member viewed strongly negative. In pairs with an opponent assigned a rating of one, aggregate performance is 4.4 tasks higher than in pairs without. This difference is marginally significant using a two-sided t-test (p = 0.0846).
Regressions in 
Table 5 support the above results. All regressions included session dummy variables and demographic variables including whether a subject had been in an economics experiment previously, if the subject was an economics/accounting major, and standard errors were clustered at the session level. By including both session dummy variables and clustering on the session level I am essentially producing the first stage of the Donald and Lang (2007) [
45] two-step correction in the presence of a small number of clusters. I do not need to worry about the second stage because the second stage yields coefficient estimates for coefficients of interest that are session specific of which I have none.
In Model (1), I created a dummy variable for opponents rated one or two (strongly negative and somewhat negative, respectively), and another dummy variable for opponents rated four or five (somewhat positive and strongly positive, respectively). I left those with an opponent rated three (neither positive nor negative feelings) as the reference group. Neither of these coefficients were significant which is not surprising given the graphical evidence above demonstrating that this performance increase is found only for those individuals competing with someone rated strongly negative.
I used Model (2) to include dummy variables for all the rating categories, leaving the neither positive nor negative category as the reference group. The strongly negative category was the only close to significant coefficient (p = 0.103) when compared to the reference group using a two-sided hypothesis test, and is significant at the 10% level using a one-sided test. Subjects competing with a strongly disliked (rating of one) individual completed roughly four additional tasks compared to subjects competing with an individual with a rating of three.
This confirms the pattern in 
Figure 8, where the only spike in performance was evident with individuals whose opponents were viewed as strongly negative. The positive coefficient for strongly negative feelings is marginally insignificant using a two-sided test (
p = 0.117), however with a one-sided hypothesis test like the one of interest, I would obtain a significant coefficient at the 10% level. It appears that something is masking the observed increase in performance such as a gender difference driven by an increase in female performance. A simple difference-in-differences should clear this up.
I included an interaction term between the dummy variable for male and the dummy variable for strongly negative feelings to test if the reaction to emotional stimulus was different for men and women. The negative and significant coefficient provides evidence that given an individual is competing with someone viewed as strongly negative, women increase performance by roughly six tasks more than men in the presence of the emotional stimulus. This result corroborates 
Figure 11 and the 
t-tests presented in 
Table 4 that females are driving the increase in performance and are much more responsive to the negative emotions. Interestingly, as shown in 
Figure 11, as well, male performance is unaffected by the negative emotional stimulus. This can be demonstrated by the summation of the coefficients on the strongly negative dummy variable and the new interaction variable. These two coefficients essentially cancel each other out, illustrating that males’ performance is not altered by the presence of these negative emotions. Using a Wald test for the summation of these two coefficients, the hypothesis that the coefficients on the strongly negative dummy and the interaction of the strongly negative dummy with the gender dummy cancel each other out cannot be rejected (
p = 0.9802).
  4. Discussion
My data confirm that when competing with a person towards which one has strongly negative emotions, there is evidence that individual performance increases substantially. Specifically, an increase in female performance drives the observed overall performance increase. This suggests that individuals use the tournament environment as an opportunity to retaliate with matched individuals who they previously experienced negatively. Individuals are willing to undertake costly effort in order to increase their own performance as a means of increasing the probability of a win, suggesting the presence of negative reciprocity.
One possible explanation for the increase in demonstrated performance may be a rational response of individuals who expect negatively rated opponents to compete more vigorously. Average opponent effort by ratings one through five was 12.23, 11.83, 10.54, 14.9 and 12.92, respectively. There is no difference in the distribution of opponent performance levels by opponent rating using a Kruskal-Wallis test (p = 0.6507), therefore there does not seem to be evidence that the increase in effort is driven by a rational response to work harder with a poorly rated individual. Additionally, using an ANOVA, we cannot reject that there is no difference in average opponent performance across opponent ratings (p = 0.4715). While the figure of interest is the performance of individuals rated one, both tests show that there is no evidence that the effort of opponents rated one (strongly negative) is significantly higher than opponents of any other rating (two through five). Because effort levels do not differ by opponent rating, this suggests that the emotional channel is driving the increase in effort.
On the other hand, the increase in performance could be driven by something unique about subjects willing to rate another individual as strongly negative. While 39% of subjects assigned at least one other participant a “strongly negative” rating, not all of these subjects were matched with the specific individual assigned the low rating. Because I observe these subjects across all opponent ratings but do not observe the increase in performance across all opponent ratings, my results do not appear due to something special about subjects who give ratings of strongly negative. Additionally, a test of a correlation between an indicator for whether subjects assigned at least one strongly negative rating and the subject’s own effort is significant (, p-value = 0.0385). However, when I restrict this test to only subjects who never competed with the individual assigned the strongly negative rating, the significance disappears (, p-value = 0.4370). If my results were driven by a concern that there is something unique about individuals willing to assign low ratings, this correlation test would have remained significant even for individuals not competing with the subject with the low rating.
Another possible explanation is that individuals with low profit from the first stage may increase effort in order to increase the probability of winning the tournament thus increasing total payoff. There is no difference in average first stage profit between individuals competing with a strongly negative opponent (rating = 1) and those competing with a somewhat negative opponent (rating = 2) (p = 0.9570). Consequently, if effort were driven by a concern for profit, one would expect to observe an increase in effort in both of these two groups. However, as showed above, the increase in effort was only evident for the individuals competing with a strongly negative opponent. Therefore, it does not appear that the increase in effort is driven by a concern for increasing one’s profit, again, providing evidence that emotions are driving the result.
My difference-in-differences estimate is robust to multiple specifications. Including opponent’s public goods game contribution as a control does little to my difference-in-differences estimate and it remains significant (0.093), even increasing slightly in significance. Controlling for opponent’s public goods game contributions allows me to say that the reported emotions have predictive power over and above proxying for opponent contribution. Replacing subject contribution with subject first stage profit and including opponent’s public goods game contribution as a control, again does little to my difference-in-differences estimate, again, slightly increasing it’s significance (p = 0.087). By including subject’s first stage profit, I can be more sure that income effects are not driving my results.
Regardless of the behavioral motivation, my results suggest that the motivation is largely extrinsic due to the significant increase in performance when competing with an individual rated strongly negative. Though there may be some component of effort that is intrinsic in nature, if the motivation was purely intrinsic such as if individuals viewed the slider task as fun, I would not expect to observe a significant difference in behavior among individuals or a significant treatment effect.
Researchers may argue that a four-person public goods game may make it difficult to establish feelings based on direct intentions. For example, a player may have contributed zero, but the other group members rate this person positively because the choice was seen as the “smart” thing to do. However, given my results and the correlation between contribution and rating, that possibility does not seem to have affected my results. Alternatively, the other group members may rate this person negatively because they are “envious” of not making the same decision rather than “angry”. I cannot differentiate between “envious” and “angry” individuals beyond that both emotions have a negative connotation. Eliciting intensities of specific emotions with both positive and negative affect would be an interesting extension.