- freely available
- re-usable

*Games*
**2011**,
*2*(2),
200-208;
doi:10.3390/g2020200

^{1}

^{2}

## Abstract

**:**We submitted three models to the competition which were based on the I-SAW model. The models introduced four new assumptions. In the first model an adjustment process was introduced through which the tendency for exploration was higher at the beginning and decreased over time in the exploration stage. Another new assumption was that surprise as a factor influencing the weight of a trial in the sampling procedure was added. In the second model we added the possibility of an exclusion of unreliable experiences gained in the early trials of a game and the possibility of a revision of a reasonable alternative which was responsible for a very bad outcome in the previous trial. Three of the four added assumptions were combined in the third model. Because each of our models contains at least two new assumptions, we estimated the relative effect of each assumption on the estimation and prediction scores and carried out a test of robustness. In this way, we were able to clarify the usefulness of each added assumption.

## 1. Introduction

We submitted three models to the market entry prediction competition 2010. All three models are based on the inertia, sampling and weighting (I-SAW) model which will be explained in Section 2. In Section 3 we describe the four additional assumptions we examined throughout the three models, which we present in Section 4. In Section 5 we discuss the relative effect of each added assumption. Lastly, in Section 6 we summarize the analysis results and the theoretical conclusions.

## 2. Description of the Inertia, Sampling and Weighting (I-SAW) Model

Both the estimation experiment and the competition experiment are modeled as a series of M = 40 market entry games that are played by artificial agents. A market entry game G_{m} is characterized by different random values for its five parameters (k, H, pH, L, S). The I-SAW model [1] generates for each market entry game G_{m} a group of N = 4 agents that play repeatedly for R = 50 trials. Each agent i is characterized by five traits whose values differ between agents and are distributed uniformly with ε_{i}∼U[0, .24], π_{i}∼U[0, 6], ω_{i}∼U[0, .8], ρ_{i}∼U[0, .2], and µ_{i}∼U{1, 2, 3}. All agents have the same action space A = {enter, not enter} and each agent i has to choose in each round t ∈ T = {1, … R} an action a_{i,t} ∊ A without knowing how the other agents will decide.

The decision process of each agent i is divided into three stages: exploration, inertia, and exploitation. Exploration implies to enter the market with probability p^{enter} = 0.66 or otherwise not to enter. The probability for an agent to explore is given by

If an agent does not explore, then she enters the second stage. Inertia implies to repeat the last action a_{i,t} = a_{i,t}_{−1} with probability

All agents that have neither entered the exploration stage nor have decided in the inertia stage to repeat their last action, make their decision in the exploitation stage. In this stage each agent chooses the action a_{i,t} ∊ A with the highest estimated subjective value (ESV).

Given the set of payoffs for all past cases X(a_{i,past case}) = {x(a_{i}_{,1}), …, x(a_{i,t}_{−1})} and the number of sample experiences or sample cases μ_{i}∼U{1, 2, 3}, the ESV of action a_{i,t} for an agent i is given by the sum of two terms: the average payoff from all past cases weighted by ω_{i}∼U[0, .8] and the average payoff from the set of sample cases {sample case^{1}, … sample case^{μi}} weighted by (1 – ω_{i}):

^{l}= t − 1 with probability ρ

_{i}∼U[0, .2] and otherwise sample case

^{l}∼U{1, …, t − 1}.

## 3. Description of the four Additional Assumptions and the three Models

#### 3.1. Additional Assumption 1: The Adjustment of Exploration over Time

In the I-SAW model, the probability to explore
${\text{p}}_{\text{i}}^{\text{explore}}$ equals ε_{i} if t > 1. The variable ε_{i} differs between people, but is constant within a person throughout all trials of a game. However, it seems reasonable to assume that when faced with an unfamiliar environment, subjects will display higher explorative behavior at the beginning than after gaining some experience. As indicated by machine learning models, the change of exploration can be linear [2-4] or discontinuous by involving a switching point [5]. Moreover, research on repeated choice, shows that people repeat their choices, i.e. develop routines, when they repeat similar decisions [6]. A routine is described as a preference for a specific solution to a known problem. Thus, we introduced a higher exploration level at the beginning of the game and a decrease of exploration with increasing numbers of trials. The decrease is modeled in four steps:

Thus, the individual tendency to explore
${p}_{i}^{\mathit{\text{explorer}}}$ is not only a function of the trait ε_{i} of agent i but also a function of the level of experience. It therefore captures additionally the adjustment process to a new environment.

#### 3.2. Additional Assumption 2: The Recalling of Surprising Experiences

In the I-SAW model, when sampling (past) experiences, the most recent trial has a higher probability to be included in the sample due to the recency effect. All other past trials have the same probability to be sampled. However, studies concerning the von-Restorff-Effect [9] suggest that not all past experiences are equally likely to be included in the sample of experiences. It was found that stimulus items that are distinct from the general item pool are more apt to be recalled [7-9]. Furthermore early research on animal learning and the disruptive effect of surprising events on memory recall, found that surprising events lead to a lower rate of recall of events subsequent to the surprising one [10].Therefore, we propose the influence of surprise on the sampling process in the exploitation stage. If the surprise term of a given trial Surprise^{t}^{−1} exceeds a threshold of 0.85 (according to fitted data), the probability to sample this trial for the calculation of the ESV is increased. To take the underweighting of rare events in decisions from experience [11,12] into consideration we limited this property to the last very surprising trial. Since the recency effect is assumed to vary across individuals, as indicated by the ρ_{i} parameter in the I-SAW model, we chose to use this parameter in order to depict surprise about a trial for the sampling process. Therefore, the last very surprising trial, has a higher probability to be sampled, and its probability to be sampled depends on the individual tendency to recall the most recent trial ρ_{i}∼U[0, 0.2].

#### 3.3. Additional Assumption 3: The Possibility of an Exclusion of Very Early Trials from the Sample of Experiences

As previously noted, besides the most recent trial the sampling procedure of the I-SAW model assigns the same probability to be recalled to all other past trials. However, in the first trials of a new game, strategic uncertainty and uncertainty about the payoff rule is likely to be higher. Thus, early choices are more prone to randomness. This led us to the assumption, that later in the game, the participants should be more likely to question the reliability of the information gained through the very early trials of the game. In order to include this “doubt about experiences in very early trials” we introduced the following modification: Early experiences or cases are revised and can be excluded from the sample even if they are drawn at first during the sampling process. Revision implies that the agent repeats the sampling procedure for a given sample experience or sample case I if sample case^{l} < 9 once, repeats it a second time if sample case^{l} < 7, and again if sample case^{l} < 5, and again if sample case^{l} < 3. This stepwise revision of the sampling decisions implies that an earlier sample case^{l} is more likely excluded from the set of sample cases.

#### 3.4. Additional Assumption 4: The Influence of a Very Bad Experience in the Previous Trial

Imagine action a_{i,t} = not enter has the higher ESV in trial t, but in the previous trial this choice led to a very bad experience. In the I-SAW model the agent would have chosen simply the action with the higher ESV which is “not enter”. In the I-SAW model the affective reaction caused by negative experiences is not captured. But decisions are not only influenced by probability, but also by affective information [14-16]. Thus, we introduced the assumption that the agent revises his/her choice, although it has a higher ESV, if he/she made a very bad experience with it in the previous trial. This means that agent i revises his/her action if one of the two following sets of conditions is true:

Revision implies to choose a_{i,t} = enter with probability λ_{i}∼U[0,0.5] (a trait) and otherwise the action with the higher ESV a_{i,t} = not enter. Note that the revision process is analogous if action a_{i}_{,}_{t} = enter has the higher ESV in trial.

## 4. Description of Our Models and Their Performance in the Competition

#### 4.1. Teodorescu et al. (2010)

The model of Teodorescu, Hariskos and Leder (2010) introduces two changes in the I-SAW model: First, the tendency for exploration is higher at the beginning and decreases over time in the exploration stage (3.1). Second, the last surprising trial is included with higher probability in the sampling of past cases in the exploitation stage (3.2). One of the main advantages of these suggested changes to the I-SAW model is that although it takes into account the changes of exploration over time and the effect of surprise on memory processes, it does not add any other traits than the ones estimated by the original I-SAW model.

#### 4.2. Hariskos et al. (2010)

The model of Hariskos, Leder and Teodorescu (2010) introduces two changes to the exploitation stage of the I-SAW model: First, very early trials are excluded with higher probability from the sample of experiences (3.3). Second, the affective reaction caused by negative experiences was addressed (3.4).

#### 4.3. Leder et al. (2010)

After simulating the first two models, we created a third model in which we integrated the decreasing tendency to explore with increasing numbers of trials (additional assumption 3.1), the doubt about the reliability of experiences in very early trials (additional assumption 3.2), and the revision of a reasonable alternative given an associated very bad experience in the previous trial (additional assumption 3.4). We kept all parameters other than a slight change in the function determining the tendency to explore as depicted below:

#### 4.4. The Models' Performance

Table 1 summarizes the performance of our three models relative to the I-SAW model, once for the data of the estimation set, and once for the data of the competition set. We used the Mean Squared Distance (MSD) criterion as a performance's measure (as was used in the competition). Specifically, MSD is the average squared distance between the prediction and the observed choice proportion (lower is better).

All three models yield a better fit for the data from the estimation set than the I-SAW model. The fit of the first model (3.1) was slightly better than the I-SAW model, and the fit of the other two models (3.2 and 3.3) were by far better. However, only the first model predicted the competition data set better than the I-SAW model. In the following section we will focus on this issue.

## 5. The Predictive Power of Each Additional Assumption

Because we added more than one assumption to the I-SAW model in each of our models, we cannot state the relative effect of each assumption individually. For this reason, we calculated the MSD scores after the competition by adding only one assumption to the I-SAW model (10,000 simulations) and summarized the relative effect of each assumption. The relative effect for the estimation and competition score is depicted in Table 2.

As depicted, each of our additional assumptions improved the estimation score. The first three assumptions (3.1, 3.2, and 3.3) also improved the competition score. Whereas the fourth assumption (3.4), while leading to the largest improvement for the estimation set, impaired the competition score, this clearly indicates over-fitting. Thus, we can conclude that the additional fourth assumption is responsible for the poor predictive performance of our second and third models.

In order to examine whether the very small improvement that resulted from adding the first assumption (3.1) was not obtained by chance, we conducted an additional analysis. One simple prediction of the decreasing exploration assumption is that in problems in which the best reply is relatively stable across trials, best reply behaviors are expected to become more common as time advances. On the other hand, constant exploration rate, as assumed by the original I-SAW model, predicts that in these cases, the frequency of best reply behaviors will remain constant over all trials. Problems 3 and 8 satisfy the relatively stable best reply requirement, since in these problems about 95% of the experiences yielded better payoffs for entering than staying out (obtained greater than forgone payoffs for entering and vice versa for staying out). The following table shows the percentages of best reply behaviors to previous trials for the first 12 trials:

Table 3 shows that the frequency of best reply behaviors increases with increasing numbers of trials, a result that cannot be explained by the original stable exploration assumption of the I-SAW model. Rather, these results can be captured by the assumption that the tendency to explore is higher in the first trials and decreases throughout the trials. Further support to the robustness of the decreasing exploration assumption can be found in the results of the following problem presented by Hochman and Erev (2007) [17]. In an experiment using the clicking paradigm, subjects were asked to choose repeatedly between unlabeled keys on the computer screen. Pressing on one of the keys always resulted in a payoff of eight points and the other always resulted in a payoff of nine. As in the market entry game, after each trial subjects received information about the forgone payoff, in addition to their obtained payoff. The surprising result was that the proportion of choosing the clearly better option increased gradually during the first 10 trials before reaching 90%–100% in later trials (see Figure 4 in [17]). Therefore, it seems that decreasing exploration over time is a robust phenomenon, even when collecting information actively is not needed and counterproductive.

## 6. Summary and Conclusions

In this paper, we examined four additional assumptions to the I-SAW model [1]. The first assumption implies that the tendency for exploration is higher at the beginning and decreases over time in the exploration stage. Although it improved the predictions only slightly, we showed that this assumption appears to be robust, even beyond market entry games. The second assumption suggests that the last surprising trial needs to be included with higher probability in the sampling of past cases in the exploitation stage. This minor change consistently improved the predictions slightly, and is in line with the von-Restorff-Effect [7-9] as well as with animal research on the disruptive effect of surprising events on memory recall [10]. In the third additional assumption, we proposed that very early trials are excluded with higher probability from the sample of experiences. We suggested that this can be a result of “doubt about experiences in very early trials”, though one can argue that it might result also from memory limitation. It is important to note, that this additional assumption yields a high relative effect in the competition and the estimation set, thus, we believe that future research should address its importance and its underling processes. The fourth assumption implies the revision of a reasonable alternative given an associated very bad experience in the previous trial. However, we did not find evidence to support this assumption; therefore, we concluded that the large improvement of the predictions for the estimated data set was the result of over fitting. We believe that the first three assumptions presented here address robust learning processes and are not only specific for market entry games. Future research is needed to determine the robustness and limitations of the above additional assumptions.

Estimation MSD Score | Relative Effect | Competition MSD Score | Relative Effect | |
---|---|---|---|---|

I-SAW Model (2) | 1.38 | 1.1749 | ||

Teodorescu et al. (4.1) | 1.3507 | −2.12% | 1.16 | −1.27% |

Hariskos et al. (4.2) | 1.1546 | −16.33% | 1.2197 | 3.81% |

Leder et al. (4.3) | 1.1546 | −16.07% | 1.1932 | 1.56% |

Estimation MSD Score | Relative Effect | Competition MSD Score | Relative Effect | |
---|---|---|---|---|

I-SAW Model (2) | 1.38 | 1.1749 | ||

Exploration Over Time (3.1) | 1.3485 | −2.28% | 1.1738 | −0.09% |

Surprising Experiences (3.2) | 1.3496 | −2.20% | 1.1617 | −1.12% |

Very Early Trials (3.3) | 1.2791 | −7.31% | 1.1375 | −3.18% |

Bad Experience in the Previous Trial (3.4) | 1.2312 | −10.78% | 1.2486 | 6.27% |

Trial | Percentage of best reply behavior to previous trials |
---|---|

2 | 75.0% |

4 | 73.3% |

6 | 86.7% |

8 | 86.7% |

10 | 91.7% |

12 | 90.0% |

## References and Notes

- Nevo, I.; Erev, I. On surprise, change, and the effect of recent outcomes; Technion: Haifa, Israel, 2010; (unpublished work). [Google Scholar]
- Crook, P.; Hayes, G. Learning in a state of confusion: Perceptual aliasing in grid world navigation. Proceedings of the 4th British Conference on (Mobile) Robotics: Towards Intelligent Mobile Robots, UWE, Bristol; 2003. [Google Scholar]
- De Croon, G.; van Dartel, M.F.; Posta, E.O. Evolutionary Learning Outperforms Reinforcement Learning on Non-Markovian Tasks. Proceedings of the 8th European Conference on Artificial Life, Workshop on Memory and Learning Mechanisms in Autonomous Robots, Canterbury, UK; 2005. [Google Scholar]
- Loch, J.; Singh, S.P. Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes (ICML-98). Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA; 1998; pp. 323–331. [Google Scholar]
- Lee, M.D.; Zhang, S.; Munro, M.N.; Steyvers, M. Psychological models of human and optimal performance on bandit problems. Cogn. Syst. Res. (in press).
- Betsch, T.; Haberstroh, S.; Glöckner, A.; Haar, T.; Fiedler, K. The effects of routine strengths on adaption and information search in recurrent decision making. Organ. Behav. Hum. Decision Proc.
**2001**, 84, 23–53. [Google Scholar] - Green, R.T. Surprise as a factor in the Von Restorff Effect. J. Exp. Psychol.
**1956**, 52, 340–344. [Google Scholar] - Hunt, R.R.; Lamb, C.A. What causes the Isolation Effect? J. Exp. Psychol.-Learn. Mem. Cogn.
**2001**, 27, 1359–1366. [Google Scholar] - Von Restorff, H. Über die Wirkung von Bereichsbildungen im Spurenfeld (The effects of field formation in the trace field). Psychologie Forschung
**1933**, 18, 299–342. [Google Scholar] - Tulving, E. Retrograde amnesia in free recall. Science
**1969**, 164, 88–90. [Google Scholar] - Barron, G.; Erev, I. Small Feedback-based decisions and their limited correspondence to description-based decisions. J. Behav. Decis. Making
**2003**, 16, 215–233. [Google Scholar] - Hertwig, R.; Barron, G.; Weber, E.U.; Erev, I. Decisions from experience and the effect of rare events in risky choices. Psychol. Sci.
**2004**, 15, 534–539. [Google Scholar] - Hochmann, G.; Ayal, S.; Glöckner, A. Physiological arousal in processing recognition information: Ignoring or integrating cognitive cues? Judgment Decis. Making
**2010**, 5, 285–299. [Google Scholar] - Loewenstein, G.F.; Weber, E.U.; Hsee, C.; Welch, N. Risk as feelings. Psychol. Bull.
**2001**, 127, 267–286. [Google Scholar] - Rottenstreich, Y.; Hsee, C.K. Money, kisses, and electric shocks: On the affective psychology of risk. Psychol. Sci.
**2001**, 12, 185–190. [Google Scholar] - Glöckner, A.; Hochmann, G. The interplay of experience-based affective and probabilistic cues in decision making. Exp. Psychol.
**2011**, 58, 132–141. [Google Scholar] - Erev, I.; Haruvy, E. Learning and the economics of small decisions. The Handbook of Experimental Economics; Kagel, J.H., Roth, A.E., Eds.; Princeton University Press: Princeton, NJ, USA, 2009. Available online: http://www.unitn.it/files/download/11452/learningchapter.pdf (accessed on June 2009). [Google Scholar]

© 2011 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).