Results of Beer Game Trials Played by Natural Resource Managers versus Students: Does Age Inﬂuence Ordering Decisions?

from a long-term Beer Game dataset played by natural resource managers: reinforcing systems education across disciplines. In Proceedings of the 35th International Conference of the System Dynamics Society, Cambridge, USA, 16–20 July 2017”. Abstract: Systems involving agriculture and natural resources (AGNR) management and representing integrations of biologic, geologic, socio-economic, and climatic characteristics are incredibly complex. AGNR managers purport using a systems-oriented mental model while many observed management and policy strategies remain linear or symptom-driven. To improve AGNR professionals’ systems thinking abilities, two programs, the King Ranch ® Institute for Ranch Management at Texas A&M University-Kingsville (KRIRM) and the Honors College at South Dakota State University (SDSUHC), implemented the famous Production Distribution Simulation Game (a.k.a. the Beer Game) into their programs beginning in 2003 and 2011. A Beer Game database consisting of 10 years of trials or over 270 individual players was compared to seminal work in the literature as well as to one another. We found that AGNR managers and students performed worse than players in a seminal Beer Game study. More interestingly, we found that younger players adapted more readily to inventory surpluses by reducing the order rates and e ﬀ ective inventories signiﬁcantly when compared to older players ( p < 0.10 for retailer and distributors, and p < 0.05 for wholesales and factories). We substantiated our results to those in more recent studies of age-related decision-making and in the context of common learning disabilities. Lastly, we discuss some implications of such decision-making on 21st century AGNR problems and encourage AGNR disciplines to better integrate system dynamics-based education and collaboration in order to better prepare for such complex issues.


Introduction
The nature of agricultural and natural resource (AGNR) systems is inherently complex due to biologic, geologic, economic, social, policy, and climatic characteristics and delays which are powerful From its genesis, system dynamics (SD) has taken an interest in such problems (e.g., [24]). Many AGNR systems offer few incentives to adopt systems-oriented mental models that facilitate addressing root causes of issues. Most management efforts have relied on easily accepted (reductionist) methods promoted from within disciplinary silos [25] that often expose learning disabilities in complex systems (described below). As a result, many AGRN problems have gone unaddressed [26]. For example, scientists from a variety of disciplines have criticized reductionist approaches most commonly used in AGNR systems: "An approach is required which takes account of ecological, economic and social aspects of change and that is able to interpret and synthesize information, generated from a range of sources in a manner which is policy relevant." [27] "I doubt if we stand a good chance of achieving understanding of the components, and of the interactions among them, as long as we insist on maintaining the comfort of our specialist or discipline zone. All indications to me are that we need more integrations of our disciplinary efforts both within and among beef cattle problem areas if we are to make the greatest contribution to developing technology for maximizing the amount of edible beef of a given quality per unit of resource use." [28] To improve and enhance the educational outcomes of AGNR educational programs to better prepare managers for 21st century challenges, in 2003 the King Ranch ® Institute for Ranch Management (KRIRM) at Texas A&M University-Kingsville established an innovative curriculum grounded in system dynamics. The KRIRM mission is to educate professionals that will improve resource management across agricultural systems throughout the world. To achieve this, KRIRM offers an intensive two-year graduate program, a less intensive certificate program, and a distance education leadership program for those professionals in industry who cannot attend full-time. The core of each of these programs is a week-long lectureship course in systems thinking that serves as the foundation to each program. Noticing similar needs in undergraduate education, in 2010 the South Dakota State University Honors College (SDSUHC) implemented a systems thinking workshop to enhance students' learning capabilities prior to completing undergraduate research experiences.
With no "go-to" agricultural educational tool capable of providing as powerful insights as the Beer Game, KRIRM and SDSUHC have used the game as the students' first exposure to decision making in complex systems. The Beer Game, played for decades in corporate workshops as well as management, business, or systems modeling classes, introduces important lessons about complex, dynamic systems. Designed as a simplified supply chain, the game encourages students to adopt a systems-oriented (i.e., nonlinear system dynamics) perspective. For AGNR managers, possessing a systems-oriented mental model is often purported while the implemented strategies remain linear or symptom-driven. These linear mental models have been reinforced via the fragmented, siloed nature of AGNR education, similarly observed in other fields. Since the inception of the KRIRM and SDSUHC programs, Beer Game performance records have been collected in an integrated database used for debriefing new students each year.
The purpose of this paper is to examine how the decisions among participants with differing domain experience varied from older (KRIRM) to younger (SDSUHC) players. In many industries, experience and wisdom can differentiate effective and successful managers from the pack. It is thought that these managers think more systemically than others since their experiences over time have enhanced their ability to make dynamic inferences less obvious to novices. However, results from dynamic decision-making research has shown that accumulated experiences do not improve performance [29,30]. We aimed to explore this hypothesis further.
First, we review literature on the Beer Game and mental models and dynamic decision-making. We then describe the participants represented in our database and how performance data were screened using a Beer Game simulation model prior to analysis of individual decisions. Results are presented and discussed, including: (a) how well our participants performed compared to seminal Beer Game studies [31,32]; (b) some significant performance gaps observed between the two groups (KRIRM and SDSUHC); and (c) how overcoming the common learning disabilities seen in complex systems will require effective education regarding the dynamics of complex systems. We conclude with a challenge for all practitioners in systems science and AGNR to increase collaboration and support a wider array of educational and research opportunities due to the important role AGNR managers will have in addressing complex 21st century challenges.

Rules of the Beer Game
The Beer Game portrays a production-distribution system characterized by an inventory stock for each of four respective players: factory (or brewer), distributor, wholesaler, and retailer arranged in a linear distribution chain [31,33]. Pennies or plastic chips represent cases of beer. Flows influencing each of the stocks include the information flow of orders from the retailer toward the brewer and the physical transport of goods from one stock to the next through the supply chain, including a two-week delay. Exogenous to the system are consumer orders (delivered through a deck of cards at the retailer end of the chain) and the inputs to the brewer's inventory production (i.e., supply of chips the brewer has access to). For each week of the game, customers purchase from the retailer, who then orders from the wholesaler, who then orders from the distributor, who then orders from the factory, who then produces additional cases of beer to meet the anticipated demand. The objective of the game is to minimize total costs throughout the supply chain, where holding inventory costs each player $0.50 per case per week or $1.00 per case per week penalty when inventory is in backlog (representing lost revenues and discontent of customers when stockouts persist). Each player manages their respective inventory through forecasting demand, which is informed by the orders of their customer one-step down the supply chain, and placing orders to their suppliers one-step up the supply chain. However, only the retailer can see the actual demand from customer orders week by week. This information limitation is further constrained since players are not allowed to communicate with one another, therefore coordination is impossible [31,33]. Although simplified from real world systems, the game has revealed important characteristics and limitations in human decision-making (described below) and the principles of systems (Section 4).

Research on Decision-Making in Relation to the Structures of Complex Systems
Research into human decision-making and performance in complex systems has been aided by management flight simulators which have allowed system dynamicists to study an individual's decision-making within simple system structures, similar to that of the Beer Game (Table 1). Rouwette et al. [34], Aramburo et al. [35], and Mohaghegh and Furlan [36] have summarized the key findings of the major works in this area based on characteristics of the system or task complexity: Delays, strength and types of feedback, exogenous change, use of heuristics or pattern matching mental models, systems thinking skills, and domain experience. In general, the presence of time delays as well as the length of delays tends to reduce performance [32,37], stronger feedback loops tend to reduce performance [32,[37][38][39][40][41][42], while exogenous changes have shown either reduced [40][41][42], improved [43], or no effect on performance [44]. Additionally, research has found consistent use of several decision making heuristics, particularly ones that employ pattern matching reasoning [30,36,[45][46][47][48][49][50][51][52][53][54], while others have found evidence of poor systems thinking skills to be pervasive throughout most individuals [55][56][57][58].

Misperceptions of feedback
• Individuals are generally unable to account for delays and feedback effects because of highly simplified mental models and poor ability to infer correct behavior of simple feedback systems [37].
• Strategies were insufficiently adjusted to account for strength of feedback between information and material flows [38].
• Misperceptions of feedback have been attributed to poor recognition of delays and preference to maintain static decision rules [32,40,59,60].
• Misperceptions of feedback not only result in suboptimal performance but often lead to strategy and decision making that is in opposite direction of optimal changes in decision making [31,32,39,41].
Types, transparency, and scale of models; types of feedback • Transparency of model (including interface design) and prior knowledge of structural information can improve task performance in dynamic decision-making contexts [61][62][63].
• Given an increasing time scale, participants are not likely to consistently or steadily improve performance [64].

Environmental and Contextual Characteristics
• Coordination risk contributes to bullwhip effects since behaviors of other individuals are not known with certainty [68].
• Stressors such as larger orders, backlogs, or late deliveries trigger hoarding and phantom ordering even though such behaviors are known to be irrational [69].
Whenever the context of the decision situation changes (e.g., changing the decision interval in order to speed up or slow down experience generation; increasing the transparency and therefore knowledge of the model or task being managed; or the nature of the decision information such as interface design and types of feedback provided to the user), participants tend to do somewhat better [61][62][63][64]66,67,70,71], or at least no worse than their prior performance without the added transparency [38,43,46,65,70,[72][73][74]. Lastly, personal characteristics that likely impact performance (e.g., stated goals; mental model or cognitive styles; the number of players in the simulator) have also shown mixed results, with improved performance resulting from whole system goals and a higher degree of similarity in participant mental model with the structure of the simulation [74][75][76][77][78]. Other studies have shown no effect on performance due to personal characteristics [42,67,[72][73][74].
More recent advances in decision making research using simulations of the Beer Game task have shown that coordination risk (the risk that individuals' decisions contribute to a collective outcome but the decision rules followed by each individual are not certain) which contributes to the bullwhip effect can be mitigated with coordination stock (holding additional on-hand inventory) but that the behavioral causes of the supply chain instability are robust [68,[79][80][81]. Sterman and Dogan [69] show that because of this persistence of instability, individuals are likely to seek larger safety stocks (hoarding) or order more than what is demanded of them (phantom ordering). These irrational responses were shown to be triggered by environmental stressors, which overwhelmed individuals' rational decision-making abilities or when individuals inappropriately applied decision heuristics incompatible with effective performance in the game. Emotional, psychiatric, and neuroanatomical factors are also discussed in Sterman and Dogan [69,82,83]. Finally, task performance studies that test domain experience, which is often used to understand expertise and where it is assumed that individuals with greater experience employ more powerful heuristics than novices [84,85], have shown that domain experience does not influence task performance [29,42]. This has been found for tests between participant groups (experienced versus inexperienced [29]) or where participants were allowed multiple attempts, given training or an intervention, or with varying feedback protocols [30,49,51,55,[85][86][87][88][89].

Materials and Methods: Natural Resource Managers and Students Play the Beer Game
Two notes from the literature are important to make before introducing this study. First, Rouwette et al. [34] found that participants in dynamic decision-making task experiments have primarily been sampled from university student populations, with professionals (e.g., university staff) being only a small subset of the participants. This is important because (a) students, however knowledgeable, generally lack the real-world experience and accumulated wisdom that seasoned managers possess and, (b) simulated games could likely represent industries or management situations that differ from the students' own, making it more difficult for students to comprehend or self-motivate themselves to think systemically about their performance compared to those that begin with more accumulated experience and are therefore able to relate their task performance to real world experiences. Second, previous research has shown that experience does not improve performance on dynamic decision tasks (e.g., [29,30,49,51,55]), but several limitations of those studies warrant reconsideration of that conclusion. For example, Brunstein et al. [29] concluded that domain experience was not a strong indicator for overcoming failures in stock-flow tasks, however, participants only differed by 1.5 years between experienced and inexperienced groups and where each group made decisions under separate environments (one on paper and one online). Additionally, although some studies have utilized both undergraduate students versus graduate students with at least several years of management experience, they did not compare results between groups at the same task (i.e., some experiments only used undergraduates, some experiments only used graduate students [30]).
In this study, we overcome both of these limitations since all participants were subject to the same experimental conditions: All played the traditional Beer Game for a period of about two hours up to 35 weeks, and all trials were facilitated by the same Beer Game instructor. We examined the domain experience hypothesis by analyzing a Beer Game database created from over 10 years of classes at KRIRM and SDSUHC. The process of the study may be summarized as follows ( Figure 1) with details in the sections below: Each groups' decisions were organized into a single database where the team costs were calculated and ranked based on participant entries on the Beer Game record sheets, then the decisions were used as inputs to a Beer Game model to generate model predicted team costs and ranks. Comparison was done between team costs and ranks using participant record sheets and the model predictions to screen for teams that used improper accounting and those teams were removed from final analysis of the participant decisions. performance [29,42]. This has been found for tests between participant groups (experienced versus inexperienced [29]) or where participants were allowed multiple attempts, given training or an intervention, or with varying feedback protocols [30,49,51,55,[85][86][87][88][89].

Materials and Methods: Natural Resource Managers and Students Play the Beer Game
Two notes from the literature are important to make before introducing this study. First, Rouwette et al. [34] found that participants in dynamic decision-making task experiments have primarily been sampled from university student populations, with professionals (e.g., university staff) being only a small subset of the participants. This is important because (a) students, however knowledgeable, generally lack the real-world experience and accumulated wisdom that seasoned managers possess and, (b) simulated games could likely represent industries or management situations that differ from the students' own, making it more difficult for students to comprehend or self-motivate themselves to think systemically about their performance compared to those that begin with more accumulated experience and are therefore able to relate their task performance to real world experiences. Second, previous research has shown that experience does not improve performance on dynamic decision tasks (e.g., [29,30,49,51,55]), but several limitations of those studies warrant reconsideration of that conclusion. For example, Brunstein et al. [29] concluded that domain experience was not a strong indicator for overcoming failures in stock-flow tasks, however, participants only differed by 1.5 years between experienced and inexperienced groups and where each group made decisions under separate environments (one on paper and one online). Additionally, although some studies have utilized both undergraduate students versus graduate students with at least several years of management experience, they did not compare results between groups at the same task (i.e., some experiments only used undergraduates, some experiments only used graduate students [30]).
In this study, we overcome both of these limitations since all participants were subject to the same experimental conditions: All played the traditional Beer Game for a period of about two hours up to 35 weeks, and all trials were facilitated by the same Beer Game instructor. We examined the domain experience hypothesis by analyzing a Beer Game database created from over 10 years of classes at KRIRM and SDSUHC. The process of the study may be summarized as follows ( Figure 1) with details in the sections below: Each groups' decisions were organized into a single database where the team costs were calculated and ranked based on participant entries on the Beer Game record sheets, then the decisions were used as inputs to a Beer Game model to generate model predicted team costs and ranks. Comparison was done between team costs and ranks using participant record sheets and the model predictions to screen for teams that used improper accounting and those teams were removed from final analysis of the participant decisions.

Participant Profiles
Since the KRIRM programs target agriculture industry leaders, natural resource conservation professionals, and up-and-coming farm and ranch managers, the majority of participants already possess four year degrees (B.S. or B.A.), in some cases graduate degrees (M.S. or Ph.D.), and arrive with at least 15 years (ranging from 10-40 years) of professional experience in production agriculture or resources conservation (mean age = 45.58 years, ranging from 23 to 65). Generally, the systems these participants operate within are laden with time-delays (e.g., crop and livestock production begin with producers making genetic selections on the types of production they wish to market, which take years to decades for payoffs to realize; growing seasons generally allow only one harvest per year and livestock replacement and maintenance efforts requiring two years or longer). Since participants come from agricultural businesses embedded with the agricultural food system, they have intimate knowledge of supply chain dynamics, since in the real world they sit at the producer, distributor, wholesaler, or retailer positions. We were highly interested in analyzing the decisions made in the Beer Game by these groups, since similarities between participant mental models and dynamic decision contexts have been shown to improve performance. Similarly, conservation professionals operate in organizations whose goals are inherently holistic and long-term, since ecosystem restoration or conservation operates at decadal to century time scales. Graduate students in AGNR disciplines but without significant professional experience have also participated, however they have generally been no more than 20% of the participants. Few, if any, undergraduates have ever participated in the KRIRM classes. However, the attendees at the SDSUHC course has been weighted to Honors College undergraduate students (many from family farms or ranches but without managerial experience) that are preparing for undergraduate research experiences (>80%). The SDSUHC class participants (mean age = 23.89 years, ranging from 19 to 55) have also included faculty members throughout the College of Agriculture and Biological Sciences (e.g., animal scientists; wildlife and fisheries scientists; agricultural economists; <20%). In total, there were 55 KRIRM teams and nine SDSUHC teams. Although age does not always equate to professional experience, in this study mean participant age provides the best proxy for experience, given the professionals attending KRIRM would not have done so if not for the experience and managerial responsibility they hold in their organization and the students attending SDSUHC are full-time undergraduates with devoted time to research and were much less dependent on work experience. Although there are a few outliers in each case (graduate students at KRIRM, faculty in SDSUHC), the influence these individuals had on overall performance was controlled for by placing no more than one graduate student or faculty on any individual team.

Database Description
The data was compiled in Microsoft Excel™ beginning in 2004 ( Figure 2), the first year the Beer Game was played at KRIRM. The SDSUHC games were added for the years 2012 through 2014. Orders for retailers, wholesalers, distributors, and factors were kept on one sheet ('Orders' tab), with weeks 1 through 35 repeated for each position down the spreadsheet. A similar convention was used to record inventory or backlog ('Inventory' tab). The entries for each team stop at 35 weeks to reduce leveling effects (see [31]). In total, the raw data included 64 teams. Data for orders and inventories were obtained directly from each team's record sheets. For reference, each team's total cost was entered at the top of each tab (beginning in cell B5), however, costs at each player position were not recorded. Average costs of each year's teams were also calculated (e.g., cell C5). The remaining tabs of the database included graphs of trends-over-time of participant performances (e.g., average, best, and worst performances) used as visual aids in the debriefing sessions of participants in the KRIRM and SDSUHC classes (excluding the current year teams, which used the record sheets from their own performances).
The database includes two kinds of uncertainties that must be recognized. The first error common to all Beer Game results is due to human mistakes made playing the game (e.g., getting ahead or behind the weekly schedule; correctly recording inventory or backlog; correctly calculating costs, etc.). However, our database includes another risk of error due to the transfer of information contained on the record sheets into the Excel file. Since the games were played at the beginning of a week-long systems thinking class, results were generally entered by graduate students the following week, or about one week after the completion of the game itself, which precluded any clarification of results, inaccuracies, or illegible entries by participants, as well as the human error involved in the actual transfer of data. Data for orders and inventories were obtained directly from each team's record sheets. For reference, each team's total cost was entered at the top of each tab (beginning in cell B5), however, costs at each player position were not recorded. Average costs of each year's teams were also calculated (e.g., cell C5). The remaining tabs of the database included graphs of trends-over-time of participant performances (e.g., average, best, and worst performances) used as visual aids in the debriefing sessions of participants in the KRIRM and SDSUHC classes (excluding the current year teams, which used the record sheets from their own performances).
The database includes two kinds of uncertainties that must be recognized. The first error common to all Beer Game results is due to human mistakes made playing the game (e.g., getting ahead or behind the weekly schedule; correctly recording inventory or backlog; correctly calculating costs, etc.). However, our database includes another risk of error due to the transfer of information contained on the record sheets into the Excel file. Since the games were played at the beginning of a week-long systems thinking class, results were generally entered by graduate students the following week, or about one week after the completion of the game itself, which precluded any clarification of results, inaccuracies, or illegible entries by participants, as well as the human error involved in the actual transfer of data.
Before proceeding to the analysis, these errors had to be reconciled in the database or teams simply removed from the analysis due to such large errors in effective inventory and costs. Sterman [31] identified that Beer Game teams with the highest costs were the most prone to accounting errors, and therefore, reduced that sample size from 48 to 11 teams, which were generally the best performing. However, successful teams can be just as susceptible to human accounting errors, since mistakes (or variance from optimum decision levels) made around a given average order quantity will not affect the overall rank of teams given lower average orders and therefore inventories. Besides addressing these errors, we also had to reconcile the costs of each player position, since only the total costs per team were entered in the database (which was critical if comparisons were to be made to previous results presented in the literature). Rather than discarding only the poorer performing and analyzing the most successful, but not necessarily lesser flawed teams, we developed a Beer Game model to compare the observed team performances with expected performance given equal accounting standards. The model (described below) aided in identifying the teams with the greatest accounting errors that should be discarded from the final analyses as well as captured costs of each player position, allowing us to compare a more representative range of teams rather than only the most successful to previous research results.

Beer Game Model
The Beer Game model was developed in Vensim™ (Ventana Systems, Inc., Harvard, MA, USA) in the same table top configuration of the Beer Game, with physical flows for inventories and information flows for orders ( Figure 3; player positions are abbreviated to Retailer = r; Wholesaler = w; Distributor = d; Factory = F, and are aggregately described in the italicized names that represent formulation for each of the positions). Equations were developed from previous Beer Game models [31,90] and are provided in Appendix A. Stocks of inventory (inventory [position]) are controlled by flows of cases (in[position] and sold[position]). Backlog stocks account for unmet demand along each position and are used to calculate effective inventory (eff inv [position]). ORDer represented customer orders, beginning at four cases and which steps to eight at the fifth week. Rather than using the decision order algorithms from either [31] or [90] (i.e., a smooth function which provides first order exponential smoothing to represent an averaging process to place orders from each sector), observed orders made by our participants were input into the model (import [position] placed orders). This allowed for the least error prone data in the database, orders placed (where no calculations are required for record keeping), to be used to evaluate teams using equal accounting standards. Before proceeding to the analysis, these errors had to be reconciled in the database or teams simply removed from the analysis due to such large errors in effective inventory and costs. Sterman [31] identified that Beer Game teams with the highest costs were the most prone to accounting errors, and therefore, reduced that sample size from 48 to 11 teams, which were generally the best performing. However, successful teams can be just as susceptible to human accounting errors, since mistakes (or variance from optimum decision levels) made around a given average order quantity will not affect the overall rank of teams given lower average orders and therefore inventories. Besides addressing these errors, we also had to reconcile the costs of each player position, since only the total costs per team were entered in the database (which was critical if comparisons were to be made to previous results presented in the literature). Rather than discarding only the poorer performing and analyzing the most successful, but not necessarily lesser flawed teams, we developed a Beer Game model to compare the observed team performances with expected performance given equal accounting standards. The model (described below) aided in identifying the teams with the greatest accounting errors that should be discarded from the final analyses as well as captured costs of each player position, allowing us to compare a more representative range of teams rather than only the most successful to previous research results.

Beer Game Model
The Beer Game model was developed in Vensim™ (Ventana Systems, Inc., Harvard, MA, USA) in the same table top configuration of the Beer Game, with physical flows for inventories and information flows for orders ( Figure 3; player positions are abbreviated to Retailer = r; Wholesaler = w; Distributor = d; Factory = F, and are aggregately described in the italicized names that represent formulation for each of the positions). Equations were developed from previous Beer Game models [31,90]  ). ORDer represented customer orders, beginning at four cases and which steps to eight at the fifth week. Rather than using the decision order algorithms from either [31] or [90] (i.e., a smooth function which provides first order exponential smoothing to represent an averaging process to place orders from each sector), observed orders made by our participants were input into the model (import [position] placed orders). This allowed for the least error prone data in the database, orders placed (where no calculations are required for record keeping), to be used to evaluate teams using equal accounting standards.  [31,90]. The structure is identical except that the algorithm used to compute placed orders in the Kirkwood [90] model formulation was replaced with observed placed orders of participants (indicated by the red italicized import variables) using the  [31,90]. The structure is identical except that the algorithm used to compute placed orders in the Kirkwood [90] model formulation was replaced with observed placed orders of participants (indicated by the red italicized import variables) using the record sheets inside the Beer Game database. Equations to replicate the model are provided in the references cited above as well as Appendix A.

Data Analyses
Initial data analyses consisted of screening the database entries for teams with the greatest errors by comparing observed total team costs and team ranks to the expected total costs and ranks given equal accounting standards. This was achieved via the Beer Game model. Outlier teams were identified and removed from the dataset. For the teams remaining, actual player orders and the modeled team and position costs were used for the analyses. Modeled costs were used to ensure comparison between teams was fair given the likelihood of unknown errors in the dataset and we attempted to minimize these through the screening process used to identify and discard teams. The remaining teams were then used to conduct two different analyses: (1) Due to the similar background and interests of all participants in natural resources management, we examine the performance of all teams. Since no control and treatments were conducted in our Beer Game trials, we simply examine the participant performances in the database with the results of experiments presented in the SD literature (H 0 : Database tc = Reported tc ). (2) Due to the unique participant profiles at the two locations (mostly experienced professionals at KRIRM, mostly undergraduates at SDSUHC) we compared team performances between the more experienced and less experienced groups (H 0 : KRIRM tc = SDSUHC tc ).

Model Comparison of Team Performances
The database included both KRIRM (n 1 = 55) and SDSUHC (n 2 = 9) teams since 2004 (total n = 64). Initial screening of the team performances revealed strong fit between total costs (Figure 4a) and team rank ( Figure 4b). Overall, the expected costs and ranks of teams fit fairly well, r 2 values of 0.90 and 0.89, respectively ( Table 2). Despite the overall strong correlations, we identified 20 teams that clearly did not fit the expected costs pattern between the modeled and observed costs of the majority of teams, resulting in a total n of 44 (n 1 = 38; n 2 = 6). Importantly, the discarded teams were fairly normally distributed throughout the database, with three teams removed from the top quartile, six from the third quartile, eight from the second quartile, and three from the bottom quartile.

Data Analyses
Initial data analyses consisted of screening the database entries for teams with the greatest errors by comparing observed total team costs and team ranks to the expected total costs and ranks given equal accounting standards. This was achieved via the Beer Game model. Outlier teams were identified and removed from the dataset. For the teams remaining, actual player orders and the modeled team and position costs were used for the analyses. Modeled costs were used to ensure comparison between teams was fair given the likelihood of unknown errors in the dataset and we attempted to minimize these through the screening process used to identify and discard teams. The remaining teams were then used to conduct two different analyses: (1) Due to the similar background and interests of all participants in natural resources management, we examine the performance of all teams. Since no control and treatments were conducted in our Beer Game trials, we simply examine the participant performances in the database with the results of experiments presented in the SD literature (H0: Databasetc = Reportedtc). (2) Due to the unique participant profiles at the two locations (mostly experienced professionals at KRIRM, mostly undergraduates at SDSUHC) we compared team performances between the more experienced and less experienced groups (H0: KRIRMtc = SDSUHCtc).

Model Comparison of Team Performances
The database included both KRIRM (n1 = 55) and SDSUHC (n2 = 9) teams since 2004 (total n = 64). Initial screening of the team performances revealed strong fit between total costs (Figure 4a) and team rank (Figure 4b). Overall, the expected costs and ranks of teams fit fairly well, r 2 values of 0.90 and 0.89, respectively (Table 2). Despite the overall strong correlations, we identified 20 teams that clearly did not fit the expected costs pattern between the modeled and observed costs of the majority of teams, resulting in a total n of 44 (n1 = 38; n2 = 6). Importantly, the discarded teams were fairly normally distributed throughout the database, with three teams removed from the top quartile, six from the third quartile, eight from the second quartile, and three from the bottom quartile.  Removing these teams significantly improved the match between observed and expected team performances and ranks, with r 2 values of 0.97 and 0.98 (Table 2; Figure 4c,d). Average errors (in terms of total $ costs, $/week, or cases/week) decreased $75 and $2, respectively. Removing the teams with inconsistent costs relative to the remaining teams created a significantly improved fit in team ranks (e.g., from six to 11 exact rank matches, or from nine to 25 percent; only five teams with rank discrepancies greater than three positions, down from 55 to 11 percent). Utilizing the Beer Game model in this way allowed us to screen the database for the teams that most likely had the greatest accounting errors and gave added confidence that the remaining teams, although not perfect in their accounting, were accurate enough to allow comparison across the dataset. The proportion of discarded teams due to likely accounting errors (31% of the original database) was therefore much smaller than [31], which discarded 75% of that database due to errors, indicating that players may do a better job of accounting than was previously expected.

Participants' Performance across the Database
The team average total costs relative to the benchmark costs (identified in [31]) are shown in Table 3. The average team cost was over 23 times the benchmark and twice the average reported in [31] (Table 3; Figure 5), although that study only reported scores of the best performing teams. The wholesaler, distributor, and factory ratios of actual to benchmark costs were as high as 30 times greater than optimal cost levels, however, the retailers in our group performed similarly to other studies ( Table 3). The differences in total costs and costs of each sector to the benchmark costs were all highly significant, and compared to Sterman [31], all sectors were significant except the retailer. To identify how well the best performing teams in our database performed relative to previous studies, total team and individual position costs were summarized into quartiles ( Table 3). The top performing teams in [31], whose team average ($2028) and position average costs (retailer $383, wholesaler $635, distributor $630, factory $380) fell most closely between our third and fourth quartile of team performances, indicating similar performance between the above average teams. These results held across positions (Table 3). Systems 2020, 8, x FOR PEER REVIEW 12 of 30 Figure 5. Distribution of team total costs from best (1) to worst (44) teams across the database relative to benchmark costs, with ranked teams 10, 20, 30, and 40 shown to illustrate the change in costs with progressively worse performances. Similar oscillations, amplifications, and phase lags were observed between our team performances and common Beer Game results (Table 4; Figure 6). Orders and inventories expressed large fluctuations, with average inventory recovery of 25.5 weeks. Backlogs of inventories migrate from the retailer to the factory similar to typical Beer Game results ( Figure 6), with the peak order rate at the factory being over three times the peak order rate of the retailer. Closed loop gains (Δ[factory orders]/Δ[customer orders]) averaged nearly 1400%, or double that reported by Sterman [31]. Maximum backlogs averaged 35 cases and occurred between 34 and 35 weeks ( Table 4). As expected, inventories overshoot initial levels, peaking at week 35. Phase lags were more evenly distributed than typical Beer Game runs, however this was likely due to the larger sample size smoothing out the week of peak order rates. Participants' anticipated minimum inventory (date of minimum inventory minus date of week order rate) were generally delayed by one or two weeks, indicating reactive strategies that did not account for orders in the supply line (orders placed but not yet received) and perpetuated extreme inventory levels later in the game.  Similar oscillations, amplifications, and phase lags were observed between our team performances and common Beer Game results (Table 4; Figure 6). Orders and inventories expressed large fluctuations, with average inventory recovery of 25.5 weeks. Backlogs of inventories migrate from the retailer to the factory similar to typical Beer Game results ( Figure 6), with the peak order rate at the factory being over three times the peak order rate of the retailer. Closed loop gains (∆[factory orders]/∆[customer orders]) averaged nearly 1400%, or double that reported by Sterman [31]. Maximum backlogs averaged 35 cases and occurred between 34 and 35 weeks ( Table 4). As expected, inventories overshoot initial levels, peaking at week 35. Phase lags were more evenly distributed than typical Beer Game runs, however this was likely due to the larger sample size smoothing out the week of peak order rates. Participants' anticipated minimum inventory (date of minimum inventory minus date of week order rate) were generally delayed by one or two weeks, indicating reactive strategies that did not account for orders in the supply line (orders placed but not yet received) and perpetuated extreme inventory levels later in the game.   Figure 6. Illustration of effective inventories for the best worst teams (R-retailer; W-wholesaler; Ddistributor; F-factory) in the database. Although the overall scores were poorer than team performances reported in the literature, the top 10 teams in our database (or ≈25%) performed better than the top 25% reported in Sterman [31] and held across all game positions except for the factory (Table 5). Table 5. Evaluation of results from the top 10 performing teams in the database used for comparison to teams reported in the literature. * Closed-loop gain is measured as change in output relative to that of the input, e.g., ∆Factory orders/∆Customer orders = (21.12 − 4)/(8 − 4) = 4.28. Retailer, wholesaler, and distributor costs were all significantly lower (which contributed to an overall significantly lower team total cost), while the factory costs were significantly higher. Periodicity and phase lags were noticeably shorter and amplification lower than the Sterman [31] teams. Of the top 10 teams of our database, the SDSUHC groups were disproportionately represented. Eight of the top 10 teams came from the KRIRM participants (≈21% of the KRIRM sample) while two teams came from the SDSUHC participants (≈33% of the SDSUHC sample).

Comparison of Performances from More and Less Experienced Participants
We hypothesized that the older, more experienced group (KRIRM) would perform better on the Beer Game task than the less experienced players, primarily undergraduate students (SDSUHC). We found no evidence to support this (Table 6), as neither the team total costs nor any of the player position costs were significantly different. This corroborates previous conclusions that management experience may not mitigate misperceptions of feedback [38]. However, qualitative analyses of the trends in effective inventory and order rates tell a more interesting story (Figure 7). The SDSUHC teams appeared to achieve maximum inventory earlier than the KRIRM groups and by week 35 were reducing their overall inventory levels back toward the 'anchored' inventory level of 12. This was achieved through overall lower average order rates (Table 6; Figure 7). Although retailer orders were similar, wholesaler, distributor, and factory average order rates differed from as low as one to as high as six cases per week. After initial inventory recovery, discrepancies in order rates were even larger (up to eight cases at the factory level) and were all statistically significant across positions (Table 6). Based on the change in slope of order rates and effective inventories after week 29 for the SDSUHC teams, it appears the younger players began accounting for cases in delivery much sooner than the KRIRM groups, whose maximum effective inventory levels continued to rise. It is possible that several interesting features are at work that created the divergence in trends of effective inventory between players. First, the older KRIRM participants could have continued to order more cases after the initial inventory recovery as a way to accumulate "coordination stock" to hedge against the risk that customer orders will significantly change in the future (based on their perception of customer orders as well as experience in the real-world) or in case the other players deviate from the near equilibrium (but-suboptimal) position that the game reaches by week 30 (i.e., compensate for obvious weaknesses in their teammates) [68]. Relying on real-world experience requires participants to determine strategy via comparison of the game to previous experience by analogy, however, decision makers who reason by analogy in complex dynamic situations have not performed as well as those who do not [71]. First, the older KRIRM participants could have continued to order more cases after the initial inventory recovery as a way to accumulate "coordination stock" to hedge against the risk that customer orders will significantly change in the future (based on their perception of customer orders as well as experience in the real-world) or in case the other players deviate from the near equilibrium (but-suboptimal) position that the game reaches by week 30 (i.e., compensate for obvious weaknesses in their teammates) [68]. Relying on real-world experience requires participants to determine strategy via comparison of the game to previous experience by analogy, however, decision makers who reason by analogy in complex dynamic situations have not performed as well as those who do not [71]. Second, the older participants were likely less inclined to lower their order rates after inventory recovery, since the initial strategy (increase the order rate to get out of backlog) eventually paid off. In other words, so long as they achieved zero backlog, they were not as heavily anchored to the initial inventory level as the younger players. It has been shown that experience with a particular set of behaviors improves performance, but that as opportunity costs of trying new strategies rises, individuals will experiment with fewer decisions and are less likely to identify superior methods compared to their status quo [91,92]. It is likely that the opportunity costs to change strategies appeared to be too high for the older players.
Third, the younger, less experienced players in the SDSUHC teams significantly lowered their order rates after inventory recovery compared to the KRIRM group (Table 6). Although inventories are affected by the choices of the other players, participants are forced to discretely place new orders based on each new inventory level, and new order rates represent desired change in the stock of the individual player. Therefore, each choice in order is aimed at closing the gap between desired and actual states of inventory (albeit with the necessary receiving and shipping delays).
Our older players increased order rates to get out of backlog, and rather than decreasing order rates once effective inventories recovered, continued to order at relatively high rates (i.e., they were heavily anchored to the choices that worked to get themselves out of backlog), while our younger participants made a more abrupt shift to lower order rates upon inventory recovery and escalation. Younger players in our sample were more heavily anchored to the initial inventory level and were therefore more responsive to escalating inventory levels (and therefore costs) by lowering their order rates significantly (Table 6).
Recent psychology research strengthens these conclusions. For example, research on dynamic decision making choices of younger versus older adults has shown that older adults (age 60-84) perform better on choice-dependent tasks, which require learning how previous choices influence current performance and making a new decision based on that knowledge [93,94]. Older players in our sample were more heavily anchored to their previous strategy that worked (order more cases to get out of backlog), and because of that success continued to do so. Research on younger decision makers (age 18-23) has shown that they perform better on choice-independent tasks (where learning requires exploiting the options that give the highest reward on each new trial [93,94]) and students have best learned dynamic decision making in systems by 'doing' and 'failure' rather than 'knowing' or relying on experience [70]. Older adults have also been shown to base their decisions on changes in states, compared to younger adults, who are more apt to change decisions based on comparison of expected values of new trials [95].
Several cognitive mechanisms or learning impairments may be underlying these patterns. For example, work has shown that age-related impairments in learning may result from declines in phasic dopaminergic signals in older versus younger adults [96], likely contributing to the deficits in feedback-driven reinforcement learning in older adults [97]. In two exploratory choice task experiments to understand how younger and older adults differ in their exploratory choices, Blanco et al. found that strategies by the two groups were qualitatively different (with older adults performing worse), in part due to older adults applying a strategy shaped by their wealth of real-world decision-making experience that may be ill-suited in some decision environments due to increased working memory loads [98]. Worthy et al. suggested that older adults' departures from state-based decision strategies in favor of immediate reward strategies were due to age-related declines in the neural structures needed for more computationally demanding (e.g., goal oriented) decision making [99]. This cognitive burden on working memory load likely leads participants to focus more on immediate versus delayed consequences of decisions [100]. Lastly, Kurnianingsih et al. found that older adults (aged 61-80) were significantly more uncertainty averse for both risky and ambiguous choices and exhibited strategies with decreased use of maximizing information [101], which likely contributes to learning deficits observed in healthy older adults driven by a diminished capacity to represent and use uncertainty to guide learning [102]. This corroborates others who have shown that younger adults more willingly explore task structures when unexpected rewards or costs indicate a need for a shift in decision strategy, compared to older adults who show preservative behavior and have deficits in updating expected values of alternative decisions [103].
Our results coincide with those observed in age-related studies [93][94][95][96][97][98][99][100][101][102][103] that likely explain the discrepancy in order rates between groups (Table 6; Figure 8). Our results are also strengthened by the conclusions in Rouwette et al. [34]. Rouwette et al. [34] found that: (1) there exist few to no fundamental differences between system dynamics-oriented tasks and performance task games from other social science disciplines, and (2) it was important to note that simulation players have primarily been sampled from university student populations. The psychology literature supports a difference in task performance by age and we have overcome the weakness of relying on university student populations by including a majority of teams composed as working professional in AGNR fields.

Implications for Agricultural and Natural Resource Management
There are a number of key lessons from the Beer Game in general and from this study in particular that are of interest for AGNR management. The boom-bust nature of the Beer Game occurs due the inherent ordering and shipping delays coupled with the overwhelming tendency of players to ignore their supply line. Natural resource managers embedded in real-world systems with extremely long time-delays (e.g., year to decadal scales) performed just as bad, if not worse, than managers from corporate contexts at identifying and managing the delayed-inventory management task in the Beer Game. Results closely corresponded to typical results of other Beer Game trials,

Implications for Agricultural and Natural Resource Management
There are a number of key lessons from the Beer Game in general and from this study in particular that are of interest for AGNR management. The boom-bust nature of the Beer Game occurs due the inherent ordering and shipping delays coupled with the overwhelming tendency of players to ignore their supply line. Natural resource managers embedded in real-world systems with extremely long time-delays (e.g., year to decadal scales) performed just as bad, if not worse, than managers from corporate contexts at identifying and managing the delayed-inventory management task in the Beer Game. Results closely corresponded to typical results of other Beer Game trials, indicating that our participants, despite intimate knowledge of AGNR systems had adopted a similar decision rule identified by Sterman [31], where participants anchor their initial expectations to the starting inventory level that inevitably produces extremely poor results. This is due to the misperception of delayed feedback between placing and receiving orders and not fully accounting for cases in the supply line, both of which lead to over-ordering and instability in even the best performing teams ( Figure 5). Even those that recognize and manage systems with many time delays that often vary from months to years in length, they still commit the same errors as ones without such experience with delays.
What are some examples from AGNR systems of failures to account for such delays and supply-on-order and what implications might there be for AGNR management in the 21st century? Unfortunately, numerous AGNR cases can be found. First, it is important to recognize how supply-lines are adjusted in AGNR systems. Producers typically have two leverage mechanisms: adjusting the number of units in production (e.g., total land under cultivation; total animal inventory, etc.) or adjusting the production per unit (e.g., production per unit of cultivated area in cropping systems; yield per head in livestock systems, etc.). Employing either of these options poses interesting trade-offs in the ability to adequately adjust the supply line. Increasing the number of units in production subjects producers to delays on the order of two to four years, while reducing units in production can occur quite rapidly (within a year). On the other hand, increasing the production per unit (through selective plant or animal breeding to enhance production potential) shortens the delay in increasing the supply line, but the genetic enhancement of the overall population makes reduction in per unit productivity extremely difficult if not impossible. To illustrate the importance of these two mechanisms to AGNR systems, consider two recent examples from the United State: corn market boom-and-bust and the contraction of the dairy industry.
The U.S. corn market for decades saw market prices oscillate between $2-4 per bushel and producers' land use decisions remained relatively stable around 78 million acres (Figure 9). In response to a step change in demand in the mid-2000s arising from renewed energy policies incentivizing ethanol production (similar to the step change observed in the Beer Game), prices rose to a peak of between $6-7 per bushel between 2011 and 2013 due to the inventory shortages resulting from the surge in capacity utilization to fill the increased demand. Producers, aiming to capitalize on the rising prices, began expanding planted area of corn by 20%, much of which onto land that had previously been retired from cultivation. Inevitably, there was a delay in productivity (which continues to increase with investment in crop production potential) as these areas came out of retirement. Failing to account for the supply line (i.e., newly converted land that had not reached its full production potential yet), total production over-shot the increased demand, resulting in a collapse of corn prices back to historical levels by 2014. As of 2020, no significant land use correction has occurred ( Figure 9). The corn market example is a conspicuous case. A more subtle but just as powerful example may be found in the U.S. dairy industry. Dairy production is highly seasonal, peaking in late spring and bottoming in winter. Likewise, dairy production consumption is seasonal, peaking in the late fall during the holiday season. Because of the mismatch in peak supply and demand periods, managing inventory is critical for a stable market environment. As a result, prior to 1960, the U.S. dairy industry experienced cycles of expansion and contradiction similar to many other livestock industries as a result of its commodity cycle ( Figure 10). Farm policy interventions in the U.S. began managing these dynamics by purchasing and storing large volumes of milk inventory to buffer seasonal variations in supply and establishing minimum price supports that helped minimize price volatility. Under these conditions, dairy herds were able to consolidate, with 50% fewer head in 1980 compared to 1960. Simultaneously, investments in animal potential yielded a 200% increase in per head productivity. In the late 1980s, U.S. farm policy lowered support prices and government inventories (or coordination stocks) ceased to function as a buffer against seasonal supply and demand imbalances. This increased the price volatility (which has weakened farm business planning, debt repayments, and dairy farm solvency) and the importance of private inventory holdings [104,105]. The corn market example is a conspicuous case. A more subtle but just as powerful example may be found in the U.S. dairy industry. Dairy production is highly seasonal, peaking in late spring and bottoming in winter. Likewise, dairy production consumption is seasonal, peaking in the late fall during the holiday season. Because of the mismatch in peak supply and demand periods, managing inventory is critical for a stable market environment. As a result, prior to 1960, the U.S. dairy industry experienced cycles of expansion and contradiction similar to many other livestock industries as a result of its commodity cycle ( Figure 10). Farm policy interventions in the U.S. began managing these dynamics by purchasing and storing large volumes of milk inventory to buffer seasonal variations in supply and establishing minimum price supports that helped minimize price volatility. Under these conditions, dairy herds were able to consolidate, with 50% fewer head in 1980 compared to 1960. Simultaneously, investments in animal potential yielded a 200% increase in per head productivity. In the late 1980s, U.S. farm policy lowered support prices and government inventories (or coordination stocks) ceased to function as a buffer against seasonal supply and demand imbalances. This increased the price volatility (which has weakened farm business planning, debt repayments, and dairy farm solvency) and the importance of private inventory holdings [104,105]. Why the increasing price volatility (amplitude) despite a stable dairy herd level? In part, seasonality of milk production inevitably creates oscillations in inventory and therefore price. However, the amplitude has significantly increased, with greater gaps between seasonal highs and lows, indicating large shifts in inventory (booms and busts similar to the Beer Game). Booms in supply (which drive price declines) have resulted not from increasing animal units, but increasing production per head (up 400% compared to 1960), and the industry has not counteracted this productivity with reducing total animal units. Instead, inventory corrections have been made through dumping (119 million pounds in 2016, 170 million pounds in 2017, over 145 million pounds in 2018; greater dumping rate is expected in 2020 due to the coronavirus pandemic; [106][107][108]). Clearly, as indicated in farm gate milk prices, this is a low leverage strategy that only temporarily corrects inventory and prices and prolongs the stress to remaining dairy producers as the volatility rises due to the continual rise in incoming inventory (that necessitates increased dumping) that will not soon change due to investments over time in herd productivity (i.e., permanent gains in genetic potential that has raised milk yield per head) that have accrued or have not yet been realized due to delays in the system.
What are the implications for the future of AGNR systems management? Without accounting for the supply line on order in AGNR supply chains, AGNR managers will continue to respond in ways to perpetuate the problems stemming from inherent oscillations and will continue to look for external causes to blame (e.g., environmental variability, government policy change, consumer behavior, etc.) for internal industry dilemmas [104]. System structure can be defined by the basic interrelationships that influence, regulate, or control behavior (including external constraints), but structure more importantly is the endogenous decision-making rules, operating policies, goals, and modus operandi, many of which are unwritten and embedded in the culture of industries and organizations. For example, given the productivity-driven goals and mental models of the dairy industry, order rate (i.e., investment in per head productivity) has not slowed, despite the recognition that the market is over-supplied. Failure to recognize how our decisions interact with the system as a whole hinders our ability to find and effectively apply leverage to systemic problems (leverage often comes from new ways of thinking [109]). Why the increasing price volatility (amplitude) despite a stable dairy herd level? In part, seasonality of milk production inevitably creates oscillations in inventory and therefore price. However, the amplitude has significantly increased, with greater gaps between seasonal highs and lows, indicating large shifts in inventory (booms and busts similar to the Beer Game). Booms in supply (which drive price declines) have resulted not from increasing animal units, but increasing production per head (up 400% compared to 1960), and the industry has not counteracted this productivity with reducing total animal units. Instead, inventory corrections have been made through dumping (119 million pounds in 2016, 170 million pounds in 2017, over 145 million pounds in 2018; greater dumping rate is expected in 2020 due to the coronavirus pandemic; [106][107][108]). Clearly, as indicated in farm gate milk prices, this is a low leverage strategy that only temporarily corrects inventory and prices and prolongs the stress to remaining dairy producers as the volatility rises due to the continual rise in incoming inventory (that necessitates increased dumping) that will not soon change due to investments over time in herd productivity (i.e., permanent gains in genetic potential that has raised milk yield per head) that have accrued or have not yet been realized due to delays in the system.
What are the implications for the future of AGNR systems management? Without accounting for the supply line on order in AGNR supply chains, AGNR managers will continue to respond in ways to perpetuate the problems stemming from inherent oscillations and will continue to look for external causes to blame (e.g., environmental variability, government policy change, consumer behavior, etc.) for internal industry dilemmas [104]. System structure can be defined by the basic interrelationships that influence, regulate, or control behavior (including external constraints), but structure more importantly is the endogenous decision-making rules, operating policies, goals, and modus operandi, many of which are unwritten and embedded in the culture of industries and organizations. For example, given the productivity-driven goals and mental models of the dairy industry, order rate (i.e., investment in per head productivity) has not slowed, despite the recognition that the market is over-supplied. Failure to recognize how our decisions interact with the system as a whole hinders our ability to find and effectively apply leverage to systemic problems (leverage often comes from new ways of thinking [109]).
AGNR professionals must overcome the same common learning disabilities that are seen in humans across cultures and contexts [60,95] and the barriers that impede our learning about complex systems [33]. Almost regardless of history or experience, when inserted into a given position within a system or organization, the structure incentivizes that we "become our position." In AGNR, managers often view their position as "producers" or those who "feed the world," reinforcing tendencies to view success based on their own productivity rather than how effectively they have met consumer expectations or balanced socio-economic and environmental concerns (e.g., the soil and water externalities cited above). Since many externalities are never felt by those that made the decisions that created the problems and because AGNR delays are particularly lengthy, our 'knee jerk' reactions are to assign blame to others around us and we fail to effectively learn from experience and the collective wisdom of others in the system: "To oscillate, the time delay must be (at least partially) ignored. The manager must continue to initiate corrective actions in response to the perceived gap between the desired and actual state of the system even after sufficient corrections to close the gap are in the pipeline . . . Learning to recognize and account for time delays goes hand in hand with learning to be patient, to defer gratification, and to trade short-run sacrifice for long-term reward. These abilities do not develop automatically. They are part of a slow process of maturation. The longer the time delays and the greater the uncertainty over how long it will take to see the results of your corrective actions, the harder it is to account for the supply line." [110] Similar learning disabilities and the consequences they exert on decision making have been observed in other natural resource management studies [111][112][113].
To overcome these disabilities and barriers, the SD profession has prioritized and advocated for systems-based education from K-12th grade levels up to university graduate programs (see the Creative Learning Exchange at clexchange.org, as well as works of Forrester [114][115][116][117][118][119] and others [50,55,120,121]). Given the results of our Beer Game database and our experience in the AGNR professions, the need for systems education in these disciplines is as desperately needed as ever if effective change is to be expected and gaps in the 21st century challenge begin to sustainably close.
AGNR professionals with systems education could likely achieve significantly different results compared to professionals without systems-oriented education. For example, thinking in systems forces us to recognize the interconnectedness and dynamic complexity of the problem at hand, the physical stocks and flows central to the issue, and time-delays between decisions and results. Systems thinking and system dynamics modeling also encourages us to maintain an unwavering commitment to the highest standards and rigor of scientific method by recognizing and correcting our hidden biases and documenting and testing our assumptions about the problem. By doing so, we can explore a wider decision space for new or previously unrecognized leverage points to achieve our goals [59]. Achieving the 21st century agriculture challenge requires input and collaboration across disciplines and cultures. System dynamics can provide a common unifying language to facilitate such collaboration.

Study Limitations
There are a number of limitations of the study as presented. First, for general research purposes, our use of the traditional board, pencil, and paper based Beer Game is unconventional, and it has been noted that this use of the Beer Game is no longer acceptable because of the high rate of clerical and data recording errors. All modern Beer Game studies are expected to use a computer version of the game to prevent such errors and to offer tight controls on information availability among the players. Due to the structure of the lectureships where our participants played the traditional Beer Game and the available computer resources at the time, computer applications have been generally unfeasible. We attempted to limit clerical and data errors by screening the data with an application of a Beer Game model. Second, traditional incentive scheme in the board game version ($1 entry fee and winner-take-all) is not consistent with current standards for experimental studies in economics, in which people are paid in proportion to their performance. Lastly, previous Beer Game studies have estimated an ordering decision rule to test the misperceptions of feedback hypothesis, which we have omitted here but is planned for in the future.

Conclusions
System dynamics facilitates investigation of the dynamic consequences of choices made by decision-makers in complex, feedback driven systems. Played by thousands over many decades, the well-known Beer Game has become a fundamental learning tool which reveals critical learning disabilities and illustrates how system structure creates behaviors over time (e.g., oscillatory inventory/backlog and exuberant rising costs). Here, a Beer Game database generated by over 270 AGNR managers (87%) and students (13%) was analyzed. A distinct facet of our study is that the majority of the participants in our database had deep AGNR management experience and least a B.S. degree (many with M.S. or Ph.D. degrees). The performance of these managers was poorer than managers in a seminal Beer Game study. More interestingly, we found evidence that younger players (in this case undergraduate AGNR students) were willing to change their decision strategies sooner and with greater magnitudes in response to pressures of the game compared with their older counterparts. In light of the many 21st century AGNR problems (e.g., food and agriculture production, natural resource capacity and environmental quality, pollution mitigation, etc.), being able to identify and communicate the dynamic complexity of problems and overcome common learning disabilities would greatly benefit AGNR managers. System dynamics provides such a framework and has been integrated to two AGNR programs described here and we encourage other AGNR programs to likewise adopt a system dynamics approach. Future work may explore emotional and psychological factors underpinning dynamic decision-making.