#
Reinforcement Learning in a New Keynesian Model^{ †}

^{1}

^{2}

^{3}

^{4}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. The Standard Behavioural NK Model

#### 2.1. The Workhorse NK Model

#### 2.2. The Brock–Hommes Behavioural NK Model

## 3. The Non-Linear NK Model

#### 3.1. Households

#### 3.2. Firms, Government Expenditures and Monetary Policy

#### 3.3. Recovering the NK Workhorse Model

## 4. AU Learning and Market-Consistent Information

## 5. Heterogeneous Expectations across Agents

#### 5.1. Exogenous Proportions of RE and BR Agents

#### 5.2. Endogenous Proportions of RE and BR Agents with Reinforcement Learning

#### 5.3. The Possibility of Bifurcation and Chaotic Dynamics

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Appendix A. Stability Analysis

**Figure A1.**Comparison of stability properties of RE, EL and AU models in $({\rho}_{r},{\theta}_{\pi})$ space; ${\rho}_{r}>0$, ${\lambda}_{x}=1$; red: determinancy; black: indeterminacy; green: instability. (

**a**) RE: ${\theta}_{y}=0.3$; (

**b**) EL: ${\theta}_{y}=0.3$; (

**c**) AU: ${\theta}_{y}=0.3$.

**Figure A2.**Comparisonof stability properties of EL and AU models in $({\theta}_{y},{\theta}_{\pi})$ space; ${\rho}_{r}=1$, ${\lambda}_{x}=1$; red: determinancy; black: indeterminacy; green: instability. (

**a**) EL: ${\rho}_{r}=1$; (

**b**) AU: ${\rho}_{r}=1$.

## Appendix B. Summary of Composite RE-BR Model

**RE Households:**

**BR Households:**

**Wholesale Firms:**

**RE Retail Firms:**

**BR Retail Firms:**

**One-Period Ahead Adaptive Expectations:**

**Wealth Distribution:**

**Closure of Model:**

**Endogenous Proportions of RE and BR Agents:**

**Welfare and Consumption Equivalence:**

## Appendix C. Balanced Growth Steady State

## Appendix D. Lemma

**Lemma**

**A1.**

**Proof.**

## Appendix E

**Proof**

**of**

**Equation**

**(28).**

## Appendix F. Additional Simulated IRFs for RE-BR Composite Models

**Figure A3.**RE versus RE-BR composite expectations with ${n}_{h}={n}_{f}=0.5$; ${\lambda}_{x}=0.25,1.0$; Taylor rule with ${\rho}_{r}=0.7$, ${\theta}_{\pi}=1.5$ and ${\theta}_{y}=0.3$; technology shock.

**Figure A4.**RE versus RE-BR composite expectations with ${n}_{h}={n}_{f}=0.5$; ${\lambda}_{x}=0.25,1.0$; Taylor rule with ${\rho}_{r}=0.7$, ${\theta}_{\pi}=1.5$ and ${\theta}_{y}=0.3$; mark-up shock.

## Appendix G. Robustness

**Table A1.**Third-order solution of the estimated NK RE-BR model; ${\mu}_{h}^{RE}={\mu}_{h}^{BR}={\mu}_{f}^{RE}={\mu}_{f}^{BR}=0.5$; $\gamma =1,100,1000$.

Variable | Stochastic Mean | Standard Deviation (%) | Skewness | Kurtosis |
---|---|---|---|---|

$\frac{{C}_{t}}{C}$ | 0.999544 | 0.042057 | 0.323304 | 0.093034 |

$\frac{{H}_{t}}{H}$ | 1.000273 | 0.005111 | 0.038002 | −0.020743 |

$\frac{{W}_{t}}{W}$ | 0.999810 | 0.038145 | 0.318586 | 0.073488 |

$\frac{{\Pi}_{t}}{\Pi}$ | 0.999898 | 0.004235 | −0.045800 | 0.030136 |

$\frac{{R}_{n,t}}{{R}_{n}}$ | 0.999887 | 0.004440 | −0.046254 | 0.044145 |

${\Phi}_{h,t}^{RE}-{C}_{h}$ | −0.000443 | 0.000257 | −1.504159 | 3.793195 |

${\Phi}_{h,t}^{AE}$ | −0.000526 | 0.000303 | −1.592581 | 4.581412 |

${\Phi}_{f,t}^{RE}-{C}_{f}$ | −0.000199 | 0.000116 | −1.672777 | 5.558457 |

${\Phi}_{f,t}^{AE}$ | −0.000349 | 0.000226 | −1.897335 | 7.457836 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1;\sigma =1)$ | 0.100008 | 0.000013 | 0.488774 | 3.275592 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1;\sigma =1)$ | 0.100014 | 0.000016 | 1.680492 | 6.480563 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =100;\sigma =1)$ | 0.100750 | 0.001295 | 0.488774 | 3.275592 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =100;\sigma =1)$ | 0.101352 | 0.001568 | 1.680492 | 6.480563 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =1)$ | 0.107502 | 0.012952 | 0.488774 | 3.275592 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =1)$ | 0.113519 | 0.015679 | 1.680492 | 6.480563 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =2)$ | 0.130010 | 0.052873 | 0.535046 | 3.638229 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =2)$ | 0.154185 | 0.063624 | 1.779321 | 7.399916 |

**Table A2.**Third-order solution of the estimated NK RE-BR model; ${\mu}_{h}^{RE}={\mu}_{h}^{BR}={\mu}_{f}^{RE}={\mu}_{f}^{BR}=0.75$; $\gamma =1,100,1000$.

Variable | Stochastic Mean | Standard Deviation (%) | Skewness | Kurtosis |
---|---|---|---|---|

$\frac{{C}_{t}}{C}$ | 0.999544 | 0.042057 | 0.323304 | 0.093034 |

$\frac{{H}_{t}}{H}$ | 1.000273 | 0.005111 | 0.038002 | −0.020743 |

$\frac{{W}_{t}}{W}$ | 0.999810 | 0.038145 | 0.318586 | 0.073488 |

$\frac{{\Pi}_{t}}{\Pi}$ | 0.999898 | 0.004235 | −0.045797 | 0.030137 |

$\frac{{R}_{n,t}}{{R}_{n}}$ | 0.999887 | 0.004440 | −0.046251 | 0.044145 |

${\Phi}_{h,t}^{RE}-{C}_{h}$ | −0.000443 | 0.000170 | −0.978598 | 1.538134 |

${\Phi}_{h,t}^{AE}$ | −0.000526 | 0.000204 | −1.088202 | 2.164231 |

${\Phi}_{f,t}^{RE}-{C}_{f}$ | −0.000199 | 0.000077 | −1.063312 | 2.243911 |

${\Phi}_{f,t}^{AE}$ | −0.000349 | 0.000159 | −1.414287 | 4.290569 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1;\sigma =1)$ | 0.100008 | 0.000008 | 0.350716 | 2.281134 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1;\sigma =1)$ | 0.100014 | 0.000011 | 1.385635 | 4.243151 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =100;\sigma =1)$ | 0.100750 | 0.000821 | 0.350716 | 2.281134 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =100;\sigma =1)$ | 0.101352 | 0.001081 | 1.385635 | 4.243151 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =1)$ | 0.107503 | 0.008211 | 0.350716 | 2.281134 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =1)$ | 0.113521 | 0.010812 | 1.385635 | 4.243151 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =2)$ | 0.130012 | 0.033699 | 0.406619 | 2.557592 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =2)$ | 0.154191 | 0.044060 | 1.491071 | 4.993491 |

## References

- Branch, W.A.; Evans, G.W. Monetary Policy and Heterogeneous Agents. Econ. Theory
**2011**, 47, 365–393. [Google Scholar] [CrossRef] - De Grauwe, P. Animal spirits and monetary policy. Econ. Theory
**2011**, 47, 423–457. [Google Scholar] [CrossRef] - De Grauwe, P. Booms and Busts in Economic Activity: A Behavioral Explanation. J. Econ. Behav. Organ.
**2012**, 83, 484–501. [Google Scholar] [CrossRef] - De Grauwe, P. Lectures on Behavioral Macroeconomics; Princeton University Press: Princeton, NJ, USA, 2012. [Google Scholar]
- Sims, C. Macroeconomics and Reality. Econometrica
**1980**, 48, 1–48. [Google Scholar] [CrossRef] - Evans, G.W.; Honkapohja, S. Learning and Macroeconomics. Annu. Rev. Econ.
**2009**, 1, 421–449. [Google Scholar] [CrossRef] - Eusepi, S.; Preston, B. The science of monetary policy: An imperfect knowledge perspective. In Federal Reserve Bank of New York Satff Reports; No. 782; Federal Reserve Bank of New York: New York, NY, USA, 2016. [Google Scholar]
- Branch, W.A.; McGough, B. Heterogeneous Expectations and Micro-Foundations in Macroeconomics. In Handbook of Computational Economics 4; Elsevier: Amsterdam, The Netherlands, 2018. [Google Scholar]
- CalvertJump, R.; Levine, P. Behavioural New Keynesian Models. J. Macroecon.
**2019**, 59, 58–77. [Google Scholar] - Caiani, A.; Godin, A.; Caverzasi, E.; Gallegati, M.; Kinsella, S.; Stiglitz, J. Agent based-stock flow consistent macroeconomics: Towards a benchmark model. J. Econ. Dyn. Control
**2016**, 69, 375–408. [Google Scholar] [CrossRef] - Tesfatsion, L. Agent-based computational economics: A constructive approach to economic theory. In Handbook of Computational Economics; Tesfatsion, L.S., Judd, K.L., Eds.; North-Holand: Amsterdam, The Netherlands, 2006; pp. 831–880. [Google Scholar]
- Levine, P. The State of DSGE Modelling. In Oxford Research Encyclopedia of Economics and Finance; Oxford University Press: Oxford, UK, 2020. [Google Scholar]
- Brock, W.A.; Hommes, C. A Rational Route to Randomness. Econometrica
**1997**, 65, 1059–1095. [Google Scholar] [CrossRef] - Tamvada, J.P.; Chowdhury, R. The irrationality of rationality in market economics: A paradox of incentives perspective. Bus. Soc.
**2023**, 62, 482–487. [Google Scholar] [CrossRef] - Sirota, M.; Juanchich, M.; Holford, D.L. Rationally irrational: When people do not correct their reasoning errors even if they could. J. Exp. Psychol. Gen.
**2023**. advance online publication. [Google Scholar] [CrossRef] - Kreps, D. Anticipated Utility and Dynamic Choice. In Frontiers of Research in Economic Theory; Jacobs, D., Kalai, E., Kamien, M., Eds.; Cambridge University Press: Cambridge, UK, 1998; pp. 242–274. [Google Scholar]
- Adam, K.; Marcet, A. Internal Rationality, Imperfect Market Knowledge and Asset Prices. J. Econ. Theory
**2011**, 146, 1224–1252. [Google Scholar] [CrossRef] - Cogley, T.; Sargent, T.J. Anticipated utility and rational expectations as approximations of bayesian decision making. Int. Econ. Rev.
**2008**, 49, 185–221. [Google Scholar] [CrossRef] - Branch, W.A.; McGough, B. Dynamic predictor election in a new keynesian model with heterogeneous agents. J. Econ. Dyn. Control
**2010**, 34, 1492–1508. [Google Scholar] [CrossRef] - Massaro, D. Heterogeneous Expectations in Monetary DSGE Models. J. Econ. Dyn. Control
**2013**, 37, 680–692. [Google Scholar] [CrossRef] - Cornea-Madeira, A.; Hommes, C.; Massaro, D. Behavioral Heterogeneity in U.S. Inflation Dynamics. J. Bus. Econ. Stat.
**2019**, 37, 288–300. [Google Scholar] [CrossRef] - CalvertJump, R.; Hommes, C.; Levine, P. Learning, Heterogeneity, and Complexity in the New Keynesian model. J. Econ. Behav. Organ.
**2019**, 166, 446–470. [Google Scholar] [CrossRef] - Milani, F. Expectations, learning and macroeconomic persistence. J. Monet. Econ.
**2007**, 54, 2065–2082. [Google Scholar] [CrossRef] - Anufriev, M.; Hommes, C.; Makarewicz, T. Simple Forecasting Heuristics that Make Us Smart: Evidence from Different Market Experiments; Working Paper Series 29; Economics Discipline Group, UTS Business School, University of Technology: Sydney, Australia, 2015. [Google Scholar]
- Hommes, C. Behavioral Rationality and Heterogeneous Expectations in Complex Economic Systems; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
- Schmitt-Grohe, S.; Uribe, M. Closing small open economy models. J. Int. Econ.
**2003**, 61, 163–185. [Google Scholar] [CrossRef] - Dixit, A.K.; Stiglitz, J.E. Monopolistic competition and optimal product diversity. Am. Econ. Rev.
**1977**, 67, 297–308. [Google Scholar] - Calvo, G. Staggered Prices in a Utility-Maximising Framework. J. Monet. Econ.
**1983**, 12, 383–398. [Google Scholar] [CrossRef] - Nimark, K.P. Man-Bites-Dog Business Cycles. Am. Econ. Rev.
**2014**, 104, 2320–2367. [Google Scholar] [CrossRef] - Deak, S.; Levine, P.; Pearlman, J.; Yang, B. Internal Rationality, Learning and Imperfect Information; School of Economics, University of Surrey: Guildford, UK, 2017; Discussion Papers 08/17. [Google Scholar]
- Smets, F.; Wouters, R. Shocks and Frictions in US business cycles: A Bayesian DSGE approach. Am. Econ. Rev.
**2007**, 97, 586–606. [Google Scholar] [CrossRef] - Deak, S.; Mirza, A.; Levine, P.; Pearlman, J. Designing Robust Policies using Optimal Pooling; School of Economics, University of Surrey: Guildford, UK, 2019; Discussion Papers 12/19. [Google Scholar]
- Deak, S.; Mirza, A.; Levine, P.; Pham, S. Negotiating the Wilderness of Bounded Rationality through Robust Policy; School of Economics, University of Surrey: Guildford, UK, 2023; Discussion Papers 02/23. [Google Scholar]
- Bildirici, M.; Ersin, O. Markov-switching vector autoregressive neural networks and sensitivity analysis of environment, economic growth and petrol prices. Environ. Sci. Pollut. Res.
**2018**, 25, 31630–31655. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**RE versus RE-BR composite expectations with ${n}_{h}={n}_{f}=0.5$, ${\lambda}_{x}=0.25,1.0$; Taylor rule with ${\rho}_{r}=0.7$, ${\theta}_{\pi}=1.5$ and ${\theta}_{y}=0.3$, ${\theta}_{dy}=0$; monetary policy shock.

**Table 1.**Third-order solution of the estimated NK RE-BR model; ${\mu}_{h}^{RE}={\mu}_{h}^{BR}={\mu}_{f}^{RE}={\mu}_{f}^{BR}=0$; $\gamma =1,100,1000$.

Variable | Stochastic Mean | Standard Deviation (%) | Skewness | Kurtosis |
---|---|---|---|---|

$\frac{{C}_{t}}{C}$ | 0.999544 | 0.042057 | 0.323304 | 0.093034 |

$\frac{{H}_{t}}{H}$ | 1.000273 | 0.005111 | 0.038002 | −0.020743 |

$\frac{{W}_{t}}{W}$ | 0.999810 | 0.038145 | 0.318586 | 0.073488 |

$\frac{{\Pi}_{t}}{\Pi}$ | 0.999898 | 0.004235 | −0.045800 | 0.030136 |

$\frac{{R}_{n,t}}{{R}_{n}}$ | 0.999887 | 0.004440 | −0.046254 | 0.044145 |

${\Phi}_{h,t}^{RE}-{C}_{h}$ | −0.000443 | 0.000446 | −2.078809 | 6.635580 |

${\Phi}_{h,t}^{AE}$ | −0.000526 | 0.000516 | −2.168947 | 8.000489 |

${\Phi}_{f,t}^{RE}-{C}_{f}$ | −0.000199 | 0.000203 | −2.279557 | 9.082031 |

${\Phi}_{f,t}^{AE}$ | −0.000349 | 0.000342 | −2.269953 | 9.937975 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1;\sigma =1)$ | 0.100008 | 0.000023 | 0.857638 | 4.454288 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1;\sigma =1)$ | 0.100014 | 0.000025 | 1.586194 | 6.015115 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =100;\sigma =1)$ | 0.100750 | 0.002297 | 0.857638 | 4.454288 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =100;\sigma =1)$ | 0.101352 | 0.002479 | 1.586194 | 6.015115 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =1)$ | 0.107501 | 0.022973 | 0.857638 | 4.454288 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =1)$ | 0.113518 | 0.024787 | 1.586194 | 6.015115 |

${n}_{h,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =2)$ | 0.130007 | 0.093482 | 0.888592 | 4.857691 |

${n}_{f,t}\phantom{\rule{3.33333pt}{0ex}}(\gamma =1000;\sigma =2)$ | 0.154182 | 0.100265 | 1.683430 | 6.867599 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Deák, S.; Levine, P.; Pearlman, J.; Yang, B.
Reinforcement Learning in a New Keynesian Model. *Algorithms* **2023**, *16*, 280.
https://doi.org/10.3390/a16060280

**AMA Style**

Deák S, Levine P, Pearlman J, Yang B.
Reinforcement Learning in a New Keynesian Model. *Algorithms*. 2023; 16(6):280.
https://doi.org/10.3390/a16060280

**Chicago/Turabian Style**

Deák, Szabolcs, Paul Levine, Joseph Pearlman, and Bo Yang.
2023. "Reinforcement Learning in a New Keynesian Model" *Algorithms* 16, no. 6: 280.
https://doi.org/10.3390/a16060280