Reinforcement Learning in Repeated Portfolio Decisions
Linan Diao Jörg Rieskamp*
Abstract: How do people make investment decisions when they receive outcome feedback? We examined how well the standard mean-variance model and two reinforcement models predict people’s portfolio decisions. The basic reinforcement model predicts a learning process that relies solely on the portfolio’s overall return, whereas the propod extended reinforcement model also takes the risk and covariance of the investments into account. The experimental results illustrate that people reacted nsitively to different correlation structures of the investment alternatives, which was best predicted by the extended reinforcement model. The results illustrate that simple reinforcement learning is suf cient to detect correlation between investments.
Keywords:Repeated Portfolio Decisions; Reinforcement Learning Model; Correlation误会英语
陈冠希英文道歉
1 Introduction
Economic theory states that nancial investments should depend on the expected returns, the perceived risks, and the correlation of available asts. Formally this is speci ed by the mean-variance (MV) model of nance (Markowitz 1952, 1959; Tobin, 1958). We examined experimentally whether people are nsitive to the core characteristics of investment alternatives when they make investment decisions repeatedly and are provided with feedback that allows for learning. To model the obrved learning process, we tested a basic learning model that relies only on the portfolios’ returns against an extended learning model that also takes the perceived risks and the correlation between investment asts into account.smoking is harmful
Past rearch has already examined whether people react nsitively to the correlation of investment alternatives. Kroll, Levy, and Rapoport (1988a, 1988b) showed that participants
四级听力高频词汇
* Linan Diao, Business School of Jilin University; Jörg Rieskamp, University of Bal, Switzerland. We would like to thank Werner Güth, David Hugh-Jones, Oliver Kirchkamp, Rene Levinsky, Ondrej Rydval, Eva-Maria Steiger, tobias Uske or their uful comments and suggestions, and all the audience at the ESI, IMPRS brown bag minars, the audience at 2010 Experimental Finance Confe
rence at Gothenburg, and the audience at the 5th Nordic Behavioral and Experimental Economics at Helsinki. We would also like to thank all the student assistants in Max Planck Institute of Economics who helped us conduct the experiments.
were not very nsitive to correlations between stocks and deviated from the MV model and paration theorem by frequently switching between available stocks and by making investments depending on the alternatives’ returns in the preceding investment periods. According to the paration theorem, if borrowing and lending are not constrained and the rates of borrowing and lending are the same, the ef cient risky frontier is reduced to a single optimal portfolio of risky stocks. Thus, the proportions invested in the risky stocks should be fixed. However, the experimental results violated the paration theorem’s prediction. Canner, Mankiw, and Weil (1997) showed that even financial advisors’ recommendations can deviate from the paration theorem. Lipe (1998) and Kallir and Sonsino (2009) showed that participants could perceive different levels of covariance, but their allocations were not significantly affected by changing the correlations. Finally, Hedesstroem, Svedsaeter, and Gaerling (2006) also showed that novice investors neglect covariation when diversifying across investment alternatives. However, most of this work illustrating that people do not react nsitively to the correlation between investment alternatives did not provide participants with
much learning opportunity. If learning opportunity was provided the rearch often did not provide a learning theory to explain the obrved learning process.
Rearch on decision making in general has illustrated that learning can have a strong impact on people’s behavior. For instance, Erev and Roth (1998) showed that experience can often lead to quick convergence to equilibrium predictions of economic games. Likewi, Bossaerts and Plott (2002, 2004) showed in experimental repeated ast markets that investors’ portfolios steadily converged toward the prediction of the capital ast pricing model. To explain the effects we need to interpret the obrved behavior by reference to learning models. Learning models can provide an explanation for how learning changes behavior and under which conditions learning will or will not lead to convergence with economic theory.
Reinforcement learning models have been successfully ud to predict and explain repeated decisions in many situations, including financial ones. Rearch has shown that reinforcement learning models can describe and predict people’s behavior better than the equilibrium prediction. For instance, Erev and Roth (1998) demonstrated that even very simple reinforcement learning models can explain behavior in experimental games with unique equilibrium in mixed strategies better than the equilibrium prediction. Camerer and Ho (1999) suggested an experienced-weighted a
春节晚会主持词ttraction (EWA) learning model which in addition to reinforcement learning (i.e., learning from experienced outcomes), assumes belief learning (i.e., learning from other players’ behavior). The EWA model accurately described people’s decision for various tasks such as constant-sum games with unique mixed-strategy equilibria, “median-action” coordination with multiple Pareto-ranked equilibria, and a dominance-solvable “p-beauty contest” game with a unique equilibrium. Feltovich (2000) compared reinforcement and belief-bad models against each other and found that both models predicted behavior better than Nash equilibrium, and the reinforcement model predicted the
obrved learning process best. Hopkins (2002) also tested reinforcement learning and belief learning models against each other and pointed out that despite their conceptual differences they often make very similar predictions. Erev and Barron (2005) developed a more high-order learning model assuming that the objects of reinforcement are cognitive strategies people apply to make choices in risky gambles.
In the finance area, Rieskamp, Bumeyer, and Laine (2003) examined learning process when participants had to allocate resources to nancial asts. They found that a learning model that assumes people only slightly modify their previous allocations bad on feedback described investment decisions better than a learning model that assumes people try out a large variety of allo
cations. Rieskamp (2006) tested participants’ hypothetical retirement savings decisions in two experiments. The results showed that learning models that incorporate recency effects described the obrved decisions best. Kaustia and Knuepfer (2008) tested reinforcement learning in the Finnish financial market and Choi, Laibson, Madrian, and Metrick (2009) ud a naïve reinforcement learning model to explain their findings: investors who experienced particularly rewarding outcomes from 401(k) savings incread their 401(k) savings rate more than investors who had less rewarding experiences. Shimokawa, Suzuki, Misawa, and Okano (2009) developed a modi ed temporal-difference reinforcement learning model to describe decision-making process for nancial investments.
In sum, the reviewed studies show that people often change their investment decisions when they are provided with feedback about the decision outcomes. Furthermore, reinforcement models often provide a good description of the obrved learning process and their predictions are often better than standard economic equilibrium predictions. Therefore, we tested two learning models against each other to e which would be better at predicting how people make repeated portfolio decisions when they receive feedback about the portfolios’ returns.
circus是什么意思What are the core assumptions of learning models? Most of the reinforcement learning models descr
ibed above assume that the probability with which an alternative is chon is an increasing function of the previously received reinforcements speci ed by an investment alternative’s return and a decreasing function of the reinforcement for other alternatives. Surprisingly, none of the suggested learning models described above take the correlation between the alternatives’ returns explicitly into account. This is surprising when considering that the correlation between the alternatives’ returns is a core component of the portfolio theory. Learning models that ignore the correlation should not be able to explain people’s repeated portfolio decisions if the decisions are affected by the correlation.
We examined to what extent correlation between alternatives in uences people’s repeated investment decisions. If people react nsitively to the correlation between investment alternatives, this will require learning models that incorporate a mechanism for the correlationsupercharger
between outcomes. Therefore, we conceived of a new reinforcement learning model that takes the risk of investment alternatives and the correlations of the alternatives’ returns explicitly into account. We tested this new model against a standard reinforcement learning model to describe people’s repeated investments.
To test the two models against each other we conducted two experiments. In the experiments, partici
pants had to make portfolio decisions and were able to change their portfolios on the basis of received feedback. First we examined whether people’s portfolio decisions were qualitatively in line with the predictions of the MV model and whether the investments were nsitive to the correlations between the investment alternatives. Second, we tested which of the two learning models predicts the obrved investments better. In the rst experiment, the participants were given detailed information about the mean and variance of the distributions from which the returns of the alternatives (i.e., stocks) were randomly drawn. However, no information about the correlation between the alternatives was provided. In the cond experiment, participants were only informed about the distributions from which the stocks’ returns were drawn without being given any information about the mean or variance of the distribution. Again no information about the correlation was provided. Thus, whereas in the rst experiment the participants could u the feedback to learn about the correlation, in the cond experiment they also had to learn the alternatives’ average returns, the involved risks, and the correlation of alternatives’ returns. The results show that participants’ portfolios differed for three conditions with different correlations of the investment alternatives. Even in the cond experiment, where the participants had to learn the various characteristics of the investment alternatives, the nsitivity to the correlation was obrved.
The rest of this article is organized as follows: Section 2 describes the two reinforcement learning models we ud. Section 3 describes the first experiment and its results in detail. Section 4 describes the cond experiment and its results in detail. We conclude with the general discussion in Section 5.
2 The Reinforcement Learning Models
safeguardThe basic idea of reinforcement learning models is that decisions are a function of the learned expectancies of alternatives. The expectancies are updated by the feedback a decision maker receives. We examined to what extent experience changes people’s investment decisions by testing two reinforcement models.
2.1 Basic Reinforcement Learning Model
Past work by, for instance, Lipe (1998) indicates that people might ba their investment decisions only on the obrved returns of the available investment alternatives, thereby水嶋あい>stuffed