有关月亮的歌曲
增强学习四要素龙溪社区
做不倒翁
增强学习有四个要素:policy, reward signal, value function and model of the environment.
1.Policy
policy定义了在给定时间点,对环境(situation)将做出如何的⾏为。( a policy defines the learning agent's way of the behaving at a given time).
2.Reward Signal
reward signal定义了在增强学习过程中的⽬标(goal)(a reward signal defines the goal in a reinforcement learning problem)。我们的学习⽬标就是要maximize the total reward。
热带草原猫3. Value Function
干燥皮肤value function定义了长期来看的reward(a value function specifies what is good in the long run)。举个例⼦,agent可能选择⼀个暂时low的reward,但是在那个时间段内,总体的reward⽐较⼤。value function可以看作是对未来reward的estimate,是增强学习算法中核⼼的部分。
二次元网名
4. Model of the environment
完美腰臀比model of the environment定义了环境因agent的action如何变化(the model of the environment is something that mimics the behavior of the environment, or more generally,that allows inferences to be made about how the environment will behavior)。
>小鱼游戏