强化学习实战(⼀)(tensorlayer乒乓球教程)
运⾏乒乓球例⼦
在本教程的第⼆部分,我们将运⾏⼀个深度强化学自我心理分析 习的例⼦,它在Karpathy的两篇博客DeepReinforcementLearning:Pongfrom
Pixels有介绍。
pythontutorial_atari_
在运⾏教程代码之前您需要安装OpenAIgymenvironment,焖鸡的做法 它提供了⼤量强化学习常⽤的游戏环境。如果⼀切运⾏正常,您将得到以
下的输出:
[2016-07-1209:31:59,760]Makingnewenv:Pong-v0
[TL]InputLayerinput_layer(?,6400)
[TL]DenLayerrelu1:200,relu
[TL]DenLayeroutput_layer:3,identity
param0:(6400,200)(mean:-0.000009median:-0.000018std:0.017393)
param1:(200,)(mean:0.000000median:0.000000std:0.000000)
param2:(200,3)(mean:0.002239median:0.003122std:0.096611)
param3:(3,)(mean:0.000000median:0.000000std:0.000000)
numofparams:1280803
layer0:Ten消亡的近义词 sor(“Relu:0”,shape=(?,200),dtype=float32)
layer1:Tensor(“add_1:0”,shape=(?,3),dtype防火顺口溜 =float32)
episode0:game0took0.17381s,reward:-1.000000
episode0:game1took0.12629s,reward:1.000000
episode0:game2took0.17082s,reward:-1.000000
episode0:game3took0.08944s,reward:-1.000000
episode0:game4took0.09446s,reward:-胖人穿搭 1.000000
episode0:game5took0.09440s,reward:-1.000000
episode0:game6took0.32798s,reward:-1.000000
episode0:game7took0.74437s,reward:-1.000000
episode0:game8took0.43013s,reward:-1.000000
episode0:game9took0.42496s,reward:-1.000000
episode0:game10took0.37128s,reward:-1.000000
episode0:game11took0.08979s,reward:-1.000000
episode0:game12took0.09138s,reward:-1.000000
episode0:game13took0.09142s,reward:-1.000000
episode0:game14took0.09639s,reward:-1.000000
episode0:game15took0.09852s,reward:-1.000000
episode0:game16took0.09984s,rewa木瓜炖牛奶 rd:-1.000000
episode0:game17took0.09575s,reward:-1.000000
episode0:game18took0.09416s,reward:-1.000000
episode0:game19took0.08674s,reward:-1.000000
episode0:game20took0.09628s,reward:-1.000000
gmean:-20.000000
episode1:game0took0.09910s,rewar1960年属什么 d:-1.000000
episode1:gam喜爱反义词 e1took0.17056s,reward:-1.000000
episode1:game2took0.09306s,reward:-1.000000
episode1:game3took0.09556s,reward:-1.000000
episode1:game4took0.12520s,reward:1.000000
episode1:game5t怎么做咖喱 ook0.17348s,reward:-1.000000
episode1:game6took0.09415s,reward:-1.000000
这个例⼦让神经⽹络通过游戏画⾯来学习如何像⼈类⼀样打乒乓球。神经⽹络将于伪AI电脑对战不断地对战,最后学会战胜它。在经过
15000个序列的训练之后,神经⽹络就可以赢得20%的⽐赛。在20000个序列的训练之后,神经⽹络可以赢得35%的⽐赛,我们可以看
到计算机学的越来越快,这是因为它有更多的胜利的数据来进⾏训练。训练了30000个序列后,神经⽹络再也不会输了。
render=Fallolfps
resume=Fal
如果您想显⽰游戏过程,那就设置render为True。当您再次运⾏该代码,您可以设置resume为True,那么代码将加载现有的模型并
且会基于它继续训练。
下⾯来介绍安装和运⾏demo教程。
pipinstallgym
OpenAIGym是开发和⽐较强化学习算法的⼯具包。
强化学习关注的是做出好决策,⽽监督式学习和⾮监督式学习主要关注的是做出预测。
强化学习有两个基本概念:环境(即外部世界)和智能体(即你正在编写的算法)。智能体向环境发送⾏为,环境回复观察和奖励(即分
数)。
OpenAIGym由两部分组成:
开源库:⼀个测试问题集合—环境(environment),可以⽤于⾃⼰的强化学习算法开发,这些环境有共享的接⼝,允许⽤户设计通
⽤的算法
Gym服务:⼀个站点和API,允许⽤户对他们训练的算法进⾏性能⽐较。
运⾏⼀个简单例⼦,移动平台使⽊棒不掉落。
importgym
rsimportMonitor
env=(‘CartPole-v0’)
env=Monitor(env,directory=’D:其他技术⽂献强化_深度学习gymcartpole-experiment-1’,video_callable=Fal,
write_upon_ret=True)
fori_episodeinrange(20):
obrvation=()
fortinrange(100):
()
print(obrvation)
action=_()
obrvation,reward,done,info=(action)
ifdone:
print(“Episodefinishedafter{}timesteps”.format(t+1))
break
接下来开始进⼊tensorlayer乒乓球教程:
pythontutorial_atari_
报错:Nomodulenamed‘atari_py’
再次运⾏pythontutorial_atari_
成功!
本文发布于:2023-03-17 14:11:26,感谢您对本站的认可!
本文链接:https://www.wtabcd.cn/fanwen/zuowen/1679033488288899.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文word下载地址:乒乓球教程.doc
本文 PDF 下载地址:乒乓球教程.pdf
留言与评论(共有 0 条评论) |