首页 > 英文翻译

Adversarial reinforcement learning

更新时间:2023-06-29 13:20:09 阅读：评论：0

Adversarial Reinforcement Learningat the moment

William Uther Manuela Veloso

January2003

CMU-CS-03-107

School of Computer Science

bellyCarnegie Mellon Universitykebi

Pittsburgh,PA15213

This manualscript was originally submitted for publication in April1997.Corrections were never completed,and so the paper was not published.However,a copy was placed on the web and a number of people referenced the work from there.It is now being published as a technical report for ea of reference.With the exception of the addition of this title page,the work is unmodiﬁed from the1997original.

Keywords:Reinforcement Learning,Markov Games,Adversarial Reinforcement Learn-ing

jiugan

Abstract

Reinforcement Learning has been ud for a number of years in single agent environments. This article reports on our investigation of Reinforcement Learning techniques in a multi-agent and adversarial environment with continuous obrvable state information.We intro-duce a new framework,two-player hexagonal grid soccer,in which to evaluate algorithms. We then compare the performance of veral single-agent Reinforcement Learning techniques in that environment.The are further compared to a previously developed adversarial Re-inforcement Learning algorithm designed for Markov games.Building upon the eﬀorts, we introduce new algorithms to handle the multi-agent,the adversarial,and the continuous-valued aspects of the domain.We introduce a technique for modelling the opponent in an adversarial game.We introduce an extension to Prioritized Sweeping that allows gener-alization of learnt knowledge over neighboring states in the domain;and we introduce an extension to the U Tree generalizing algorithm that allows the handling of continuous state spaces.Extensive empirical evaluation is conducted in the grid soccer domain.

yishi

This page intentionally left blank.

Adversarial Reinforcement Learning

William Uther and Manuela Veloso

Computer Science Department

Carnegie Mellon University

Pittsburgh,PA15213

uther,u.edu

April24,1997

Abstract

marie rneholt

Reinforcement Learning has been ud for a number of years in single agent environ-ments.This article reports on our investigation of Reinforcement Learning techniques in a multi-agent and adversarial environment with continuous obrvable state information.We in-troduce a new framework,two-player hexagonal grid soccer,in which to evaluate algorithms. We then compare the performance of veral single-agent Reinforcement Learning techniques in that environment.The are further compared to a previously developed adversarial Rein-forcement Learning algorithm desig

ned for Markov games.Building upon the efforts,we introduce new algorithms to handle the multi-agent,the adversarial,and the continuous-valued aspects of the domain.We introduce a technique for modelling the opponent in an adversarial game.We introduce an extension to Prioritized Sweeping that allows generalization of learnt knowledge over neighboring states in the domain;and we introduce an extension to the U Tree generalizing algorithm that allows the handling of continuous state spaces.Extensive empirical evaluation is conducted in the grid soccer domain.

1Introduction

Multi-agent adversarial environments have traditionally been addresd as game playing situ-ations.Indeed,one of theﬁrst areas to be studied in Artiﬁcial Intelligence was game playing. For example,the pioneering checkers playing algorithm by[Samuel,1959]ud both arch and machine learning strategies.Interestingly,his approach is similar to modern Reinforce-ment Learning techniques[Kaelbling et al.,1996].An evaluation function that guides the -lection of moves is reprented as a parameterized weighted sum of game features.Parameters are incrementally reﬁned as a function of the game playing performance.This is a similar method to classical Reinforcement Learning which also provides for incremental update of an evaluation function,although in this ca it is reprented as a table of values.

Since Samuel’s work however,Reinforcement Learning techniques were not ud again in an adversarial tting until quite recently.[Tesauro,1995,Thrun,1995]have both ud neural nets in a Reinforcement Learning paradigm.[Tesauro,1995]’s work in the game of checkers

was successful,but required hand tuned features being fed to the algorithm for high quality

play.[Thrun,1995]was moderately successful in using similar techniques in chess,but the

techniques were not as successful as they had been in the checkers domain.This work has

been repeated in other domains,but again,without the same success as in the checkers domain

(in[Kaelbling et al.,1996]).

[Littman,1994]took standard Q Learning,[Watkins and Dayan,1992],and modiﬁed it to

sunrowork with Markov games.He replaced the simple update ud in standard Q Learning

with a mixed strategy(probabilistic)update.He then evaluated this by playing against both standard Q Learning and random players in a simple game.The game ud in

[Littman,1994]is a small two player grid soccer game designed to be able to be solved quickly

by traditional Q Learning techniques.He trained4different players for his game.Two players

ud his algorithm,two ud normal Q Learning.One of each was trained against a random

服饰搭配培训opponent,the other against an opponent of the same type.Littman then froze tho four players

and trained‘challengers’against them.His results showed that his algorithms,which learned a

probabilistic strategy,performed better under the conditions than Q Learning,which learned

a deterministic strategy,or his hand coded,but again deterministic,strategy.

We u a similar environment to that ud by[Littman,1994]to investigate Markov games.

Our environment is larger,both in number of states and number of actions per state,to more

英语effectively test the generalization capabilities of our algorithms.We conduct tests where both

players are learning as they play.This allows learning to take the place of a mixed,or prob-

abilistic,strategy.We look at a number of standard Reinforcement Learning algorithms and

compare them in a simple game.None of the algorithms we test perform any internal arch or

lookahead when deciding actions;they all u just the current state and their learnt evaluation

for that state.While arch would improve performance,we considered it orthogonal,and a

future step,to learning the evaluation function.

In the Reinforcement Learning paradigm an agent is placed in a situation without knowl-

edge of any goals or other information about the environment.As the agent acts in the environ-

ment it is given feedback:a reinforcement value or reward that deﬁnes the utility of being in

the current state.Over time the agent is suppod to customize its actions to the environment

so as to maximize the sum of this reward.By only giving the agent reward when a goal is

reached,the agent learns to achieve its goals.

In an adversarial tting there are multiple(at least two)agents in the world.In particular,

in a game with two players,when an agent wins a game it is given a positive reinforcement

and its opponent is given negative reinforcement.Maximizing reward corresponds directly to

winning games.Over time the agent is learning to act so that it wins the game.

In this paper we investigate the performance of some previously published algorithms in an

chances

adversarial environment;Q Learning,Minimax Q Learning,and Prioritized Sweeping.We also

introduce a new algorithm,Opponent Modelling Q Learning,to try and improve upon the

algorithms.All of the techniques rely on a table of values and actions and do not generalize

between similar or equivalent states.The learned tables are“state-speciﬁc.”We introduce

Fitted Prioritized Sweeping and a modiﬁcation of the U Tree algorithm[McCallum,1995],

Continuous U Tree,as examples of algorithms that generalize over multiple states.Finally,we

look at what can be learned by looking at the world from your opponent’s point of view.

本文发布于:2023-06-29 13:20:09，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/161588.html

上一篇：Roland Barthes

下一篇：Positive psychology and character strengths. application to strengths-bad school counling

标签：服饰搭配培训

留言与评论（共有 0 条评论）