Using Self-Organizing Distinctive State Abstraction to Navigate a Maze World

更新时间:2023-06-17 09:03:52 阅读：评论：0

Using Self-Organizing Distinctive State

Abstraction to Navigate a Maze World

Connie Li

Phil Katz

May14,2006

Abstract

This paper prents a developmental robotics experiment implement-ing lf-organizing distinctive state abstraction and growing neural gas.

The robot brain takes nsory input of eight sonars,a light nsor,and沈从文作者简介

a stall nsor and has motor outputs of translation and rotation.The

robot is placed in a maze world with a goal signiﬁed by a light source.We

test whether the robot can learn to traver the maze from increasingly

distant starting points,using reinforcement learning.Although the robot

does not learn to traver the maze,it ems likely that an adaptation of

our algorithm could complete this task.

1Introduction

One of the primary goals in theﬁeld of developmental robotics is to create a robot brain that stores its own reprentations of its nsorimotor states;to develop state abstraction.It is easy to provide a robotic brain with pre-deﬁned distinct states;however,this inrts anthropomorphic bias,as the manner in which a human experimenter divides nsorimotor space into states is most likely not the same manner in which the robot wouldﬁnd it most uful to be done.The challenge,of cour,is to tell a robot brain how to generate states without specifying which states to generate.Robots are becoming proﬁcient at dividing discrete nsorimotor spaces into discrete states;dividing a continuous nsorimotor state into discrete spaces proves to be even more of a challenge for robot brains.One way to develop discrete state abstraction of a continuous world is to u lf-organizing distinctive state abstraction(Provost et al.2006).

The idea of lf-organizing distinctive state abstraction(SODA)is that each navigational state is compod of high-level action abstractions that allow it to approximate one nsory state.The nsory states are generated through growing neural gas(Fritzke1995).Growing neural gas(GNG)was developed so that important topological features of a given environment could be distin-guished.This algorithm is ideal for developmental robotics becau it is free from anthropomorphic bias and requires few parameters in order to run.Other

dimensionality reduction techniques,such as competitive Hebbian learning,re-quire a predeﬁned network size and other parameters.The Growing Neural Gas algorithm allows the network to t the parameters dynamically to appropri-ate values.The implementation of Growing Neural Gas that we ud in this experiment was modiﬁed from code written by Dan Amato.

The SODA algorithm us the GNG units to help determine its high-level actions.A high-level SODA action is compod of two steps,trajectory follow-ing and hill climbing.For trajectory following,the algorithmﬁnds the clost GNG unit to the current nsory input,and us Q-learning to determine the best action.After trajectory following has led the robot to a new nsory GNG state,it executes hil

l climbing toﬁne-tune the movement executed by the trajectory following.The Q-learning algorithm us reinforcement learning to back-propagate reward from successful trials.Becau our world is continuous, we ud the GNG units and the low-level actions to form the Q-table.

2Our Environment

Our experiment placed a single simulated robot,named Katie,in a simulated world.The simulator that we ud,implementing a simulated Pioneer robot, was the award-winning Pyrobot software.Katie was equipped with eight sonar nsors,arranged across her front half,a stall nsor,and a light nsor.The light nsor was actually taken from the max of two light nsors,one on her front left and one on her front right.If the light value was greater than0.2, it was discretized to10,so that there was a clear contrast for Katie between eing the light and not eing the light.The ten values were combined to create her nsory state vector at each timestep.Pioneer robots(and therefore Katie)are equipped with two independently-controlled wheels that allow free movement through a continuous world.For the SODA algorithm,it is necessary to generate a list of low-level actions for the robot to u in navigation.We gave Katie a list of sixteen actions with the intention of allowing her complete freedom of movement;nine of the actions are generally forward-moving,four are turns, and three move Katie backwards.

Our world was designed as a relatively simple maze for Katie to navigate. There is a light source in the upper-left corner that rves as the goal.At the start of the experiment,Katie is placed at the point labelled’1’on the map, very clo to the goal.Every six trials,she is moved to a starting point farther and farther away from the goal.We implemented this incremental increa in diﬃculty so that Katie would be able to learn ctions of the maze at a time, and then build on that knowledge to learn larger and larger ctions.

3Our Algorithm

文字开头的成语

The GNG variant we ud works by generating a new GNG unit,comprid of a reprentative vector and a list of neighboring units,every t number of

不该丢失的勇气

Figure1:Katie’s World.The numbers signify the quence of starting locations that Katie us over the cour of the experiment.

timesteps.At every timestep,the GNG is given the current nsory state of the robot,andﬁnds the clost(in Euclidean distance)unit in the network.It then slightly adjusts that unit’s reprentative vector,ts the age of that unit to zero,and ages each neighbor of the current unit.After units reach a certain age,they are removed from the network.

Our implementation of the SODA algorithm ud the low-level actions de-scribed previously.The SODA algorithm goes through a simple loop:

1.Find the clost GNG unit(as measured by Euclidean distance of the

reprentative vector).

2.Using the Q-table,choo a low-level movement and continue to execute

that movement until the clost GNG unit is no longer the same.

•70percent of the time,choo the action with the highest Q-score.

wps背景图片The other30percent,choo a random action.

•The Q-scores are back-propagated through the Q-table when the goal

is reached using a temporal diﬀerencing algorithm(implemented in

争吵的英文

Figure2:A t A,of20GNG units.The were generated using the full (non-log)nsor values.The black micircle reprents the light nsor value.

Pyrobot by Robert Cay)

•The reward for the Q-score is bad on what we called a“cookie

score”.Katie dropped“cookies”as she travelled through the world.

The cookie score was calculated by summing the amount of time that

Katie spent in clo proximity to a cookie.Cookies had a small decay

factor so that they disappeared after a certain amount of time.The

reward at the end of a trial was bad on the inver of this cookie

score and on the maximum number of timesteps,so that the less time

she spent visiting places she had already been,the higher her reward.

•As mentioned,the Q-table ud GNG units as states.Each GNG

unit that contained a light value of approximately10was classiﬁed

as a goal state.

3.Using the same GNG unit that was ud for trajectory following,com-

mence hill climbing:

•Execute and then rever each low-level action to e if it would get

the current nsory vector clor to the reprentative vector of the

GNG unit you are hill climbing.

•Choo the action which will minimize the distance to the repren-

tative vector.If no action will decrea the distance,choo a new

GNG unit and commence trajectory following.

4Our Experiment

微观世界观后感

Our experiment was divided into two parts;in theﬁrst,Katie wandered ran-domly about the world,growing neural gas.In the cond part,she ran a ries of trials,during which she was rewarded for reaching the goal.

Figure3:A subt of t B of GNG units.The full t contains28GNG units. The were generated by taking the log of the nsor values.The black micircle reprents the light nsor value.The images are magniﬁed200percent.

Generating an appropriately sized t of GNG units required many trials and minor adjustments to the GNG algorithm.We did not want a t of GNG units with more than thirty units,becau it would slow down the algorithm (particularly the hill climbing step).We also did not want a t of GNG units with less than ten units,becau such a t could not encompass the variety of nsory states provided by our world.We generated two ts of GNG units that ﬁt the parameters;one using the direct nsor data and one taking the log of the nsor values,to emphasize the diﬀerences between small values.

雪字开头的成语For the cond half of our experiment,Katie was placed in the world and given a certain number of timesteps(between1000and3500,depending on the distance from the starting location to the goal)to attempt toﬁnd the goal.If she reached the goal(by reaching a GNG state with a light value of10),the

reward is back-propagated along her path.If she reaches the timestep limit and has not en the ligh

t,she is ret to the starting position with no reward.She is placed at each of the six start positions six times before moving on to the next one.

5Results

The results that we were hoping to e were that the cookie score and num-ber of timesteps needed toﬁnd the goal would go down during each pha of the experiment.Becau we normalized the cookie score to be divided by the maximum number of timesteps allowed,which incread with distance between the start and the goal,we also expected to e a general decrea in the cookie score through the whole experiment.We also hoped to e at worst a consistent percentage of“failed”trials;trials in which Katie did notﬁnd the goal.

When we ran the experiment with t A of GNG units,we did notﬁnd a consistent percentage of“failed”trials;in fact,from starting point#4onwards, Katie never found the goal.

StartP oint AverageSteps

1581

21246

19朵玫瑰花语31846

42500

53000

63500

本文发布于:2023-06-17 09:03:52，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/974375.html

上一篇：员工父母的感谢信感谢员工父母感谢信(8篇)

下一篇：最新教职工活动方案疫情(三篇)