Backup source of Documentation/Running Demos/maze2d(No. 2)

List of Backups
View the diff.
View the diff current.
View the backup.
Go to Documentation/Running Demos/maze2d.
- 1 (2011-05-25 (Wed) 16:58:04)
- 2 (2011-05-25 (Wed) 17:52:29)
- 3 (2011-05-26 (Thu) 01:32:28)
- 4 (2011-05-26 (Thu) 07:09:21)
* Running Maze2D Demos [#uf2c18ce]

This is a simple demonstration of a maze task.

** Overview [#sde05b79]

The robot is an omniwheel mobile robot.  The robot can move in any direction on a 2-dimensional plane ([-1,1]x[-1,1]).

#ref(map-ce2-rlem080907-Nt-ex10.png,center,zoom,300x0)

This task is performed in simulation.
The state of the robot is its global position which is expressed as

>>x = (x1, x2) ,

and its control input is the state transition in a time step dt=0.01 which is expressed as

>>u = (Dx1, Dx2) .

In this environment, there is some '''wind''' that changes the behavior of the robot in the direction of the arrows as shown in the above figure.
There are also '''walls''' which the robot can not pass through.

The objective of the navigation task is to acquire a path from the start to the goal.
According to this objective, the reward function is designed as follows:
1 for goal, a small step cost, and a penalty for going out of the plane.

** Build the Demo Program [#p0208717]
Execute:

  $ cd benchmarks/maze2d
  $ make

See [[Documentation/Installation Guide]] for the detail.

** Running Command [#b27b6a5c]
Execute as follows:
  $ ./maze2d.out -path ../cmn,m -agent AGENT_FILE -outdir OUT_DIR
Here, AGENT_FILE is an agent script in which a reinforcement learning method and the other conditions are slected.
Available agent scripts are listed below.
OUT_DIR is a result directory into which the program store some data.
You need to create OUT_DIR before running; if non-existet directory is specified, no result is stored.

For example, execute the following:
  $ mkdir -p result/rl1
  $ ./maze2d.out -path ../cmn,m -agent ql_da1 -outdir result/rl1
You can see the output like:
  random seed = 1306343562
  episode 0...
  episode 1...
  episode 2...
  episode 3...
  episode 4...
  ...
  episode 997...
  episode 998...
  episode 999...
In OUT_DIR (result/rl1), following files are stored:
:cmdline| Command line of the execution.
:before.agent, after.agent| Whole agent scripts generated by the program (before the execution and after the execution, respectively).
:ext_sto| External storage directory (maybe not used in this case).
:included| A copy of every included agent file.
:log-eps-ret.dat| Log file of (episode number, return in the episode).
:log-action-res.dat| Log file of each action.

So, for example, use gnuplot to plot the learning curve as:
  $ gnuplot
  gnuplot> plot 'result/rl1/log-eps-ret.dat' w l

#ref(rl1-eps-ret.png,center,zoom,300x0)

** Agent Script [#o59362f0]
The following files can be specified as AGENT_FILE.

:ql_da1  | Q(&lambda;)-learning, linear action value function (NGnet).
:fqi_da1 | Fitted Q iteration (updated in every 10 episode), linear action value function (NGnet).
:lspi_da1| LSPI (updated in every 5 episode), linear action value function (NGnet).
:hrl_da1 | Cohen's hierarchical RL.
:ql_gwf1 | Wire-fitting updated by Q(&lambda;)-learning.
:qlfqi_da1  | Q(&lambda;)-learning + Fitted Q iteration (updated in every 10 episode), linear action value function (NGnet).
:qlfqi_gwf1 | Q(&lambda;)-learning + Fitted Q iteration (updated in every 10 episode) for Wire-fitting.
:dyna_da1   | Dyna (using McMahan-and-Gordon's prioritized sweeping), linear action value function (NGnet).
:ql_dcob_q1 | Q(&lambda;)-learning, DCOB (action space), linear action value function (NGnet).
:ql_wfdcob1 | Q(&lambda;)-learning, WF-DCOB.

:chacts | In this case, available action set changes with state (situation).


** Miscellaneous [#heb72746]

In order to specify the random seed, just append an agent file as follows:
  $ ./maze2d.out -path ../cmn,m -agent ql_da1,seed0 -outdir result/rl1
Here, seed0 is m/seed0.agent; in this file, the random seed is set to be zero.
By specifying the random seed, we can obtain the same result in every run.