maze2d
Start:
* Running Maze2D Demos [#uf2c18ce]
These are simple demonstrations of a maze task.
** Overview [#sde05b79]
The robot is an omniwheel mobile robot. The robot can mo...
#ref(map-ce2-rlem080907-Nt-ex10.png,center,zoom,300x0)
This task is performed in simulation.
The state of the robot is its global position which is ex...
>>x = (x1, x2) ,
and its control input is the state transition in a time s...
>>u = (Dx1, Dx2) .
In this environment, there is some '''wind''' that change...
There are also '''walls''' which the robot can not pass t...
The objective of the navigation task is to acquire a path...
According to this objective, the reward function is desig...
1 for goal, a small step cost, and a penalty for going ou...
Each episode begins with the start state, and ends if the...
** Running Command [#i98caea7]
Please read [[Common Usage>../Common Usage]] in advance.
The running command is:
$ ./DEMO_PRG -path PATH_LIST -agent AGENT_FILE -outdir ...
The demo-specific elements are:
:DEMO_DIR| maze2d (benchmarks/maze2d).
:DEMO_PRG| maze2d.out
:PATH_LIST| ../cmn,m
:AGENT_FILE| Available agent scripts are listed below.
:OUT_DIR| Any directory is possible.
For example, execute the following:
$ mkdir -p result/rl1
$ ./maze2d.out -path ../cmn,m -agent ql_da1 -outdir res...
You can see the output like (debug lines are omitted):
random seed = 1306343562
episode 0...
episode 1...
episode 2...
episode 3...
episode 4...
...
episode 997...
episode 998...
episode 999...
To exit the program, press Ctrl+c on the terminal.
In OUT_DIR (result/rl1), the result files are stored.
For example, use gnuplot to plot the learning curve as:
$ gnuplot
gnuplot> plot 'result/rl1/log-eps-ret.dat' w l
#ref(rl1-eps-ret.png,center,zoom,300x0)
** Agent Script [#o59362f0]
The following files can be specified as AGENT_FILE.
:ql_da1 | Q(λ)-learning, linear action value func...
:fqi_da1 | Fitted Q iteration (updated in every 10 episod...
:hrl_da1 | Cohen's hierarchical RL.
:ql_gwf1 | Wire-fitting updated by Q(λ)-learning.
:qlfqi_da1 | Q(λ)-learning + Fitted Q iteration (...
:qlfqi_gwf1 | Q(λ)-learning + Fitted Q iteration (...
:dyna_da1 | Dyna (using McMahan-and-Gordon's priori...
:ql_dcob_q1 | Q(λ)-learning, DCOB (action space), ...
:ql_wfdcob1 | Q(λ)-learning, WF-DCOB.
:chacts | In this case, available action set changes with...
''Testing:''
:lspi_da1| LSPI (updated in every 5 episode), linear acti...
Note: the LSPI module is under testing; it does not conve...
End:
* Running Maze2D Demos [#uf2c18ce]
These are simple demonstrations of a maze task.
** Overview [#sde05b79]
The robot is an omniwheel mobile robot. The robot can mo...
#ref(map-ce2-rlem080907-Nt-ex10.png,center,zoom,300x0)
This task is performed in simulation.
The state of the robot is its global position which is ex...
>>x = (x1, x2) ,
and its control input is the state transition in a time s...
>>u = (Dx1, Dx2) .
In this environment, there is some '''wind''' that change...
There are also '''walls''' which the robot can not pass t...
The objective of the navigation task is to acquire a path...
According to this objective, the reward function is desig...
1 for goal, a small step cost, and a penalty for going ou...
Each episode begins with the start state, and ends if the...
** Running Command [#i98caea7]
Please read [[Common Usage>../Common Usage]] in advance.
The running command is:
$ ./DEMO_PRG -path PATH_LIST -agent AGENT_FILE -outdir ...
The demo-specific elements are:
:DEMO_DIR| maze2d (benchmarks/maze2d).
:DEMO_PRG| maze2d.out
:PATH_LIST| ../cmn,m
:AGENT_FILE| Available agent scripts are listed below.
:OUT_DIR| Any directory is possible.
For example, execute the following:
$ mkdir -p result/rl1
$ ./maze2d.out -path ../cmn,m -agent ql_da1 -outdir res...
You can see the output like (debug lines are omitted):
random seed = 1306343562
episode 0...
episode 1...
episode 2...
episode 3...
episode 4...
...
episode 997...
episode 998...
episode 999...
To exit the program, press Ctrl+c on the terminal.
In OUT_DIR (result/rl1), the result files are stored.
For example, use gnuplot to plot the learning curve as:
$ gnuplot
gnuplot> plot 'result/rl1/log-eps-ret.dat' w l
#ref(rl1-eps-ret.png,center,zoom,300x0)
** Agent Script [#o59362f0]
The following files can be specified as AGENT_FILE.
:ql_da1 | Q(λ)-learning, linear action value func...
:fqi_da1 | Fitted Q iteration (updated in every 10 episod...
:hrl_da1 | Cohen's hierarchical RL.
:ql_gwf1 | Wire-fitting updated by Q(λ)-learning.
:qlfqi_da1 | Q(λ)-learning + Fitted Q iteration (...
:qlfqi_gwf1 | Q(λ)-learning + Fitted Q iteration (...
:dyna_da1 | Dyna (using McMahan-and-Gordon's priori...
:ql_dcob_q1 | Q(λ)-learning, DCOB (action space), ...
:ql_wfdcob1 | Q(λ)-learning, WF-DCOB.
:chacts | In this case, available action set changes with...
''Testing:''
:lspi_da1| LSPI (updated in every 5 episode), linear acti...
Note: the LSPI module is under testing; it does not conve...
Page: