maze2d
SkyAI
/
Documentation
/
Running Demos
/ maze2d
-- Use page as template --
Demonstrations
Developers
Developers/akihikoy
Documentation
Documentation/Architecture Overview
Documentation/Installation Guide
Documentation/Installation Guide/Debian and Ubuntu
Documentation/Installation Guide/Mac
Documentation/Introduction
Documentation/Keywords
Documentation/Modular Architecture
Documentation/Running Demos
Documentation/Running Demos/Common Usage
Documentation/Running Demos/bioloid
Documentation/Running Demos/humanoid01
Documentation/Running Demos/maze2d
Documentation/Script Language
Documentation/Tutorial - Example - Maze
Documentation/Tutorial - Example - Mountain Car
Documentation/Tutorial - Making Executable
Documentation/Tutorial - Making Module
Documentation/Tutorial - Making Original Domain
Documentation/Tutorial - Making RL Module
Documentation/Tutorial - Writing Agent Script
FormattingRules
Help
InterWikiName
License
MenuBar
Recent Changes
Recent Changes/0.2.0
RecentDeleted
SandBox
SkyAI
* Running Maze2D Demos [#uf2c18ce] These are simple demonstrations of a maze task. ** Overview [#sde05b79] The robot is an omniwheel mobile robot. The robot can move in any direction on a 2-dimensional plane ([-1,1]x[-1,1]). #ref(map-ce2-rlem080907-Nt-ex10.png,center,zoom,300x0) This task is performed in simulation. The state of the robot is its global position which is expressed as >>x = (x1, x2) , and its control input is the state transition in a time step dt=0.01 which is expressed as >>u = (Dx1, Dx2) . In this environment, there is some '''wind''' that changes the behavior of the robot in the direction of the arrows as shown in the above figure. There are also '''walls''' which the robot can not pass through. The objective of the navigation task is to acquire a path from the start to the goal. According to this objective, the reward function is designed as follows: 1 for goal, a small step cost, and a penalty for going out of the plane. Each episode begins with the start state, and ends if the robot has reached the goal, gone outside, or t>12[s]. ** Running Command [#i98caea7] Please read [[Common Usage>../Common Usage]] in advance. The running command is: $ ./DEMO_PRG -path PATH_LIST -agent AGENT_FILE -outdir OUT_DIR The demo-specific elements are: :DEMO_DIR| maze2d (benchmarks/maze2d). :DEMO_PRG| maze2d.out :PATH_LIST| ../cmn,m :AGENT_FILE| Available agent scripts are listed below. :OUT_DIR| Any directory is possible. For example, execute the following: $ mkdir -p result/rl1 $ ./maze2d.out -path ../cmn,m -agent ql_da1 -outdir result/rl1 You can see the output like (debug lines are omitted): random seed = 1306343562 episode 0... episode 1... episode 2... episode 3... episode 4... ... episode 997... episode 998... episode 999... To exit the program, press Ctrl+c on the terminal. In OUT_DIR (result/rl1), the result files are stored. For example, use gnuplot to plot the learning curve as: $ gnuplot gnuplot> plot 'result/rl1/log-eps-ret.dat' w l #ref(rl1-eps-ret.png,center,zoom,300x0) ** Agent Script [#o59362f0] The following files can be specified as AGENT_FILE. :ql_da1 | Q(λ)-learning, linear action value function (NGnet). :fqi_da1 | Fitted Q iteration (updated in every 10 episode), linear action value function (NGnet). :hrl_da1 | Cohen's hierarchical RL. :ql_gwf1 | Wire-fitting updated by Q(λ)-learning. :qlfqi_da1 | Q(λ)-learning + Fitted Q iteration (updated in every 10 episode), linear action value function (NGnet). :qlfqi_gwf1 | Q(λ)-learning + Fitted Q iteration (updated in every 10 episode) for Wire-fitting. :dyna_da1 | Dyna (using McMahan-and-Gordon's prioritized sweeping), linear action value function (NGnet). :ql_dcob_q1 | Q(λ)-learning, DCOB (action space), linear action value function (NGnet). :ql_wfdcob1 | Q(λ)-learning, WF-DCOB. :chacts | In this case, available action set changes with state (situation). ''Testing:'' :lspi_da1| LSPI (updated in every 5 episode), linear action value function (NGnet). Note: the LSPI module is under testing; it does not converge in this task.
Do not change timestamp
* Running Maze2D Demos [#uf2c18ce] These are simple demonstrations of a maze task. ** Overview [#sde05b79] The robot is an omniwheel mobile robot. The robot can move in any direction on a 2-dimensional plane ([-1,1]x[-1,1]). #ref(map-ce2-rlem080907-Nt-ex10.png,center,zoom,300x0) This task is performed in simulation. The state of the robot is its global position which is expressed as >>x = (x1, x2) , and its control input is the state transition in a time step dt=0.01 which is expressed as >>u = (Dx1, Dx2) . In this environment, there is some '''wind''' that changes the behavior of the robot in the direction of the arrows as shown in the above figure. There are also '''walls''' which the robot can not pass through. The objective of the navigation task is to acquire a path from the start to the goal. According to this objective, the reward function is designed as follows: 1 for goal, a small step cost, and a penalty for going out of the plane. Each episode begins with the start state, and ends if the robot has reached the goal, gone outside, or t>12[s]. ** Running Command [#i98caea7] Please read [[Common Usage>../Common Usage]] in advance. The running command is: $ ./DEMO_PRG -path PATH_LIST -agent AGENT_FILE -outdir OUT_DIR The demo-specific elements are: :DEMO_DIR| maze2d (benchmarks/maze2d). :DEMO_PRG| maze2d.out :PATH_LIST| ../cmn,m :AGENT_FILE| Available agent scripts are listed below. :OUT_DIR| Any directory is possible. For example, execute the following: $ mkdir -p result/rl1 $ ./maze2d.out -path ../cmn,m -agent ql_da1 -outdir result/rl1 You can see the output like (debug lines are omitted): random seed = 1306343562 episode 0... episode 1... episode 2... episode 3... episode 4... ... episode 997... episode 998... episode 999... To exit the program, press Ctrl+c on the terminal. In OUT_DIR (result/rl1), the result files are stored. For example, use gnuplot to plot the learning curve as: $ gnuplot gnuplot> plot 'result/rl1/log-eps-ret.dat' w l #ref(rl1-eps-ret.png,center,zoom,300x0) ** Agent Script [#o59362f0] The following files can be specified as AGENT_FILE. :ql_da1 | Q(λ)-learning, linear action value function (NGnet). :fqi_da1 | Fitted Q iteration (updated in every 10 episode), linear action value function (NGnet). :hrl_da1 | Cohen's hierarchical RL. :ql_gwf1 | Wire-fitting updated by Q(λ)-learning. :qlfqi_da1 | Q(λ)-learning + Fitted Q iteration (updated in every 10 episode), linear action value function (NGnet). :qlfqi_gwf1 | Q(λ)-learning + Fitted Q iteration (updated in every 10 episode) for Wire-fitting. :dyna_da1 | Dyna (using McMahan-and-Gordon's prioritized sweeping), linear action value function (NGnet). :ql_dcob_q1 | Q(λ)-learning, DCOB (action space), linear action value function (NGnet). :ql_wfdcob1 | Q(λ)-learning, WF-DCOB. :chacts | In this case, available action set changes with state (situation). ''Testing:'' :lspi_da1| LSPI (updated in every 5 episode), linear action value function (NGnet). Note: the LSPI module is under testing; it does not converge in this task.
View Text Formatting Rules