Documentation/Running Demos/humanoid01

Running Humanoid01 Demos †

These are demonstrations of motion learning tasks.

Overview †

The robot is a small size humanoid robot that has 17-DoF (degree of freedom).

Experiments are performed in simulation using a dynamics simulator ODE (Open Dynamics Engine). The dynamics simulation is calculated with a time step 0.2 [ms].

A crawling and a turning task are performed with this robot.

The objective of the crawling task is to move forward along the x-axis as far as possible. According to this objective, the reward function is designed as follows: moving reward that is proportional to forward velocity, small penalty for torque usage, and penalty for falling down. Each episode begins with the initial state where the robot is standing up and stationary, and ends if t>20[s] or the amount of reward is less than -40 (i.e. too small). In default, a 5-DoF (Degree of Freedom) constraint is used in the crawling task.

The objective of the turning task is to turn around the z-axis as fast as possible. So, the dominant element of the reward is the rotational velocity. The other setup is the same as that of the crawling task other than the initial state where the robot lies down.

↑

Build the Demo Program †

Execute:

 $ cd benchmarks/humanoid01
 $ make

See Documentation/Installation Guide for the detail.

↑

Running Command †

Please read Common Usage in advance. The running command is:

 $ ./DEMO_PRG -path PATH_LIST -agent AGENT_FILE -outdir OUT_DIR

The demo-specific elements are:

DEMO_DIR: humanoid01 (benchmarks/humanoid01).
DEMO_PRG: humanoid01.out
PATH_LIST: Crawling task: ../cmn,m,m/cr and turning task: ../cmn,m,m/tn
AGENT_FILE: Available agent scripts are listed below.
OUT_DIR: Any directory is possible.

For example, execute the following:

 $ mkdir -p result/rl1
 $ ./humanoid01.out -path ../cmn,m,m/cr -agent ql_dcob1 -outdir result/rl1

You can see that a window opens:

and also see the output in the terminal like (debug lines are omitted):

 random seed = 1306390058
 simulation is initialized.
 start simulation..
 episode 0...
 simulation is initialized.
 
 Simulation test environment v0.02
   Ctrl-P : pause / unpause (or say `-pause' on command line).
   Ctrl-O : single step when paused.
   Ctrl-T : toggle textures (or say `-notex' on command line).
   Ctrl-S : toggle shadows (or say `-noshadow' on command line).
   Ctrl-V : print current viewpoint coordinates (x,y,z,h,p,r).
   Ctrl-W : write frames to ppm files: frame/frameNNN.ppm
   Ctrl-X : exit.
 
 Change the camera position by clicking + dragging in the window.
   Left button - pan and tilt.
   Right button - forward and sideways.
   Left + Right button (or middle button) - sideways and up.
 
 episode 1...
 simulation is initialized.
 episode 2...
 simulation is initialized.
 episode 3...
 simulation is initialized.
 episode 4...
 simulation is initialized.
 episode 5...
 simulation is initialized.
 episode 6...
 simulation is initialized.

To exit the program, press Ctrl+x on the window.

In OUT_DIR (result/rl1), the result files are stored. For example, use gnuplot to plot the learning curve as:

 $ gnuplot
 gnuplot> plot 'result/rl1/log-eps-ret.dat' w l

↑

Agent Script †

The following files can be specified as AGENT_FILE.

↑

Crawling Task †

ql_grid1: Q(lambda)-learning, Grid action space, linear action value function (NGnet).
ql_dcob1: Q(lambda)-learning, DCOB (action space), linear action value function (NGnet).
ql_dcob_q1: Q(lambda)-learning, joint angle version of DCOB, linear action value function (NGnet).
ql_gwf1: Wire-fitting (grid init) updated by Q(lambda)-learning.
ql_wfdcob1: WF-DCOB, Q(lambda)-learning.

fqi_grid1: Fitted Q iteration (updated in every 10 episode), Grid action space, linear action value function (NGnet).
qlfqi_grid1: Q(lambda)-learning + Fitted Q iteration (updated in every 10 episode), Grid action space, linear action value function (NGnet).
qlfqi_dcob1: Q(lambda)-learning + Fitted Q iteration (updated in every 10 episode), DCOB (action space), linear action value function (NGnet).
qlfqi_gwf1: Q(lambda)-learning + Fitted Q iteration (updated in every 10 episode) for Wire-fitting (grid init).
qlfqi_wfdcob1: Q(lambda)-learning + Fitted Q iteration (updated in every 10 episode) for WF-DCOB.

Testing:

lspi_grid1: LSPI (updated in every 5 episode), Grid action space, linear action value function (NGnet).

↑

Turning Task †

ql_dcob2: Q(lambda)-learning, DCOB, linear action value function (NGnet).
ql_dcob_q2: Q(lambda)-learning, joint angle version of DCOB, linear action value function (NGnet).

↑

Miscellaneous †

↑

Start with Pause †

Add the following option on the command line:

 -pause 1

↑

Capture the Frames †

Press Ctrl+w on the window.

↑

Execute in Console Mode †

Add the following option on the command line:

 -console true

↑

Change the DoF (Degree of Freedom) †

Add the agent script that defines a DoF configuration. For example:

 ./humanoid01.out -path ../cmn,m,m/cr -agent ql_dcob1,dof6 -outdir result/rl2

Provided DoF configurations are:

(default): 5-DoF. Some joints are coupled, which gives a bilateral symmetry.
dof4asym: 4-DoF. Some joints are coupled, which gives a single DoF for each leg.
dof6: 6-DoF. Some joints are coupled, which gives a bilateral symmetry.
dof7: 7-DoF. Some joints are coupled, which gives a bilateral symmetry.