Running Humanoid01 Demos

These are demonstrations of motion learning tasks.

Overview

The robot is a small size humanoid robot that has 17-DoF (degree of freedom).

humanoid-manoi01-6-full.png

Experiments are performed in simulation using a dynamics simulator ODE (Open Dynamics Engine). The dynamics simulation is calculated with a time step 0.2 [ms].

A crawling and a turning task are performed with this robot.

The objective of the crawling task is to move forward along the x-axis as far as possible. According to this objective, the reward function is designed as follows: moving reward that is proportional to forward velocity, small penalty for torque usage, and penalty for falling down. Each episode begins with the initial state where the robot is standing up and stationary, and ends if t>20[s] or the amount of reward is less than -40 (i.e. too small). In default, a 5-DoF (Degree of Freedom) constraint is used in the crawling task.

The objective of the turning task is to turn around the z-axis as fast as possible. So, the dominant element of the reward is the rotational velocity. The other setup is the same as that of the crawling task other than the initial state where the robot lies down.

Build the Demo Program

Execute:

 $ cd benchmarks/humanoid01
 $ make

See Documentation/Installation Guide for the detail.

Running Command

Please read Common Usage in advance. The running command is:

 $ ./DEMO_PRG -path PATH_LIST -agent AGENT_FILE -outdir OUT_DIR

The demo-specific elements are:

DEMO_DIR
humanoid01 (benchmarks/humanoid01).
DEMO_PRG
humanoid01.out
PATH_LIST
Crawling task: ../cmn,m,m/cr and turning task: ../cmn,m,m/tn
AGENT_FILE
Available agent scripts are listed below.
OUT_DIR
Any directory is possible.

For example, execute the following:

 $ mkdir -p result/rl1
 $ ./humanoid01.out -path ../cmn,m,m/cr -agent ql_dcob1 -outdir result/rl1

You can see that a window opens:

sim-window.jpg

and also see the output in the terminal like (debug lines are omitted):

 random seed = 1306390058
 simulation is initialized.
 start simulation..
 episode 0...
 simulation is initialized.
 
 Simulation test environment v0.02
   Ctrl-P : pause / unpause (or say `-pause' on command line).
   Ctrl-O : single step when paused.
   Ctrl-T : toggle textures (or say `-notex' on command line).
   Ctrl-S : toggle shadows (or say `-noshadow' on command line).
   Ctrl-V : print current viewpoint coordinates (x,y,z,h,p,r).
   Ctrl-W : write frames to ppm files: frame/frameNNN.ppm
   Ctrl-X : exit.
 
 Change the camera position by clicking + dragging in the window.
   Left button - pan and tilt.
   Right button - forward and sideways.
   Left + Right button (or middle button) - sideways and up.
 
 episode 1...
 simulation is initialized.
 episode 2...
 simulation is initialized.
 episode 3...
 simulation is initialized.
 episode 4...
 simulation is initialized.
 episode 5...
 simulation is initialized.
 episode 6...
 simulation is initialized.

To exit the program, press Ctrl+x on the window.

In OUT_DIR (result/rl1), the result files are stored. For example, use gnuplot to plot the learning curve as:

 $ gnuplot
 gnuplot> plot 'result/rl1/log-eps-ret.dat' w l
rl1-eps-ret.png

Agent Script

The following files can be specified as AGENT_FILE.

Crawling Task

ql_grid1
Q(lambda)-learning, Grid action space, linear action value function (NGnet).
ql_dcob1
Q(lambda)-learning, DCOB (action space), linear action value function (NGnet).
ql_dcob_q1
Q(lambda)-learning, joint angle version of DCOB, linear action value function (NGnet).
ql_gwf1
Wire-fitting (grid init) updated by Q(lambda)-learning.
ql_wfdcob1
WF-DCOB, Q(lambda)-learning.
fqi_grid1
Fitted Q iteration (updated in every 10 episode), Grid action space, linear action value function (NGnet).
qlfqi_grid1
Q(lambda)-learning + Fitted Q iteration (updated in every 10 episode), Grid action space, linear action value function (NGnet).
qlfqi_dcob1
Q(lambda)-learning + Fitted Q iteration (updated in every 10 episode), DCOB (action space), linear action value function (NGnet).
qlfqi_gwf1
Q(lambda)-learning + Fitted Q iteration (updated in every 10 episode) for Wire-fitting (grid init).
qlfqi_wfdcob1
Q(lambda)-learning + Fitted Q iteration (updated in every 10 episode) for WF-DCOB.

Testing:

lspi_grid1
LSPI (updated in every 5 episode), Grid action space, linear action value function (NGnet).

Turning Task

ql_dcob2
Q(lambda)-learning, DCOB, linear action value function (NGnet).
ql_dcob_q2
Q(lambda)-learning, joint angle version of DCOB, linear action value function (NGnet).

Miscellaneous

Start with Pause

Add the following option on the command line:

 -pause 1

Capture the Frames

Press Ctrl+w on the window.

Execute in Console Mode

Add the following option on the command line:

 -console true

Change the DoF (Degree of Freedom)

Add the agent script that defines a DoF configuration. For example:

 ./humanoid01.out -path ../cmn,m,m/cr -agent ql_dcob1,dof6 -outdir result/rl2

Provided DoF configurations are:

(default)
5-DoF. Some joints are coupled, which gives a bilateral symmetry.
dof4asym
4-DoF. Some joints are coupled, which gives a single DoF for each leg.
dof6
6-DoF. Some joints are coupled, which gives a bilateral symmetry.
dof7
7-DoF. Some joints are coupled, which gives a bilateral symmetry.

 

 



Attach file: filehumanoid-manoi01-6-full.png 1696 download [Information] filerl1-eps-ret.png 1804 download [Information] filesim-window.jpg 1772 download [Information]

Front page   Edit Freeze Diff Backup Upload Copy Rename Reload   New List of pages Search Recent changes   Help   RSS of recent changes
Last-modified: 2011-05-30 (Mon) 01:36:08 (4708d)