Backup of Documentation/Tutorial - Example - Maze(No. 3)

List of Backups
View the diff.
View the diff current.
View the source.
Go to Documentation/Tutorial - Example - Maze.
- 1 (2012-07-18 (Wed) 02:07:07)
- 2 (2012-07-18 (Wed) 03:19:12)
- 3 (2012-07-18 (Wed) 11:11:38)
- 4 (2012-07-20 (Fri) 08:37:47)
- 5 (2012-07-25 (Wed) 08:06:08)
- 6 (2012-07-26 (Thu) 05:26:14)
- 7 (2012-07-26 (Thu) 05:54:56)

Table of Contents

Overview
Maze Task Module
Random Action Module
Main Function
Compile
Agent Script for Random Action Test
Agent Script for Q(lambda)-learning

Overview †

Here, we introduce how to implement a simple maze task with SkyAI. The maze task has a discrete state and a discrete action, which will be implemented as a module of SkyAI. As an reinforcement learning algorithm, Peng's Q(lambda)-learning is applied to the maze task; of course, we use predefined modules.

The following is the procedure:

Implement a maze task module.
Implement a random action module for testing the task module.
Implement a main function.
Compile.
Write an agent script for the random action test.
Write an agent script to apply Q(lambda)-learning.

The sample code works on a console; no extra libraries are required. Let's start!

↑

Maze Task Module †

Please refer to ../Tutorial - Making Module.

Make a C++ source file using a template materials/templates/apps/main_tmpl.cpp contained in the SkyAI directory.
- You can modify the file information (file name, brief, author, date, copyright, license info, etc.)
- Replace every NAME_SPACE by loco_rabbits.
- Write the following code inside the namespace loco_rabbits.

Make a configure class using the template TXxConfigurations written in ../Tutorial - Making Module.

Replace every TXxConfigurations by TMazeTaskConfigurations.

Remove the TestC parameter and add the following parameters:

int   NumEpisodes;     // number of episodes
int   MaxSteps;        // number of max action steps per episode
int   StartX, StartY;  // start position
double GoalReward;     // goal reward
double StepCost;       // cost for each action step
int   SleepUTime;      // duration for display
std::vector<std::vector<int> >   Map;  // Map[y][x], 0:free space, 1:wall, 2:goal, every element should have the same size

Initialize them at the constructor as:

TMazeTaskConfigurations (var_space::TVariableMap &mmap) :
    NumEpisodes   (1000),
    MaxSteps      (1000),
    StartX        (1),
    StartY        (1),
    GoalReward    (1.0),
    StepCost      (-0.01),
    SleepUTime    (1000)
  {
    Register(mmap);
  }

In the member function Register, insert them:

ADD( NumEpisodes );
ADD( StartX );
ADD( StartY );
ADD( GoalReward );
ADD( StepCost );
ADD( SleepUTime );
ADD( Map );

Add lora/variable_space_impl.h in the include list.

#include <lora/variable_space_impl.h>  // to store std::vector<TIntVector>

You can add your own parameters such as a noise.

Make the base of the module using the template MXxModule written in ../Tutorial - Making Module.

Simple template is OK.
Replace every MXxModule by MMazeEnvModule.
Replace every MParentModule by TModuleInterface.
Replace TXxConfigurations by TMazeTaskConfigurations.

Remove the definition of mem_ (TXxMemory mem_;).

//===========================================================================================
//!\brief Maze task (environment+task) module
class MMazeTaskModule
    : public TModuleInterface
//===========================================================================================
{
public:
  typedef TModuleInterface TParent;
  typedef MMazeTaskModule  TThis;
  SKYAI_MODULE_NAMES(MMazeTaskModule)

  MMazeTaskModule (const std::string &v_instance_name)
    : TParent        (v_instance_name),
      conf_          (TParent::param_box_config_map())
    {
    }

protected:

  TMazeTaskConfigurations  conf_;

};  // end of MMazeTaskModule
//-------------------------------------------------------------------------------------------

Add following ports into MMazeEnvModule.

(port type), (port name), (return type), (parameter list), (purpose)
slot, slot_start, void, (void), called at the beginning of the execution.
slot, slot_execute_action, void, (const TInt &a), called by an RL agent module to execute action.
signal, signal_initialization, void (void), emit when the module is initialized.
signal, signal_start_of_episode, void (void), emit when each episode starts.
signal, signal_finish_episode, void (void), emit when the end-of-episode condition is satisfied.
signal, signal_end_of_episode, void (void), emit when each episode is terminated.
signal, signal_start_of_step, void (void), emit at the start of each step.
signal, signal_end_of_step, void (void), emit at the end of each step.
signal, signal_reward, void (const TSingleReward &), emit when a reward is given.
out, out_state_set_size, const TInt&, (void), output the number of elements in the state set.
out, out_action_set_size, const TInt&, (void), output the number of elements in the action set.
out, out_state, const TInt&, (void), output the current state (x,y are serialized).
out, out_time, const TReal&, (void), output the current time.
Note: some signal ports will not be used, but, defined for later use.
In order to add the ports, follow the steps:

Add declarations:

  MAKE_SLOT_PORT(slot_start, void, (void), (), TThis);
  MAKE_SLOT_PORT(slot_execute_action, void, (const TInt &a), (a), TThis);

  MAKE_SIGNAL_PORT(signal_initialization, void (void), TThis);
  MAKE_SIGNAL_PORT(signal_start_of_episode, void (void), TThis);
  MAKE_SIGNAL_PORT(signal_finish_episode, void (void), TThis);
  MAKE_SIGNAL_PORT(signal_end_of_episode, void (void), TThis);
  MAKE_SIGNAL_PORT(signal_start_of_step, void (void), TThis);
  MAKE_SIGNAL_PORT(signal_end_of_step, void (void), TThis);
  MAKE_SIGNAL_PORT(signal_reward, void (const TSingleReward &), TThis);

  MAKE_OUT_PORT(out_state_set_size, const TInt&, (void), (), TThis);
  MAKE_OUT_PORT(out_action_set_size, const TInt&, (void), (), TThis);
  MAKE_OUT_PORT(out_state, const TInt&, (void), (), TThis);
  MAKE_OUT_PORT(out_time, const TReal&, (void), (), TThis);

Add initializers at the constructor:

MMazeTaskModule (const std::string &v_instance_name)
  : ...
    slot_start              (*this),
    slot_execute_action     (*this),
    signal_initialization   (*this),
    signal_start_of_episode (*this),
    signal_finish_episode   (*this),
    signal_end_of_episode   (*this),
    signal_start_of_step    (*this),
    signal_end_of_step      (*this),
    signal_reward           (*this),
    out_state_set_size      (*this),
    out_action_set_size     (*this),
    out_state               (*this),
    out_time                (*this)

Add register functions at the constructor:

add_slot_port   (slot_start              );
add_slot_port   (slot_execute_action     );
add_signal_port (signal_initialization   );
add_signal_port (signal_start_of_episode );
add_signal_port (signal_finish_episode   );
add_signal_port (signal_end_of_episode   );
add_signal_port (signal_start_of_step    );
add_signal_port (signal_end_of_step      );
add_signal_port (signal_reward           );
add_out_port    (out_state_set_size      );
add_out_port    (out_action_set_size     );
add_out_port    (out_state               );
add_out_port    (out_time                );

Next, we implement the slot port callbacks and the output functions. This procedure is slightly complicated; follow one by one.

Add member variables at the protected section.

mutable int state_set_size_;
const int action_set_size_;
int  current_action_;
int  pos_x_, pos_y_;

mutable int tmp_state_;
TReal current_time_;
TInt  num_episode_;

Add their initializers:

state_set_size_  (0),
action_set_size_ (4),
current_action_  (0),

Implement slot_start_exec. This is a long code, so, write the declaration at the protected section:

virtual void slot_start_exec (void);

Then, define it outside the class:

/*virtual*/void MMazeTaskModule::slot_start_exec (void)
{
  init_environment();
  signal_initialization.ExecAll();

  for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++num_episode_)
  {
    init_environment();

    signal_start_of_episode.ExecAll();

    bool running(true);
    while(running)
    {
      signal_start_of_step.ExecAll();

      running= step_environment();
      show_environment();
      usleep(conf_.SleepUTime);

      if(current_time_>=conf_.MaxSteps)
      {
        signal_finish_episode.ExecAll();
        running= false;
      }
      signal_end_of_step.ExecAll();
    }

    signal_end_of_episode.ExecAll();
  }
}

where we used the three member functions. These are declared at the protected section:

void init_environment (void);
bool step_environment (void);
void show_environment (void);

and, defined outside the class:

void MMazeTaskModule::init_environment (void)
{
  pos_x_= conf_.StartX;
  pos_y_= conf_.StartY;
  current_time_= 0.0l;
}

bool MMazeTaskModule::step_environment (void)
{
  int next_x(pos_x_), next_y(pos_y_);
  switch(current_action_)
  {
  case 0: ++next_x; break;  // right
  case 1: --next_y; break;  // up
  case 2: --next_x; break;  // left
  case 3: ++next_y; break;  // down
  default: LERROR("invalid action:"<<current_action_);
  }

  ++current_time_;
  signal_reward.ExecAll(conf_.StepCost);

  switch(conf_.Map[next_y][next_x])
  {
  case 0:  // free space
    pos_x_=next_x;
    pos_y_=next_y;
    break;
  case 1:  // wall
    break;
  case 2:  // goal
    pos_x_=next_x;
    pos_y_=next_y;
    signal_reward.ExecAll(conf_.GoalReward);
    signal_finish_episode.ExecAll();
    return false;
  default: LERROR("invalid map element: "<<conf_.Map[next_y][next_x]);
  }
  return true;
}

void MMazeTaskModule::show_environment (void)
{
  int x(0),y(0);
  std::cout<<"("<<pos_x_<<","<<pos_y_<<")  "<<current_time_<<"/"<<num_episode_<<std::endl;
  for(std::vector<std::vector<int> >::const_iterator yitr(conf_.Map.begin()),ylast(conf_.Map.end());yitr!=ylast;++yitr,++y)
  {
    x=0;
    for(std::vector<int>::const_iterator xitr(yitr->begin()),xlast(yitr->end());xitr!=xlast;++xitr,++x)
    {
      std::cout<<" ";
      if(x==pos_x_ && y==pos_y_)
        std::cout<<"R";
      else if(x==conf_.StartX && y==conf_.StartY)
        std::cout<<"S";
      else
        switch(*xitr)
        {
        case 0:  std::cout<<" "; break;
        case 1:  std::cout<<"#"; break;
        case 2:  std::cout<<"G"; break;
        default: std::cout<<"?"; break;
        }
    }
    std::cout<<" "<<std::endl;
  }
  std::cout<<std::endl;
}

Implement the other slot port callbacks and output functions. These are short code, so, you can write inside the class at the protected section.

virtual void slot_execute_action_exec (const TInt &a)
  {
    current_action_= a;
  }

virtual const TInt& out_state_set_size_get (void) const
  {
    state_set_size_= conf_.Map[0].size() * conf_.Map.size();
    return state_set_size_;
  }

virtual const TInt& out_action_set_size_get (void) const
  {
    return action_set_size_;
  }

virtual const TInt& out_state_get (void) const
  {
    return tmp_state_=serialize(pos_x_,pos_y_);
  }

virtual const TReal& out_time_get (void) const
  {
    return current_time_;
  }

where serialize is a protected member function defined as follows:

int serialize (int x, int y) const
  {
    return y * conf_.Map[0].size() + x;
  }

Finally, use SKYAI_ADD_MODULE macro to register the module on SkyAI:
```
SKYAI_ADD_MODULE(MMazeTaskModule)
```

That's it.

↑

Random Action Module †

Next, in order to test the MMazeTaskModule module, we make a module named MRandomActionModule that emits a random action at each step. MRandomActionModule has two ports:

(port type), (port name), (return type), (parameter list), (purpose)
slot, slot_step, void, (void), called at each step where a random action is emitted through the signal_action port.
signal, signal_action, void (const TInt &), emit at each step.

Thus, its implementation is very simple:

//===========================================================================================
//!\brief Random action module
class MRandomActionModule
    : public TModuleInterface
//===========================================================================================
{
public:
  typedef TModuleInterface     TParent;
  typedef MRandomActionModule  TThis;
  SKYAI_MODULE_NAMES(MRandomActionModule)

  MRandomActionModule (const std::string &v_instance_name)
    : TParent        (v_instance_name),
      slot_step      (*this),
      signal_action  (*this)
    {
      add_slot_port   (slot_step    );
      add_signal_port (signal_action);
    }

protected:
  MAKE_SLOT_PORT(slot_step, void, (void), (), TThis);
  MAKE_SIGNAL_PORT(signal_action, void (const TInt &), TThis);

  virtual void slot_step_exec (void)
    {
      signal_action.ExecAll(rand() % 4);
    }
};  // end of MRandomActionModule
//-------------------------------------------------------------------------------------------

Then, use SKYAI_ADD_MODULE macro to register the module on SkyAI:

SKYAI_ADD_MODULE(MRandomActionModule)

↑

Main Function †

Refer to ../Tutorial - Making Executable.

Our main function is as follows:

using namespace std;
using namespace loco_rabbits;
int main(int argc, char**argv)
{
  TOptionParser option(argc,argv);

  TAgent  agent;
  if (!ParseCmdLineOption (agent, option))  return 0;

  MMazeTaskModule *p_maze_task = dynamic_cast<MMazeTaskModule*>(agent.SearchModule("maze_task"));
  if(p_maze_task==NULL)  {LERROR("module `maze_task' is not defined as an instance of MMazeTaskModule"); return 1;}

  agent.SaveToFile (agent.GetDataFileName("before.agent"),"before-");

  p_maze_task->Start();

  agent.SaveToFile (agent.GetDataFileName("after.agent"),"after-");

  return 0;
}

This main function consists of the following parts:

Create an instance of the TAgent class.
Parse the command line option and load an agent script.
Get a module named maze_task which is an instance of MMazeTaskModule.
Save the agent status into a file named before.agent.
Execute the maze_task's Start function.
Save the agent status into a file named after.agent.

↑

Compile †

First, write a makefile as follows:

BASE_REL_DIR:=../..
include $(BASE_REL_DIR)/Makefile_preconf
EXEC := maze.out
OBJS := maze.o
USING_SKYAI_ODE:=true
MAKING_SKYAI:=true
include $(BASE_REL_DIR)/Makefile_body

BASE_REL_DIR : relative path to the base directory of the SkyAI.

Then, execute the make command:

make

An executable named maze.out is generated?

↑

Agent Script for Random Action Test †

Now, let's test MMazeTaskModule using MRandomActionModule.

Create a blank file named random_act.agent and open it.
Instantiate each module; the MMazeTaskModule's instance should have the name maze_task:
```
module MMazeTaskModule     maze_task
module MRandomActionModule rand_action
```
Connect the following port pairs:
- maze_task.signal_start_of_step --> rand_action.slot_step
- rand_action.signal_action --> maze_task.slot_execute_action
```
connect maze_task.signal_start_of_step ,  rand_action.slot_step
connect rand_action.signal_action ,  maze_task.slot_execute_action
```

Assign the maze information to the configuration parameters of maze_task:

maze_task.config={
    Map={
        []= (1,1,1,1,1,1,1,1,1,1)
        []= (1,0,0,0,1,0,0,0,2,1)
        []= (1,0,1,0,1,0,0,0,0,1)
        []= (1,0,1,0,1,1,0,0,0,1)
        []= (1,0,1,0,0,1,0,1,1,1)
        []= (1,0,0,0,0,1,0,0,0,1)
        []= (1,0,0,0,0,0,0,0,0,1)
        []= (1,1,1,1,1,1,1,1,1,1)
      }
    StartX= 1
    StartY= 3
  }

That's it. Let's test!

Launch the executable as follows:

./maze.out -agent random_act

You will see a maze as follows where the robot (R) moves randomly.

(1,5)  77/4
 # # # # # # # # # #
 #       #       G #
 #   #   #         #
 # S #   # #       #
 #   #     #   # # #
 # R       #       #
 #                 #
 # # # # # # # # # #

↑

Agent Script for Q(lambda)-learning †

If you can make sure that MMazeTaskModule works correctly, then, let's apply a Q-learning module.

Create a blank file named ql.agent and open it.
Include ql_dsda where a composite Q-learning module is defined:
```
include_once "ql_dsda"
```
Instantiate the modules; the MMazeTaskModule's instance should have the name maze_task:
```
module MMazeTaskModule  maze_task
module MTDDiscStateAct  behavior
```

Connect the port pairs:

/// initialization process:
connect  maze_task.signal_initialization       , behavior.slot_initialize
/// start of episode process:
connect  maze_task.signal_start_of_episode     , behavior.slot_start_episode
/// learning signals:
connect  behavior.signal_execute_action        , maze_task.slot_execute_action
connect  maze_task.signal_end_of_step          , behavior.slot_finish_action
connect  maze_task.signal_reward               , behavior.slot_add_to_reward
connect  maze_task.signal_finish_episode       , behavior.slot_finish_episode_immediately
/// I/O:
connect  maze_task.out_action_set_size         , behavior.in_action_set_size
connect  maze_task.out_state_set_size          , behavior.in_state_set_size
connect  maze_task.out_state                   , behavior.in_state
connect  maze_task.out_time                    , behavior.in_cont_time

Assign the maze information to the configuration parameters of maze_task:

maze_task.config={
    Map={
        []= (1,1,1,1,1,1,1,1,1,1)
        []= (1,0,0,0,1,0,0,0,2,1)
        []= (1,0,1,0,1,0,0,0,0,1)
        []= (1,0,1,0,1,1,0,0,0,1)
        []= (1,0,1,0,0,1,0,1,1,1)
        []= (1,0,0,0,0,1,0,0,0,1)
        []= (1,0,0,0,0,0,0,0,0,1)
        []= (1,1,1,1,1,1,1,1,1,1)
      }
    StartX= 1
    StartY= 3
  }

Assign the learning configuration to the parameters of behavior:

behavior.config={
    UsingEligibilityTrace = true
    UsingReplacingTrace = true
    Lambda = 0.9
    GradientMax = 1.0e+100

    ActionSelection = "asBoltzman"
    PolicyImprovement = "piExpReduction"
    Tau = 1
    TauDecreasingFactor = 0.05
    TraceMax = 1.0

    Gamma = 0.9
    Alpha = 0.3
    AlphaDecreasingFactor = 0.002
    AlphaMin = 0.05
  }

Launch the executable as follows:

./maze.out -path ../../benchmarks/cmn -agent ql -outdir result/rl1

where ../../benchmarks/cmn is a relative path of the benchmarks/cmn directory; modify it for your environment.

After several tens of episodes, the policy will converge to a path:

(1,4)  1/520
 # # # # # # # # # #
 #       #       G #
 #   #   #         #
 # S #   # #       #
 # R #     #   # # #
 #         #       #
 #                 #
 # # # # # # # # # #

(3,6)  5/520
 # # # # # # # # # #
 #       #       G #
 #   #   #         #
 # S #   # #       #
 #   #     #   # # #
 #         #       #
 #     R           #
 # # # # # # # # # #

(6,6)  8/520
 # # # # # # # # # #
 #       #       G #
 #   #   #         #
 # S #   # #       #
 #   #     #   # # #
 #         #       #
 #           R     #
 # # # # # # # # # #

(7,3)  12/520
 # # # # # # # # # #
 #       #       G #
 #   #   #         #
 # S #   # #   R   #
 #   #     #   # # #
 #         #       #
 #                 #
 # # # # # # # # # #

(8,1)  15/520
 # # # # # # # # # #
 #       #       R #
 #   #   #         #
 # S #   # #       #
 #   #     #   # # #
 #         #       #
 #                 #
 # # # # # # # # # #

In order to store the learning logs, make a directory result/rl1 which is specified with -outdir option. Plotting log-eps-ret.dat, you will obtain a learning curve:

Example of a learning curve.