- The added line is THIS COLOR.
- The deleted line is THIS COLOR.
''Table of Contents''
#contents
* Overview [#xfb9bf26]
Here, we introduce how to implement a simple maze task with SkyAI.
The maze task has a discrete state and a discrete action, which will be implemented as a module of SkyAI.
As an reinforcement learning algorithm, Peng's Q(lambda)-learning is applied to the maze task; of course, we use predefined modules.
The following is the procedure:
+ Implement a maze task module.
+ Implement a random action module for testing the task module.
+ Implement a main function.
+ Compile.
+ Write an agent script for the random action test.
+ Write an agent script to apply Q(lambda)-learning.
The sample code works on a console; no extra libraries are required.
Let's start!
* Maze Task Module [#g1d7df27]
Please refer to [[../Tutorial - Making Module]].
+ Make a C++ source file using a template materials/templates/apps/main_tmpl.cpp contained in the SkyAI directory.
-- You can modify the file information (file name, brief, author, date, copyright, license info, etc.)
-- Replace every NAME_SPACE by loco_rabbits.
-- Write the following code inside the namespace loco_rabbits.
+ Make a configure class using the template TXxConfigurations written in [[../Tutorial - Making Module]].
-- Replace every TXxConfigurations by TMazeTaskConfigurations.
-- Remove the TestC parameter and add the following parameters:
#codeh(cpp){{
int NumEpisodes; // number of episodes
int MaxSteps; // number of max action steps per episode
int StartX, StartY; // start position
double GoalReward; // goal reward
double StepCost; // cost for each action step
int SleepUTime; // duration for display
std::vector<std::vector<int> > Map; // Map[y][x], 0:free space, 1:wall, 2:goal, every element should have the same size
}}
-- Initialize them at the constructor as:
#codeh(cpp){{
TMazeTaskConfigurations (var_space::TVariableMap &mmap) :
NumEpisodes (1000),
MaxSteps (1000),
StartX (1),
StartY (1),
GoalReward (1.0),
StepCost (-0.01),
SleepUTime (1000)
{
Register(mmap);
}
}}
-- In the member function Register, insert them:
#codeh(cpp){{
ADD( NumEpisodes );
ADD( StartX );
ADD( StartY );
ADD( GoalReward );
ADD( StepCost );
ADD( SleepUTime );
ADD( Map );
}}
-- Add lora/variable_space_impl.h in the include list.
#codeh(cpp){{
#include <lora/variable_space_impl.h> // to store std::vector<TIntVector>
}}
-- You can add your own parameters such as a noise.
+ Make the base of the module using the template MXxModule written in [[../Tutorial - Making Module]].
-- Simple template is OK.
-- Replace every MXxModule by MMazeEnvModule.
-- Replace every MParentModule by TModuleInterface.
-- Replace TXxConfigurations by TMazeTaskConfigurations.
-- Remove the definition of mem_ (TXxMemory mem_;).
#codeh(cpp){{
//===========================================================================================
//!\brief Maze task (environment+task) module
class MMazeTaskModule
: public TModuleInterface
//===========================================================================================
{
public:
typedef TModuleInterface TParent;
typedef MMazeTaskModule TThis;
SKYAI_MODULE_NAMES(MMazeTaskModule)
MMazeTaskModule (const std::string &v_instance_name)
: TParent (v_instance_name),
conf_ (TParent::param_box_config_map())
{
}
protected:
TMazeTaskConfigurations conf_;
}; // end of MMazeTaskModule
//-------------------------------------------------------------------------------------------
}}
+ Add following ports into MMazeEnvModule.
-- (port type), (port name), (return type), (parameter list), purpose
-- (port type), (port name), (return type), (parameter list), (purpose)
-- slot, slot_start, void, (void), called at the beginning of the execution.
-- slot, slot_execute_action, void, (const TInt &a), called by an RL agent module to execute action.
-- signal, signal_initialization, void (void), emit when the module is initialized.
-- signal, signal_start_of_episode, void (void), emit when each episode starts.
-- signal, signal_finish_episode, void (void), emit when the end-of-episode condition is satisfied.
-- signal, signal_end_of_episode, void (void), emit when each episode is terminated.
-- signal, signal_start_of_step, void (void), emit at the start of each step.
-- signal, signal_end_of_step, void (void), emit at the end of each step.
-- signal, signal_reward, void (const TSingleReward &), emit when a reward is given.
-- out, out_state_set_size, const TInt&, (void), output the number of elements in the state set.
-- out, out_action_set_size, const TInt&, (void), output the number of elements in the action set.
-- out, out_state, const TInt&, (void), output the current state (x,y are serialized).
-- out, out_time, const TReal&, (void), output the current time.
-- Note: some signal ports will not be used, but, defined for later use.
-- In order to add the ports, follow the steps:
++ Add declarations:
#codeh(cpp){{
MAKE_SLOT_PORT(slot_start, void, (void), (), TThis);
MAKE_SLOT_PORT(slot_execute_action, void, (const TInt &a), (a), TThis);
MAKE_SIGNAL_PORT(signal_initialization, void (void), TThis);
MAKE_SIGNAL_PORT(signal_start_of_episode, void (void), TThis);
MAKE_SIGNAL_PORT(signal_finish_episode, void (void), TThis);
MAKE_SIGNAL_PORT(signal_end_of_episode, void (void), TThis);
MAKE_SIGNAL_PORT(signal_start_of_step, void (void), TThis);
MAKE_SIGNAL_PORT(signal_end_of_step, void (void), TThis);
MAKE_SIGNAL_PORT(signal_reward, void (const TSingleReward &), TThis);
MAKE_OUT_PORT(out_state_set_size, const TInt&, (void), (), TThis);
MAKE_OUT_PORT(out_action_set_size, const TInt&, (void), (), TThis);
MAKE_OUT_PORT(out_state, const TInt&, (void), (), TThis);
MAKE_OUT_PORT(out_time, const TReal&, (void), (), TThis);
}}
++ Add initializers at the constructor:
#codeh(cpp){{
MMazeTaskModule (const std::string &v_instance_name)
: ...
slot_start (*this),
slot_execute_action (*this),
signal_initialization (*this),
signal_start_of_episode (*this),
signal_finish_episode (*this),
signal_end_of_episode (*this),
signal_start_of_step (*this),
signal_end_of_step (*this),
signal_reward (*this),
out_state_set_size (*this),
out_action_set_size (*this),
out_state (*this),
out_time (*this)
}}
++ Add register functions at the constructor:
#codeh(cpp){{
add_slot_port (slot_start );
add_slot_port (slot_execute_action );
add_signal_port (signal_initialization );
add_signal_port (signal_start_of_episode );
add_signal_port (signal_finish_episode );
add_signal_port (signal_end_of_episode );
add_signal_port (signal_start_of_step );
add_signal_port (signal_end_of_step );
add_signal_port (signal_reward );
add_out_port (out_state_set_size );
add_out_port (out_action_set_size );
add_out_port (out_state );
add_out_port (out_time );
}}
+ Finally, we implement the slot port callbacks and the output functions. This procedure is slightly complicated.
+ Next, we implement the slot port callbacks and the output functions. This procedure is slightly complicated; follow one by one.
++ Add member variables at the protected section.
#codeh(cpp){{
mutable int state_set_size_;
const int action_set_size_;
int current_action_;
int pos_x_, pos_y_;
mutable int tmp_state_;
TReal current_time_;
TInt num_episode_;
}}
++ Add their initializers:
#codeh(cpp){{
state_set_size_ (0),
action_set_size_ (4),
current_action_ (0),
}}
++ Implement slot_start_exec. This is a long code, so, write the declaration at the protected section:
#codeh(cpp){{
virtual void slot_start_exec (void);
}}
Then, define it outside the class:
#codeh(cpp){{
/*virtual*/void MMazeTaskModule::slot_start_exec (void)
{
init_environment();
signal_initialization.ExecAll();
for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++num_episode_)
{
init_environment();
signal_start_of_episode.ExecAll();
bool running(true);
while(running)
{
signal_start_of_step.ExecAll();
running= step_environment();
show_environment();
usleep(conf_.SleepUTime);
if(current_time_>=conf_.MaxSteps)
{
signal_finish_episode.ExecAll();
running= false;
}
signal_end_of_step.ExecAll();
}
signal_end_of_episode.ExecAll();
}
}
}}
where we used the three member functions. These are declared at the protected section:
#codeh(cpp){{
void init_environment (void);
bool step_environment (void);
void show_environment (void);
}}
and, defined outside the class:
#codeh(cpp){{
void MMazeTaskModule::init_environment (void)
{
pos_x_= conf_.StartX;
pos_y_= conf_.StartY;
current_time_= 0.0l;
}
}}
#codeh(cpp){{
bool MMazeTaskModule::step_environment (void)
{
int next_x(pos_x_), next_y(pos_y_);
switch(current_action_)
{
case 0: ++next_x; break; // right
case 1: --next_y; break; // up
case 2: --next_x; break; // left
case 3: ++next_y; break; // down
default: LERROR("invalid action:"<<current_action_);
}
++current_time_;
signal_reward.ExecAll(conf_.StepCost);
switch(conf_.Map[next_y][next_x])
{
case 0: // free space
pos_x_=next_x;
pos_y_=next_y;
break;
case 1: // wall
break;
case 2: // goal
pos_x_=next_x;
pos_y_=next_y;
signal_reward.ExecAll(conf_.GoalReward);
signal_finish_episode.ExecAll();
return false;
default: LERROR("invalid map element: "<<conf_.Map[next_y][next_x]);
}
return true;
}
}}
#codeh(cpp){{
void MMazeTaskModule::show_environment (void)
{
int x(0),y(0);
std::cout<<"("<<pos_x_<<","<<pos_y_<<") "<<current_time_<<"/"<<num_episode_<<std::endl;
for(std::vector<std::vector<int> >::const_iterator yitr(conf_.Map.begin()),ylast(conf_.Map.end());yitr!=ylast;++yitr,++y)
{
x=0;
for(std::vector<int>::const_iterator xitr(yitr->begin()),xlast(yitr->end());xitr!=xlast;++xitr,++x)
{
std::cout<<" ";
if(x==pos_x_ && y==pos_y_)
std::cout<<"R";
else if(x==conf_.StartX && y==conf_.StartY)
std::cout<<"S";
else
switch(*xitr)
{
case 0: std::cout<<" "; break;
case 1: std::cout<<"#"; break;
case 2: std::cout<<"G"; break;
default: std::cout<<"?"; break;
}
}
std::cout<<" "<<std::endl;
}
std::cout<<std::endl;
}
}}
++ Implement the other slot port callbacks and output functions. These are short code, so, you can write inside the class at the protected section.
#codeh(cpp){{
virtual void slot_execute_action_exec (const TInt &a)
{
current_action_= a;
}
virtual const TInt& out_state_set_size_get (void) const
{
state_set_size_= conf_.Map[0].size() * conf_.Map.size();
return state_set_size_;
}
virtual const TInt& out_action_set_size_get (void) const
{
return action_set_size_;
}
virtual const TInt& out_state_get (void) const
{
return tmp_state_=serialize(pos_x_,pos_y_);
}
virtual const TReal& out_time_get (void) const
{
return current_time_;
}
}}
where serialize is a protected member function defined as follows:
#codeh(cpp){{
int serialize (int x, int y) const
{
return y * conf_.Map[0].size() + x;
}
}}
+ Finally, use SKYAI_ADD_MODULE macro to register the module on SkyAI:
#codeh(cpp){{
SKYAI_ADD_MODULE(MMazeTaskModule)
}}
That's it.
* Random Action Module [#x0a9632f]
Next, in order to test the MMazeTaskModule module, we make a module named MRandomActionModule that emits a random action at each step.
MRandomActionModule has two ports:
- (port type), (port name), (return type), (parameter list), (purpose)
- slot, slot_step, void, (void), called at each step where a random action is emitted through the signal_action port.
- signal, signal_action, void (const TInt &), emit at each step.
Thus, its implementation is very simple:
#codeh(cpp){{
//===========================================================================================
//!\brief Random action module
class MRandomActionModule
: public TModuleInterface
//===========================================================================================
{
public:
typedef TModuleInterface TParent;
typedef MRandomActionModule TThis;
SKYAI_MODULE_NAMES(MRandomActionModule)
MRandomActionModule (const std::string &v_instance_name)
: TParent (v_instance_name),
slot_step (*this),
signal_action (*this)
{
add_slot_port (slot_step );
add_signal_port (signal_action);
}
protected:
MAKE_SLOT_PORT(slot_step, void, (void), (), TThis);
MAKE_SIGNAL_PORT(signal_action, void (const TInt &), TThis);
virtual void slot_step_exec (void)
{
signal_action.ExecAll(rand() % 4);
}
}; // end of MRandomActionModule
//-------------------------------------------------------------------------------------------
}}
Then, use SKYAI_ADD_MODULE macro to register the module on SkyAI:
#codeh(cpp){{
SKYAI_ADD_MODULE(MRandomActionModule)
}}
* Main Function [#ddf2c0fe]
Refer to [[../Tutorial - Making Executable]].
Our main function is as follows:
#codeh(cpp){{
using namespace std;
using namespace loco_rabbits;
int main(int argc, char**argv)
{
TOptionParser option(argc,argv);
TAgent agent;
if (!ParseCmdLineOption (agent, option)) return 0;
MMazeTaskModule *p_maze_task = dynamic_cast<MMazeTaskModule*>(agent.SearchModule("maze_task"));
if(p_maze_task==NULL) {LERROR("module `maze_task' is not defined as an instance of MMazeTaskModule"); return 1;}
agent.SaveToFile (agent.GetDataFileName("before.agent"),"before-");
p_maze_task->Start();
agent.SaveToFile (agent.GetDataFileName("after.agent"),"after-");
return 0;
}
}}
This main function consists of the following parts:
+ Create an instance of the TAgent class.
+ Parse the command line option and load an agent script.
+ Get a module named maze_task which is an instance of MMazeTaskModule.
+ Save the agent status into a file named before.agent.
+ Execute the maze_task's Start function.
+ Save the agent status into a file named after.agent.
* Compile [#ga57fd4c]
First, write a makefile as follows:
#codeh(makefile){{
BASE_REL_DIR:=../..
include $(BASE_REL_DIR)/Makefile_preconf
EXEC := maze.out
OBJS := maze.o
USING_SKYAI_ODE:=true
MAKING_SKYAI:=true
include $(BASE_REL_DIR)/Makefile_body
}}
- BASE_REL_DIR : relative path to the base directory of the SkyAI.
Then, execute the make command:
#codeh(sh){{
make
}}
An executable named maze.out is generated?
* Agent Script for Random Action Test [#vdba6662]
Now, let's test MMazeTaskModule using MRandomActionModule.
+ Create a blank file named random_act.agent and open it.
+ Instantiate each module; the MMazeTaskModule's instance should have the name maze_task:
#codeh(cpp){{
module MMazeTaskModule maze_task
module MRandomActionModule rand_action
}}
+ Connect the following port pairs:
-- maze_task.signal_start_of_step --> rand_action.slot_step
-- rand_action.signal_action --> maze_task.slot_execute_action
#codeh(cpp){{
connect maze_task.signal_start_of_step , rand_action.slot_step
connect rand_action.signal_action , maze_task.slot_execute_action
}}
+ Assign the maze information to the configuration parameters of maze_task:
#codeh(cpp){{
maze_task.config={
Map={
[]= (1,1,1,1,1,1,1,1,1,1)
[]= (1,0,0,0,1,0,0,0,2,1)
[]= (1,0,1,0,1,0,0,0,0,1)
[]= (1,0,1,0,1,1,0,0,0,1)
[]= (1,0,1,0,0,1,0,1,1,1)
[]= (1,0,0,0,0,1,0,0,0,1)
[]= (1,0,0,0,0,0,0,0,0,1)
[]= (1,1,1,1,1,1,1,1,1,1)
}
StartX= 1
StartY= 3
}
}}
That's it. Let's test!
Launch the executable as follows:
#codeh(sh){{
./maze.out -agent random_act
}}
You will see a maze as follows where the robot (R) moves randomly.
(1,5) 77/4
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# R # #
# #
# # # # # # # # # #
* Agent Script for Q(lambda)-learning [#ef68204c]
If you can make sure that MMazeTaskModule works correctly, then, let's apply a Q-learning module.
+ Create a blank file named ql.agent and open it.
+ Include ql_dsda where a composite Q-learning module is defined:
#codeh(cpp){{
include_once "ql_dsda"
}}
+ Instantiate the modules; the MMazeTaskModule's instance should have the name maze_task:
#codeh(cpp){{
module MMazeTaskModule maze_task
module MTDDiscStateAct behavior
}}
+ Connect the port pairs:
#codeh(cpp){{
/// initialization process:
connect maze_task.signal_initialization , behavior.slot_initialize
/// start of episode process:
connect maze_task.signal_start_of_episode , behavior.slot_start_episode
/// learning signals:
connect behavior.signal_execute_action , maze_task.slot_execute_action
connect maze_task.signal_end_of_step , behavior.slot_finish_action
connect maze_task.signal_reward , behavior.slot_add_to_reward
connect maze_task.signal_finish_episode , behavior.slot_finish_episode_immediately
/// I/O:
connect maze_task.out_action_set_size , behavior.in_action_set_size
connect maze_task.out_state_set_size , behavior.in_state_set_size
connect maze_task.out_state , behavior.in_state
connect maze_task.out_time , behavior.in_cont_time
}}
+ Assign the maze information to the configuration parameters of maze_task:
#codeh(cpp){{
maze_task.config={
Map={
[]= (1,1,1,1,1,1,1,1,1,1)
[]= (1,0,0,0,1,0,0,0,2,1)
[]= (1,0,1,0,1,0,0,0,0,1)
[]= (1,0,1,0,1,1,0,0,0,1)
[]= (1,0,1,0,0,1,0,1,1,1)
[]= (1,0,0,0,0,1,0,0,0,1)
[]= (1,0,0,0,0,0,0,0,0,1)
[]= (1,1,1,1,1,1,1,1,1,1)
}
StartX= 1
StartY= 3
}
}}
+ Assign the learning configuration to the parameters of behavior:
#codeh(cpp){{
behavior.config={
UsingEligibilityTrace = true
UsingReplacingTrace = true
Lambda = 0.9
GradientMax = 1.0e+100
ActionSelection = "asBoltzman"
PolicyImprovement = "piExpReduction"
Tau = 1
TauDecreasingFactor = 0.05
TraceMax = 1.0
Gamma = 0.9
Alpha = 0.3
AlphaDecreasingFactor = 0.002
AlphaMin = 0.05
}
}}
Launch the executable as follows:
#codeh(sh){{
./maze.out -path ../../benchmarks/cmn -agent ql -outdir result/rl1
}}
where ../../benchmarks/cmn is a relative path of the benchmarks/cmn directory; modify it for your environment.
After several tens of episodes, the policy will converge to a path:
(1,4) 1/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# R # # # # #
# # #
# #
# # # # # # # # # #
(3,6) 5/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# # #
# R #
# # # # # # # # # #
(6,6) 8/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# # #
# R #
# # # # # # # # # #
(7,3) 12/520
# # # # # # # # # #
# # G #
# # # #
# S # # # R #
# # # # # #
# # #
# #
# # # # # # # # # #
(8,1) 15/520
# # # # # # # # # #
# # R #
# # # #
# S # # # #
# # # # # #
# # #
# #
# # # # # # # # # #
In order to store the learning logs, make a directory result/rl1 which is specified with -outdir option.
Plotting log-eps-ret.dat, you will obtain a learning curve:
#ref(./out-maze.png,zoom,center,600x0)
CENTER:''Example of a learning curve.''