Tutorial - Example - Maze
Start:
''Table of Contents''
#contents
* Overview [#xfb9bf26]
Here, we introduce how to implement a simple maze task wi...
The maze task has a discrete state and a discrete action,...
As an reinforcement learning algorithm, Peng's Q(lambda)-...
The following is the procedure:
+ Implement a maze task module.
+ Implement a random action module for testing the task m...
+ Implement a main function.
+ Compile.
+ Write an agent script for the random action test.
+ Write an agent script to apply Q(lambda)-learning.
The sample code works on a console; no extra libraries ar...
Let's start!
* Task Setup [#c0a22683]
The maze has the size W x H, consisting of the start (S),...
The robot cannot go through the walls.
Its objective is to move from the start to the goal in th...
This is an example of the maze environment:
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# # #
# #
# # # # # # # # # #
The ''state'' is a 1-dimensional discrete value where the...
The ''action'' is a discrete action consisting of {up,dow...
The ''reward'' is given +1 when the robot arrives at the ...
Each episode starts with locating the robot at the start,...
* Maze Task Module [#g1d7df27]
Please refer to [[../Tutorial - Making Module]].
+ Make a C++ source file named maze.cpp using a template ...
-- You can modify the file information (file name, brief,...
-- Replace every NAME_SPACE by loco_rabbits.
-- Write the following code inside the namespace loco_rab...
+ Make a configure class using the template TXxConfigurat...
-- Replace every TXxConfigurations by TMazeTaskConfigurat...
-- Remove the TestC parameter and add the following param...
#codeh(cpp){{
int NumEpisodes; // number of episodes
int MaxSteps; // number of max action steps per ...
int StartX, StartY; // start position
double GoalReward; // goal reward
double StepCost; // cost for each action step
int SleepUTime; // duration for display
std::vector<std::vector<int> > Map; // Map[y][x], 0:fr...
}}
-- Initialize them at the constructor as:
#codeh(cpp){{
TMazeTaskConfigurations (var_space::TVariableMap &mmap) :
NumEpisodes (1000),
MaxSteps (1000),
StartX (1),
StartY (1),
GoalReward (1.0),
StepCost (-0.01),
SleepUTime (1000)
{
Register(mmap);
}
}}
-- In the member function Register, insert them:
#codeh(cpp){{
ADD( NumEpisodes );
ADD( StartX );
ADD( StartY );
ADD( GoalReward );
ADD( StepCost );
ADD( SleepUTime );
ADD( Map );
}}
-- Add lora/variable_space_impl.h in the include list.
#codeh(cpp){{
#include <lora/variable_space_impl.h> // to store std::v...
}}
-- You can add your own parameters such as a noise.
+ Make the base of the module using the template MXxModul...
-- Simple template is OK.
-- Replace every MXxModule by MMazeEnvModule.
-- Replace every MParentModule by TModuleInterface.
-- Replace TXxConfigurations by TMazeTaskConfigurations.
-- Remove the definition of mem_ (TXxMemory mem_;).
#codeh(cpp){{
//=======================================================...
//!\brief Maze task (environment+task) module
class MMazeTaskModule
: public TModuleInterface
//=======================================================...
{
public:
typedef TModuleInterface TParent;
typedef MMazeTaskModule TThis;
SKYAI_MODULE_NAMES(MMazeTaskModule)
MMazeTaskModule (const std::string &v_instance_name)
: TParent (v_instance_name),
conf_ (TParent::param_box_config_map())
{
}
protected:
TMazeTaskConfigurations conf_;
}; // end of MMazeTaskModule
//-------------------------------------------------------...
}}
+ Add following ports into MMazeEnvModule.
-- (port type), (port name), (return type), (parameter li...
-- slot, slot_start, void, (void), called at the beginnin...
-- slot, slot_execute_action, void, (const TInt &a), call...
-- signal, signal_initialization, void (void), emit when ...
-- signal, signal_start_of_episode, void (void), emit whe...
-- signal, signal_finish_episode, void (void), emit when ...
-- signal, signal_end_of_episode, void (void), emit when ...
-- signal, signal_start_of_step, void (void), emit at the...
-- signal, signal_end_of_step, void (void), emit at the e...
-- signal, signal_reward, void (const TSingleReward &), e...
-- out, out_state_set_size, const TInt&, (void), output t...
-- out, out_action_set_size, const TInt&, (void), output ...
-- out, out_state, const TInt&, (void), output the curren...
-- out, out_time, const TReal&, (void), output the curren...
-- Note: some signal ports will not be used, but, defined...
-- In order to add the ports, follow the steps:
++ Add declarations:
#codeh(cpp){{
MAKE_SLOT_PORT(slot_start, void, (void), (), TThis);
MAKE_SLOT_PORT(slot_execute_action, void, (const TInt &...
MAKE_SIGNAL_PORT(signal_initialization, void (void), TT...
MAKE_SIGNAL_PORT(signal_start_of_episode, void (void), ...
MAKE_SIGNAL_PORT(signal_finish_episode, void (void), TT...
MAKE_SIGNAL_PORT(signal_end_of_episode, void (void), TT...
MAKE_SIGNAL_PORT(signal_start_of_step, void (void), TTh...
MAKE_SIGNAL_PORT(signal_end_of_step, void (void), TThis);
MAKE_SIGNAL_PORT(signal_reward, void (const TSingleRewa...
MAKE_OUT_PORT(out_state_set_size, const TInt&, (void), ...
MAKE_OUT_PORT(out_action_set_size, const TInt&, (void),...
MAKE_OUT_PORT(out_state, const TInt&, (void), (), TThis);
MAKE_OUT_PORT(out_time, const TReal&, (void), (), TThis);
}}
++ Add initializers at the constructor:
#codeh(cpp){{
MMazeTaskModule (const std::string &v_instance_name)
: ...
slot_start (*this),
slot_execute_action (*this),
signal_initialization (*this),
signal_start_of_episode (*this),
signal_finish_episode (*this),
signal_end_of_episode (*this),
signal_start_of_step (*this),
signal_end_of_step (*this),
signal_reward (*this),
out_state_set_size (*this),
out_action_set_size (*this),
out_state (*this),
out_time (*this)
}}
++ Add register functions at the constructor:
#codeh(cpp){{
add_slot_port (slot_start );
add_slot_port (slot_execute_action );
add_signal_port (signal_initialization );
add_signal_port (signal_start_of_episode );
add_signal_port (signal_finish_episode );
add_signal_port (signal_end_of_episode );
add_signal_port (signal_start_of_step );
add_signal_port (signal_end_of_step );
add_signal_port (signal_reward );
add_out_port (out_state_set_size );
add_out_port (out_action_set_size );
add_out_port (out_state );
add_out_port (out_time );
}}
+ Next, we implement the slot port callbacks and the outp...
++ Add member variables at the protected section.
#codeh(cpp){{
mutable int state_set_size_;
const int action_set_size_;
int current_action_;
int pos_x_, pos_y_;
mutable int tmp_state_;
TReal current_time_;
TInt num_episode_;
}}
++ Add their initializers:
#codeh(cpp){{
state_set_size_ (0),
action_set_size_ (4),
current_action_ (0),
}}
++ Implement slot_start_exec. This is a long code, so, w...
#codeh(cpp){{
virtual void slot_start_exec (void);
}}
Then, define it outside the class:
#codeh(cpp){{
/*virtual*/void MMazeTaskModule::slot_start_exec (void)
{
init_environment();
signal_initialization.ExecAll();
for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++n...
{
init_environment();
signal_start_of_episode.ExecAll();
bool running(true);
while(running)
{
signal_start_of_step.ExecAll();
running= step_environment();
show_environment();
usleep(conf_.SleepUTime);
if(current_time_>=conf_.MaxSteps)
{
signal_finish_episode.ExecAll();
running= false;
}
signal_end_of_step.ExecAll();
}
signal_end_of_episode.ExecAll();
}
}
}}
where we used the three member functions. These are decl...
#codeh(cpp){{
void init_environment (void);
bool step_environment (void);
void show_environment (void);
}}
and, defined outside the class:
#codeh(cpp){{
void MMazeTaskModule::init_environment (void)
{
pos_x_= conf_.StartX;
pos_y_= conf_.StartY;
current_time_= 0.0l;
}
}}
#codeh(cpp){{
bool MMazeTaskModule::step_environment (void)
{
int next_x(pos_x_), next_y(pos_y_);
switch(current_action_)
{
case 0: ++next_x; break; // right
case 1: --next_y; break; // up
case 2: --next_x; break; // left
case 3: ++next_y; break; // down
default: LERROR("invalid action:"<<current_action_);
}
++current_time_;
signal_reward.ExecAll(conf_.StepCost);
switch(conf_.Map[next_y][next_x])
{
case 0: // free space
pos_x_=next_x;
pos_y_=next_y;
break;
case 1: // wall
break;
case 2: // goal
pos_x_=next_x;
pos_y_=next_y;
signal_reward.ExecAll(conf_.GoalReward);
signal_finish_episode.ExecAll();
return false;
default: LERROR("invalid map element: "<<conf_.Map[next...
}
return true;
}
}}
#codeh(cpp){{
void MMazeTaskModule::show_environment (void)
{
int x(0),y(0);
std::cout<<"("<<pos_x_<<","<<pos_y_<<") "<<current_tim...
for(std::vector<std::vector<int> >::const_iterator yitr...
{
x=0;
for(std::vector<int>::const_iterator xitr(yitr->begin...
{
std::cout<<" ";
if(x==pos_x_ && y==pos_y_)
std::cout<<"R";
else if(x==conf_.StartX && y==conf_.StartY)
std::cout<<"S";
else
switch(*xitr)
{
case 0: std::cout<<" "; break;
case 1: std::cout<<"#"; break;
case 2: std::cout<<"G"; break;
default: std::cout<<"?"; break;
}
}
std::cout<<" "<<std::endl;
}
std::cout<<std::endl;
}
}}
++ Implement the other slot port callbacks and output fun...
#codeh(cpp){{
virtual void slot_execute_action_exec (const TInt &a)
{
current_action_= a;
}
virtual const TInt& out_state_set_size_get (void) const
{
state_set_size_= conf_.Map[0].size() * conf_.Map.size...
return state_set_size_;
}
virtual const TInt& out_action_set_size_get (void) const
{
return action_set_size_;
}
virtual const TInt& out_state_get (void) const
{
return tmp_state_=serialize(pos_x_,pos_y_);
}
virtual const TReal& out_time_get (void) const
{
return current_time_;
}
}}
where serialize is a protected member function defined as...
#codeh(cpp){{
int serialize (int x, int y) const
{
return y * conf_.Map[0].size() + x;
}
}}
+ Add a Start() public member function that calls slot_st...
#codeh(cpp){{
void Start()
{
slot_start.Exec();
}
}}
+ Finally, use SKYAI_ADD_MODULE macro to register the mod...
#codeh(cpp){{
SKYAI_ADD_MODULE(MMazeTaskModule)
}}
This should be written outside the class and inside the n...
That's it.
* Random Action Module [#x0a9632f]
Next, in order to test the MMazeTaskModule module, we mak...
MRandomActionModule has two ports:
- (port type), (port name), (return type), (parameter lis...
- slot, slot_step, void, (void), called at each step wher...
- signal, signal_action, void (const TInt &), emit at eac...
Thus, its implementation is very simple:
#codeh(cpp){{
//=======================================================...
//!\brief Random action module
class MRandomActionModule
: public TModuleInterface
//=======================================================...
{
public:
typedef TModuleInterface TParent;
typedef MRandomActionModule TThis;
SKYAI_MODULE_NAMES(MRandomActionModule)
MRandomActionModule (const std::string &v_instance_name)
: TParent (v_instance_name),
slot_step (*this),
signal_action (*this)
{
add_slot_port (slot_step );
add_signal_port (signal_action);
}
protected:
MAKE_SLOT_PORT(slot_step, void, (void), (), TThis);
MAKE_SIGNAL_PORT(signal_action, void (const TInt &), TT...
virtual void slot_step_exec (void)
{
signal_action.ExecAll(rand() % 4);
}
}; // end of MRandomActionModule
//-------------------------------------------------------...
}}
Then, use SKYAI_ADD_MODULE macro to register the module o...
#codeh(cpp){{
SKYAI_ADD_MODULE(MRandomActionModule)
}}
* Main Function [#ddf2c0fe]
Refer to [[../Tutorial - Making Executable]].
Our main function is as follows:
#codeh(cpp){{
using namespace std;
using namespace loco_rabbits;
int main(int argc, char**argv)
{
TOptionParser option(argc,argv);
TAgent agent;
if (!ParseCmdLineOption (agent, option)) return 0;
MMazeTaskModule *p_maze_task = dynamic_cast<MMazeTaskMo...
if(p_maze_task==NULL) {LERROR("module `maze_task' is n...
agent.SaveToFile (agent.GetDataFileName("before.agent")...
p_maze_task->Start();
agent.SaveToFile (agent.GetDataFileName("after.agent"),...
return 0;
}
}}
This main function consists of the following parts:
+ Create an instance of the TAgent class.
+ Parse the command line option and load an agent script.
+ Get a module named maze_task which is an instance of MM...
+ Save the agent status into a file named before.agent.
+ Execute the maze_task's Start function.
+ Save the agent status into a file named after.agent.
* Compile [#ga57fd4c]
First, write a makefile as follows:
#codeh(makefile){{
BASE_REL_DIR:=../..
include $(BASE_REL_DIR)/Makefile_preconf
EXEC := maze.out
OBJS := maze.o
USING_SKYAI_ODE:=true
MAKING_SKYAI:=true
include $(BASE_REL_DIR)/Makefile_body
}}
- BASE_REL_DIR : relative path to the base directory of t...
Then, execute the make command:
#codeh(sh){{
make
}}
An executable named maze.out is generated?
* Agent Script for Random Action Test [#vdba6662]
Please refer to [[../Tutorial - Writing Agent Script]].
Now, let's test MMazeTaskModule using MRandomActionModule.
+ Create a blank file named random_act.agent and open it.
+ Instantiate each module; the MMazeTaskModule's instance...
#codeh(cpp){{
module MMazeTaskModule maze_task
module MRandomActionModule rand_action
}}
+ Connect the following port pairs:
-- maze_task.signal_start_of_step --> rand_action.slot_step
-- rand_action.signal_action --> maze_task.slot_execute_a...
#codeh(cpp){{
connect maze_task.signal_start_of_step , rand_action.slo...
connect rand_action.signal_action , maze_task.slot_execu...
}}
+ Assign the maze information to the configuration parame...
#codeh(cpp){{
maze_task.config={
Map={
[]= (1,1,1,1,1,1,1,1,1,1)
[]= (1,0,0,0,1,0,0,0,2,1)
[]= (1,0,1,0,1,0,0,0,0,1)
[]= (1,0,1,0,1,1,0,0,0,1)
[]= (1,0,1,0,0,1,0,1,1,1)
[]= (1,0,0,0,0,1,0,0,0,1)
[]= (1,0,0,0,0,0,0,0,0,1)
[]= (1,1,1,1,1,1,1,1,1,1)
}
StartX= 1
StartY= 3
}
}}
That's it. Let's test!
Launch the executable as follows:
#codeh(sh){{
./maze.out -agent random_act
}}
You will see a maze as follows where the robot (R) moves ...
(1,5) 77/4
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# R # #
# #
# # # # # # # # # #
* Agent Script for Q(lambda)-learning [#ef68204c]
Please refer to [[../Tutorial - Writing Agent Script]].
If you can make sure that MMazeTaskModule works correctly...
+ Create a blank file named ql.agent and open it.
+ Include ql_dsda where a composite Q-learning module is ...
#codeh(cpp){{
include_once "ql_dsda"
}}
+ Instantiate the modules; the MMazeTaskModule's instance...
#codeh(cpp){{
module MMazeTaskModule maze_task
module MTDDiscStateAct behavior
}}
+ Connect the port pairs:
#codeh(cpp){{
/// initialization process:
connect maze_task.signal_initialization , behavior...
/// start of episode process:
connect maze_task.signal_start_of_episode , behavior...
/// learning signals:
connect behavior.signal_execute_action , maze_tas...
connect maze_task.signal_end_of_step , behavior...
connect maze_task.signal_reward , behavior...
connect maze_task.signal_finish_episode , behavior...
/// I/O:
connect maze_task.out_action_set_size , behavior...
connect maze_task.out_state_set_size , behavior...
connect maze_task.out_state , behavior...
connect maze_task.out_time , behavior...
}}
+ Assign the maze information to the configuration parame...
#codeh(cpp){{
maze_task.config={
Map={
[]= (1,1,1,1,1,1,1,1,1,1)
[]= (1,0,0,0,1,0,0,0,2,1)
[]= (1,0,1,0,1,0,0,0,0,1)
[]= (1,0,1,0,1,1,0,0,0,1)
[]= (1,0,1,0,0,1,0,1,1,1)
[]= (1,0,0,0,0,1,0,0,0,1)
[]= (1,0,0,0,0,0,0,0,0,1)
[]= (1,1,1,1,1,1,1,1,1,1)
}
StartX= 1
StartY= 3
}
}}
+ Assign the learning configuration to the parameters of ...
#codeh(cpp){{
behavior.config={
UsingEligibilityTrace = true
UsingReplacingTrace = true
Lambda = 0.9
GradientMax = 1.0e+100
ActionSelection = "asBoltzman"
PolicyImprovement = "piExpReduction"
Tau = 1
TauDecreasingFactor = 0.05
TraceMax = 1.0
Gamma = 0.9
Alpha = 0.3
AlphaDecreasingFactor = 0.002
AlphaMin = 0.05
}
}}
Launch the executable as follows:
#codeh(sh){{
./maze.out -path ../../benchmarks/cmn -agent ql -outdir r...
}}
where ../../benchmarks/cmn is a relative path of the benc...
After several tens of episodes, the policy will converge ...
#block
(1,4) 1/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# R # # # # #
# # #
# #
# # # # # # # # # #
#block(next)
(3,6) 5/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# # #
# R #
# # # # # # # # # #
#block(next)
(6,6) 8/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# # #
# R #
# # # # # # # # # #
#block(end)
#block
(7,3) 12/520
# # # # # # # # # #
# # G #
# # # #
# S # # # R #
# # # # # #
# # #
# #
# # # # # # # # # #
#block(next)
(8,1) 15/520
# # # # # # # # # #
# # R #
# # # #
# S # # # #
# # # # # #
# # #
# #
# # # # # # # # # #
#block(next)
&nb...
&nb...
&nb...
#block(end)
In order to store the learning logs, make a directory res...
Plotting log-eps-ret.dat, you will obtain a learning curve:
#ref(./out-maze.png,zoom,center,600x0)
CENTER:''Example of a learning curve.''
* Exercise [#h5477405]
+ Execute several runs (e.g. 10 runs) and plot their mean...
+ Test another parameters, algorithms, and maze kinds.
+ Extend the maze; e.g. include a trap, wind, etc.
+ Add noise at each step. The noise parameter should be ...
+ Change the agent script to log the state transition in ...
-- Check the logger modules MSimpleDataLogger1_T, MSimple...
End:
''Table of Contents''
#contents
* Overview [#xfb9bf26]
Here, we introduce how to implement a simple maze task wi...
The maze task has a discrete state and a discrete action,...
As an reinforcement learning algorithm, Peng's Q(lambda)-...
The following is the procedure:
+ Implement a maze task module.
+ Implement a random action module for testing the task m...
+ Implement a main function.
+ Compile.
+ Write an agent script for the random action test.
+ Write an agent script to apply Q(lambda)-learning.
The sample code works on a console; no extra libraries ar...
Let's start!
* Task Setup [#c0a22683]
The maze has the size W x H, consisting of the start (S),...
The robot cannot go through the walls.
Its objective is to move from the start to the goal in th...
This is an example of the maze environment:
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# # #
# #
# # # # # # # # # #
The ''state'' is a 1-dimensional discrete value where the...
The ''action'' is a discrete action consisting of {up,dow...
The ''reward'' is given +1 when the robot arrives at the ...
Each episode starts with locating the robot at the start,...
* Maze Task Module [#g1d7df27]
Please refer to [[../Tutorial - Making Module]].
+ Make a C++ source file named maze.cpp using a template ...
-- You can modify the file information (file name, brief,...
-- Replace every NAME_SPACE by loco_rabbits.
-- Write the following code inside the namespace loco_rab...
+ Make a configure class using the template TXxConfigurat...
-- Replace every TXxConfigurations by TMazeTaskConfigurat...
-- Remove the TestC parameter and add the following param...
#codeh(cpp){{
int NumEpisodes; // number of episodes
int MaxSteps; // number of max action steps per ...
int StartX, StartY; // start position
double GoalReward; // goal reward
double StepCost; // cost for each action step
int SleepUTime; // duration for display
std::vector<std::vector<int> > Map; // Map[y][x], 0:fr...
}}
-- Initialize them at the constructor as:
#codeh(cpp){{
TMazeTaskConfigurations (var_space::TVariableMap &mmap) :
NumEpisodes (1000),
MaxSteps (1000),
StartX (1),
StartY (1),
GoalReward (1.0),
StepCost (-0.01),
SleepUTime (1000)
{
Register(mmap);
}
}}
-- In the member function Register, insert them:
#codeh(cpp){{
ADD( NumEpisodes );
ADD( StartX );
ADD( StartY );
ADD( GoalReward );
ADD( StepCost );
ADD( SleepUTime );
ADD( Map );
}}
-- Add lora/variable_space_impl.h in the include list.
#codeh(cpp){{
#include <lora/variable_space_impl.h> // to store std::v...
}}
-- You can add your own parameters such as a noise.
+ Make the base of the module using the template MXxModul...
-- Simple template is OK.
-- Replace every MXxModule by MMazeEnvModule.
-- Replace every MParentModule by TModuleInterface.
-- Replace TXxConfigurations by TMazeTaskConfigurations.
-- Remove the definition of mem_ (TXxMemory mem_;).
#codeh(cpp){{
//=======================================================...
//!\brief Maze task (environment+task) module
class MMazeTaskModule
: public TModuleInterface
//=======================================================...
{
public:
typedef TModuleInterface TParent;
typedef MMazeTaskModule TThis;
SKYAI_MODULE_NAMES(MMazeTaskModule)
MMazeTaskModule (const std::string &v_instance_name)
: TParent (v_instance_name),
conf_ (TParent::param_box_config_map())
{
}
protected:
TMazeTaskConfigurations conf_;
}; // end of MMazeTaskModule
//-------------------------------------------------------...
}}
+ Add following ports into MMazeEnvModule.
-- (port type), (port name), (return type), (parameter li...
-- slot, slot_start, void, (void), called at the beginnin...
-- slot, slot_execute_action, void, (const TInt &a), call...
-- signal, signal_initialization, void (void), emit when ...
-- signal, signal_start_of_episode, void (void), emit whe...
-- signal, signal_finish_episode, void (void), emit when ...
-- signal, signal_end_of_episode, void (void), emit when ...
-- signal, signal_start_of_step, void (void), emit at the...
-- signal, signal_end_of_step, void (void), emit at the e...
-- signal, signal_reward, void (const TSingleReward &), e...
-- out, out_state_set_size, const TInt&, (void), output t...
-- out, out_action_set_size, const TInt&, (void), output ...
-- out, out_state, const TInt&, (void), output the curren...
-- out, out_time, const TReal&, (void), output the curren...
-- Note: some signal ports will not be used, but, defined...
-- In order to add the ports, follow the steps:
++ Add declarations:
#codeh(cpp){{
MAKE_SLOT_PORT(slot_start, void, (void), (), TThis);
MAKE_SLOT_PORT(slot_execute_action, void, (const TInt &...
MAKE_SIGNAL_PORT(signal_initialization, void (void), TT...
MAKE_SIGNAL_PORT(signal_start_of_episode, void (void), ...
MAKE_SIGNAL_PORT(signal_finish_episode, void (void), TT...
MAKE_SIGNAL_PORT(signal_end_of_episode, void (void), TT...
MAKE_SIGNAL_PORT(signal_start_of_step, void (void), TTh...
MAKE_SIGNAL_PORT(signal_end_of_step, void (void), TThis);
MAKE_SIGNAL_PORT(signal_reward, void (const TSingleRewa...
MAKE_OUT_PORT(out_state_set_size, const TInt&, (void), ...
MAKE_OUT_PORT(out_action_set_size, const TInt&, (void),...
MAKE_OUT_PORT(out_state, const TInt&, (void), (), TThis);
MAKE_OUT_PORT(out_time, const TReal&, (void), (), TThis);
}}
++ Add initializers at the constructor:
#codeh(cpp){{
MMazeTaskModule (const std::string &v_instance_name)
: ...
slot_start (*this),
slot_execute_action (*this),
signal_initialization (*this),
signal_start_of_episode (*this),
signal_finish_episode (*this),
signal_end_of_episode (*this),
signal_start_of_step (*this),
signal_end_of_step (*this),
signal_reward (*this),
out_state_set_size (*this),
out_action_set_size (*this),
out_state (*this),
out_time (*this)
}}
++ Add register functions at the constructor:
#codeh(cpp){{
add_slot_port (slot_start );
add_slot_port (slot_execute_action );
add_signal_port (signal_initialization );
add_signal_port (signal_start_of_episode );
add_signal_port (signal_finish_episode );
add_signal_port (signal_end_of_episode );
add_signal_port (signal_start_of_step );
add_signal_port (signal_end_of_step );
add_signal_port (signal_reward );
add_out_port (out_state_set_size );
add_out_port (out_action_set_size );
add_out_port (out_state );
add_out_port (out_time );
}}
+ Next, we implement the slot port callbacks and the outp...
++ Add member variables at the protected section.
#codeh(cpp){{
mutable int state_set_size_;
const int action_set_size_;
int current_action_;
int pos_x_, pos_y_;
mutable int tmp_state_;
TReal current_time_;
TInt num_episode_;
}}
++ Add their initializers:
#codeh(cpp){{
state_set_size_ (0),
action_set_size_ (4),
current_action_ (0),
}}
++ Implement slot_start_exec. This is a long code, so, w...
#codeh(cpp){{
virtual void slot_start_exec (void);
}}
Then, define it outside the class:
#codeh(cpp){{
/*virtual*/void MMazeTaskModule::slot_start_exec (void)
{
init_environment();
signal_initialization.ExecAll();
for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++n...
{
init_environment();
signal_start_of_episode.ExecAll();
bool running(true);
while(running)
{
signal_start_of_step.ExecAll();
running= step_environment();
show_environment();
usleep(conf_.SleepUTime);
if(current_time_>=conf_.MaxSteps)
{
signal_finish_episode.ExecAll();
running= false;
}
signal_end_of_step.ExecAll();
}
signal_end_of_episode.ExecAll();
}
}
}}
where we used the three member functions. These are decl...
#codeh(cpp){{
void init_environment (void);
bool step_environment (void);
void show_environment (void);
}}
and, defined outside the class:
#codeh(cpp){{
void MMazeTaskModule::init_environment (void)
{
pos_x_= conf_.StartX;
pos_y_= conf_.StartY;
current_time_= 0.0l;
}
}}
#codeh(cpp){{
bool MMazeTaskModule::step_environment (void)
{
int next_x(pos_x_), next_y(pos_y_);
switch(current_action_)
{
case 0: ++next_x; break; // right
case 1: --next_y; break; // up
case 2: --next_x; break; // left
case 3: ++next_y; break; // down
default: LERROR("invalid action:"<<current_action_);
}
++current_time_;
signal_reward.ExecAll(conf_.StepCost);
switch(conf_.Map[next_y][next_x])
{
case 0: // free space
pos_x_=next_x;
pos_y_=next_y;
break;
case 1: // wall
break;
case 2: // goal
pos_x_=next_x;
pos_y_=next_y;
signal_reward.ExecAll(conf_.GoalReward);
signal_finish_episode.ExecAll();
return false;
default: LERROR("invalid map element: "<<conf_.Map[next...
}
return true;
}
}}
#codeh(cpp){{
void MMazeTaskModule::show_environment (void)
{
int x(0),y(0);
std::cout<<"("<<pos_x_<<","<<pos_y_<<") "<<current_tim...
for(std::vector<std::vector<int> >::const_iterator yitr...
{
x=0;
for(std::vector<int>::const_iterator xitr(yitr->begin...
{
std::cout<<" ";
if(x==pos_x_ && y==pos_y_)
std::cout<<"R";
else if(x==conf_.StartX && y==conf_.StartY)
std::cout<<"S";
else
switch(*xitr)
{
case 0: std::cout<<" "; break;
case 1: std::cout<<"#"; break;
case 2: std::cout<<"G"; break;
default: std::cout<<"?"; break;
}
}
std::cout<<" "<<std::endl;
}
std::cout<<std::endl;
}
}}
++ Implement the other slot port callbacks and output fun...
#codeh(cpp){{
virtual void slot_execute_action_exec (const TInt &a)
{
current_action_= a;
}
virtual const TInt& out_state_set_size_get (void) const
{
state_set_size_= conf_.Map[0].size() * conf_.Map.size...
return state_set_size_;
}
virtual const TInt& out_action_set_size_get (void) const
{
return action_set_size_;
}
virtual const TInt& out_state_get (void) const
{
return tmp_state_=serialize(pos_x_,pos_y_);
}
virtual const TReal& out_time_get (void) const
{
return current_time_;
}
}}
where serialize is a protected member function defined as...
#codeh(cpp){{
int serialize (int x, int y) const
{
return y * conf_.Map[0].size() + x;
}
}}
+ Add a Start() public member function that calls slot_st...
#codeh(cpp){{
void Start()
{
slot_start.Exec();
}
}}
+ Finally, use SKYAI_ADD_MODULE macro to register the mod...
#codeh(cpp){{
SKYAI_ADD_MODULE(MMazeTaskModule)
}}
This should be written outside the class and inside the n...
That's it.
* Random Action Module [#x0a9632f]
Next, in order to test the MMazeTaskModule module, we mak...
MRandomActionModule has two ports:
- (port type), (port name), (return type), (parameter lis...
- slot, slot_step, void, (void), called at each step wher...
- signal, signal_action, void (const TInt &), emit at eac...
Thus, its implementation is very simple:
#codeh(cpp){{
//=======================================================...
//!\brief Random action module
class MRandomActionModule
: public TModuleInterface
//=======================================================...
{
public:
typedef TModuleInterface TParent;
typedef MRandomActionModule TThis;
SKYAI_MODULE_NAMES(MRandomActionModule)
MRandomActionModule (const std::string &v_instance_name)
: TParent (v_instance_name),
slot_step (*this),
signal_action (*this)
{
add_slot_port (slot_step );
add_signal_port (signal_action);
}
protected:
MAKE_SLOT_PORT(slot_step, void, (void), (), TThis);
MAKE_SIGNAL_PORT(signal_action, void (const TInt &), TT...
virtual void slot_step_exec (void)
{
signal_action.ExecAll(rand() % 4);
}
}; // end of MRandomActionModule
//-------------------------------------------------------...
}}
Then, use SKYAI_ADD_MODULE macro to register the module o...
#codeh(cpp){{
SKYAI_ADD_MODULE(MRandomActionModule)
}}
* Main Function [#ddf2c0fe]
Refer to [[../Tutorial - Making Executable]].
Our main function is as follows:
#codeh(cpp){{
using namespace std;
using namespace loco_rabbits;
int main(int argc, char**argv)
{
TOptionParser option(argc,argv);
TAgent agent;
if (!ParseCmdLineOption (agent, option)) return 0;
MMazeTaskModule *p_maze_task = dynamic_cast<MMazeTaskMo...
if(p_maze_task==NULL) {LERROR("module `maze_task' is n...
agent.SaveToFile (agent.GetDataFileName("before.agent")...
p_maze_task->Start();
agent.SaveToFile (agent.GetDataFileName("after.agent"),...
return 0;
}
}}
This main function consists of the following parts:
+ Create an instance of the TAgent class.
+ Parse the command line option and load an agent script.
+ Get a module named maze_task which is an instance of MM...
+ Save the agent status into a file named before.agent.
+ Execute the maze_task's Start function.
+ Save the agent status into a file named after.agent.
* Compile [#ga57fd4c]
First, write a makefile as follows:
#codeh(makefile){{
BASE_REL_DIR:=../..
include $(BASE_REL_DIR)/Makefile_preconf
EXEC := maze.out
OBJS := maze.o
USING_SKYAI_ODE:=true
MAKING_SKYAI:=true
include $(BASE_REL_DIR)/Makefile_body
}}
- BASE_REL_DIR : relative path to the base directory of t...
Then, execute the make command:
#codeh(sh){{
make
}}
An executable named maze.out is generated?
* Agent Script for Random Action Test [#vdba6662]
Please refer to [[../Tutorial - Writing Agent Script]].
Now, let's test MMazeTaskModule using MRandomActionModule.
+ Create a blank file named random_act.agent and open it.
+ Instantiate each module; the MMazeTaskModule's instance...
#codeh(cpp){{
module MMazeTaskModule maze_task
module MRandomActionModule rand_action
}}
+ Connect the following port pairs:
-- maze_task.signal_start_of_step --> rand_action.slot_step
-- rand_action.signal_action --> maze_task.slot_execute_a...
#codeh(cpp){{
connect maze_task.signal_start_of_step , rand_action.slo...
connect rand_action.signal_action , maze_task.slot_execu...
}}
+ Assign the maze information to the configuration parame...
#codeh(cpp){{
maze_task.config={
Map={
[]= (1,1,1,1,1,1,1,1,1,1)
[]= (1,0,0,0,1,0,0,0,2,1)
[]= (1,0,1,0,1,0,0,0,0,1)
[]= (1,0,1,0,1,1,0,0,0,1)
[]= (1,0,1,0,0,1,0,1,1,1)
[]= (1,0,0,0,0,1,0,0,0,1)
[]= (1,0,0,0,0,0,0,0,0,1)
[]= (1,1,1,1,1,1,1,1,1,1)
}
StartX= 1
StartY= 3
}
}}
That's it. Let's test!
Launch the executable as follows:
#codeh(sh){{
./maze.out -agent random_act
}}
You will see a maze as follows where the robot (R) moves ...
(1,5) 77/4
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# R # #
# #
# # # # # # # # # #
* Agent Script for Q(lambda)-learning [#ef68204c]
Please refer to [[../Tutorial - Writing Agent Script]].
If you can make sure that MMazeTaskModule works correctly...
+ Create a blank file named ql.agent and open it.
+ Include ql_dsda where a composite Q-learning module is ...
#codeh(cpp){{
include_once "ql_dsda"
}}
+ Instantiate the modules; the MMazeTaskModule's instance...
#codeh(cpp){{
module MMazeTaskModule maze_task
module MTDDiscStateAct behavior
}}
+ Connect the port pairs:
#codeh(cpp){{
/// initialization process:
connect maze_task.signal_initialization , behavior...
/// start of episode process:
connect maze_task.signal_start_of_episode , behavior...
/// learning signals:
connect behavior.signal_execute_action , maze_tas...
connect maze_task.signal_end_of_step , behavior...
connect maze_task.signal_reward , behavior...
connect maze_task.signal_finish_episode , behavior...
/// I/O:
connect maze_task.out_action_set_size , behavior...
connect maze_task.out_state_set_size , behavior...
connect maze_task.out_state , behavior...
connect maze_task.out_time , behavior...
}}
+ Assign the maze information to the configuration parame...
#codeh(cpp){{
maze_task.config={
Map={
[]= (1,1,1,1,1,1,1,1,1,1)
[]= (1,0,0,0,1,0,0,0,2,1)
[]= (1,0,1,0,1,0,0,0,0,1)
[]= (1,0,1,0,1,1,0,0,0,1)
[]= (1,0,1,0,0,1,0,1,1,1)
[]= (1,0,0,0,0,1,0,0,0,1)
[]= (1,0,0,0,0,0,0,0,0,1)
[]= (1,1,1,1,1,1,1,1,1,1)
}
StartX= 1
StartY= 3
}
}}
+ Assign the learning configuration to the parameters of ...
#codeh(cpp){{
behavior.config={
UsingEligibilityTrace = true
UsingReplacingTrace = true
Lambda = 0.9
GradientMax = 1.0e+100
ActionSelection = "asBoltzman"
PolicyImprovement = "piExpReduction"
Tau = 1
TauDecreasingFactor = 0.05
TraceMax = 1.0
Gamma = 0.9
Alpha = 0.3
AlphaDecreasingFactor = 0.002
AlphaMin = 0.05
}
}}
Launch the executable as follows:
#codeh(sh){{
./maze.out -path ../../benchmarks/cmn -agent ql -outdir r...
}}
where ../../benchmarks/cmn is a relative path of the benc...
After several tens of episodes, the policy will converge ...
#block
(1,4) 1/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# R # # # # #
# # #
# #
# # # # # # # # # #
#block(next)
(3,6) 5/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# # #
# R #
# # # # # # # # # #
#block(next)
(6,6) 8/520
# # # # # # # # # #
# # G #
# # # #
# S # # # #
# # # # # #
# # #
# R #
# # # # # # # # # #
#block(end)
#block
(7,3) 12/520
# # # # # # # # # #
# # G #
# # # #
# S # # # R #
# # # # # #
# # #
# #
# # # # # # # # # #
#block(next)
(8,1) 15/520
# # # # # # # # # #
# # R #
# # # #
# S # # # #
# # # # # #
# # #
# #
# # # # # # # # # #
#block(next)
&nb...
&nb...
&nb...
#block(end)
In order to store the learning logs, make a directory res...
Plotting log-eps-ret.dat, you will obtain a learning curve:
#ref(./out-maze.png,zoom,center,600x0)
CENTER:''Example of a learning curve.''
* Exercise [#h5477405]
+ Execute several runs (e.g. 10 runs) and plot their mean...
+ Test another parameters, algorithms, and maze kinds.
+ Extend the maze; e.g. include a trap, wind, etc.
+ Add noise at each step. The noise parameter should be ...
+ Change the agent script to log the state transition in ...
-- Check the logger modules MSimpleDataLogger1_T, MSimple...
Page: