''Table of Contents'' #contents * Overview [#xfb9bf26] Here, we introduce how to implement a simple maze task with SkyAI. The maze task has a discrete state and a discrete action, which will be implemented as a module of SkyAI. As an reinforcement learning algorithm, Peng's Q(lambda)-learning is applied to the maze task; of course, we use predefined modules. The following is the procedure: + Implement a maze task module. + Implement a random action module for testing the task module. + Implement a main function. + Compile. + Write an agent script for the random action test. + Write an agent script to apply Q(lambda)-learning. The sample code works on a console; no extra libraries are required. Let's start! * Maze Task Module [#g1d7df27] Please refer to [[../Tutorial - Making Module]]. + Make a C++ source file using a template materials/templates/apps/main_tmpl.cpp contained in the SkyAI directory. -- You can modify the file information (file name, brief, author, date, copyright, license info, etc.) -- Replace every NAME_SPACE by loco_rabbits. -- Write the following code inside the namespace loco_rabbits. + Make a configure class using the template TXxConfigurations written in [[../Tutorial - Making Module]]. -- Replace every TXxConfigurations by TMazeTaskConfigurations. -- Remove the TestC parameter and add the following parameters: #codeh(cpp){{ int NumEpisodes; // number of episodes int MaxSteps; // number of max action steps per episode int StartX, StartY; // start position double GoalReward; // goal reward double StepCost; // cost for each action step int SleepUTime; // duration for display std::vector<std::vector<int> > Map; // Map[y][x], 0:free space, 1:wall, 2:goal, every element should have the same size }} -- Initialize them at the constructor as: #codeh(cpp){{ TMazeTaskConfigurations (var_space::TVariableMap &mmap) : NumEpisodes (1000), MaxSteps (1000), StartX (1), StartY (1), GoalReward (1.0), StepCost (-0.01), SleepUTime (1000) { Register(mmap); } }} -- In the member function Register, insert them: #codeh(cpp){{ ADD( NumEpisodes ); ADD( StartX ); ADD( StartY ); ADD( GoalReward ); ADD( StepCost ); ADD( SleepUTime ); ADD( Map ); }} -- Add lora/variable_space_impl.h in the include list. #codeh(cpp){{ #include <lora/variable_space_impl.h> // to store std::vector<TIntVector> }} -- You can add your own parameters such as a noise. + Make the base of the module using the template MXxModule written in [[../Tutorial - Making Module]]. -- Simple template is OK. -- Replace every MXxModule by MMazeEnvModule. -- Replace every MParentModule by TModuleInterface. -- Replace TXxConfigurations by TMazeTaskConfigurations. -- Remove the definition of mem_ (TXxMemory mem_;). #codeh(cpp){{ //=========================================================================================== //!\brief Maze task (environment+task) module class MMazeTaskModule : public TModuleInterface //=========================================================================================== { public: typedef TModuleInterface TParent; typedef MMazeTaskModule TThis; SKYAI_MODULE_NAMES(MMazeTaskModule) MMazeTaskModule (const std::string &v_instance_name) : TParent (v_instance_name), conf_ (TParent::param_box_config_map()) { } protected: TMazeTaskConfigurations conf_; }; // end of MMazeTaskModule //------------------------------------------------------------------------------------------- }} + Add following ports into MMazeEnvModule. -- (port type), (port name), (return type), (parameter list), (purpose) -- slot, slot_start, void, (void), called at the beginning of the execution. -- slot, slot_execute_action, void, (const TInt &a), called by an RL agent module to execute action. -- signal, signal_initialization, void (void), emit when the module is initialized. -- signal, signal_start_of_episode, void (void), emit when each episode starts. -- signal, signal_finish_episode, void (void), emit when the end-of-episode condition is satisfied. -- signal, signal_end_of_episode, void (void), emit when each episode is terminated. -- signal, signal_start_of_step, void (void), emit at the start of each step. -- signal, signal_end_of_step, void (void), emit at the end of each step. -- signal, signal_reward, void (const TSingleReward &), emit when a reward is given. -- out, out_state_set_size, const TInt&, (void), output the number of elements in the state set. -- out, out_action_set_size, const TInt&, (void), output the number of elements in the action set. -- out, out_state, const TInt&, (void), output the current state (x,y are serialized). -- out, out_time, const TReal&, (void), output the current time. -- Note: some signal ports will not be used, but, defined for later use. -- In order to add the ports, follow the steps: ++ Add declarations: #codeh(cpp){{ MAKE_SLOT_PORT(slot_start, void, (void), (), TThis); MAKE_SLOT_PORT(slot_execute_action, void, (const TInt &a), (a), TThis); MAKE_SIGNAL_PORT(signal_initialization, void (void), TThis); MAKE_SIGNAL_PORT(signal_start_of_episode, void (void), TThis); MAKE_SIGNAL_PORT(signal_finish_episode, void (void), TThis); MAKE_SIGNAL_PORT(signal_end_of_episode, void (void), TThis); MAKE_SIGNAL_PORT(signal_start_of_step, void (void), TThis); MAKE_SIGNAL_PORT(signal_end_of_step, void (void), TThis); MAKE_SIGNAL_PORT(signal_reward, void (const TSingleReward &), TThis); MAKE_OUT_PORT(out_state_set_size, const TInt&, (void), (), TThis); MAKE_OUT_PORT(out_action_set_size, const TInt&, (void), (), TThis); MAKE_OUT_PORT(out_state, const TInt&, (void), (), TThis); MAKE_OUT_PORT(out_time, const TReal&, (void), (), TThis); }} ++ Add initializers at the constructor: #codeh(cpp){{ MMazeTaskModule (const std::string &v_instance_name) : ... slot_start (*this), slot_execute_action (*this), signal_initialization (*this), signal_start_of_episode (*this), signal_finish_episode (*this), signal_end_of_episode (*this), signal_start_of_step (*this), signal_end_of_step (*this), signal_reward (*this), out_state_set_size (*this), out_action_set_size (*this), out_state (*this), out_time (*this) }} ++ Add register functions at the constructor: #codeh(cpp){{ add_slot_port (slot_start ); add_slot_port (slot_execute_action ); add_signal_port (signal_initialization ); add_signal_port (signal_start_of_episode ); add_signal_port (signal_finish_episode ); add_signal_port (signal_end_of_episode ); add_signal_port (signal_start_of_step ); add_signal_port (signal_end_of_step ); add_signal_port (signal_reward ); add_out_port (out_state_set_size ); add_out_port (out_action_set_size ); add_out_port (out_state ); add_out_port (out_time ); }} + Next, we implement the slot port callbacks and the output functions. This procedure is slightly complicated; follow one by one. ++ Add member variables at the protected section. #codeh(cpp){{ mutable int state_set_size_; const int action_set_size_; int current_action_; int pos_x_, pos_y_; mutable int tmp_state_; TReal current_time_; TInt num_episode_; }} ++ Add their initializers: #codeh(cpp){{ state_set_size_ (0), action_set_size_ (4), current_action_ (0), }} ++ Implement slot_start_exec. This is a long code, so, write the declaration at the protected section: #codeh(cpp){{ virtual void slot_start_exec (void); }} Then, define it outside the class: #codeh(cpp){{ /*virtual*/void MMazeTaskModule::slot_start_exec (void) { init_environment(); signal_initialization.ExecAll(); for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++num_episode_) { init_environment(); signal_start_of_episode.ExecAll(); bool running(true); while(running) { signal_start_of_step.ExecAll(); running= step_environment(); show_environment(); usleep(conf_.SleepUTime); if(current_time_>=conf_.MaxSteps) { signal_finish_episode.ExecAll(); running= false; } signal_end_of_step.ExecAll(); } signal_end_of_episode.ExecAll(); } } }} where we used the three member functions. These are declared at the protected section: #codeh(cpp){{ void init_environment (void); bool step_environment (void); void show_environment (void); }} and, defined outside the class: #codeh(cpp){{ void MMazeTaskModule::init_environment (void) { pos_x_= conf_.StartX; pos_y_= conf_.StartY; current_time_= 0.0l; } }} #codeh(cpp){{ bool MMazeTaskModule::step_environment (void) { int next_x(pos_x_), next_y(pos_y_); switch(current_action_) { case 0: ++next_x; break; // right case 1: --next_y; break; // up case 2: --next_x; break; // left case 3: ++next_y; break; // down default: LERROR("invalid action:"<<current_action_); } ++current_time_; signal_reward.ExecAll(conf_.StepCost); switch(conf_.Map[next_y][next_x]) { case 0: // free space pos_x_=next_x; pos_y_=next_y; break; case 1: // wall break; case 2: // goal pos_x_=next_x; pos_y_=next_y; signal_reward.ExecAll(conf_.GoalReward); signal_finish_episode.ExecAll(); return false; default: LERROR("invalid map element: "<<conf_.Map[next_y][next_x]); } return true; } }} #codeh(cpp){{ void MMazeTaskModule::show_environment (void) { int x(0),y(0); std::cout<<"("<<pos_x_<<","<<pos_y_<<") "<<current_time_<<"/"<<num_episode_<<std::endl; for(std::vector<std::vector<int> >::const_iterator yitr(conf_.Map.begin()),ylast(conf_.Map.end());yitr!=ylast;++yitr,++y) { x=0; for(std::vector<int>::const_iterator xitr(yitr->begin()),xlast(yitr->end());xitr!=xlast;++xitr,++x) { std::cout<<" "; if(x==pos_x_ && y==pos_y_) std::cout<<"R"; else if(x==conf_.StartX && y==conf_.StartY) std::cout<<"S"; else switch(*xitr) { case 0: std::cout<<" "; break; case 1: std::cout<<"#"; break; case 2: std::cout<<"G"; break; default: std::cout<<"?"; break; } } std::cout<<" "<<std::endl; } std::cout<<std::endl; } }} ++ Implement the other slot port callbacks and output functions. These are short code, so, you can write inside the class at the protected section. #codeh(cpp){{ virtual void slot_execute_action_exec (const TInt &a) { current_action_= a; } virtual const TInt& out_state_set_size_get (void) const { state_set_size_= conf_.Map[0].size() * conf_.Map.size(); return state_set_size_; } virtual const TInt& out_action_set_size_get (void) const { return action_set_size_; } virtual const TInt& out_state_get (void) const { return tmp_state_=serialize(pos_x_,pos_y_); } virtual const TReal& out_time_get (void) const { return current_time_; } }} where serialize is a protected member function defined as follows: #codeh(cpp){{ int serialize (int x, int y) const { return y * conf_.Map[0].size() + x; } }} + Finally, use SKYAI_ADD_MODULE macro to register the module on SkyAI: #codeh(cpp){{ SKYAI_ADD_MODULE(MMazeTaskModule) }} That's it. * Random Action Module [#x0a9632f] Next, in order to test the MMazeTaskModule module, we make a module named MRandomActionModule that emits a random action at each step. MRandomActionModule has two ports: - (port type), (port name), (return type), (parameter list), (purpose) - slot, slot_step, void, (void), called at each step where a random action is emitted through the signal_action port. - signal, signal_action, void (const TInt &), emit at each step. Thus, its implementation is very simple: #codeh(cpp){{ //=========================================================================================== //!\brief Random action module class MRandomActionModule : public TModuleInterface //=========================================================================================== { public: typedef TModuleInterface TParent; typedef MRandomActionModule TThis; SKYAI_MODULE_NAMES(MRandomActionModule) MRandomActionModule (const std::string &v_instance_name) : TParent (v_instance_name), slot_step (*this), signal_action (*this) { add_slot_port (slot_step ); add_signal_port (signal_action); } protected: MAKE_SLOT_PORT(slot_step, void, (void), (), TThis); MAKE_SIGNAL_PORT(signal_action, void (const TInt &), TThis); virtual void slot_step_exec (void) { signal_action.ExecAll(rand() % 4); } }; // end of MRandomActionModule //------------------------------------------------------------------------------------------- }} Then, use SKYAI_ADD_MODULE macro to register the module on SkyAI: #codeh(cpp){{ SKYAI_ADD_MODULE(MRandomActionModule) }} * Main Function [#ddf2c0fe] Refer to [[../Tutorial - Making Executable]]. Our main function is as follows: #codeh(cpp){{ using namespace std; using namespace loco_rabbits; int main(int argc, char**argv) { TOptionParser option(argc,argv); TAgent agent; if (!ParseCmdLineOption (agent, option)) return 0; MMazeTaskModule *p_maze_task = dynamic_cast<MMazeTaskModule*>(agent.SearchModule("maze_task")); if(p_maze_task==NULL) {LERROR("module `maze_task' is not defined as an instance of MMazeTaskModule"); return 1;} agent.SaveToFile (agent.GetDataFileName("before.agent"),"before-"); p_maze_task->Start(); agent.SaveToFile (agent.GetDataFileName("after.agent"),"after-"); return 0; } }} This main function consists of the following parts: + Create an instance of the TAgent class. + Parse the command line option and load an agent script. + Get a module named maze_task which is an instance of MMazeTaskModule. + Save the agent status into a file named before.agent. + Execute the maze_task's Start function. + Save the agent status into a file named after.agent. * Compile [#ga57fd4c] First, write a makefile as follows: #codeh(makefile){{ BASE_REL_DIR:=../.. include $(BASE_REL_DIR)/Makefile_preconf EXEC := maze.out OBJS := maze.o USING_SKYAI_ODE:=true MAKING_SKYAI:=true include $(BASE_REL_DIR)/Makefile_body }} - BASE_REL_DIR : relative path to the base directory of the SkyAI. Then, execute the make command: #codeh(sh){{ make }} An executable named maze.out is generated? * Agent Script for Random Action Test [#vdba6662] Now, let's test MMazeTaskModule using MRandomActionModule. + Create a blank file named random_act.agent and open it. + Instantiate each module; the MMazeTaskModule's instance should have the name maze_task: #codeh(cpp){{ module MMazeTaskModule maze_task module MRandomActionModule rand_action }} + Connect the following port pairs: -- maze_task.signal_start_of_step --> rand_action.slot_step -- rand_action.signal_action --> maze_task.slot_execute_action #codeh(cpp){{ connect maze_task.signal_start_of_step , rand_action.slot_step connect rand_action.signal_action , maze_task.slot_execute_action }} + Assign the maze information to the configuration parameters of maze_task: #codeh(cpp){{ maze_task.config={ Map={ []= (1,1,1,1,1,1,1,1,1,1) []= (1,0,0,0,1,0,0,0,2,1) []= (1,0,1,0,1,0,0,0,0,1) []= (1,0,1,0,1,1,0,0,0,1) []= (1,0,1,0,0,1,0,1,1,1) []= (1,0,0,0,0,1,0,0,0,1) []= (1,0,0,0,0,0,0,0,0,1) []= (1,1,1,1,1,1,1,1,1,1) } StartX= 1 StartY= 3 } }} That's it. Let's test! Launch the executable as follows: #codeh(sh){{ ./maze.out -agent random_act }} You will see a maze as follows where the robot (R) moves randomly. (1,5) 77/4 # # # # # # # # # # # # G # # # # # # S # # # # # # # # # # # R # # # # # # # # # # # # # # * Agent Script for Q(lambda)-learning [#ef68204c] If you can make sure that MMazeTaskModule works correctly, then, let's apply a Q-learning module. + Create a blank file named ql.agent and open it. + Include ql_dsda where a composite Q-learning module is defined: #codeh(cpp){{ include_once "ql_dsda" }} + Instantiate the modules; the MMazeTaskModule's instance should have the name maze_task: #codeh(cpp){{ module MMazeTaskModule maze_task module MTDDiscStateAct behavior }} + Connect the port pairs: #codeh(cpp){{ /// initialization process: connect maze_task.signal_initialization , behavior.slot_initialize /// start of episode process: connect maze_task.signal_start_of_episode , behavior.slot_start_episode /// learning signals: connect behavior.signal_execute_action , maze_task.slot_execute_action connect maze_task.signal_end_of_step , behavior.slot_finish_action connect maze_task.signal_reward , behavior.slot_add_to_reward connect maze_task.signal_finish_episode , behavior.slot_finish_episode_immediately /// I/O: connect maze_task.out_action_set_size , behavior.in_action_set_size connect maze_task.out_state_set_size , behavior.in_state_set_size connect maze_task.out_state , behavior.in_state connect maze_task.out_time , behavior.in_cont_time }} + Assign the maze information to the configuration parameters of maze_task: #codeh(cpp){{ maze_task.config={ Map={ []= (1,1,1,1,1,1,1,1,1,1) []= (1,0,0,0,1,0,0,0,2,1) []= (1,0,1,0,1,0,0,0,0,1) []= (1,0,1,0,1,1,0,0,0,1) []= (1,0,1,0,0,1,0,1,1,1) []= (1,0,0,0,0,1,0,0,0,1) []= (1,0,0,0,0,0,0,0,0,1) []= (1,1,1,1,1,1,1,1,1,1) } StartX= 1 StartY= 3 } }} + Assign the learning configuration to the parameters of behavior: #codeh(cpp){{ behavior.config={ UsingEligibilityTrace = true UsingReplacingTrace = true Lambda = 0.9 GradientMax = 1.0e+100 ActionSelection = "asBoltzman" PolicyImprovement = "piExpReduction" Tau = 1 TauDecreasingFactor = 0.05 TraceMax = 1.0 Gamma = 0.9 Alpha = 0.3 AlphaDecreasingFactor = 0.002 AlphaMin = 0.05 } }} Launch the executable as follows: #codeh(sh){{ ./maze.out -path ../../benchmarks/cmn -agent ql -outdir result/rl1 }} where ../../benchmarks/cmn is a relative path of the benchmarks/cmn directory; modify it for your environment. After several tens of episodes, the policy will converge to a path: (1,4) 1/520 # # # # # # # # # # # # G # # # # # # S # # # # # R # # # # # # # # # # # # # # # # # # # # (3,6) 5/520 # # # # # # # # # # # # G # # # # # # S # # # # # # # # # # # # # # R # # # # # # # # # # # (6,6) 8/520 # # # # # # # # # # # # G # # # # # # S # # # # # # # # # # # # # # R # # # # # # # # # # # (7,3) 12/520 # # # # # # # # # # # # G # # # # # # S # # # R # # # # # # # # # # # # # # # # # # # # # # (8,1) 15/520 # # # # # # # # # # # # R # # # # # # S # # # # # # # # # # # # # # # # # # # # # # # # # In order to store the learning logs, make a directory result/rl1 which is specified with -outdir option. Plotting log-eps-ret.dat, you will obtain a learning curve: #ref(./out-maze.png,zoom,center,600x0) CENTER:''Example of a learning curve.''