Tutorial - Example - Mountain Car
SkyAI
/
Documentation
/ Tutorial - Example - Mountain Car
-- Use page as template --
Demonstrations
Developers
Developers/akihikoy
Documentation
Documentation/Architecture Overview
Documentation/Installation Guide
Documentation/Installation Guide/Debian and Ubuntu
Documentation/Installation Guide/Mac
Documentation/Introduction
Documentation/Keywords
Documentation/Modular Architecture
Documentation/Running Demos
Documentation/Running Demos/Common Usage
Documentation/Running Demos/bioloid
Documentation/Running Demos/humanoid01
Documentation/Running Demos/maze2d
Documentation/Script Language
Documentation/Tutorial - Example - Maze
Documentation/Tutorial - Example - Mountain Car
Documentation/Tutorial - Making Executable
Documentation/Tutorial - Making Module
Documentation/Tutorial - Making Original Domain
Documentation/Tutorial - Making RL Module
Documentation/Tutorial - Writing Agent Script
FormattingRules
Help
InterWikiName
License
MenuBar
Recent Changes
Recent Changes/0.2.0
RecentDeleted
SandBox
SkyAI
''Table of Contents'' #contents * Overview [#n9051ce0] Here, we introduce how to implement a simple ''mountain-car'' task with SkyAI. The mountain-car task has a continuous state and a continuous action, which will be implemented as a module of SkyAI. In this tutorial, we discretize the action space; thus, this tutorial is an example of continuous state/discrete action problem. As an reinforcement learning algorithm, Peng's Q(lambda)-learning is applied to the mountain-car task; of course, we use predefined modules. In order to approximate the action value function over the continuous state space, we employ the ''normalized Gaussian network (NGnet)''. The following is the procedure: + Implement a mountain-car task module. + Implement a random action module for testing the task module. + Implement a main function. + Compile. + Write an agent script for the random action test. + Generate NGnet. + Write an agent script to apply Q(lambda)-learning with NGnet. The remarkable differences from the [[maze task>../Tutorial - Example - Maze]] are generating NGnet and using it in Q(lambda)-learning. The sample code works on a console; no extra libraries are required. * Task Setup [#ob140f6f] In the mountain-car environment, there is a mountain, a car, and a goal. #ref(./mountaincar.png,center,zoom,400x0) The objective of this task is to go from the start ('''x'''=-0.5) to the goal ('''x'''>=0.6). The car can accelerate, but does not have enough power to go beyond the mountain. Thus, the car needs to climb the opposite side, then climb toward the goal by using the kickback. The dynamics of the mountain is given as follows: > &mimetex( \dot{x}_{t+1} = \dot{x}_{t} + \bigl(-9.8m\cos(3x_{t}) \frac{a_t}{m} - k\dot{x}_{t}\bigr)\Delta{}t );, > &mimetex( x_{t+1} = x_{t} + \dot{x}_{t+1} \Delta{}t );, where '''m''' denotes the mass of the car (0.2), '''k''' denotes the friction factor (0.3), &mimetex( \Delta{}t ); denotes the time step (0.01), and '''a''' denotes the acceleration of the car. The robot cannot go into '''x'''<=-1.2 where is a wall. In the beginning of each episode, the car is stationary at '''x'''=-0.5. Each episode ends when the car reaches the goal ('''x'''>=0.6) or the amount of time becomes greater than a threshold (100). The ''state'' is a 2-dimensional vector &mimetex( x, \dot{x} );. The ''action'' is an acceleration '''a''' chosen from a discrete set {-0.2, 0, 0.2} which is not enough to go beyond the mountain. The ''reward'' is given by: > &mimetex( 0.1 \bigl(\frac{1}{1 + (0.6-x)^2} - 1\bigr) );. * MountainCar Task Module [#tf858bc6] Please refer to [[../Tutorial - Making Module]]. + Make a C++ source file named mountain_car.cpp using a template materials/templates/apps/main_tmpl.cpp contained in the SkyAI directory. -- You can modify the file information (file name, brief, author, date, copyright, license info, etc.) -- Replace every NAME_SPACE by loco_rabbits. -- Write the following code inside the namespace loco_rabbits. + Make a configure class using the template TXxConfigurations written in [[../Tutorial - Making Module]]. -- Replace every TXxConfigurations by TMountainCarTaskConfigurations. -- Remove the TestC parameter and add the following parameters: #codeh(cpp){{ int NumEpisodes; // number of episodes double TimeStep; // time-step double MaxTime; // max time per episode (task is terminated after this) double Gravity; // gravity of the environment double Mass; // mass of the car double Fric; // friction factor int DispWidth; // width for displaying the environment on the console int DispHeight; // height for displaying the environment on the console int SleepUTime; // duration for display }} -- Initialize them at the constructor as: #codeh(cpp){{ TMountainCarTaskConfigurations (var_space::TVariableMap &mmap) : NumEpisodes (200), TimeStep (0.01), MaxTime (100.0), Gravity (9.8), Mass (0.2), Fric (0.3), DispWidth (40), DispHeight (15), SleepUTime (1000) { Register(mmap); } }} -- In the member function Register, insert them: #codeh(cpp){{ ADD( NumEpisodes ); ADD( TimeStep ); ADD( MaxTime ); ADD( Gravity ); ADD( Mass ); ADD( Fric ); ADD( DispWidth ); ADD( DispHeight ); ADD( SleepUTime ); }} -- You can add your own parameters such as a noise. + Make the base of the module using the template MXxModule written in [[../Tutorial - Making Module]]. -- Simple template is OK. -- Replace every MXxModule by MMountainCarTaskModule. -- Replace every MParentModule by TModuleInterface. -- Replace TXxConfigurations by TMountainCarTaskConfigurations. -- Remove the definition of mem_ (TXxMemory mem_;). #codeh(cpp){{ //=========================================================================================== //!\brief Mountain Car task (environment+task) module class MMountainCarTaskModule : public TModuleInterface //=========================================================================================== { public: typedef TModuleInterface TParent; typedef MMountainCarTaskModule TThis; SKYAI_MODULE_NAMES(MMountainCarTaskModule) MMountainCarTaskModule (const std::string &v_instance_name) : TParent (v_instance_name), conf_ (TParent::param_box_config_map()) { } protected: TMountainCarTaskConfigurations conf_; }; // end of MMountainCarTaskModule //------------------------------------------------------------------------------------------- }} + Add following ports into MMountainCarTaskModule. -- (port type), (port name), (return type), (parameter list), (purpose) -- slot, slot_start, void, (void), called at the beginning of the execution. -- slot, slot_execute_action, void, (const TRealVector &a), called by an RL agent module to execute action (1-dimensional vector). -- signal, signal_initialization, void (void), emit when the module is initialized. -- signal, signal_start_of_episode, void (void), emit when each episode starts. -- signal, signal_finish_episode, void (void), emit when the end-of-episode condition is satisfied. -- signal, signal_end_of_episode, void (void), emit when each episode is terminated. -- signal, signal_start_of_timestep, void (const TReal &dt), emit at the start of each time step. -- signal, signal_end_of_timestep, void (const TReal &dt), emit at the end of each time step. -- signal, signal_reward, void (const TSingleReward &), emit when a reward is given. -- out, out_state, const TRealVector&, (void), output the current state (2-dimensional vector). -- out, out_time, const TReal&, (void), output the current time. -- Note: some signal ports will not be used, but, defined for later use. -- The differences from the [[maze task>../Tutorial - Example - Maze]] are that: the action type of slot_execute_action is changed, signal_start_of_step and signal_end_of_step are replaced by signal_start_of_timestep and signal_end_of_timestep respectively because of the continuous time system, out_state_set_size and out_action_set_size are removed, the state type of out_state is changed, -- Note: this module receives a continuous action (i.e. acceleration) at each time step rather than a discrete action. The discretized action set is assumed to be defined by the other module. -- In order to add the ports, follow the steps: ++ Add declarations. ++ Add initializers at the constructor. ++ Add register functions at the constructor. + Next, we implement the slot port callbacks and the output functions. This procedure is slightly complicated; follow one by one. ++ Add member variables at the protected section. #codeh(cpp){{ TRealVector accel_; //!< 1-dim acceleration TRealVector state_; //!< position, velocity TReal time_; TInt num_episode_; }} ++ Implement slot_start_exec. This is a long code, so, write the declaration at the protected section: #codeh(cpp){{ virtual void slot_start_exec (void); }} Then, define it outside the class: #codeh(cpp){{ /*virtual*/void MMountainCarTaskModule::slot_start_exec (void) { init_environment(); signal_initialization.ExecAll(); for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++num_episode_) { init_environment(); signal_start_of_episode.ExecAll(); bool running(true); while(running) { signal_start_of_timestep.ExecAll(conf_.TimeStep); running= step_environment(); show_environment(); usleep(conf_.SleepUTime); if(time_>=conf_.MaxTime) { signal_finish_episode.ExecAll(); running= false; } signal_end_of_timestep.ExecAll(conf_.TimeStep); } signal_end_of_episode.ExecAll(); } } }} where we used the three member functions. These are declared at the protected section: #codeh(cpp){{ void init_environment (void); bool step_environment (void); void show_environment (void); }} and, defined outside the class: #codeh(cpp){{ void MMountainCarTaskModule::init_environment (void) { state_.resize(2); state_(0)= -0.5; state_(1)= 0.0; accel_.resize(1); accel_(0)= 0.0; time_= 0.0l; } }} #codeh(cpp){{ bool MMountainCarTaskModule::step_environment (void) { state_(1)= state_(1) + (-conf_.Gravity*conf_.Mass*std::cos(3.0*state_(0))+accel_(0)/conf_.Mass-conf_.Fric*state_(1))*conf_.TimeStep; state_(0)= state_(0) + state_(1)*conf_.TimeStep; time_+= conf_.TimeStep; TReal reward= 0.1l*(1.0l / (1.0l + Square(0.6l-state_(0))) - 1.0l); signal_reward.ExecAll(reward); if(state_(0)<=-1.2) { state_(0)=-1.2; state_(1)=0.0; } if(state_(0)>=0.6) { signal_finish_episode.ExecAll(); return false; } return true; } }} #codeh(cpp){{ void MMountainCarTaskModule::show_environment (void) { std::cout<<"("<<state_(0)<<","<<state_(1)<<"), "<<accel_(0)<<", "<<time_<<"/"<<num_episode_<<std::endl; std::vector<int> curve(conf_.DispWidth); for(int x(0);x<conf_.DispWidth;++x) { double rx= (0.6+1.2)*x/static_cast<TReal>(conf_.DispWidth)-1.2; curve[x]= static_cast<TReal>(conf_.DispHeight-1)*0.5*(1.0-sin(3.0*rx))+1; std::cout<<"-"; } std::cout<<std::endl; int pos= static_cast<TReal>(conf_.DispWidth)*(state_(0)+1.2)/(0.6+1.2); for(int y(0);y<conf_.DispHeight;++y) { for(int x(0);x<conf_.DispWidth;++x) { if(x==pos && y==curve[x]-1) std::cout<<"#"; else if(x==conf_.DispWidth-1 && y==curve[x]-1) std::cout<<"G"; else if(y>=curve[x] || x==0) std::cout<<"^"; else std::cout<<" "; } std::cout<<std::endl; } for(int x(0);x<conf_.DispWidth;++x) std::cout<<"-"; std::cout<<std::endl<<std::endl; } }} ++ Implement the other slot port callbacks and output functions. These are short code, so, you can write inside the class at the protected section. #codeh(cpp){{ virtual void slot_execute_action_exec (const TRealVector &a) { accel_= a; } virtual const TRealVector& out_state_get (void) const { return state_; } virtual const TReal& out_cont_time_get (void) const { return time_; } }} + Add a Start() public member function that calls slot_start: #codeh(cpp){{ void Start() { slot_start.Exec(); } }} + Finally, use SKYAI_ADD_MODULE macro to register the module on SkyAI: #codeh(cpp){{ SKYAI_ADD_MODULE(MMountainCarTaskModule) }} This should be written outside the class and inside the namespace loco_rabbits. That's it. * Random Action Module [#x48e5ef1] Next, in order to test the MMountainCarTaskModule module, we make a module named MRandomActionModule that emits a random action at each step. MRandomActionModule has two ports: - (port type), (port name), (return type), (parameter list), (purpose) - slot, slot_timestep, void, (const TReal &dt), called at each time step where a random action is emitted through the signal_action port. - signal, signal_action, void (const TRealVector &), emit at each step. Thus, its implementation is very simple: #codeh(cpp){{ //=========================================================================================== //!\brief Random action module class MRandomActionModule : public TModuleInterface //=========================================================================================== { public: typedef TModuleInterface TParent; typedef MRandomActionModule TThis; SKYAI_MODULE_NAMES(MRandomActionModule) MRandomActionModule (const std::string &v_instance_name) : TParent (v_instance_name), slot_timestep (*this), signal_action (*this) { add_slot_port (slot_timestep); add_signal_port (signal_action); } protected: MAKE_SLOT_PORT(slot_timestep, void, (const TReal &dt), (dt), TThis); MAKE_SIGNAL_PORT(signal_action, void (const TRealVector &), TThis); virtual void slot_timestep_exec (const TReal &dt) { static int time(0); static TRealVector a(1); if(time%50==0) switch(rand() % 3) { case 0: a(0)=0.0; break; case 1: a(0)=+0.2; break; case 2: a(0)=-0.2; break; } signal_action.ExecAll(a); ++time; } }; // end of MRandomActionModule //------------------------------------------------------------------------------------------- }} Then, use SKYAI_ADD_MODULE macro to register the module on SkyAI: #codeh(cpp){{ SKYAI_ADD_MODULE(MRandomActionModule) }} * Main Function [#y8e5e9ed] Refer to [[../Tutorial - Making Executable]]. The main function for the mountain-car task is almost the same as that of the [[maze task>../Tutorial - Example - Maze]]. A difference is the name of the module type. Here is an example: #codeh(cpp){{ int main(int argc, char**argv) { TOptionParser option(argc,argv); TAgent agent; if (!ParseCmdLineOption (agent, option)) return 0; MMountainCarTaskModule *p_mountaincar_task = dynamic_cast<MMountainCarTaskModule*>(agent.SearchModule("mountaincar_task")); if(p_mountaincar_task==NULL) {LERROR("module `mountaincar_task' is not defined as an instance of MMountainCarTaskModule"); return 1;} agent.SaveToFile (agent.GetDataFileName("before.agent"),"before-"); p_mountaincar_task->Start(); agent.SaveToFile (agent.GetDataFileName("after.agent"),"after-"); return 0; } }} * Compile [#q8b2db95] First, write a makefile which is almost the same as that of [[maze task>../Tutorial - Example - Maze]]; the difference is the executable's name. Then, execute the make command. An executable named mountain_car.out is generated? * Agent Script for Random Action Test [#e998a6d3] Please refer to [[../Tutorial - Writing Agent Script]]. Now, let's test MMountainCarTaskModule using MRandomActionModule. + Create a blank file named random_act.agent and open it. + Instantiate each module; the MMountainCarTaskModule's instance should have the name mountaincar_task: #codeh(cpp){{ module MMountainCarTaskModule mountaincar_task module MRandomActionModule rand_action }} + Connect the port pairs: #codeh(cpp){{ connect mountaincar_task.signal_start_of_timestep , rand_action.slot_timestep connect rand_action.signal_action , mountaincar_task.slot_execute_action }} + Assign to the configuration parameters of mountaincar_task: #codeh(cpp){{ mountaincar_task.config={ SleepUTime= 1000 } }} That's it. Let's test! Launch the executable as follows: #codeh(sh){{ ./mountain_car.out -agent random_act }} You will see a mountain as follows where the car (#) moves randomly. (-0.242451,0.875342), 0, 35.61/1 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ # ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- * Normalized Gaussian Network (NGnet) [#pc1927a1] We use an NGnet to approximate the action value function. In order to use the NGnet, we need to follow the process: + Generate a set of basis functions and save them into a file. SkyAI provides a tool to do this. + Specify the file path of the parameter of the NGnet module in an agent script. + Use NGnet with an RL module. The basis functions are allocated over the state space; they should cover the possible state. In this section, we describe how to generate the basis functions using the generating tool. The generating tools are stored in the tools/ngnet-generator directory; the executables may be already compiled. Otherwise, execute the ''make'' command at tools/ngnet-generator. The basis functions of NGnet are generated as follows: #codeh(sh){{ ./gen-grid.out -out OUT_FILENAME -unit_grid DIV_VEC -xmin MIN_VEC -xmax MAX_VEC -invSigma INVSIGMA_VEC }} Its options are (N: the dimensionality of state): - OUT_FILENAME : Output file name. - DIV_VEC : Vector whose element is the number of divisions at each dimension. This should be a vector of size N. - MIN_VEC : Vector whose element is lower bound of the center of basis function at each dimension. This should be a vector of size N. - MAX_VEC : Vector whose element is upper bound of the center of basis function at each dimension. This should be a vector of size N. - INVSIGMA_VEC : Vector of the diagonal elements of the inverse covariance matrix. This should be a vector of size N. If use "auto", the matrix is automatically computed. Of course, N is 2 in the mountain-car task. You can investigate the upper and the lower bound in the random action test. In this task, let us use 5x5 basis functions. Thus, we generate the basis functions of NGnet as follows: #codeh(sh){{ ../../tools/ngnet-generator/gen-grid.out -out ngnet_mc5x5.dat -unit_grid "5 5" -xmin "-1.2 -1.5" -xmax "0.6 1.5" -invSigma "auto" }} where ../../ denotes the relative path to the SkyAI base directory. The file ngnet_mc5x5.dat is generated, which is a text format; you can see the contents. #ref(./ngnet.png,center,zoom,300x0) This figure illustrates the locations of the basis functions. Each ellipse shows the center of a Gaussian basis function and the contour of 1-standard deviation. * Agent Script for Q(lambda)-learning with NGnet [#i71875a1] Please refer to [[../Tutorial - Writing Agent Script]]. Let's apply a Q-learning module to MMountainCarTaskModule. + Create a blank file named ql.agent and open it. + Include ql_da where a composite Q-learning module is defined: #codeh(cpp){{ include_once "ql_da" }} + Instantiate the following modules; the MMountainCarTaskModule's instance should have the name mountaincar_task: #codeh(cpp){{ module MMountainCarTaskModule mountaincar_task module MTDDiscAct behavior module MLCHolder_TRealVector direct_action module MDiscretizer action_discretizer module MBasisFunctionsNGnet ngnet }} - MTDDiscAct : TD(lambda)-learning module. - MDiscretizer : Module to define a discrete action set. - MLCHolder_TRealVector : Module to hold a control signal in a fixed time (here, configured to 0.2 sec). - MBasisFunctionsNGnet : Function approximator NGnet. + Connect the port pairs: #codeh(cpp){{ /// initialization process: connect mountaincar_task.signal_initialization , ngnet.slot_initialize connect ngnet.slot_initialize_finished , action_discretizer.slot_initialize connect action_discretizer.slot_initialize_finished , behavior.slot_initialize /// start of episode process: connect mountaincar_task.signal_start_of_episode , behavior.slot_start_episode /// start of time step process: connect mountaincar_task.signal_start_of_timestep , direct_action.slot_start_time_step /// end of time step process: connect mountaincar_task.signal_end_of_timestep , direct_action.slot_finish_time_step /// learning signals: connect behavior.signal_execute_action , action_discretizer.slot_in connect action_discretizer.signal_out , direct_action.slot_execute_action connect direct_action.signal_execute_command , mountaincar_task.slot_execute_action connect direct_action.signal_end_of_action , behavior.slot_finish_action connect mountaincar_task.signal_reward , behavior.slot_add_to_reward connect mountaincar_task.signal_finish_episode , behavior.slot_finish_episode_immediately /// I/O: connect action_discretizer.out_set_size , behavior.in_action_set_size connect mountaincar_task.out_state , ngnet.in_x connect ngnet.out_y , behavior.in_feature connect mountaincar_task.out_cont_time , behavior.in_cont_time }} + Task module setup: #codeh(cpp){{ mountaincar_task.config={ SleepUTime= 1000 } }} + NGnet file path: #codeh(cpp){{ ngnet.config ={ NGnetFileName = "ngnet_mc5x5.dat" } }} + Discrete action set with a control-command holder configuration: #codeh(cpp){{ action_discretizer.config ={ Min = (-0.2, -0.2) Max = ( 0.2, 0.2) Division = (3, 3) } direct_action.config ={Interval = 0.2;} }} + Learning configuration: #codeh(cpp){{ behavior.config={ UsingEligibilityTrace = true UsingReplacingTrace = true Lambda = 0.9 GradientMax = 1.0e+100 ActionSelection = "asBoltzman" PolicyImprovement = "piExpReduction" Tau = 1 TauDecreasingFactor = 0.05 TraceMax = 1.0 Gamma = 0.9 Alpha = 0.3 AlphaDecreasingFactor = 0.002 AlphaMin = 0.05 } }} Launch the executable as follows: #codeh(sh){{ ./mountain_car.out -path ../../benchmarks/cmn -agent ql -outdir result/rl1 }} where ../../benchmarks/cmn is a relative path of the benchmarks/cmn directory; modify it for your environment. After several tens of episodes, the policy will converge to a path: #block (-0.499914,0.00861355), 0.2, 0.01/200 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ # ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) (-0.450656,0.253621), 0.2, 0.35/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ # ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(end) #block (-0.317402,0.311478), 0.2, 0.78/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ #^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) (-0.62904,-0.879678), -0.2, 1.54/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^# ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(end) #block (-0.915373,-0.0505839), 0.2, 2.06/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^# ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) (-0.638749,1.06824), 0.2, 2.54/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^# ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(end) #block (-0.162464,1.08153), 0.2, 2.95/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ #^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) (0.149024,0.667015), 0.2, 3.31/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ #^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(end) #block (0.595877,0.685196), 0.2, 4.16/0 ---------------------------------------- ^ # ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) #block(end) In order to store the learning logs, make a directory result/rl1 which is specified with -outdir option. Plotting log-eps-ret.dat, you will obtain a learning curve: #ref(./out-mountaincar.png,zoom,center,600x0) CENTER:''Example of a learning curve.''
Do not change timestamp
''Table of Contents'' #contents * Overview [#n9051ce0] Here, we introduce how to implement a simple ''mountain-car'' task with SkyAI. The mountain-car task has a continuous state and a continuous action, which will be implemented as a module of SkyAI. In this tutorial, we discretize the action space; thus, this tutorial is an example of continuous state/discrete action problem. As an reinforcement learning algorithm, Peng's Q(lambda)-learning is applied to the mountain-car task; of course, we use predefined modules. In order to approximate the action value function over the continuous state space, we employ the ''normalized Gaussian network (NGnet)''. The following is the procedure: + Implement a mountain-car task module. + Implement a random action module for testing the task module. + Implement a main function. + Compile. + Write an agent script for the random action test. + Generate NGnet. + Write an agent script to apply Q(lambda)-learning with NGnet. The remarkable differences from the [[maze task>../Tutorial - Example - Maze]] are generating NGnet and using it in Q(lambda)-learning. The sample code works on a console; no extra libraries are required. * Task Setup [#ob140f6f] In the mountain-car environment, there is a mountain, a car, and a goal. #ref(./mountaincar.png,center,zoom,400x0) The objective of this task is to go from the start ('''x'''=-0.5) to the goal ('''x'''>=0.6). The car can accelerate, but does not have enough power to go beyond the mountain. Thus, the car needs to climb the opposite side, then climb toward the goal by using the kickback. The dynamics of the mountain is given as follows: > &mimetex( \dot{x}_{t+1} = \dot{x}_{t} + \bigl(-9.8m\cos(3x_{t}) \frac{a_t}{m} - k\dot{x}_{t}\bigr)\Delta{}t );, > &mimetex( x_{t+1} = x_{t} + \dot{x}_{t+1} \Delta{}t );, where '''m''' denotes the mass of the car (0.2), '''k''' denotes the friction factor (0.3), &mimetex( \Delta{}t ); denotes the time step (0.01), and '''a''' denotes the acceleration of the car. The robot cannot go into '''x'''<=-1.2 where is a wall. In the beginning of each episode, the car is stationary at '''x'''=-0.5. Each episode ends when the car reaches the goal ('''x'''>=0.6) or the amount of time becomes greater than a threshold (100). The ''state'' is a 2-dimensional vector &mimetex( x, \dot{x} );. The ''action'' is an acceleration '''a''' chosen from a discrete set {-0.2, 0, 0.2} which is not enough to go beyond the mountain. The ''reward'' is given by: > &mimetex( 0.1 \bigl(\frac{1}{1 + (0.6-x)^2} - 1\bigr) );. * MountainCar Task Module [#tf858bc6] Please refer to [[../Tutorial - Making Module]]. + Make a C++ source file named mountain_car.cpp using a template materials/templates/apps/main_tmpl.cpp contained in the SkyAI directory. -- You can modify the file information (file name, brief, author, date, copyright, license info, etc.) -- Replace every NAME_SPACE by loco_rabbits. -- Write the following code inside the namespace loco_rabbits. + Make a configure class using the template TXxConfigurations written in [[../Tutorial - Making Module]]. -- Replace every TXxConfigurations by TMountainCarTaskConfigurations. -- Remove the TestC parameter and add the following parameters: #codeh(cpp){{ int NumEpisodes; // number of episodes double TimeStep; // time-step double MaxTime; // max time per episode (task is terminated after this) double Gravity; // gravity of the environment double Mass; // mass of the car double Fric; // friction factor int DispWidth; // width for displaying the environment on the console int DispHeight; // height for displaying the environment on the console int SleepUTime; // duration for display }} -- Initialize them at the constructor as: #codeh(cpp){{ TMountainCarTaskConfigurations (var_space::TVariableMap &mmap) : NumEpisodes (200), TimeStep (0.01), MaxTime (100.0), Gravity (9.8), Mass (0.2), Fric (0.3), DispWidth (40), DispHeight (15), SleepUTime (1000) { Register(mmap); } }} -- In the member function Register, insert them: #codeh(cpp){{ ADD( NumEpisodes ); ADD( TimeStep ); ADD( MaxTime ); ADD( Gravity ); ADD( Mass ); ADD( Fric ); ADD( DispWidth ); ADD( DispHeight ); ADD( SleepUTime ); }} -- You can add your own parameters such as a noise. + Make the base of the module using the template MXxModule written in [[../Tutorial - Making Module]]. -- Simple template is OK. -- Replace every MXxModule by MMountainCarTaskModule. -- Replace every MParentModule by TModuleInterface. -- Replace TXxConfigurations by TMountainCarTaskConfigurations. -- Remove the definition of mem_ (TXxMemory mem_;). #codeh(cpp){{ //=========================================================================================== //!\brief Mountain Car task (environment+task) module class MMountainCarTaskModule : public TModuleInterface //=========================================================================================== { public: typedef TModuleInterface TParent; typedef MMountainCarTaskModule TThis; SKYAI_MODULE_NAMES(MMountainCarTaskModule) MMountainCarTaskModule (const std::string &v_instance_name) : TParent (v_instance_name), conf_ (TParent::param_box_config_map()) { } protected: TMountainCarTaskConfigurations conf_; }; // end of MMountainCarTaskModule //------------------------------------------------------------------------------------------- }} + Add following ports into MMountainCarTaskModule. -- (port type), (port name), (return type), (parameter list), (purpose) -- slot, slot_start, void, (void), called at the beginning of the execution. -- slot, slot_execute_action, void, (const TRealVector &a), called by an RL agent module to execute action (1-dimensional vector). -- signal, signal_initialization, void (void), emit when the module is initialized. -- signal, signal_start_of_episode, void (void), emit when each episode starts. -- signal, signal_finish_episode, void (void), emit when the end-of-episode condition is satisfied. -- signal, signal_end_of_episode, void (void), emit when each episode is terminated. -- signal, signal_start_of_timestep, void (const TReal &dt), emit at the start of each time step. -- signal, signal_end_of_timestep, void (const TReal &dt), emit at the end of each time step. -- signal, signal_reward, void (const TSingleReward &), emit when a reward is given. -- out, out_state, const TRealVector&, (void), output the current state (2-dimensional vector). -- out, out_time, const TReal&, (void), output the current time. -- Note: some signal ports will not be used, but, defined for later use. -- The differences from the [[maze task>../Tutorial - Example - Maze]] are that: the action type of slot_execute_action is changed, signal_start_of_step and signal_end_of_step are replaced by signal_start_of_timestep and signal_end_of_timestep respectively because of the continuous time system, out_state_set_size and out_action_set_size are removed, the state type of out_state is changed, -- Note: this module receives a continuous action (i.e. acceleration) at each time step rather than a discrete action. The discretized action set is assumed to be defined by the other module. -- In order to add the ports, follow the steps: ++ Add declarations. ++ Add initializers at the constructor. ++ Add register functions at the constructor. + Next, we implement the slot port callbacks and the output functions. This procedure is slightly complicated; follow one by one. ++ Add member variables at the protected section. #codeh(cpp){{ TRealVector accel_; //!< 1-dim acceleration TRealVector state_; //!< position, velocity TReal time_; TInt num_episode_; }} ++ Implement slot_start_exec. This is a long code, so, write the declaration at the protected section: #codeh(cpp){{ virtual void slot_start_exec (void); }} Then, define it outside the class: #codeh(cpp){{ /*virtual*/void MMountainCarTaskModule::slot_start_exec (void) { init_environment(); signal_initialization.ExecAll(); for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++num_episode_) { init_environment(); signal_start_of_episode.ExecAll(); bool running(true); while(running) { signal_start_of_timestep.ExecAll(conf_.TimeStep); running= step_environment(); show_environment(); usleep(conf_.SleepUTime); if(time_>=conf_.MaxTime) { signal_finish_episode.ExecAll(); running= false; } signal_end_of_timestep.ExecAll(conf_.TimeStep); } signal_end_of_episode.ExecAll(); } } }} where we used the three member functions. These are declared at the protected section: #codeh(cpp){{ void init_environment (void); bool step_environment (void); void show_environment (void); }} and, defined outside the class: #codeh(cpp){{ void MMountainCarTaskModule::init_environment (void) { state_.resize(2); state_(0)= -0.5; state_(1)= 0.0; accel_.resize(1); accel_(0)= 0.0; time_= 0.0l; } }} #codeh(cpp){{ bool MMountainCarTaskModule::step_environment (void) { state_(1)= state_(1) + (-conf_.Gravity*conf_.Mass*std::cos(3.0*state_(0))+accel_(0)/conf_.Mass-conf_.Fric*state_(1))*conf_.TimeStep; state_(0)= state_(0) + state_(1)*conf_.TimeStep; time_+= conf_.TimeStep; TReal reward= 0.1l*(1.0l / (1.0l + Square(0.6l-state_(0))) - 1.0l); signal_reward.ExecAll(reward); if(state_(0)<=-1.2) { state_(0)=-1.2; state_(1)=0.0; } if(state_(0)>=0.6) { signal_finish_episode.ExecAll(); return false; } return true; } }} #codeh(cpp){{ void MMountainCarTaskModule::show_environment (void) { std::cout<<"("<<state_(0)<<","<<state_(1)<<"), "<<accel_(0)<<", "<<time_<<"/"<<num_episode_<<std::endl; std::vector<int> curve(conf_.DispWidth); for(int x(0);x<conf_.DispWidth;++x) { double rx= (0.6+1.2)*x/static_cast<TReal>(conf_.DispWidth)-1.2; curve[x]= static_cast<TReal>(conf_.DispHeight-1)*0.5*(1.0-sin(3.0*rx))+1; std::cout<<"-"; } std::cout<<std::endl; int pos= static_cast<TReal>(conf_.DispWidth)*(state_(0)+1.2)/(0.6+1.2); for(int y(0);y<conf_.DispHeight;++y) { for(int x(0);x<conf_.DispWidth;++x) { if(x==pos && y==curve[x]-1) std::cout<<"#"; else if(x==conf_.DispWidth-1 && y==curve[x]-1) std::cout<<"G"; else if(y>=curve[x] || x==0) std::cout<<"^"; else std::cout<<" "; } std::cout<<std::endl; } for(int x(0);x<conf_.DispWidth;++x) std::cout<<"-"; std::cout<<std::endl<<std::endl; } }} ++ Implement the other slot port callbacks and output functions. These are short code, so, you can write inside the class at the protected section. #codeh(cpp){{ virtual void slot_execute_action_exec (const TRealVector &a) { accel_= a; } virtual const TRealVector& out_state_get (void) const { return state_; } virtual const TReal& out_cont_time_get (void) const { return time_; } }} + Add a Start() public member function that calls slot_start: #codeh(cpp){{ void Start() { slot_start.Exec(); } }} + Finally, use SKYAI_ADD_MODULE macro to register the module on SkyAI: #codeh(cpp){{ SKYAI_ADD_MODULE(MMountainCarTaskModule) }} This should be written outside the class and inside the namespace loco_rabbits. That's it. * Random Action Module [#x48e5ef1] Next, in order to test the MMountainCarTaskModule module, we make a module named MRandomActionModule that emits a random action at each step. MRandomActionModule has two ports: - (port type), (port name), (return type), (parameter list), (purpose) - slot, slot_timestep, void, (const TReal &dt), called at each time step where a random action is emitted through the signal_action port. - signal, signal_action, void (const TRealVector &), emit at each step. Thus, its implementation is very simple: #codeh(cpp){{ //=========================================================================================== //!\brief Random action module class MRandomActionModule : public TModuleInterface //=========================================================================================== { public: typedef TModuleInterface TParent; typedef MRandomActionModule TThis; SKYAI_MODULE_NAMES(MRandomActionModule) MRandomActionModule (const std::string &v_instance_name) : TParent (v_instance_name), slot_timestep (*this), signal_action (*this) { add_slot_port (slot_timestep); add_signal_port (signal_action); } protected: MAKE_SLOT_PORT(slot_timestep, void, (const TReal &dt), (dt), TThis); MAKE_SIGNAL_PORT(signal_action, void (const TRealVector &), TThis); virtual void slot_timestep_exec (const TReal &dt) { static int time(0); static TRealVector a(1); if(time%50==0) switch(rand() % 3) { case 0: a(0)=0.0; break; case 1: a(0)=+0.2; break; case 2: a(0)=-0.2; break; } signal_action.ExecAll(a); ++time; } }; // end of MRandomActionModule //------------------------------------------------------------------------------------------- }} Then, use SKYAI_ADD_MODULE macro to register the module on SkyAI: #codeh(cpp){{ SKYAI_ADD_MODULE(MRandomActionModule) }} * Main Function [#y8e5e9ed] Refer to [[../Tutorial - Making Executable]]. The main function for the mountain-car task is almost the same as that of the [[maze task>../Tutorial - Example - Maze]]. A difference is the name of the module type. Here is an example: #codeh(cpp){{ int main(int argc, char**argv) { TOptionParser option(argc,argv); TAgent agent; if (!ParseCmdLineOption (agent, option)) return 0; MMountainCarTaskModule *p_mountaincar_task = dynamic_cast<MMountainCarTaskModule*>(agent.SearchModule("mountaincar_task")); if(p_mountaincar_task==NULL) {LERROR("module `mountaincar_task' is not defined as an instance of MMountainCarTaskModule"); return 1;} agent.SaveToFile (agent.GetDataFileName("before.agent"),"before-"); p_mountaincar_task->Start(); agent.SaveToFile (agent.GetDataFileName("after.agent"),"after-"); return 0; } }} * Compile [#q8b2db95] First, write a makefile which is almost the same as that of [[maze task>../Tutorial - Example - Maze]]; the difference is the executable's name. Then, execute the make command. An executable named mountain_car.out is generated? * Agent Script for Random Action Test [#e998a6d3] Please refer to [[../Tutorial - Writing Agent Script]]. Now, let's test MMountainCarTaskModule using MRandomActionModule. + Create a blank file named random_act.agent and open it. + Instantiate each module; the MMountainCarTaskModule's instance should have the name mountaincar_task: #codeh(cpp){{ module MMountainCarTaskModule mountaincar_task module MRandomActionModule rand_action }} + Connect the port pairs: #codeh(cpp){{ connect mountaincar_task.signal_start_of_timestep , rand_action.slot_timestep connect rand_action.signal_action , mountaincar_task.slot_execute_action }} + Assign to the configuration parameters of mountaincar_task: #codeh(cpp){{ mountaincar_task.config={ SleepUTime= 1000 } }} That's it. Let's test! Launch the executable as follows: #codeh(sh){{ ./mountain_car.out -agent random_act }} You will see a mountain as follows where the car (#) moves randomly. (-0.242451,0.875342), 0, 35.61/1 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ # ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- * Normalized Gaussian Network (NGnet) [#pc1927a1] We use an NGnet to approximate the action value function. In order to use the NGnet, we need to follow the process: + Generate a set of basis functions and save them into a file. SkyAI provides a tool to do this. + Specify the file path of the parameter of the NGnet module in an agent script. + Use NGnet with an RL module. The basis functions are allocated over the state space; they should cover the possible state. In this section, we describe how to generate the basis functions using the generating tool. The generating tools are stored in the tools/ngnet-generator directory; the executables may be already compiled. Otherwise, execute the ''make'' command at tools/ngnet-generator. The basis functions of NGnet are generated as follows: #codeh(sh){{ ./gen-grid.out -out OUT_FILENAME -unit_grid DIV_VEC -xmin MIN_VEC -xmax MAX_VEC -invSigma INVSIGMA_VEC }} Its options are (N: the dimensionality of state): - OUT_FILENAME : Output file name. - DIV_VEC : Vector whose element is the number of divisions at each dimension. This should be a vector of size N. - MIN_VEC : Vector whose element is lower bound of the center of basis function at each dimension. This should be a vector of size N. - MAX_VEC : Vector whose element is upper bound of the center of basis function at each dimension. This should be a vector of size N. - INVSIGMA_VEC : Vector of the diagonal elements of the inverse covariance matrix. This should be a vector of size N. If use "auto", the matrix is automatically computed. Of course, N is 2 in the mountain-car task. You can investigate the upper and the lower bound in the random action test. In this task, let us use 5x5 basis functions. Thus, we generate the basis functions of NGnet as follows: #codeh(sh){{ ../../tools/ngnet-generator/gen-grid.out -out ngnet_mc5x5.dat -unit_grid "5 5" -xmin "-1.2 -1.5" -xmax "0.6 1.5" -invSigma "auto" }} where ../../ denotes the relative path to the SkyAI base directory. The file ngnet_mc5x5.dat is generated, which is a text format; you can see the contents. #ref(./ngnet.png,center,zoom,300x0) This figure illustrates the locations of the basis functions. Each ellipse shows the center of a Gaussian basis function and the contour of 1-standard deviation. * Agent Script for Q(lambda)-learning with NGnet [#i71875a1] Please refer to [[../Tutorial - Writing Agent Script]]. Let's apply a Q-learning module to MMountainCarTaskModule. + Create a blank file named ql.agent and open it. + Include ql_da where a composite Q-learning module is defined: #codeh(cpp){{ include_once "ql_da" }} + Instantiate the following modules; the MMountainCarTaskModule's instance should have the name mountaincar_task: #codeh(cpp){{ module MMountainCarTaskModule mountaincar_task module MTDDiscAct behavior module MLCHolder_TRealVector direct_action module MDiscretizer action_discretizer module MBasisFunctionsNGnet ngnet }} - MTDDiscAct : TD(lambda)-learning module. - MDiscretizer : Module to define a discrete action set. - MLCHolder_TRealVector : Module to hold a control signal in a fixed time (here, configured to 0.2 sec). - MBasisFunctionsNGnet : Function approximator NGnet. + Connect the port pairs: #codeh(cpp){{ /// initialization process: connect mountaincar_task.signal_initialization , ngnet.slot_initialize connect ngnet.slot_initialize_finished , action_discretizer.slot_initialize connect action_discretizer.slot_initialize_finished , behavior.slot_initialize /// start of episode process: connect mountaincar_task.signal_start_of_episode , behavior.slot_start_episode /// start of time step process: connect mountaincar_task.signal_start_of_timestep , direct_action.slot_start_time_step /// end of time step process: connect mountaincar_task.signal_end_of_timestep , direct_action.slot_finish_time_step /// learning signals: connect behavior.signal_execute_action , action_discretizer.slot_in connect action_discretizer.signal_out , direct_action.slot_execute_action connect direct_action.signal_execute_command , mountaincar_task.slot_execute_action connect direct_action.signal_end_of_action , behavior.slot_finish_action connect mountaincar_task.signal_reward , behavior.slot_add_to_reward connect mountaincar_task.signal_finish_episode , behavior.slot_finish_episode_immediately /// I/O: connect action_discretizer.out_set_size , behavior.in_action_set_size connect mountaincar_task.out_state , ngnet.in_x connect ngnet.out_y , behavior.in_feature connect mountaincar_task.out_cont_time , behavior.in_cont_time }} + Task module setup: #codeh(cpp){{ mountaincar_task.config={ SleepUTime= 1000 } }} + NGnet file path: #codeh(cpp){{ ngnet.config ={ NGnetFileName = "ngnet_mc5x5.dat" } }} + Discrete action set with a control-command holder configuration: #codeh(cpp){{ action_discretizer.config ={ Min = (-0.2, -0.2) Max = ( 0.2, 0.2) Division = (3, 3) } direct_action.config ={Interval = 0.2;} }} + Learning configuration: #codeh(cpp){{ behavior.config={ UsingEligibilityTrace = true UsingReplacingTrace = true Lambda = 0.9 GradientMax = 1.0e+100 ActionSelection = "asBoltzman" PolicyImprovement = "piExpReduction" Tau = 1 TauDecreasingFactor = 0.05 TraceMax = 1.0 Gamma = 0.9 Alpha = 0.3 AlphaDecreasingFactor = 0.002 AlphaMin = 0.05 } }} Launch the executable as follows: #codeh(sh){{ ./mountain_car.out -path ../../benchmarks/cmn -agent ql -outdir result/rl1 }} where ../../benchmarks/cmn is a relative path of the benchmarks/cmn directory; modify it for your environment. After several tens of episodes, the policy will converge to a path: #block (-0.499914,0.00861355), 0.2, 0.01/200 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ # ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) (-0.450656,0.253621), 0.2, 0.35/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ # ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(end) #block (-0.317402,0.311478), 0.2, 0.78/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ #^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) (-0.62904,-0.879678), -0.2, 1.54/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^# ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(end) #block (-0.915373,-0.0505839), 0.2, 2.06/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^# ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) (-0.638749,1.06824), 0.2, 2.54/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^# ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(end) #block (-0.162464,1.08153), 0.2, 2.95/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ #^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) (0.149024,0.667015), 0.2, 3.31/0 ---------------------------------------- ^ G ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ #^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(end) #block (0.595877,0.685196), 0.2, 4.16/0 ---------------------------------------- ^ # ^ ^^^^^ ^ ^^^^^^^ ^ ^^^^^^^^ ^ ^^^^^^^^^^ ^^ ^^^^^^^^^^^ ^^^ ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^ ^^^^^ ^^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ---------------------------------------- #block(next) #block(end) In order to store the learning logs, make a directory result/rl1 which is specified with -outdir option. Plotting log-eps-ret.dat, you will obtain a learning curve: #ref(./out-mountaincar.png,zoom,center,600x0) CENTER:''Example of a learning curve.''
See
FormatRule
(PukiWiki-official)
Help