Tutorial - Example - Mountain Car
Start:
''Table of Contents''
#contents
* Overview [#n9051ce0]
Here, we introduce how to implement a simple ''mountain-c...
The mountain-car task has a continuous state and a contin...
In this tutorial, we discretize the action space; thus, t...
As an reinforcement learning algorithm, Peng's Q(lambda)-...
In order to approximate the action value function over th...
The following is the procedure:
+ Implement a mountain-car task module.
+ Implement a random action module for testing the task m...
+ Implement a main function.
+ Compile.
+ Write an agent script for the random action test.
+ Generate NGnet.
+ Write an agent script to apply Q(lambda)-learning with ...
The remarkable differences from the [[maze task>../Tutori...
The sample code works on a console; no extra libraries ar...
* Task Setup [#ob140f6f]
In the mountain-car environment, there is a mountain, a c...
#ref(./mountaincar.png,center,zoom,400x0)
The objective of this task is to go from the start ('''x'...
The car can accelerate, but does not have enough power to...
Thus, the car needs to climb the opposite side, then clim...
The dynamics of the mountain is given as follows:
> &mimetex( \dot{x}_{t+1} = \dot{x}_{t} + \bigl(-9.8m\cos...
> &mimetex( x_{t+1} = x_{t} + \dot{x}_{t+1} \Delta{}t );,
where '''m''' denotes the mass of the car (0.2), '''k''' ...
The robot cannot go into '''x'''<=-1.2 where is a wall.
In the beginning of each episode, the car is stationary a...
Each episode ends when the car reaches the goal ('''x'''>...
The ''state'' is a 2-dimensional vector &mimetex( x, \dot...
The ''action'' is an acceleration '''a''' chosen from a d...
The ''reward'' is given by:
> &mimetex( 0.1 \bigl(\frac{1}{1 + (0.6-x)^2} - 1\bigr) ...
* MountainCar Task Module [#tf858bc6]
Please refer to [[../Tutorial - Making Module]].
+ Make a C++ source file named mountain_car.cpp using a t...
-- You can modify the file information (file name, brief,...
-- Replace every NAME_SPACE by loco_rabbits.
-- Write the following code inside the namespace loco_rab...
+ Make a configure class using the template TXxConfigurat...
-- Replace every TXxConfigurations by TMountainCarTaskCon...
-- Remove the TestC parameter and add the following param...
#codeh(cpp){{
int NumEpisodes; // number of episodes
double TimeStep; // time-step
double MaxTime; // max time per episode (task is term...
double Gravity; // gravity of the environment
double Mass; // mass of the car
double Fric; // friction factor
int DispWidth; // width for displaying the environme...
int DispHeight; // height for displaying the environm...
int SleepUTime; // duration for display
}}
-- Initialize them at the constructor as:
#codeh(cpp){{
TMountainCarTaskConfigurations (var_space::TVariableMap &...
NumEpisodes (200),
TimeStep (0.01),
MaxTime (100.0),
Gravity (9.8),
Mass (0.2),
Fric (0.3),
DispWidth (40),
DispHeight (15),
SleepUTime (1000)
{
Register(mmap);
}
}}
-- In the member function Register, insert them:
#codeh(cpp){{
ADD( NumEpisodes );
ADD( TimeStep );
ADD( MaxTime );
ADD( Gravity );
ADD( Mass );
ADD( Fric );
ADD( DispWidth );
ADD( DispHeight );
ADD( SleepUTime );
}}
-- You can add your own parameters such as a noise.
+ Make the base of the module using the template MXxModul...
-- Simple template is OK.
-- Replace every MXxModule by MMountainCarTaskModule.
-- Replace every MParentModule by TModuleInterface.
-- Replace TXxConfigurations by TMountainCarTaskConfigura...
-- Remove the definition of mem_ (TXxMemory mem_;).
#codeh(cpp){{
//=======================================================...
//!\brief Mountain Car task (environment+task) module
class MMountainCarTaskModule
: public TModuleInterface
//=======================================================...
{
public:
typedef TModuleInterface TParent;
typedef MMountainCarTaskModule TThis;
SKYAI_MODULE_NAMES(MMountainCarTaskModule)
MMountainCarTaskModule (const std::string &v_instance_n...
: TParent (v_instance_name),
conf_ (TParent::param_box_config_map())
{
}
protected:
TMountainCarTaskConfigurations conf_;
}; // end of MMountainCarTaskModule
//-------------------------------------------------------...
}}
+ Add following ports into MMountainCarTaskModule.
-- (port type), (port name), (return type), (parameter li...
-- slot, slot_start, void, (void), called at the beginnin...
-- slot, slot_execute_action, void, (const TRealVector &a...
-- signal, signal_initialization, void (void), emit when ...
-- signal, signal_start_of_episode, void (void), emit whe...
-- signal, signal_finish_episode, void (void), emit when ...
-- signal, signal_end_of_episode, void (void), emit when ...
-- signal, signal_start_of_timestep, void (const TReal &d...
-- signal, signal_end_of_timestep, void (const TReal &dt)...
-- signal, signal_reward, void (const TSingleReward &), e...
-- out, out_state, const TRealVector&, (void), output the...
-- out, out_time, const TReal&, (void), output the curren...
-- Note: some signal ports will not be used, but, defined...
-- The differences from the [[maze task>../Tutorial - Exa...
-- Note: this module receives a continuous action (i.e. a...
-- In order to add the ports, follow the steps:
++ Add declarations.
++ Add initializers at the constructor.
++ Add register functions at the constructor.
+ Next, we implement the slot port callbacks and the outp...
++ Add member variables at the protected section.
#codeh(cpp){{
TRealVector accel_; //!< 1-dim acceleration
TRealVector state_; //!< position, velocity
TReal time_;
TInt num_episode_;
}}
++ Implement slot_start_exec. This is a long code, so, w...
#codeh(cpp){{
virtual void slot_start_exec (void);
}}
Then, define it outside the class:
#codeh(cpp){{
/*virtual*/void MMountainCarTaskModule::slot_start_exec (...
{
init_environment();
signal_initialization.ExecAll();
for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++n...
{
init_environment();
signal_start_of_episode.ExecAll();
bool running(true);
while(running)
{
signal_start_of_timestep.ExecAll(conf_.TimeStep);
running= step_environment();
show_environment();
usleep(conf_.SleepUTime);
if(time_>=conf_.MaxTime)
{
signal_finish_episode.ExecAll();
running= false;
}
signal_end_of_timestep.ExecAll(conf_.TimeStep);
}
signal_end_of_episode.ExecAll();
}
}
}}
where we used the three member functions. These are decl...
#codeh(cpp){{
void init_environment (void);
bool step_environment (void);
void show_environment (void);
}}
and, defined outside the class:
#codeh(cpp){{
void MMountainCarTaskModule::init_environment (void)
{
state_.resize(2);
state_(0)= -0.5;
state_(1)= 0.0;
accel_.resize(1);
accel_(0)= 0.0;
time_= 0.0l;
}
}}
#codeh(cpp){{
bool MMountainCarTaskModule::step_environment (void)
{
state_(1)= state_(1) + (-conf_.Gravity*conf_.Mass*std::...
state_(0)= state_(0) + state_(1)*conf_.TimeStep;
time_+= conf_.TimeStep;
TReal reward= 0.1l*(1.0l / (1.0l + Square(0.6l-state_(0...
signal_reward.ExecAll(reward);
if(state_(0)<=-1.2)
{
state_(0)=-1.2;
state_(1)=0.0;
}
if(state_(0)>=0.6)
{
signal_finish_episode.ExecAll();
return false;
}
return true;
}
}}
#codeh(cpp){{
void MMountainCarTaskModule::show_environment (void)
{
std::cout<<"("<<state_(0)<<","<<state_(1)<<"), "<<accel...
std::vector<int> curve(conf_.DispWidth);
for(int x(0);x<conf_.DispWidth;++x)
{
double rx= (0.6+1.2)*x/static_cast<TReal>(conf_.DispW...
curve[x]= static_cast<TReal>(conf_.DispHeight-1)*0.5*...
std::cout<<"-";
}
std::cout<<std::endl;
int pos= static_cast<TReal>(conf_.DispWidth)*(state_(0)...
for(int y(0);y<conf_.DispHeight;++y)
{
for(int x(0);x<conf_.DispWidth;++x)
{
if(x==pos && y==curve[x]-1) std::cout<<"#";
else if(x==conf_.DispWidth-1 && y==curve[x]-1) std...
else if(y>=curve[x] || x==0) std::cout<<"^";
else std::cout<<" ";
}
std::cout<<std::endl;
}
for(int x(0);x<conf_.DispWidth;++x) std::cout<<"-";
std::cout<<std::endl<<std::endl;
}
}}
++ Implement the other slot port callbacks and output fun...
#codeh(cpp){{
virtual void slot_execute_action_exec (const TRealVector ...
{
accel_= a;
}
virtual const TRealVector& out_state_get (void) const
{
return state_;
}
virtual const TReal& out_cont_time_get (void) const
{
return time_;
}
}}
+ Add a Start() public member function that calls slot_st...
#codeh(cpp){{
void Start()
{
slot_start.Exec();
}
}}
+ Finally, use SKYAI_ADD_MODULE macro to register the mod...
#codeh(cpp){{
SKYAI_ADD_MODULE(MMountainCarTaskModule)
}}
This should be written outside the class and inside the n...
That's it.
* Random Action Module [#x48e5ef1]
Next, in order to test the MMountainCarTaskModule module,...
MRandomActionModule has two ports:
- (port type), (port name), (return type), (parameter lis...
- slot, slot_timestep, void, (const TReal &dt), called at...
- signal, signal_action, void (const TRealVector &), emit...
Thus, its implementation is very simple:
#codeh(cpp){{
//=======================================================...
//!\brief Random action module
class MRandomActionModule
: public TModuleInterface
//=======================================================...
{
public:
typedef TModuleInterface TParent;
typedef MRandomActionModule TThis;
SKYAI_MODULE_NAMES(MRandomActionModule)
MRandomActionModule (const std::string &v_instance_name)
: TParent (v_instance_name),
slot_timestep (*this),
signal_action (*this)
{
add_slot_port (slot_timestep);
add_signal_port (signal_action);
}
protected:
MAKE_SLOT_PORT(slot_timestep, void, (const TReal &dt), ...
MAKE_SIGNAL_PORT(signal_action, void (const TRealVector...
virtual void slot_timestep_exec (const TReal &dt)
{
static int time(0);
static TRealVector a(1);
if(time%50==0)
switch(rand() % 3)
{
case 0: a(0)=0.0; break;
case 1: a(0)=+0.2; break;
case 2: a(0)=-0.2; break;
}
signal_action.ExecAll(a);
++time;
}
}; // end of MRandomActionModule
//-------------------------------------------------------...
}}
Then, use SKYAI_ADD_MODULE macro to register the module o...
#codeh(cpp){{
SKYAI_ADD_MODULE(MRandomActionModule)
}}
* Main Function [#y8e5e9ed]
Refer to [[../Tutorial - Making Executable]].
The main function for the mountain-car task is almost the...
Here is an example:
#codeh(cpp){{
int main(int argc, char**argv)
{
TOptionParser option(argc,argv);
TAgent agent;
if (!ParseCmdLineOption (agent, option)) return 0;
MMountainCarTaskModule *p_mountaincar_task = dynamic_ca...
if(p_mountaincar_task==NULL) {LERROR("module `mountain...
agent.SaveToFile (agent.GetDataFileName("before.agent")...
p_mountaincar_task->Start();
agent.SaveToFile (agent.GetDataFileName("after.agent"),...
return 0;
}
}}
* Compile [#q8b2db95]
First, write a makefile which is almost the same as that ...
Then, execute the make command.
An executable named mountain_car.out is generated?
* Agent Script for Random Action Test [#e998a6d3]
Please refer to [[../Tutorial - Writing Agent Script]].
Now, let's test MMountainCarTaskModule using MRandomActio...
+ Create a blank file named random_act.agent and open it.
+ Instantiate each module; the MMountainCarTaskModule's i...
#codeh(cpp){{
module MMountainCarTaskModule mountaincar_task
module MRandomActionModule rand_action
}}
+ Connect the port pairs:
#codeh(cpp){{
connect mountaincar_task.signal_start_of_timestep , rand...
connect rand_action.signal_action , mountaincar_task.slo...
}}
+ Assign to the configuration parameters of mountaincar_t...
#codeh(cpp){{
mountaincar_task.config={
SleepUTime= 1000
}
}}
That's it. Let's test!
Launch the executable as follows:
#codeh(sh){{
./mountain_car.out -agent random_act
}}
You will see a mountain as follows where the car (#) move...
(-0.242451,0.875342), 0, 35.61/1
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ # ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
* Normalized Gaussian Network (NGnet) [#pc1927a1]
We use an NGnet to approximate the action value function.
In order to use the NGnet, we need to follow the process:
+ Generate a set of basis functions and save them into a ...
+ Specify the file path of the parameter of the NGnet mod...
+ Use NGnet with an RL module.
The basis functions are allocated over the state space; t...
In this section, we describe how to generate the basis fu...
The generating tools are stored in the tools/ngnet-genera...
The basis functions of NGnet are generated as follows:
#codeh(sh){{
./gen-grid.out -out OUT_FILENAME -unit_grid DIV_VEC -xmin...
}}
Its options are (N: the dimensionality of state):
- OUT_FILENAME : Output file name.
- DIV_VEC : Vector whose element is the number of divisio...
- MIN_VEC : Vector whose element is lower bound of the ce...
- MAX_VEC : Vector whose element is upper bound of the ce...
- INVSIGMA_VEC : Vector of the diagonal elements of the i...
Of course, N is 2 in the mountain-car task.
You can investigate the upper and the lower bound in the ...
In this task, let us use 5x5 basis functions.
Thus, we generate the basis functions of NGnet as follows:
#codeh(sh){{
../../tools/ngnet-generator/gen-grid.out -out ngnet_mc5x5...
}}
where ../../ denotes the relative path to the SkyAI base ...
The file ngnet_mc5x5.dat is generated, which is a text fo...
#ref(./ngnet.png,center,zoom,300x0)
This figure illustrates the locations of the basis functi...
Each ellipse shows the center of a Gaussian basis functio...
* Agent Script for Q(lambda)-learning with NGnet [#i71875...
Please refer to [[../Tutorial - Writing Agent Script]].
Let's apply a Q-learning module to MMountainCarTaskModule.
+ Create a blank file named ql.agent and open it.
+ Include ql_da where a composite Q-learning module is de...
#codeh(cpp){{
include_once "ql_da"
}}
+ Instantiate the following modules; the MMountainCarTask...
#codeh(cpp){{
module MMountainCarTaskModule mountaincar_task
module MTDDiscAct behavior
module MLCHolder_TRealVector direct_action
module MDiscretizer action_discretizer
module MBasisFunctionsNGnet ngnet
}}
- MTDDiscAct : TD(lambda)-learning module.
- MDiscretizer : Module to define a discrete action set.
- MLCHolder_TRealVector : Module to hold a control signal...
- MBasisFunctionsNGnet : Function approximator NGnet.
+ Connect the port pairs:
#codeh(cpp){{
/// initialization process:
connect mountaincar_task.signal_initialization , ng...
connect ngnet.slot_initialize_finished , ac...
connect action_discretizer.slot_initialize_finished , be...
/// start of episode process:
connect mountaincar_task.signal_start_of_episode , be...
/// start of time step process:
connect mountaincar_task.signal_start_of_timestep , di...
/// end of time step process:
connect mountaincar_task.signal_end_of_timestep , di...
/// learning signals:
connect behavior.signal_execute_action , ac...
connect action_discretizer.signal_out , di...
connect direct_action.signal_execute_command , mo...
connect direct_action.signal_end_of_action , be...
connect mountaincar_task.signal_reward , be...
connect mountaincar_task.signal_finish_episode , be...
/// I/O:
connect action_discretizer.out_set_size , be...
connect mountaincar_task.out_state , ng...
connect ngnet.out_y , be...
connect mountaincar_task.out_cont_time , be...
}}
+ Task module setup:
#codeh(cpp){{
mountaincar_task.config={
SleepUTime= 1000
}
}}
+ NGnet file path:
#codeh(cpp){{
ngnet.config ={
NGnetFileName = "ngnet_mc5x5.dat"
}
}}
+ Discrete action set with a control-command holder confi...
#codeh(cpp){{
action_discretizer.config ={
Min = (-0.2, -0.2)
Max = ( 0.2, 0.2)
Division = (3, 3)
}
direct_action.config ={Interval = 0.2;}
}}
+ Learning configuration:
#codeh(cpp){{
behavior.config={
UsingEligibilityTrace = true
UsingReplacingTrace = true
Lambda = 0.9
GradientMax = 1.0e+100
ActionSelection = "asBoltzman"
PolicyImprovement = "piExpReduction"
Tau = 1
TauDecreasingFactor = 0.05
TraceMax = 1.0
Gamma = 0.9
Alpha = 0.3
AlphaDecreasingFactor = 0.002
AlphaMin = 0.05
}
}}
Launch the executable as follows:
#codeh(sh){{
./mountain_car.out -path ../../benchmarks/cmn -agent ql -...
}}
where ../../benchmarks/cmn is a relative path of the benc...
After several tens of episodes, the policy will converge ...
#block
(-0.499914,0.00861355), 0.2, 0.01/200
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ # ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
(-0.450656,0.253621), 0.2, 0.35/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ # ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(end)
#block
(-0.317402,0.311478), 0.2, 0.78/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ #^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
(-0.62904,-0.879678), -0.2, 1.54/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^# ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(end)
#block
(-0.915373,-0.0505839), 0.2, 2.06/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^# ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
(-0.638749,1.06824), 0.2, 2.54/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^# ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(end)
#block
(-0.162464,1.08153), 0.2, 2.95/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ #^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
(0.149024,0.667015), 0.2, 3.31/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ #^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(end)
#block
(0.595877,0.685196), 0.2, 4.16/0
----------------------------------------
^ #
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
&nb...
&nb...
&nb...
&nb...
&nb...
#block(end)
In order to store the learning logs, make a directory res...
Plotting log-eps-ret.dat, you will obtain a learning curve:
#ref(./out-mountaincar.png,zoom,center,600x0)
CENTER:''Example of a learning curve.''
End:
''Table of Contents''
#contents
* Overview [#n9051ce0]
Here, we introduce how to implement a simple ''mountain-c...
The mountain-car task has a continuous state and a contin...
In this tutorial, we discretize the action space; thus, t...
As an reinforcement learning algorithm, Peng's Q(lambda)-...
In order to approximate the action value function over th...
The following is the procedure:
+ Implement a mountain-car task module.
+ Implement a random action module for testing the task m...
+ Implement a main function.
+ Compile.
+ Write an agent script for the random action test.
+ Generate NGnet.
+ Write an agent script to apply Q(lambda)-learning with ...
The remarkable differences from the [[maze task>../Tutori...
The sample code works on a console; no extra libraries ar...
* Task Setup [#ob140f6f]
In the mountain-car environment, there is a mountain, a c...
#ref(./mountaincar.png,center,zoom,400x0)
The objective of this task is to go from the start ('''x'...
The car can accelerate, but does not have enough power to...
Thus, the car needs to climb the opposite side, then clim...
The dynamics of the mountain is given as follows:
> &mimetex( \dot{x}_{t+1} = \dot{x}_{t} + \bigl(-9.8m\cos...
> &mimetex( x_{t+1} = x_{t} + \dot{x}_{t+1} \Delta{}t );,
where '''m''' denotes the mass of the car (0.2), '''k''' ...
The robot cannot go into '''x'''<=-1.2 where is a wall.
In the beginning of each episode, the car is stationary a...
Each episode ends when the car reaches the goal ('''x'''>...
The ''state'' is a 2-dimensional vector &mimetex( x, \dot...
The ''action'' is an acceleration '''a''' chosen from a d...
The ''reward'' is given by:
> &mimetex( 0.1 \bigl(\frac{1}{1 + (0.6-x)^2} - 1\bigr) ...
* MountainCar Task Module [#tf858bc6]
Please refer to [[../Tutorial - Making Module]].
+ Make a C++ source file named mountain_car.cpp using a t...
-- You can modify the file information (file name, brief,...
-- Replace every NAME_SPACE by loco_rabbits.
-- Write the following code inside the namespace loco_rab...
+ Make a configure class using the template TXxConfigurat...
-- Replace every TXxConfigurations by TMountainCarTaskCon...
-- Remove the TestC parameter and add the following param...
#codeh(cpp){{
int NumEpisodes; // number of episodes
double TimeStep; // time-step
double MaxTime; // max time per episode (task is term...
double Gravity; // gravity of the environment
double Mass; // mass of the car
double Fric; // friction factor
int DispWidth; // width for displaying the environme...
int DispHeight; // height for displaying the environm...
int SleepUTime; // duration for display
}}
-- Initialize them at the constructor as:
#codeh(cpp){{
TMountainCarTaskConfigurations (var_space::TVariableMap &...
NumEpisodes (200),
TimeStep (0.01),
MaxTime (100.0),
Gravity (9.8),
Mass (0.2),
Fric (0.3),
DispWidth (40),
DispHeight (15),
SleepUTime (1000)
{
Register(mmap);
}
}}
-- In the member function Register, insert them:
#codeh(cpp){{
ADD( NumEpisodes );
ADD( TimeStep );
ADD( MaxTime );
ADD( Gravity );
ADD( Mass );
ADD( Fric );
ADD( DispWidth );
ADD( DispHeight );
ADD( SleepUTime );
}}
-- You can add your own parameters such as a noise.
+ Make the base of the module using the template MXxModul...
-- Simple template is OK.
-- Replace every MXxModule by MMountainCarTaskModule.
-- Replace every MParentModule by TModuleInterface.
-- Replace TXxConfigurations by TMountainCarTaskConfigura...
-- Remove the definition of mem_ (TXxMemory mem_;).
#codeh(cpp){{
//=======================================================...
//!\brief Mountain Car task (environment+task) module
class MMountainCarTaskModule
: public TModuleInterface
//=======================================================...
{
public:
typedef TModuleInterface TParent;
typedef MMountainCarTaskModule TThis;
SKYAI_MODULE_NAMES(MMountainCarTaskModule)
MMountainCarTaskModule (const std::string &v_instance_n...
: TParent (v_instance_name),
conf_ (TParent::param_box_config_map())
{
}
protected:
TMountainCarTaskConfigurations conf_;
}; // end of MMountainCarTaskModule
//-------------------------------------------------------...
}}
+ Add following ports into MMountainCarTaskModule.
-- (port type), (port name), (return type), (parameter li...
-- slot, slot_start, void, (void), called at the beginnin...
-- slot, slot_execute_action, void, (const TRealVector &a...
-- signal, signal_initialization, void (void), emit when ...
-- signal, signal_start_of_episode, void (void), emit whe...
-- signal, signal_finish_episode, void (void), emit when ...
-- signal, signal_end_of_episode, void (void), emit when ...
-- signal, signal_start_of_timestep, void (const TReal &d...
-- signal, signal_end_of_timestep, void (const TReal &dt)...
-- signal, signal_reward, void (const TSingleReward &), e...
-- out, out_state, const TRealVector&, (void), output the...
-- out, out_time, const TReal&, (void), output the curren...
-- Note: some signal ports will not be used, but, defined...
-- The differences from the [[maze task>../Tutorial - Exa...
-- Note: this module receives a continuous action (i.e. a...
-- In order to add the ports, follow the steps:
++ Add declarations.
++ Add initializers at the constructor.
++ Add register functions at the constructor.
+ Next, we implement the slot port callbacks and the outp...
++ Add member variables at the protected section.
#codeh(cpp){{
TRealVector accel_; //!< 1-dim acceleration
TRealVector state_; //!< position, velocity
TReal time_;
TInt num_episode_;
}}
++ Implement slot_start_exec. This is a long code, so, w...
#codeh(cpp){{
virtual void slot_start_exec (void);
}}
Then, define it outside the class:
#codeh(cpp){{
/*virtual*/void MMountainCarTaskModule::slot_start_exec (...
{
init_environment();
signal_initialization.ExecAll();
for(num_episode_=0; num_episode_<conf_.NumEpisodes; ++n...
{
init_environment();
signal_start_of_episode.ExecAll();
bool running(true);
while(running)
{
signal_start_of_timestep.ExecAll(conf_.TimeStep);
running= step_environment();
show_environment();
usleep(conf_.SleepUTime);
if(time_>=conf_.MaxTime)
{
signal_finish_episode.ExecAll();
running= false;
}
signal_end_of_timestep.ExecAll(conf_.TimeStep);
}
signal_end_of_episode.ExecAll();
}
}
}}
where we used the three member functions. These are decl...
#codeh(cpp){{
void init_environment (void);
bool step_environment (void);
void show_environment (void);
}}
and, defined outside the class:
#codeh(cpp){{
void MMountainCarTaskModule::init_environment (void)
{
state_.resize(2);
state_(0)= -0.5;
state_(1)= 0.0;
accel_.resize(1);
accel_(0)= 0.0;
time_= 0.0l;
}
}}
#codeh(cpp){{
bool MMountainCarTaskModule::step_environment (void)
{
state_(1)= state_(1) + (-conf_.Gravity*conf_.Mass*std::...
state_(0)= state_(0) + state_(1)*conf_.TimeStep;
time_+= conf_.TimeStep;
TReal reward= 0.1l*(1.0l / (1.0l + Square(0.6l-state_(0...
signal_reward.ExecAll(reward);
if(state_(0)<=-1.2)
{
state_(0)=-1.2;
state_(1)=0.0;
}
if(state_(0)>=0.6)
{
signal_finish_episode.ExecAll();
return false;
}
return true;
}
}}
#codeh(cpp){{
void MMountainCarTaskModule::show_environment (void)
{
std::cout<<"("<<state_(0)<<","<<state_(1)<<"), "<<accel...
std::vector<int> curve(conf_.DispWidth);
for(int x(0);x<conf_.DispWidth;++x)
{
double rx= (0.6+1.2)*x/static_cast<TReal>(conf_.DispW...
curve[x]= static_cast<TReal>(conf_.DispHeight-1)*0.5*...
std::cout<<"-";
}
std::cout<<std::endl;
int pos= static_cast<TReal>(conf_.DispWidth)*(state_(0)...
for(int y(0);y<conf_.DispHeight;++y)
{
for(int x(0);x<conf_.DispWidth;++x)
{
if(x==pos && y==curve[x]-1) std::cout<<"#";
else if(x==conf_.DispWidth-1 && y==curve[x]-1) std...
else if(y>=curve[x] || x==0) std::cout<<"^";
else std::cout<<" ";
}
std::cout<<std::endl;
}
for(int x(0);x<conf_.DispWidth;++x) std::cout<<"-";
std::cout<<std::endl<<std::endl;
}
}}
++ Implement the other slot port callbacks and output fun...
#codeh(cpp){{
virtual void slot_execute_action_exec (const TRealVector ...
{
accel_= a;
}
virtual const TRealVector& out_state_get (void) const
{
return state_;
}
virtual const TReal& out_cont_time_get (void) const
{
return time_;
}
}}
+ Add a Start() public member function that calls slot_st...
#codeh(cpp){{
void Start()
{
slot_start.Exec();
}
}}
+ Finally, use SKYAI_ADD_MODULE macro to register the mod...
#codeh(cpp){{
SKYAI_ADD_MODULE(MMountainCarTaskModule)
}}
This should be written outside the class and inside the n...
That's it.
* Random Action Module [#x48e5ef1]
Next, in order to test the MMountainCarTaskModule module,...
MRandomActionModule has two ports:
- (port type), (port name), (return type), (parameter lis...
- slot, slot_timestep, void, (const TReal &dt), called at...
- signal, signal_action, void (const TRealVector &), emit...
Thus, its implementation is very simple:
#codeh(cpp){{
//=======================================================...
//!\brief Random action module
class MRandomActionModule
: public TModuleInterface
//=======================================================...
{
public:
typedef TModuleInterface TParent;
typedef MRandomActionModule TThis;
SKYAI_MODULE_NAMES(MRandomActionModule)
MRandomActionModule (const std::string &v_instance_name)
: TParent (v_instance_name),
slot_timestep (*this),
signal_action (*this)
{
add_slot_port (slot_timestep);
add_signal_port (signal_action);
}
protected:
MAKE_SLOT_PORT(slot_timestep, void, (const TReal &dt), ...
MAKE_SIGNAL_PORT(signal_action, void (const TRealVector...
virtual void slot_timestep_exec (const TReal &dt)
{
static int time(0);
static TRealVector a(1);
if(time%50==0)
switch(rand() % 3)
{
case 0: a(0)=0.0; break;
case 1: a(0)=+0.2; break;
case 2: a(0)=-0.2; break;
}
signal_action.ExecAll(a);
++time;
}
}; // end of MRandomActionModule
//-------------------------------------------------------...
}}
Then, use SKYAI_ADD_MODULE macro to register the module o...
#codeh(cpp){{
SKYAI_ADD_MODULE(MRandomActionModule)
}}
* Main Function [#y8e5e9ed]
Refer to [[../Tutorial - Making Executable]].
The main function for the mountain-car task is almost the...
Here is an example:
#codeh(cpp){{
int main(int argc, char**argv)
{
TOptionParser option(argc,argv);
TAgent agent;
if (!ParseCmdLineOption (agent, option)) return 0;
MMountainCarTaskModule *p_mountaincar_task = dynamic_ca...
if(p_mountaincar_task==NULL) {LERROR("module `mountain...
agent.SaveToFile (agent.GetDataFileName("before.agent")...
p_mountaincar_task->Start();
agent.SaveToFile (agent.GetDataFileName("after.agent"),...
return 0;
}
}}
* Compile [#q8b2db95]
First, write a makefile which is almost the same as that ...
Then, execute the make command.
An executable named mountain_car.out is generated?
* Agent Script for Random Action Test [#e998a6d3]
Please refer to [[../Tutorial - Writing Agent Script]].
Now, let's test MMountainCarTaskModule using MRandomActio...
+ Create a blank file named random_act.agent and open it.
+ Instantiate each module; the MMountainCarTaskModule's i...
#codeh(cpp){{
module MMountainCarTaskModule mountaincar_task
module MRandomActionModule rand_action
}}
+ Connect the port pairs:
#codeh(cpp){{
connect mountaincar_task.signal_start_of_timestep , rand...
connect rand_action.signal_action , mountaincar_task.slo...
}}
+ Assign to the configuration parameters of mountaincar_t...
#codeh(cpp){{
mountaincar_task.config={
SleepUTime= 1000
}
}}
That's it. Let's test!
Launch the executable as follows:
#codeh(sh){{
./mountain_car.out -agent random_act
}}
You will see a mountain as follows where the car (#) move...
(-0.242451,0.875342), 0, 35.61/1
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ # ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
* Normalized Gaussian Network (NGnet) [#pc1927a1]
We use an NGnet to approximate the action value function.
In order to use the NGnet, we need to follow the process:
+ Generate a set of basis functions and save them into a ...
+ Specify the file path of the parameter of the NGnet mod...
+ Use NGnet with an RL module.
The basis functions are allocated over the state space; t...
In this section, we describe how to generate the basis fu...
The generating tools are stored in the tools/ngnet-genera...
The basis functions of NGnet are generated as follows:
#codeh(sh){{
./gen-grid.out -out OUT_FILENAME -unit_grid DIV_VEC -xmin...
}}
Its options are (N: the dimensionality of state):
- OUT_FILENAME : Output file name.
- DIV_VEC : Vector whose element is the number of divisio...
- MIN_VEC : Vector whose element is lower bound of the ce...
- MAX_VEC : Vector whose element is upper bound of the ce...
- INVSIGMA_VEC : Vector of the diagonal elements of the i...
Of course, N is 2 in the mountain-car task.
You can investigate the upper and the lower bound in the ...
In this task, let us use 5x5 basis functions.
Thus, we generate the basis functions of NGnet as follows:
#codeh(sh){{
../../tools/ngnet-generator/gen-grid.out -out ngnet_mc5x5...
}}
where ../../ denotes the relative path to the SkyAI base ...
The file ngnet_mc5x5.dat is generated, which is a text fo...
#ref(./ngnet.png,center,zoom,300x0)
This figure illustrates the locations of the basis functi...
Each ellipse shows the center of a Gaussian basis functio...
* Agent Script for Q(lambda)-learning with NGnet [#i71875...
Please refer to [[../Tutorial - Writing Agent Script]].
Let's apply a Q-learning module to MMountainCarTaskModule.
+ Create a blank file named ql.agent and open it.
+ Include ql_da where a composite Q-learning module is de...
#codeh(cpp){{
include_once "ql_da"
}}
+ Instantiate the following modules; the MMountainCarTask...
#codeh(cpp){{
module MMountainCarTaskModule mountaincar_task
module MTDDiscAct behavior
module MLCHolder_TRealVector direct_action
module MDiscretizer action_discretizer
module MBasisFunctionsNGnet ngnet
}}
- MTDDiscAct : TD(lambda)-learning module.
- MDiscretizer : Module to define a discrete action set.
- MLCHolder_TRealVector : Module to hold a control signal...
- MBasisFunctionsNGnet : Function approximator NGnet.
+ Connect the port pairs:
#codeh(cpp){{
/// initialization process:
connect mountaincar_task.signal_initialization , ng...
connect ngnet.slot_initialize_finished , ac...
connect action_discretizer.slot_initialize_finished , be...
/// start of episode process:
connect mountaincar_task.signal_start_of_episode , be...
/// start of time step process:
connect mountaincar_task.signal_start_of_timestep , di...
/// end of time step process:
connect mountaincar_task.signal_end_of_timestep , di...
/// learning signals:
connect behavior.signal_execute_action , ac...
connect action_discretizer.signal_out , di...
connect direct_action.signal_execute_command , mo...
connect direct_action.signal_end_of_action , be...
connect mountaincar_task.signal_reward , be...
connect mountaincar_task.signal_finish_episode , be...
/// I/O:
connect action_discretizer.out_set_size , be...
connect mountaincar_task.out_state , ng...
connect ngnet.out_y , be...
connect mountaincar_task.out_cont_time , be...
}}
+ Task module setup:
#codeh(cpp){{
mountaincar_task.config={
SleepUTime= 1000
}
}}
+ NGnet file path:
#codeh(cpp){{
ngnet.config ={
NGnetFileName = "ngnet_mc5x5.dat"
}
}}
+ Discrete action set with a control-command holder confi...
#codeh(cpp){{
action_discretizer.config ={
Min = (-0.2, -0.2)
Max = ( 0.2, 0.2)
Division = (3, 3)
}
direct_action.config ={Interval = 0.2;}
}}
+ Learning configuration:
#codeh(cpp){{
behavior.config={
UsingEligibilityTrace = true
UsingReplacingTrace = true
Lambda = 0.9
GradientMax = 1.0e+100
ActionSelection = "asBoltzman"
PolicyImprovement = "piExpReduction"
Tau = 1
TauDecreasingFactor = 0.05
TraceMax = 1.0
Gamma = 0.9
Alpha = 0.3
AlphaDecreasingFactor = 0.002
AlphaMin = 0.05
}
}}
Launch the executable as follows:
#codeh(sh){{
./mountain_car.out -path ../../benchmarks/cmn -agent ql -...
}}
where ../../benchmarks/cmn is a relative path of the benc...
After several tens of episodes, the policy will converge ...
#block
(-0.499914,0.00861355), 0.2, 0.01/200
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ # ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
(-0.450656,0.253621), 0.2, 0.35/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ # ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(end)
#block
(-0.317402,0.311478), 0.2, 0.78/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ #^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
(-0.62904,-0.879678), -0.2, 1.54/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^# ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(end)
#block
(-0.915373,-0.0505839), 0.2, 2.06/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^# ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
(-0.638749,1.06824), 0.2, 2.54/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^# ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(end)
#block
(-0.162464,1.08153), 0.2, 2.95/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ #^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
(0.149024,0.667015), 0.2, 3.31/0
----------------------------------------
^ G
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ #^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(end)
#block
(0.595877,0.685196), 0.2, 4.16/0
----------------------------------------
^ #
^ ^^^^^
^ ^^^^^^^
^ ^^^^^^^^
^ ^^^^^^^^^^
^^ ^^^^^^^^^^^
^^^ ^^^^^^^^^^^^
^^^^ ^^^^^^^^^^^^^
^^^^^ ^^^^^^^^^^^^^^
^^^^^^ ^^^^^^^^^^^^^^^
^^^^^^^ ^^^^^^^^^^^^^^^^
^^^^^^^^ ^^^^^^^^^^^^^^^^^
^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------
#block(next)
&nb...
&nb...
&nb...
&nb...
&nb...
#block(end)
In order to store the learning logs, make a directory res...
Plotting log-eps-ret.dat, you will obtain a learning curve:
#ref(./out-mountaincar.png,zoom,center,600x0)
CENTER:''Example of a learning curve.''
Page: