Documentation/Introduction

SkyAI is a highly modularized Reinforcement Learning library for real and simulation robots to learn behaviors. Our ultimate goal is to develop an artificial intelligence (AI) program with which the robots can learn to behave as their users wish.

Table of Contents

Introduction to SkyAI †

Designing a behavior by only its objective is essential for future robots, because this ability enables the end-users to teach their wish to the robots easily. Reinforcement learning (RL) is one of such technologies, thus, RL applications in robotics are of great interest.

RL algorithms play an important role in the SkyAI. Thus, in this documents, RL is introduced first.

This introduction is described from a robotics viewpoint. But, please keep in mind that RL algorithms and the SkyAI are applicable in more general situations.

↑

What is RL? †

Reinforcement learning is a very general approach to enable a robot (agent) to acquire a behavior.

↑

Usual Robot Development †

In general, a developer of a robot have to specify the follows about the robot:

state (sensors) definition,
action (actuators) definition,
relation between the sensors, or relation between the sensors and the actuators (so-called a kinematics and a dynamics models of the robot),
programs with which the robot behaves. These may be difficult to the end-users of the robot.

↑

Using RL †

However, RL algorithms relax them; instead, the robot tries to find out! Now, the things that the developer should specify are:

state (sensors) definition,
action (actuators) definition,
reward function which indicates the objective of the robot's behavior. The state and action definitions can be pre-defined by the developer. Thus, the end-users only need to specify the reward function.

↑

Reward-based Behavior Design †

The reward function is the key. It is a scalar function which is evaluated and given to the robot at the end of every action. The RL algorithm tries to move the robot so that it can obtain the rewards as much as possible.

For instance, if we want the robot to walk forward, we can design the reward as a speed of the robot. If we want the robot to jump, we can design the reward as a vertical position of the robot.

After the reward function is specified, the robot begins practicing (learning). In the early stage of learning, it moves its body randomly. Through the learning, the robot finds a better policy (how to behave). The behavior is refined little by little. And finally, the robot obtain the optimal policy.

↑

Adaptable to Unknown Environment †

A remarkable thing is that we do not need to specify the kinematics and the dynamics models. The robot implicitly (or explicitly, in some cases) learns them. This means that the RL methods can be applicable to a problem whose kinematics or dynamics models are difficult to identify. This feature is very suitable for real world tasks, since making precise models of the real world, such as humans, is often difficult.

↑

Many Applications †

Thus, a lot of researches of RL applications are performed. You can find some videos related to the RL applications in the following websites:

MPI Robot Learning Lab (see "research")
Video Highlights - Robot Locomotion Group (MIT)
researches - A. Yamaguchi (Robotics Lab, NAIST)

↑

Issue of RL and Solutions †

However, there is a big issue of RL methods. Namely, the RL methods require a lot of learning cost. Here, the learning cost means both the computational cost and the sampling cost (e.g. falling down). The latter one is crucial for a real robot. What is worse is that the learning cost increases exponentially with the complexity of the task and the robot (e.g. the degree of freedom). As long as this problem is not solved, it is considered to be difficult to apply RL to realistic tasks.

Many researchers are tackling to the learning cost issue. And, the following strategies to improve the RL methods are proposed so far:

dimension reduction,
model utilization,
hierarchical structure,
imitation of the others,
reusing already learned knowledge.

↑

Concepts of SkyAI †

↑

Solving the RL Issues, Creating a Robot's Intelligence †

The aim of the SkyAI is to integrate these ideas so that many developers of robotics (and the other fields) can use the sophisticated RL methods.

↑

Highly Modularized Architecture †

The approach of the SkyAI is modularization of the methods. Modular architecture enables the following features:

High scalability: Modular architecture makes it easy to inherit a module and create a new one. That is, the library is highly scalable.
High reusability of implementation: Modular architecture can separate a task (problem) specific implementation into some modules. Typical examples are reward modules and low-level robot controller. In contrast, we can make several generic, i.e. highly reusable, modules.
High reusability of learned knowledge: Modular architecture also enhances the reusability of learned knowledge, such as a learned policy by an RL algorithm, a dynamics model, and a reward model.

↑

Once Compiled, Reconfigurable Infinitely †

The SkyAI must be executed in high speed, and should be highly flexible.

These are very important features to apply the SkyAI to a real robot system. A real robot system requires a high-speed execution. On the other hand, we need a high flexibility like script languages.

Thus, the SkyAI is developed so that after a software using the SkyAI is compiled once, the modular structure of the software is reconfigurable infinitely.

↑

Developer Friendly †

Highly-modularized architecture has many benefits as mentioned above, however, it sometimes makes development difficult. The SkyAI pursues a developer-friendly implementation.

↑

Implementation †

Based on the concepts, the SkyAI is implemented as follows.

↑

Written in C++ †

A program written in C++ is compiled to an executable code whose execution speed is almost the highest level. This is very suitable for a real robot system.

↑

Core Modular Architecture †

A module is implemented as a class in C++.

In general, communication between classes is performed by member functions. In the SkyAI, in order to achieve the reconfigurability, a member function is encapsulated as a port object. Each module can have any number of ports. Ports can be connected or disconnected flexibly, which enables reconfiguring the modular structure.

To provide some basic functions, such as a script interface, every module is inherited from a basic module class. Or, modules also can be inherited from an interface module. An interface module provides some default port definitions. By inheriting a module class, its port objects are also inherited. Using interface modules increases the reusability and the exchangeability.

↑

Script Interface †

A script language is defined for the SkyAI. In the script language, instantiating modules, connecting modules, and setting parameters of the modules (e.g. a learning rate) can be described.

↑

Bridge between C++ Code and Script †

Some macros are provided to make ports easily. The ports made by them become available in the script language. For example:

class MSomeModule : public TModuleInterface
{
  ...
protected:
  MAKE_SIGNAL_PORT(signal_execute_action, void(const TAction&), TThis);
  ...
};

Thus, making ports does not require a kind of code generator.

The SkyAI also provides a way to handle the parameters of a module in the script language. For example, if you want to handle the following structure type in the script language,

struct TSomeConfigurations
{
  int                              X;
  double                           Y;
  std::vector<double>              VecData;
  std::vector<std::list<double> >  CmpData;
}

just define a "Register" member function and a constructor as:

TSomeConfigurations (TVariableMap &mmap)
  {
    Register(mmap);
  }
void Register (TVariableMap &mmap)
  {
    AddToVarMap(mmap, "X"      , X);
    AddToVarMap(mmap, "Y"      , Y);
    AddToVarMap(mmap, "VecData", VecData);
    AddToVarMap(mmap, "CmpData", CmpData);
  }

Then, you can assign the values in the script language as follows:

X = 12
Y = -2.1e+4
VecData = (1, 2.236, 3)

CmpData = {
    [] = (0.1, 0.2, 0.3)  // push back a list
    [] = (0.4, 0.5, 0.6)  // ditto
  }

Thus, if a developer is familiar with C++, it is easy for the developer to make modules.

↑

Applying SkyAI †

In order to apply the SkyAI to a specific problem (a robot system), the developer is needed to make

a robot module (or an environment module), and,
a task module. These modules should be compatible with the modules provided by the SkyAI. Thus, the developer should make the modules by inheriting the basic module, or the other interface modules. But, this is not difficult as described above.

Maybe, a developer needs to make some additional modules, such as a TCP communication module. But, if the developer can implement the modules as generic ones, they can be published so that the other developers can reuse them.

↑

Benchmarks †

Some benchmark programs are also provided with the SkyAI. They define some robots on simulation, and some environments such as a maze. Also, as an off-the-shelf robot module, a Bioloid (made by ROBOTIS) controller module is also provided.

↑

Introduction