Using the Simulator - SIMULATOR AND METHODS

4. SIMULATOR AND METHODS

4.3 Using the Simulator

This section describes the software tools that were used to train the controllers with sim-ulators. Our aim is to have a controller that can predict actions in the real environment by observing the states similar to the simulated environment. The simulation program is CopelliaSim [41] previously known as V-Rep. We implemented the DDPG algorithm in a Python script and this script communicates with the simulator through a toolkit called PyRep [42].

4.3.1 CopelliaSim

CopelliaSim is a versatile and a scalable simulation framework [41]. It is currently a popular simulator for industry and research. The wide range of available objects and robots makes it very useful. Some of the features of the simulator [41] which are important for our work are:

• Physics Engine

• Remote API

• Collision Detection

• Minimum Distance Calculation

Physics EngineCopelliaSim has 4 different physics engines available to use which are Bullet, ODE, Newton and Vortex Dynamics. Those physics engines enable the simulator to simulate real-world dynamics and interactions between objects. This is also important for the works in which manipulators need to grasp and put objects but we will not deal with those tasks. Bullet [43] physics engine is used in our work. The simulation time step is set as 50 ms which is the time needed for the main script to be executed again.

Each scene has one main script which has the necessary codes to run the simulation.

For the dynamic movement of the obstacle in our scene, a threaded child script written in Lua language is run for that specific object. Inside that script, a random target location is determined for the obstacle in the light gray area in Figure 4.1. Then, the obstacle moves to that location with a random velocity in the range [0,0.04] m/s. Note that, the position of the obstacle is not reset at the end of our episode, it continues this random movement independently from our episodes. The manipulator is located at the origin of the scene, namely the origin position of our Cartesian coordinate system is the base of the manipulator.

Remote APIRemote API enables the simulator to communicate with external applications such as Matlab, Python, C++. Our simulator communicates with a Python script but will use PyRep (explained in Section 4.3.2) instead of Remote API.

Collision Detection Module Collision Detection Module is used to detect collision

be-tween the objects. We use this module to detect collisions bebe-tween the manipulator and the obstacle. It is used as a flag in our training algorithm. An example of this detection is shown in Figure 4.5 where the manipulator turns into red color after collision.

Figure 4.5.Example of collision detection.

Distance Calculation Module Distance Calculation Module is used to measure mini-mum Euclidean distance between any part of the two objects. We use this module to measure minimum distance between manipulator and obstacle, end-effector and target object pairs. An example of this measurement is shown in Figure 4.6 where distance vectors are shown by tick black lines with distance values in meters near them.

4.3.2 PyRep

PyRep is a toolkit built on top of CopelliaSim which enables us to launch the simulation from Python script and execute the Python codes simultaneously with the simulation [42].

Remote API calls are not fast enough to run our RL algorithms, so we use this toolkit which is faster than Remote API. PyRep takes information about the environment from the simulator at every step which are needed for the state vector and reward calculation during the DDPG training and testing. Those information items are:

• Target object Cartesian position

• Manipulator joint velocities and intrinsic position

Figure 4.6.Example of minimum distance measurement.

• End-effector Cartesian position

• Minimum distance between manipulator and obstacle

• Minimum distance between end-effector and target object

• Collision detection between manipulator and obstacle

Our Python script constructs the state vector with the given information and then it predicts the action values (joint velocities) by using the state vector. Then, PyRep sends the joint velocities predicted by the DDPG to the simulator and sends command to step the physics simulation with those joint velocities. After the step is executed with the given joint velocities, PyRep takes the new information from the simulator to find the new state vector and reward for the executed action. Note that, movement of the dynamic obstacle runs in a child script independent of the PyRep commands. Also, at the end of episode PyRep resets the environment by randomly placing the target object in the dark gray area and resetting the positions and orientations of the joints of the manipulator (Figure 4.1).

This whole iteration is summarized in Algorithm 3.

By using Algorithm 3, we can easily train our controller by implementing DDPG in Python script and communicating with the simulator through Pyrep. After the training, we can save the parameters of the networks of DDPG, namely save our controller behaviour.

Later, we can restore those parameters to test our controller. We can easily test how it performs by simply predicting actions for the state obtained using PyRep again. Algo-rithm 4 shows the testing iteration.

Algorithm 3:Training Algorithm with PyRep Initialize the DDPG networks ;

Launch the simulation scene by PyRep and initialize objects;

forepisode = 1,M do

Reset the environment by randomly placing the target object and setting the manipulator to the initial configuration through PyRep ;

Get information from simulator through PyRep and obtain the initial states₁ ; fort = 1,T do

Select actiona_tby using the actor network ;

Step the physics simulation by setting joint velocities of the manipulator given ina_tthrough PyRep ;

Get information from the simulator through PyRep, obtain the next states_t+1 and rewardr_t;

Store the experience(s_t, a_t, r_t, s_t+1)in experience replay ; ifexperience replay is full then

Update the networks in DDPG;

end

Sets_t =s_t+1 ; end

end

Save the parameters of the networks ;

Algorithm 4:Testing Performance with PyRep Restore DDPG networks of the controller ;

Launch the simulation scene by PyRep and initialize objects ; forepisode = 1,M do

Reset the environment by randomly placing the target object and setting the manipulator to the initial configuration through PyRep ;

Get information from simulator through PyRep and obtain the initial states₁ ; fort = 1,T do

Select actionatby using the actor network ;

Step the physics simulation by setting joint velocities of the manipulator given ina_tthrough PyRep ;

Get information from the simulator through PyRep, obtain the next statest+1 ; Obtain the information about successful action and collision ;

if action is successful for 30 consecutive time step tthen End the episode with success;

end end end

Save the statistics of the test ;

In Algorithm 4 the successful action is determined with the same condition asRsin Equa-tion 4.4. We decided to wait for some number of consecutive successful acEqua-tions to ensure that the manipulator stays at the target region without collision for some time. Note that an episode can finish with both success and failure. Namely, it can collide at the earlier steps and then finish with success. At the end, the statistics which shows success rate and collision rate are obtained. Those results are shown in Section 6.2.

Note that the action selection is different for the Hybrid Controller which is summarized in Algorithm 2. Algorithm 4 is a summary to test a trained DDPG model such as Monolithic Controller, Reaching Sub-Controller or Avoiding Sub-Controller.

In document Comparison of monolithic and hybrid controllers for multi-objective sim-to-real learning (sivua 36-41)