Training methods for Machine Learning - Learning Grasp Affordances from Vision

Grasping is most suitable for supervised learning methods. The biggest question is how to produce the data and labelling of the training set. Usually an image of graspable objects has such a large number of grasping candidates that its not feasible to have the robot try

choices to produce training material.

2.12.1 Human imitation

One possible method of teaching is to have a human illustrate the grasps, which can give very good positive examples, and is very intuitive to perform for the teacher.

There are two main ways how teaching by example can be done. The simplest and most direct to implement is to manually operate the robot to the correct grasp, which is then saved [33]. This is also called human teleoperation.

Other alternative is to use some kind of sensor to record the grasp as made by a human hand, and map this to the robot hand. This has the benefit of being very easy to understand for the human trainer, and the robot can attempt to imitate the human grasps.

Both recording human grasps and robot teleoperation are examined as teaching methods by DeGranville et al. They examine using these methods for learning probability density functions for grasps. He also examines learning available grasp actions on objects, in other words: grasp affordances. [19] [34]

Naturally there are also problems with a robot imitating a human. The largest issue is the difference in grasp types, as humans can perform a wide variety of different grasps, whereas robot grasping usually focuses on one specific type of grasping e.g. force-closure.

One current research direction is to use force control to allow easily manipulating the robot into desired positions. One application of force control is using force and torque sensors in the robot arm to be able to directly manipulate the robot by e.g. pushing it [35].

This allows teaching the robot by the human operator applying a small suggestive force to the robot arm. This allows operating the robot into grasp positions to be performed faster, thus allowing faster teaching.

The biggest problem with methods teaching via mimicry is the low volume of examples produced, as each sample grasp only provides a single labeled training sample. Negative examples are also hard to produce. This is why alternative methods have been actively researched.

2.12.2 Reinforcement learning

Reinforcement learning is a general machine learning method where the learner performs actions, and tries to optimize a reward function associated to those actions [36]. It is an exploratory learning method where the intent is to try different approaches, and constantly integrate feedback from those trials to improve future attempts.

Reinforcement has been tried for having the robot learn itself by trying grasps. Usually some basis still needs to be provide by other methods, from which the robot can then improve by trying varying grasps. Grasping is such a complex task that simply random exploration will not produce results. For reinforcement learning, starting with "tabula rasa", a blank slate with nothing learned, is usually not efficient, as the likelihood of arriving at a single successful grasp with no initial training is unlikely. One of the other methods of grasping can be used to seed the learning, after which the reinforcement learn-ing can improve the success rate of the result. As an example of reinforcement learnlearn-ing, Hsiao [37] has used Partially Observable Markov Decision Processes (POMDP) to sig-nificantly improve control over grasping.

2.12.3 Simulated training

Simulation of grasps and training data significantly speeds up the process of implementing and improving grasping. Such tools are especially important for a topic as complex as grasping novel objects.

GraspIt! is one such simulator framework that contains many tools for simulating grasps.

GraspIt! includes models of several different types of grippers, simulation of grasping with visualization of weak points, tools for offline analysis of the quality of grasps and several other features that help with the process of designing and implementing robotic grasping. It can also integrate with an actual robot and work as the control framework for grasping [32].

Simulation of the training material has been tried in several ways. Saxena uses raytraced sample objects with labelled regions to generate simulated training and test data sets [7].

This method has the benefit of producing a very comprehensive training set with relatively little effort. The raytraced images are also very high resolution and have high accuracy, al-lowing the learning algorithm to focus on the significant features. The simulated learning results still correlate well to realworld performance.

The big question is how well the simulated training maps to the real world task. Simulated training material is not perfectly comparable to objects in real world environments, and does not include the noise that real 2D images and reconstructed 3D models have. Its also not possible to physically try the grasps, so the labeling relies completely on intuition of the maker of the training material.

2.12.4 Human Labelling

A training set with human placed labels is the most simple, and also the most robust method, as it is hard to do better than the classification accuracy of a human. Unfortu-nately it also takes the most time, which is why there is a lot of research into avoiding it. The work can be alleviated a little by having good tools for visualizing the grasps candidates being labelled. Using actual pictures from the sensors, and labeling the grasp candidates produces the most accurate result, however it is also very time consuming and painstaking work, as the human operator must evaluate each grasp candidate, and decide whether it is suitable for grasping.

Using this method, it is possible to run the robot in unclear cases to verify the suitability of the grasp candidate for actual grasping. This requires that the scene can be restored to previous state afterwards, as otherwise more samples cannot be tried if the object moves.

This can be achieved by careful marking of object positioning.

In document Learning Grasp Affordances from Vision (sivua 21-24)