Features - Learning Grasp Affordances from Vision

There were several categories of features used to create the feature vector for the classifier.

These features could be categorized as angle features, discontinuity features, distance features and sphere features.

The angle features consider the stability of the grasp by calculating the angle of the edge at the contact point, and calculating the difference between the angle and the angle of the line connecting the contact points. Discontinuity features try to ensure that a single objects is grasped, from its closest to camera portion. Distance features ensure the distance between the contact points is with graspable limits, and allows preferring optimal grasp width.

Finally, sphere features calculate the number of point cloud points blocking the robot hand and gripper, to make sure there is nothing in the way that the robot hand could collide with. Sphere features also easily remove grasps that are detected too close to the table, as the table surface is in the way.

The following sections explain these features in detail.

3.4.1 Angle Features

The most important new feature of the method in [3] is the angle feature. This feature is calculated by taking a histogram over the gradients of a template patch (size 10x10 pixels) near the contact point, and selecting the two most significant angles present in

the gradient. The gradient field is calculated with multiple edge detection algorithms:

Sobel, Prewitt and Roberts. The feature is taken for both the intensity and depth images separately. The gradient is divided into 36 bins with 10^◦ width for the histogram.

These detected angles are then normalized by taking the difference between the angle and the line connecting the two contact points of the grasping pair. This provides a strong representation of the stability of the grasp. If the angles at both ends of the grasp pair are linear to the line connecting the contact points, this means the grasp points are on perpendicular surfaces, which inherently produces stable grasps.

Figure 10 shows the angle feature visualized. The blue square shows the 10x10 template at the end of the purple line connecting the contact points. The green lines are the most significant angles in the histogram of the gradient, and the yellow lines are the second most significant. As can be seen, the green lines are nearly parallel with the line connect-ing the contact points, detectconnect-ing the fact that the contact point is on two sides parallel to each other and would be a good candidate for a stable grasp. The second most significant angle for both contact points detects that the grasp is near the top of the object, so this grasp is not quite perfect, as its more likely to slip. A better grasp would be lower down the object.

Figure 10.Visualization of the angle features.

As an example, in Figure 11 the gradient for the template of the right contact point from Figure 10 is shown. The gradient is calculated with the Roberts edge filter. The dominant

gradient angles are towards the right and tilted slightly upwards. The histogram is shown in Figure 12, clearly showing the two dominant peaks.

Figure 11.Gradient of the angle template.

Figure 12.Histogram of gradient angles in the template.

3.4.2 Line Discontinuity Feature

The line discontinuity feature distinguishes grasps that are not on the same object from ones that are. The intensity or depth values on the line between the contact points are taken, and some statistics on the values are calculated.

First off, standard deviation of the intensity and depth is taken. Second, the difference between the minimum and maximum values. Third, any discontinuities larger than a set threshold are counted. For strong grasps, the number of discontinuities should be zero, and the depth should deviate as little as possible. This ensures the grasp is on the same object, and is the leanest grasp available.

The discontinuity feature distinguishes grasps where the contact points are on two sepa-rate objects, or where there is an object in front of the object where the contact points are, and also grasps where the points are not on the object at all.

Imagine a grasp, where the two grasp points are on the table, off the left and right side of the object, so that the connecting line passes over the object. In Figure 13 one sees the depth plot for such a grasp. The object can be clearly seen as two significant discontinu-ities. A solid grasp for a simple object usually has no discontinuity to speak of. If there are multiple objects, all their edges would be visible.

Figure 13.Plot of depth values on line between grasp points.

One potential improvement would be to evaluate the amount of discontinuity as a linear value. The second possible improvement would be to distinguish between positive and negative discontinuities.

3.4.3 Distance Features

The distance features are calculated by taking the 2D, 3D euclidean distance between the points, meaning second norm, or simply the "straight line" distance between the points.

Additionally, both 2D and 3D manhattan distance between the contact points is also taken.

Manhattan distance is the distance of the points along the axes, also known as the taxicab distance.

Furthermore, the feature is normalized for euclidean distances so that zero is at the best grasp width of the robot. In case of the MELFA robot with the Weiss Robotics PG-70 based gripper, the optimal grasp width is around 60mm, with viable grasps between 10mm and 72mm. The formula for the normalized distance is

d=kd_pair−d_optimalk (2)

where d is the distance feature, d_pair is the euclidean distance between the points and d_optimal = 60mmis the optimal gripper width. [3]

Figure 14 shows a contact point pair and how 2D euclidean distance and 2D manhattan distance are measured. 3D distances are not shown, however they are similar, simply extended to the added 3rd dimension.

Figure 14.Example of euclidean and manhattan distance for a point pair.

In addition to distance between contact points, the distance from the contact points to the robot base and the camera is calculated. This lets us differentiate grasps which are in the area where the robot can best grasp, as well as preferring those objects that are closer to the camera.

Le [3] discards the height from the table to make the method more universal, however in our configuration the table height is always known due to the required calibration between the table-mounted robot and stereo camera. For this reason the distance from the table

surface is used to distinguish grasps that are too close to the table.

3.4.4 Sphere Features

The sphere features are calculated by making a point cloud of the depth map and calcu-lating the number of points within a sphere. The method used defines one sphere feature, taken just outside the contact points (moving along the 3-D line connecting the contact points). This filters out contact points blocked by something else, and also removes con-tact points that lie on the surface of the table, as there are points from the table surface within the sphere.

In addition, this work adds a third sphere feature to cover for the bottom of the gripper.

This is improves results because the camera is located to the side and the robot grasps from above the table. This could lead to grasps being detect so low on the object that in practice the robot could not grasp it. To keep things simple, this feature is calculated by taking the middle point of the contact points and moving upward on Z axis as far as the gripper depth.

With some experimentation, the sphere features location and size were chosen. Location was placed so that the sphere’s center is 20mm away from the contact point along the line connecting the contact point. The radius of the sphere was set at 15mm so that it does not contain points from the object surface at the contact point.

In document Learning Grasp Affordances from Vision (sivua 36-41)