THE PROPOSED APPROACH - Radar Image - Towards Simulation-based Verification of Autonomous Navig

Radar Image

3 THE PROPOSED APPROACH

This research proposed a target detection approach based on an improved R-CNN algorithm for Inland River crossing area radar image. According to the principle of R-CNN algorithm, a model for dichotomy needs to be trained first. The sample sets used for the training of dichotomy model is created by artificial experience and divided into positive sample set and negative sample set. When the model reaches the stable state after training and testing, the constantly updated radar images are used as input. The constantly updated radar images need to be pre-processed first. Then, proposal regions are obtained based on the breadth-first search algorithm, which is the improvement of R-CNN algorithm in this study. After adjusting the fixed size of all the proposal regions, the trained CNN network is used to extract features from them, and the categories in the proposal regions are judged according to SVM. Finally, regression is used to modify the region proposal position and output the result. The flowchart of this approach is illustrated in Figure 1. Particularly, only real-time radar images are used as the data source.

Figure 1: The flowchart of the proposed approach

3.1 Feature extraction and SVM training

For general optical images, the complexity of the target to be detected in the image is high in contour, color, texture, size, spatial overlap and other aspects. Therefore, in order to extract target feature better, the convolutional neural network structure is usually designed to be relatively complex. There are eight weighted layers in the Alexnet ^[14]. The structure of Alexnet is shown in Figure 2. The first five layers are the convolution layer, and the remaining three are the full connection layer. The output of the last full connectivity layer is the input of 1000 dimensional softmax. Softmax generates a distribution of 1000 categories. In addition, the detection of targets in optical images may involve multiple target categories, such as identifying people, cars, plants and animals etc. Therefore, the classification number will be set according to the actual situation in the classification model of training or downloading. In the R-CNN algorithm, the number of classifications after fine-tuning was 21. There are 20 categories and 1 background.

Figure 2: The structure of AlexNet

However, for the marine radar images, the targets to be detected are the vessels in crossing area. The representation of radar image is divided into foreground and background. The spot formed by the radar reflection wave in the image is considered as the suspected target or foreground, and the empty area in the image is considered as the background. Compared with the optical images, radar images differ greatly both in imaging form and in the complexity of the contour details. The size of the detected target spot in the radar image is related to the actual size of the target. The contour of the radar spot is irregular, and there are differences in the contour of the same target in the continuous output radar image. But, the vessel spot in radar image is basically fusiform and has a certain aspect ratio, while the other object spot is totally irregular. Therefore, with the support of enough positive and negative sample sets, the structure of convolutional neural network can be simplified for feature extraction of vessel spot. In this research, the convolutional neural network structure is simplified. The feature extraction network consists of two convolution layers, two pooling layers and one relu layer. The process of feature extraction and SVM training is shown in Figure 3.

Figure 3: CNN feature extraction

As a kind of dichotomy classifier, the main idea of SVM is to find a hyperplane in the space that can divide all data samples, and make the distance between all data from samples of different categories to this hyperplane be the shortest. The principle is shown in Figure 4.

Figure 4: Schematic diagram of SVM

The hyperplane is shown in equation (1)

w^𝑇𝑥 + 𝑏 = 0 (1)

Suppose P(𝑥1, 𝑥2, … , 𝑥𝑛) is a point in the sample, where x represents the ith characteristic variable, then the distance d from this point to the hyperplane can be calculated by equation (2).

𝑑 = |w₁∗ 𝑥₁+ w₂∗ 𝑥₂+ ⋯ + w_𝑛∗ 𝑥_𝑛+ b|

√w₁²+ w₂²+ ⋯ + w_n² = |W^𝑇∗ 𝑋 + b|

||W||

(2)

Where ‖W‖ is the norm of the hyperplane, and the constant b is similar to the intercept of the linear equation.

3.2 Radar image preprocessing

Marine radar has been widely used in all types of ships or coastal surveillance as important navigational aids. In general, the detection range of marine radar can be altered manually, so the range of marine radar detection area will be different according to the actual situation. The output images of different types of shore-based monitoring radars are shown in Figure 5 (a) and (b). It can be seen from the figures that the center of the image is the location of the radar. In a radar image, besides various targets, there are also a lot of noises. Some noises have been marked in figures. In addition, the channel has been marked. However, the channel portion of the radar image is the exactly focus area for regulators. When carrying out target detection on radar image, a large amount of additional calculation data will be generated in most areas of the image except the channel, which will seriously affect the calculation speed. Therefore, in order to minimize the computation, the original radar image must be pre-processed.

The general image pretreatment method includes binarization, dilation and erosion etc. But these pretreatment methods are not suitable. The reason is that although these operations can reduce the noise in the image to a certain extent, the noise points in the radar image are relatively small, which will not affect the detection of the target, or even can be ignored. In addition, dilation and erosion operations will make the contour of the spot in the radar image become smooth, which is not conducive to feature extraction. Therefore, in this study, the pretreatment of radar image is only to use the mask to filter out the channel, which only has the signal in the channel. This reduces the amount of data and improves the computing speed.

(a) (b)

Figure 5: The output images of shore-based monitoring radar

3.3 Region proposal

Traditional target detection algorithms have low efficiency. The inefficiency is mainly caused by two factors. On the one hand, the region selection strategy based on sliding window is not targeted, with high time complexity and window redundancy. On the other hand, the characteristics of manual design do not have good robustness to the variation of diversity. However, with the advent of target detection algorithm based on deep learning, target detection has made a great breakthrough. The breakthrough of R-CNN target detection algorithm lies in the use of Region Proposal + CNN instead of the traditional sliding window + manual design. In this way, target detection is decomposed into extraction of region proposal first, and then feature extraction + classification operation is carried out on the images within the region proposal.

The R-CNN algorithm uses the selective search to extract about 2000 region proposals which may contain objects from bottom to top in the image. The input is a color image, and the output is set of object location hypotheses. The selective search algorithm gives priority to merging four regions, namely those with similar color, similar texture, small combined total area and large proportion of combined total area in BBOX. These four rules only relate to the color histogram, texture histogram, area and location of the area. The combined regional features can be directly calculated from the sub-regional features. The selective search algorithm is shown in equation 3, 4, 5, 6, 7.

𝑠_{𝑐𝑜𝑙𝑜𝑟}(𝑟_𝑖, 𝑟_𝑗) = ∑ min(𝑐_𝑖^𝑘, 𝑐_𝑗^𝑘)

𝑛 𝑘=1

(3)

𝑠_{𝑡𝑒𝑥𝑡𝑢𝑟𝑒}(𝑟_𝑖, 𝑟_𝑗) = ∑ min(𝑡_𝑖^𝑘, 𝑡_𝑗^𝑘)

𝑛 𝑘=1

(4)

𝑠𝑠𝑖𝑧𝑒(𝑟𝑖, 𝑟𝑗) = 1 −𝑠𝑖𝑧𝑒(𝑟_𝑖) + 𝑠𝑖𝑧𝑒(𝑟_𝑗)

𝑠𝑖𝑧𝑒 (𝑖𝑚) (5)

fill(𝑟_𝑖, 𝑟_𝑗) = 1 −𝑠𝑖𝑧𝑒(𝐵𝐵_𝑖𝑗) − 𝑠𝑖𝑧𝑒(𝑟𝑖) − 𝑠𝑖𝑧𝑒(𝑟_𝑗) 𝑠𝑖𝑧𝑒 (𝑖𝑚)

(6)

𝑠(𝑟_𝑖, 𝑟_𝑗) = 𝑎₁𝑠_{𝑐𝑜𝑙𝑜𝑟}(𝑟_𝑖, 𝑟_𝑗) + 𝑎₂𝑠_{𝑡𝑒𝑥𝑡𝑢𝑟𝑒}(𝑟_𝑖, 𝑟_𝑗) + 𝑎₃𝑠_{𝑠𝑖𝑧𝑒}(𝑟_𝑖, 𝑟_𝑗) + 𝑎₄fill(𝑟_𝑖, 𝑟_𝑗) (7)

Although the selective search algorithm is no longer as exhaustive as the traditional target detection algorithm, the number of region proposals extracted by selective search is up to about 2000. All the suspected targets contained in these 2000 region proposals need CNN feature extraction and SVM classification. Therefore, the whole process requires a large amount of computation, which is also the reason for the slow detection speed of R-CNN algorithm.

However, it has been mentioned in the previous discussion that there are clear distinction between radar images and optical images. The low number of color channels in radar image is the most important feature. In addition, the regional division is also obvious in the radar image. Based on these features of radar images, an algorithm suitable for searching radar color block can be considered in region proposal determination of radar image. In this research, breadth-first search ^[15]

(BFS) algorithm is proposed to replace the selective search algorithm. This is the improvement of R-CNN.

Breadth-first search is an image search algorithm. The algorithm starts at the root node at run time, traverses the nodes of the tree along the width of the tree, and terminates if a target is found.

Breadth-first search is blind to the target search process, which can be understood as a blind search method. The purpose of the breadth-first search is to systematically expand and examine all the nodes in the diagram for results. In other words, the BFS algorithm thoroughly searches the entire graph until it finds a result, regardless of the possible location of the result.

The search process of breadth first algorithm is similar to the hierarchical traversal of tree. In operation, starting from a vertex in the graph, we first traverse each vertex, then all its adjacency points, and then from these adjacency points, we also visit their adjacency points successively.

According to this process, the algorithm will not end until the adjacent points of all the accessed vertices in the graph are accessed.

The search process of breadth-first search algorithm is shown in Figure 6. In the figure, assume 1 as the starting point, and traverse all its adjacent points 2 and 3. Starting at 2, and traverse all its adjacent points 4 and 5. Starting at 3, and traverse all its adjacent points 6 and 7. Starting at 4, and traverse its adjacent point 8. Start at 5, and since all the starting points of 5 are already accessed, skip it. Points 6 and 7 are treated the same way as points 5.

Figure 6: The search process of BFS

In document Towards Simulation-based Verification of Autonomous Navigation Systems (sivua 60-65)