Saimaa ringed seal re-identification - Metric learning based pattern matching for species agnos

In the re-identification of Saimaa ringed seals the pelage pattern on the fur have been commonly used as differentiation factor since it uniquely identify each seal individual (see Figure 17). It is worth noting that seals have non-rigid shape which results in a large number of possible poses, this needs to be taken into account when a matching algorithm for re-identification is constructed.

Figure 17.Example of Saimaa ringed seals. [5]

Nepovinnykh et al. [5] proposed a full framework for Saimaa ringed seals re-identification.

The data used was collected from camera traps which introduced a problem of images of seals individuals taken in a same pose, background and illumination effect which might result in distracting the model from learning the identifying pattern and rather end up learning the background of the image for example. This is how the authors justified the need for the segmentation step.

Deeplab [24] is a CNN based model that provides semantic segmentation of objects in an image by specifying a class for each pixel in the image. The model consists of two parts: an encoder which encodes multi-scale contextual information using atrous

convo-lution at multiple scales and a decoder that gives accurate segmentation boundaries, in Figure 18 [24] an overview of the DeepLab architecture is shown. The use of Atrous Spatial Pyramid Pooling (ASPP) allows the model to robustly segment objects at multiple scales. The localization of object boundaries was improved significantly by combining methods from deep CNNs and probabilistic graphical models. Nepovinnykh et al. [5]

used the Deeplab model to equip the framework with a robust and precise segmentation of seals with different sizes.

Figure 18.DeepLab model architecture. [24]

Following the segmentation two postprocessing were introduced: the closing of the holes in the pattern using sliding windows convex hull and the smoothing of the borders using a Gaussian filter with thresholding. These two steps ensures that the pelage pattern is fully covered in the seal segment and hence enhance the identification result.

After segmenting the seal, the pelage pattern is extracted in order to reduce the amount of data the model needs to learn and to remove the noise which might be the result of a different weather conditions or variation in illumination. The pattern extraction algorithm used was based on Sato tubeness filter [29] to detect the continuous ridges of the pelage pattern. The pattern extraction was carried in the following steps [5]:

1. Detect the continuous ridges of the pelage pattern using sato tubeness filter [29].

2. Sharpen the pattern using an unsharping mask.

3. Remove the segmentation border which was falsely detected as part of the pattern by the filter.

4. Apply morphological opening using a disk to remove small artifacts from the image.

5. Make the image brighter using adaptive histogram normalization.

6. Highlight the pattern using otsu’s thresholding and zeroing pixels below it.

7. Apply morphological opening again using a disk to remove small artifacts which might have resulted from thresholding.

8. Apply a weaker unsharping mask to keep the pattern well defined.

Figure 19 Shows the pattern extraction steps, starting from the segmented seal as input and outputs a grayscale image containing the extracted pattern.

Figure 19. Visualization of pattern extraction result. First row: Steps 1–4 of the algorithm (from the left to the right). Second row: Steps 5–9 of the algorithm (from the left to the right). Third row: the source image (left) and the end result of pattern extraction (right). [5]

Triplet Neural Network (TNN) [33] is an extension of the Siamese Neural Network used to learn a similarity metric using a triplet loss function. TNN architecture consist of two parts: the convolutional part which learns the features of the sample and the fully-connected part which encodes the vector. TNN takes as input three samples: anchor, positive and negative. Anchor is the base sample, positive is a sample of the same indi-vidual as anchor and negative is a sample from a different indiindi-vidual than the anchor. The

goal of the network is to learn an embedding such that the distance between the encoding vectors of samples from the same individual is shorter than the distance between samples of different individuals. Assuming the first encoding was the worse case where the en-coding vectors of the negative sample was closer to the anchor than the positive sample, the network keeps on iterating until it learns to encode in such a way that the positive encoding is closer as shown in Figure 20.

Figure 20.Triplet Neural Network iterations. [34]

Mathematically, TNN reaches the right encoding by minimizing the triplet loss function which calculates the distance between each pair of encoding vectors usingL₂metric. The model converges when the difference between the distance from positive to anchor and the distance from negative to anchor is greater by a marginmas follows:

L_triplet (xa, xp, xn) = max(0, m+

kf(x_a)−f(x_p)k²₂− kf(x_a)−f(x_n)k²₂ (1)

Nepovinnykh et al. [5] uses a TNN to find the corresponding matches of pattern patches from the dataset of known individuals. After extracting the pattern, the image is cropped and rescalled to a common size ensuring the scale is relatively similar to the size of the seal. Then the pattern gets divided into multiple overlapping patches with a fixed size of 160×160pixels.

To train the TNN, three patches are passed to the network: anchor as a base, positive a patch from the same individual as the anchor and negative a patch from a different individual. To account for the variation which might result from the large number of possible poses the seal might be in as the image is being captured, seven rotated version of each patch are passed to the first part and the result is summed before it is send to the fully-connected part. The network encodes the features of each patch in a vector with a fixed size512three encoding vectors with a fixed size of512corresponding to the input. The distance between the two pairs is calculated and the network keeps on iterating

until the distance between the anchor and the positive is shorter by a pre-defined margin than the distance between the anchor and the negative [5]. The process is illustrated in Figure 21.

Figure 21.Triplet Neural Network training. [5]

The final step is the re-identification of the seal individual. Two methods based on patch comparison were used. In the first method a simple K-Nearest Neighbour (KNN) was used to identify seal individual where each patch would vote for the seal it belong to and the sum of the weighted votes is how an individual is decided. This method was able to correctly identify82.5%individuals when it provided the top-5 matches.

The second method is a Heatmap-based topologically-aware patch matching algorithm [5].

The method can be divides into three steps as follows:

1. Patch-similarity heatmap generation to select candidates of corresponding patches from the gallery image .

2. Filtering candidates using an angle-based method to preserve the topological con-sistency.

3. Individual ranking.

In order to generate the heatmaps the trained network is used to allocate an encoding vector for each patch andL₂ metric is used to find the distance between the patch in the

query image and patches from the dataset. Local minima of the heatmap which indicates patches with high similarities are identified and are marked as candidates [5]. In Figure 22 examples of the generated heatmaps are shown. Upon identifying the candidates from the patches the variation from the expected location is quantified by measuring the angel and the patch with the smallest variation is chosen as the corresponding patch. By calculating the variations of all the patches in an individual’s pattern the rank of this individual can be calculated by averaging the variation of the filtered patches. Individuals with the smallest average variation get ranked higher.

Figure 22. Examples of the region similarity heatmaps: the query image (left) and gallery im-age (right). Heatmaps for a query imim-age highlight a single region that is being compared to the entire gallery image. Heatmaps for the gallery image show regions which are most similar to the highlighted region from the query image. [5]

The topology aware algorithm was able to correctly identify 88.6% individuals when it provided the top-5 matches. In Figure 23 a visualization of the full frame work is shown.

While the level of accuracy is not enough for the framework to be used alone for reli-able re-identification, the authors believe that providing the top-5 candidates will help the experts reach an accurate decision about the seal identity faster.

Chelak et al. [35] proposed a novel feature pooling approach that creates an embedding vector of the image by aggregating the local pattern features, taking into account the spatial distribution of features. The generated embeddings can then be used to re-identify the different Saimaa ringed seals individuals. The proposed method was able to achieve up to 86.54% of accuracy when matching pattern patches.

Original Image Segmentation

Patch Extraction

Bounding Box &

Gray Scale

Pattern Extraction Triplet CNN

Identification results

Figure 23.Saimaa ringed seal re-identification framework. [5]

4 SPECIES AGNOSTIC ANIMAL RE-IDENTIFICATION FRAMEWORK

4.1 Pipeline

An animal re-identification model is built with the purpose of finding a distinct identifier for each individual based on an image of the animal. The images are captured by different camera traps, meaning that the position and angle in which the images were taken can not be assumed to be similar all the time. This issue needs to be taken into consideration when building the dataset by collecting images of the animal taken from different position to cover all the possible patterns that might appear in one image, ensuring more reliable results [1].

In order to re-identify an animal individual based on an image, three main steps need to be followed. The first step is to detect the animal in the image and segment it from the background. The second step is feature extraction, which is done by extracting the pattern from the segmented animal’s skin, fur or feather. The third and final step is the search for a matching individual where the extracted pattern is compared with the pattern of the known individuals to find the most accurate match [5]. In Figure 24 an illustration of the re-identification steps is shown.

Input Segmentation

In document Metric learning based pattern matching for species agnostic animal re-identification (sivua 22-30)