Structure of the thesis - Finding interesting images from large quantities of camera trap data

Section 2 is about animal biometrics and how to re-identify animals in general as well as Saimaa ringed seals. Section 3 focuses on automatic image dataset cleaning describing three possible methods to evaluate whether single images should belong to a specific dataset or not, and introduces convolutional neural networks. Section 4 proposes a method to find a working solution on this issue, and Section 5 describes the image dataset and the experiments made, including their results. After that comes discussion in Section 6 and conclusions in Section 7.

2 ANIMAL BIOMETRICS

2.1 Animal re-identification

2.1.1 Pipeline

Re-identifying an animal from an image usually consists of the following steps (Figure 2):

detecting the existence of the animal, cropping the animal, possibly detecting the orientation of the animal, detecting the unique identifiable part of the animal, cropping it, and finally attempting to re-identify the animal based on its identifiable part, usually by comparing the animal’s attributes with a predefined dataset (Li et al. 2019; Nepovinnykh, Eerola, and Kalviainen 2020; Yousif et al. 2019). There may also be steps of image processing such as reducing colours or increasing contrast in between the aforementioned steps. However, there are different methods to use for each step, which perform differently with different kinds of animals and environments.

Figure 2.General animal re-identification pipeline (Larregui et al. 2019).

2.1.2 Detection of interesting images

Animals as well as other objects can be detected with different methods depending on the data available. Video data enables the use of movement tracking based on the object’s pre-vious location and velocity (Lind et al. 2005; Wang et al. 2014). It also supplies multiple images which are certainly of the same animal, which could help with the re-identification.

For example if one frame of the video is blurred or obstructed, the adjacent frames could be used to complete the image of the individual animal and its identifiable part such as pelage patterns, body proportions or facial proportions. However, if the animals to be re-identified live in herds, there is no certainty that the animal in the same position in adjacent video frames are the same individual, as they might move unpredictably. Of course this depends on the frame rate of the camera and the velocity of the animals, too.

Image classification with machine learning is a modern way of detecting animals in images.

Parham et al. (2018) and Hung Nguyen et al. (2017) are two groups to take on the machine learning approach in their papers. If the machine learning method can successfully build a

model which describes the properties by which the animal can be detected, the model might be able to automatically classify images to those which contain the animal and those which do not. Usually the challenge lies in building the model. The animal in the image can appear in many different orientations and poses, and the camera always catches a two-dimensional projection of the animal, which adds complexity to the detection. In addition, the model should not classify any kind of vegetation or background as an animal.

2.1.3 Segmentation of the animal

Background subtraction methods are applicable for segmenting the animal from the image when the camera stays still and background does not have too much noise or variance. This is rarely the case in animal biometrics, as the images are usually taken in the wild where the time of date has a huge impact on how the background looks. The time of date can make the background very dark or very light, and the colour of the sky can change too. As the Sun moves throughout the day, the shadows are cast at different directions at different angles too, which can make the background look very different especially in forests, under cliffs, and in other places with noticeable altitude variations. Even the wind alone can make a huge difference if the background has trees with long branches and large leaves, or a body of water which might have a flat surface or waves.

Another way to try to segment the animal from the image is using a neural network ap-proach. Nepovinnykh, Eerola, and Kalviainen (2020) used an image segmentation tool called DeepLab (Chen et al. 2018), which is based on a deep CNN model improved by other tech-niques such as atrous convolution and atrous spatial pyramid pooling. As postprocesing to the resulting segmentation, they closed the holes left by the segmentation and applied simple Gaussian filter. Closing the holes means, as they were also segmenting Saimaa ringed seals, ensuring that the outline of the segmentation result is smooth and without sudden discon-nections. The Gaussian filter was used to turn to segmentation result into binary; each pixel either fully belongs to the seal or does not belong at all. The process is explained in detail in Section 2.2.

2.1.4 Animal individual identification

When the animal is segmented from the image, the next step is to identify the individual.

Many animal species have a part of their body, which is unique for each individual. Some of those are visible far away, such as pelage patterns of zebras, giraffes, and leopards. They are

excellent species for image-based re-identification, as the animals can be photographed from distance and the unique feature can be captured relatively precisely, as with zebras in Crall et al. (2013). Some other unique parts, such as blood vessel patterns and bat wing patterns, can only be examined if the animal is caught (Amelon et al. 2017).

Humans have traditionally been re-identified by their fingerprints (Clarke 1994). While gen-erally practical, this solution has the problem, that the fingerprint can change due to a trauma to the finger. Another unique part in human body is the iris pattern in the eye, which has been used in more modern re-identification implementations (Hallinan 1991). Third way is to try to measure the proportions of the human face (Samal et al. 1992). It is done by recognizing the key elements which often include lips, nose, cheeks, eyes and eyebrows. The relative sizes, positions and shapes of these elements are then examined and compared to previously known faces. None of these methods are perfect though. Some iris scanners can be fooled with accurate printed images, and facial re-identification might fail because of a different enough angle of the camera.

If the animal’s pose can be predicted, or the identifiable part of the animal is located in a particular spot on the animal’s skin, it could be useful to try to find the animal’s pose. Li et al. (2019) ended up with this solution with their proposed method to re-identify Amur tigers.

After they predicted the tiger’s pose, they could tell for sure that certain stripe patterns were from the same spot on the skins of the tigers in different images.

The re-identification of the animal can then be done by comparing the extracted unique parts. A popular tool in automatic image processing is the convolutional neural network (Section 3.5). It can detect shapes and patterns in images, and when combined with other computational methods, it can be used to predict if the animal skins in different images belong to the same individual or not, as described in Section 2.2.

In document Finding interesting images from large quantities of camera trap data (sivua 9-12)