Finding interesting images from large quantities of camera trap data

(1)

Degree Program in Computational Engineering and Analytics Bachelor’s thesis

Juho Kauppala

Supervisor: Docent, D.Sc. Tuomas Eerola

(2)

Lappeenranta-Lahti University of Technology LUT School of Engineering Science

Degree Program in Computational Engineering and Analytics Juho Kauppala

Finding interesting images from large quantities of camera trap data

Bachelor’s thesis 2020

42 pages, 19 figures, 2 tables

Supervisor: Docent, D.Sc. Tuomas Eerola

Keywords: Camera trap; Convolutional neural network; Computer vision; Image classification; Saimaa ringed seal;

To protect the endangered Saimaa ringed seal population, the ringed seals are monitored with automatic camera traps in order to gather knowledge about them including population size, territory, age, health and breeding data. To achieve this, the monitoring system has to be able to tell apart individual ringed seals from the automatically gathered images. For such a system to work efficiently, it has to be able to find the images with animal for further analysis.

Methods to re-identify Saimaa ringed seals as well as other animals from images have already been proposed. A review to animal re-identification and automatic image dataset cleaning are made, and the basic principle of convolutional neural network is presented. A dataset which contains images of Saimaa ringed seals as well as empty images is also introduced.

A model for cleaning the Saimaa ringed seal dataset, that is removing images that do not contain a ringed seal, is proposed. The model consist of splitting the image into small patches, predicting the probability for each patch to contain a Saimaa ringed seal using a convolutional neural network, and then deciding whether the image contains a Saimaa ringed seal based on an connected-component labeling algorithm. The model performs well, achieving the accuracy of more than 90%.

(3)

Lappeenrannan-Lahden teknillinen yliopisto LUT School of Engineering Science

Laskennallisen tekniikan ja analytiikan koulutusohjelma Juho Kauppala

Kiinnostavien kuvien etsiminen suuresta riistakameroilla ker¨atyst¨a kuvajoukosta

Kandidaatinty¨o 2020

42 sivua, 19 kuvaa, 2 taulukkoa

Ohjaaja: Dosentti, TkT Tuomas Eerola

Avainsanat: Konvoluutioneuroverkko; Kuvien luokittelu; Riistakamera; Saimaannorppa;

Tietokonen¨ak¨o;

Uhanalaisen saimaannorppapopulaation suojelemiseksi norppia tarkkaillaan automaattisilla riistakameroilla. Tarkoituksena on kerätä tietoa etenkin norppapopulaation koosta, norppien reviiristä, eliniästä, terveydentilasta sekä lisääntymisestä. Jotta tämä onnistuisi, tarkkailujär- jestelmän on pystyttävä erottamaan valokuvista eri norppayksilöt. Sitä varten tarkkailujär- jestelmän on kyettävä ensin tunnistamaan, onko valokuvassa norppa ylipäänsä.

Keinoja saimaannorppien sekä muiden eläinlajien automaattiseen tunnistamiseen kuvista on jo esitetty. Työssä esitellään perusasioita eläinten tunnistamisesta, automatisoidusta kuvajoukon suodattamisesta sekä kerrotaan konvoluutioneuroverkon toimintaperiaate. Esitellään myös kuvajoukko, joka sisältää kuvia saimaannorpista sekä tyhjiä kuvia.

Ehdotetaan mallia kuvajoukon suodattamiseen tyhjistä kuvista. Arvioitava kuva pilkotaan ensin pieniksi paloiksi, jokaisen palan todennäköisyys sisältää saimaannorppaa arvioidaan konvoluutioneuroverkon avulla ja näistä todennäköisyyksistä kootaan päätös, onko arvioitavassa kuvassa saimaannorppa vai ei. Päätös kootaan käyttäen algoritmia, joka ryhmittelee vierekkäiset positiivisen arvon saaneet palat alueiksi ja tutkii suurimman löydetyn alueen kokoa. Malli saavuttaa yli 90% tarkkuuden, mitä voidaan pitää hyvänä tuloksena.

(4)

Symbols and abbreviations 6

1 INTRODUCTION 7

1.1 Background . . . 7

1.2 Objectives and delimitations . . . 8

1.3 Structure of the thesis . . . 9

2 ANIMAL BIOMETRICS 10 2.1 Animal re-identification . . . 10

2.1.1 Pipeline . . . 10

2.1.2 Detection of interesting images . . . 10

2.1.3 Segmentation of the animal . . . 11

2.1.4 Animal individual identification . . . 11

2.2 Saimaa ringed seal re-identification . . . 12

3 AUTOMATIC IMAGE DATASET CLEANING 15 3.1 Background . . . 15

3.2 Cleaning by detection . . . 15

3.3 Background subtraction . . . 16

3.4 Image classification . . . 16

3.5 Convolutional neural network . . . 17

3.5.1 Neural network . . . 17

3.5.2 Convolutional layers . . . 18

3.5.3 Typical convolutional neural network architecture . . . 20

(5)

4 PROPOSED METHOD 23

4.1 Saimaa ringed seal data cleaning . . . 23

4.2 Patch cropping . . . 23

4.3 Patch classification using CNN . . . 26

4.4 Decision rules . . . 26

5 EXPERIMENTS 29 5.1 Data . . . 29

5.1.1 Initial image dataset . . . 29

5.1.2 Preprocessing . . . 30

5.2 Experimental arrangements . . . 30

5.3 Evaluation criteria . . . 32

5.4 Results . . . 33

6 DISCUSSION 35

7 CONCLUSIONS 37

REFERENCES 38

(6)

A Area of the image to be cropped F₁ F₁score

f Neural network activation function l_x Width of the image to be cropped l_y Height of the image to be cropped

M Area where the center points of the patches lie M_x_0,1 x-coordinates of the area’s borders (left, right) M_y_0,1 y-coordinates of the area’s borders (top, bottom) m_x Number of patches created in horizontal dimension m_y Number of patches created in vertical dimension n₋ Number of the false negative results

P Precision

p₊ Number of the true positive results p₋ Number of the false positive results

R Recall

s Patch side length w₀ Neuron bias

w_n Neuron input weights x_n Neuron input values y Neuron output value

CCL Connected-component labeling CNN Convolutional neural network ReLU Rectified Linear Unit

RGB Red, Green and Blue

(7)

1 INTRODUCTION

1.1 Background

In the days of accelerating global warming (Smith et al. 2015) and growth of human population, an increasing number of animal species find their natural habitat endangered. Especial cause for concern are different kinds of endemic species, which can be more radically af- fected than others as their niche is already thin (Malcolm et al. 2006). Several species are already extinct, and actions should be taken to preserve as many different species as possible, for the damage to biodiversity can cause unexpected consequences in sensitive ecosystems and indirectly impact humans’ way of living too. Saimaa ringed seal and Ladoga ringed seal are two of the mentioned endemic species, existing only in lake Saimaa and lake Ladoga correspondingly. International Union for Conservation of Nature and Natural Resources has classified Saimaa ringed seal as Endangered (EN) and Ladoga ringed seal Vulnerable (VU), on the scale from Least concern (LC) to Extinct (EX) (Sipil¨a 2016a; Sipil¨a 2016b). The Saimaa ringed seal is also the symbol of Finnish nature conservation, and therefore its ex- tinction would be also a remarkable symbolic loss in the field.

This work is related to the CoExist project (Project ID: KS1549) funded by the European Union, the Russian Federation and the Republic of Finland via The South-East Finland - Russia CBC 2014-2020 programme. CoExist is a collaborative project between LUT Univer- sity, Department of Environmental and Biological Sciences of University of Eastern Finland (lead partner), Biologists for nature conservation (St. Petersburg, Russia) and South Karelia Division of The Finnish Association for Nature Conservation. The project aims to automate the monitoring of ringed seals in such a manner, that no device needs to be attached to the seal, as well as to strengthen the Russian-Finnish cross-border cooperation in conservation biology. To monitor the seals there are camera traps set in the wild in Finland and Russia, which have provided a large number of automatically taken images. Nepovinnykh, Eerola, and Kalviainen (2020) proposed a way to automatically re-identify seals by cropping them from the picture and comparing their pelage pattern with a dataset. However, as this only applies to images with a ringed seal in it, a way to remove empty images is necessary to save time and computer resources. This thesis reviews the key techniques related to animal re-identification and image dataset cleaning and proposes a method to implement a dataset cleaning model.

(8)

Figure 1. A Saimaa ringed seal in a camera trap image.

1.2 Objectives and delimitations

The goal of this thesis is to find the most effective and efficient way to tell, whether an image should be discarded or processed to recognize a ringed seal in it. This consists of the following parts:

1. Review possible approaches for recognizing whether an image contains a Saimaa ringed seal.

2. Implement an adequate method based on the review.

3. Prepare a dataset for training and testing the method.

4. Evaluate the fitness of the implemented method.

This thesis is delimited to recognize only images of Saimaa ringed seals, as it is the species which the conservation project targets. It will also be trained only with images taken with camera traps, so images that differ greatly from those might not get the correct result and should not be evaluated with the programme. Images are also ordinary color images, therefore images taken with infrared cameras or such will not be evaluated correctly. Images should also be roughly the same size as those used in the training process.

(9)

1.3 Structure of the thesis

Section 2 is about animal biometrics and how to re-identify animals in general as well as Saimaa ringed seals. Section 3 focuses on automatic image dataset cleaning describing three possible methods to evaluate whether single images should belong to a specific dataset or not, and introduces convolutional neural networks. Section 4 proposes a method to find a working solution on this issue, and Section 5 describes the image dataset and the experiments made, including their results. After that comes discussion in Section 6 and conclusions in Section 7.

(10)

2 ANIMAL BIOMETRICS

2.1 Animal re-identification

2.1.1 Pipeline

Re-identifying an animal from an image usually consists of the following steps (Figure 2):

detecting the existence of the animal, cropping the animal, possibly detecting the orientation of the animal, detecting the unique identifiable part of the animal, cropping it, and finally attempting to re-identify the animal based on its identifiable part, usually by comparing the animal’s attributes with a predefined dataset (Li et al. 2019; Nepovinnykh, Eerola, and Kalviainen 2020; Yousif et al. 2019). There may also be steps of image processing such as reducing colours or increasing contrast in between the aforementioned steps. However, there are different methods to use for each step, which perform differently with different kinds of animals and environments.

Figure 2.General animal re-identification pipeline (Larregui et al. 2019).

2.1.2 Detection of interesting images

Animals as well as other objects can be detected with different methods depending on the data available. Video data enables the use of movement tracking based on the object’s previous location and velocity (Lind et al. 2005; Wang et al. 2014). It also supplies multiple images which are certainly of the same animal, which could help with the re-identification.

For example if one frame of the video is blurred or obstructed, the adjacent frames could be used to complete the image of the individual animal and its identifiable part such as pelage patterns, body proportions or facial proportions. However, if the animals to be re-identified live in herds, there is no certainty that the animal in the same position in adjacent video frames are the same individual, as they might move unpredictably. Of course this depends on the frame rate of the camera and the velocity of the animals, too.

Image classification with machine learning is a modern way of detecting animals in images.

Parham et al. (2018) and Hung Nguyen et al. (2017) are two groups to take on the machine learning approach in their papers. If the machine learning method can successfully build a

(11)

model which describes the properties by which the animal can be detected, the model might be able to automatically classify images to those which contain the animal and those which do not. Usually the challenge lies in building the model. The animal in the image can appear in many different orientations and poses, and the camera always catches a two-dimensional projection of the animal, which adds complexity to the detection. In addition, the model should not classify any kind of vegetation or background as an animal.

2.1.3 Segmentation of the animal

Background subtraction methods are applicable for segmenting the animal from the image when the camera stays still and background does not have too much noise or variance. This is rarely the case in animal biometrics, as the images are usually taken in the wild where the time of date has a huge impact on how the background looks. The time of date can make the background very dark or very light, and the colour of the sky can change too. As the Sun moves throughout the day, the shadows are cast at different directions at different angles too, which can make the background look very different especially in forests, under cliffs, and in other places with noticeable altitude variations. Even the wind alone can make a huge difference if the background has trees with long branches and large leaves, or a body of water which might have a flat surface or waves.

Another way to try to segment the animal from the image is using a neural network approach. Nepovinnykh, Eerola, and Kalviainen (2020) used an image segmentation tool called DeepLab (Chen et al. 2018), which is based on a deep CNN model improved by other techniques such as atrous convolution and atrous spatial pyramid pooling. As postprocesing to the resulting segmentation, they closed the holes left by the segmentation and applied simple Gaussian filter. Closing the holes means, as they were also segmenting Saimaa ringed seals, ensuring that the outline of the segmentation result is smooth and without sudden discon- nections. The Gaussian filter was used to turn to segmentation result into binary; each pixel either fully belongs to the seal or does not belong at all. The process is explained in detail in Section 2.2.

2.1.4 Animal individual identification

When the animal is segmented from the image, the next step is to identify the individual.

Many animal species have a part of their body, which is unique for each individual. Some of those are visible far away, such as pelage patterns of zebras, giraffes, and leopards. They are

(12)

excellent species for image-based re-identification, as the animals can be photographed from distance and the unique feature can be captured relatively precisely, as with zebras in Crall et al. (2013). Some other unique parts, such as blood vessel patterns and bat wing patterns, can only be examined if the animal is caught (Amelon et al. 2017).

Humans have traditionally been re-identified by their fingerprints (Clarke 1994). While generally practical, this solution has the problem, that the fingerprint can change due to a trauma to the finger. Another unique part in human body is the iris pattern in the eye, which has been used in more modern re-identification implementations (Hallinan 1991). Third way is to try to measure the proportions of the human face (Samal et al. 1992). It is done by recognizing the key elements which often include lips, nose, cheeks, eyes and eyebrows. The relative sizes, positions and shapes of these elements are then examined and compared to previously known faces. None of these methods are perfect though. Some iris scanners can be fooled with accurate printed images, and facial re-identification might fail because of a different enough angle of the camera.

If the animal’s pose can be predicted, or the identifiable part of the animal is located in a particular spot on the animal’s skin, it could be useful to try to find the animal’s pose. Li et al. (2019) ended up with this solution with their proposed method to re-identify Amur tigers.

After they predicted the tiger’s pose, they could tell for sure that certain stripe patterns were from the same spot on the skins of the tigers in different images.

The re-identification of the animal can then be done by comparing the extracted unique parts. A popular tool in automatic image processing is the convolutional neural network (Section 3.5). It can detect shapes and patterns in images, and when combined with other computational methods, it can be used to predict if the animal skins in different images belong to the same individual or not, as described in Section 2.2.

2.2 Saimaa ringed seal re-identification

Saimaa ringed seals have a pelage pattern that can be used in re-identification, as it is unique for each individual seal. When the animal’s pose cannot be predicted, which is the case with Saimaa ringed seals, other solutions than pose prediction need to be applied to take into account that different images might not be of the same part of the animal. After the seal is segmented from the image (see e.g. Zhelezniakov et al. 2015), Nepovinnykh, Eerola, K¨alvi¨ainen, et al. (2018) proposed to identify seals using solutions based on convolutional neural network and support vector machine. Chehrsimin et al. (2018) proposed to re-identify seals using existing Wild-ID (Halloran et al. 2015) and HotSpotter (Crall et al. 2013) meth-

(13)

ods. The latest method was proposed by Nepovinnykh, Eerola, and Kalviainen (2020), who decided to crop the Saimaa ringed seal’s pelage pattern into several small patches, which were then compared to known patches in the database (Figure 3). Because the seal can lie in different poses, the patches were also rotated from -30 to 30 degrees to adapt to the changes in the seal’s pose. They proposed a method to extract the pelage pattern with eight steps:

1. Sato tubeness filter (Sato et al. 1998) detects continuos ridges.

2. Sharpen the image using unsharp masking.

3. Remove seal borderline which the Sato filter detects as a tube.

4. Remove small artifacts with morphological opening.

5. Brighten the image with adaptive histogram normalization (Vishwakarma et al. 2009).

6. Clarify pattern edges by applying maximum likelihood thresholding (Kurita et al.

1992) and zeroing out pixels below it.

7. Remove small artifacts with morphological opening.

8. Sharpen the image using unsharp masking.

Figure 3. Saimaa ringed seal re-identification pipeline by Nepovinnykh, Eerola, and Kalviainen (2020).

These steps produce a grayscale image of the pelage pattern of the Saimaa ringed seal, with high contrast and sharp, clearly visible circles and ridges. This image is then split into patches, which contain a small part of the pelage pattern. Each patch is then compared with

(14)

all other patches stored in the database, resulting in a decision on whether the patch matches with a seal in the database or not.

Nepovinnykh, Eerola, and Kalviainen (2020) compared patches using a Triplet Neural Net- work (Hoffer et al. 2015). The basic principle is that for each patch, a convolutional section first calculates the features of the image, then a fully connected layer encodes the features into a vector. Then the distances of the vectors are compared.

The network is called a Triplet Neural Network because of how it operates. It uses three patches at a time for training. One of the patches is called the anchor, and the other two are from different seal individuals: the one called positive is from the same seal as the anchor, and the one called negative is from another seal. Each of the three images produce an encod- ing vector, and the distances from one vector to another can be calculated. The goal of the training is to get a network, which gives such vectors as result, that the distance between the anchor and the positive is shorter than the distance between the anchor and the negative.

(15)

3 AUTOMATIC IMAGE DATASET CLEANING

3.1 Background

Three commonly used techniques in image dataset cleaning are:

1. Detecting the animal using an object detection or segmentation algorithm.

2. Detecting the animal using background reduction techniques.

3. Classifying the images using for example a convolutional neural network.

An object detection or segmentation algorithm maps the area which contains the desired object such as an animal, and if it recognizes it correctly, it can be used to filtering images based on whether it finds an object or not. Background reduction is based on the idea of comparing a new image with another of the empty background, and if the new image differs enough from the empty one, it must contain something else. Convolutional neural networks learn the looks of the animal and predict if the image contains one.

3.2 Cleaning by detection

Segmentation algorithms to separate a Saimaa ringed seal from the background have been developed (e.g. Zhelezniakov et al. 2015). The goal of the algorithm is to find the animal of interest from the image. It labels each pixel into animal or background forming a binary map. Thus the resulting binary map has the same resolution as the original image.

Although encoded in different manners, image files are after all two dimensional arrays of pixel color values, which makes them easily readable and operable for the computer. By analyzing the resulting images of the segmentation algorithm it could be possible to make a decision whether the image contains an animal or not. This could be done for example by setting a threshold of how many pixels labeled as animal must be found or how big the percentage of such pixels should be.

The problem with this solution is that even though the segmentation algorithm can separate the animal from the background, it might work unexpectedly if there is no animal in the image. It might try to segment another object which resembles the desired animal by looks.

(16)

3.3 Background subtraction

Background subtraction methods function by telling apart new components in images by comparing them with empty background images (Piccardi 2004). It is a common technique in tracking objects on video, for the differences in subsequent images on video are usually minimal, as the video camera takes normally dozens of images per second. This makes it practical to analyze images by subtracting their common background, which leaves the moving object conveniently separated in the former and the latter image, like done by Barnich et al. (2010). There are different implementations for background subtraction. Sobral et al.

(2014) divide methods into five categories as shown in Table 1.

To be able to use background subtraction in animal detection, each camera and setting needs to have its own model of empty background, on which the animal then lies or not. However, nature images tend to have a lot of variation on their own. Different seasons bring many changes to the view, like sunlight level and angle, possible ice on lake surface and amount of vegetation. The purpose of the pelage pattern of various animals is to camouflage, which does not make the task any easier. On top of that, even if the background could be modeled well enough, every foreign object in the image is not an animal of interest. There are other animals as well as humans and other moving objects in the image set. These foreign objects would need to go through another check whether it is the animal of interest or not, which brings us back to the start. Background subtraction might be able to recognize if the image has a foreign object in it or not, but it cannot alone recognize the type of the foreign object.

3.4 Image classification

Another way to try to clean a dataset is to use image classification. The basic idea is to have to classes: positive and negative. Images which are classified as negative can be discarded as empty or unwanted, and the images classified as positive form the cleaned up dataset. Image classification is usually done by machine learning solutions. The method of choice depends on the animal in case, although if the animal is not particularly easy to detect with other methods, because it is a classification task in question, a convolutional neural network is a solid all-round solution.

A convolutional neural network, which is reviewed in Section 3.5, or any image classification algorithm usually cannot classify full images as they are when taken with a camera, but some kind of image preprocessing is usually needed. The results of the classification of the preprocessed image might need to be postprocessed or analyzed too in order to classify the

(17)

Table 1.Background subtraction methods (Sobral et al. 2014).

Basic methods

Form a model of the background usually by calculating pixel-wise means and variances, and comparing new images with this model.

Statistical methods

Form background color variation variables for each pixel, then calculate Gaussian probabilities, that a pixel belongs to background.

Fuzzy methods

Extract foreground with Choquet integral or a fuzzy function and reduce noise with low-pass filtering.

Neural and neuro-fuzzy methods

Probabilistic Neural Networks decide for each pixel, whether it belongs to the background or not. Method becomes neuro-fuzzy, when a fuzzy function is added to the decision-making.

Other methods

Includes eigen value and eigen vector calculation and VuMeter (Goya et al. 2006), which calculates for each of the three colours of RGB separately, if the pixel belongs to the background or not.

entire image. This leads to the suggestion, that a single application will never fit all needs, but different solutions will be needed for different animals in different cases.

3.5 Convolutional neural network

3.5.1 Neural network

A neural network is a machine learning model in which an output is produced from a multi- dimensional input. The basic principle of the neural network resembles the structure of the brain, in which the output is formed by chaining multiple simple operations. The basic component of the neural network is called the neuron, as its task in the neural network is similar to the one of neuron’s in the brain. A neural network consists of layers, which in turn consist of neurons. Usually every neuron on the previous layer is connected to every neuron on the following layer, which makes so called fully connected layers. Layers can also be partially connected for example due to performance reasons, if the neural network is large.

A neuron is the smallest single component in the neural network, yet there are multiple types of neurons, and they have parameters too, which affect their operations. A very basic neuron

(18)

has the following parameters:

• Weights for every input

• Bias, a neuron-specific constant

• Activation function

Let x_n be neuron’s inputs, w_n their weights, w₀ bias and f the activation function. The neuron’s output valueycan be described as following:

y= f(w₀+

n

∑

i=1

w_ix_i) (1)

The activation function of the neuron can have different purposes. In the final layer of a classifying neural network is usually found a decisive activation function, which scales the values to a designated range. An example of such are the logistic functions, like the sigmoid function shown in Figure 4, which scales all values between 0 and 1.

In the middle of the neural network it might not be beneficial to scale the values in such a manner. Instead, it might be useful to only remove negative values, which could be done with a Rectified Linear Unit (ReLU) function shown in Figure 5. There are different types of ReLU functions which can be used if the ordinary version does not satisfy (Xu et al. 2015).

If the activation function is not required to perform any kind of modification to the value, a linear function f(x) =xcan be used.

3.5.2 Convolutional layers

Convolutional layers are specialized to image processing. If a neural network has at least one convolutional layer, it can be called a convolutional neural network. In a convolutional layer a value is calculated for every pixel based on its and its neighbouring pixels’ values.

By setting a weight for each pixel around the pixel in question, convolutional filters can be created (Figure 6). Different filters can detect different kinds of shapes and patterns in the image. On one convolutional layer every neuron has its own filter, which means that there can be hundreds of different filters on a single convolutional layer. The decisive layer can then give high weights for the inputs coming from the useful filters and small for those with unnecessary patterns.

Convolutional layers are often paired with pooling layers. Pooling layers decrease the resolution of the input by combining an often 2×2 square into a single pixel. The value of the

(19)

Figure 4. Sigmoid function scales all values to the interval of ]0, 1[.

Figure 5.ReLU-function eliminates negative values.

single pixel can be for example the average or maximum or minimum value of the pixels being combined, depending on what is needed. This helps the subsequent layers detect shapes and patterns which are larger and could not be detected from the original image with such filters, which detect only one-pixel-wide patterns.

As the convolutional layers use 2-dimensional image data and ordinary artificial neural network layers use 1-dimensional data, a flattening layer is often used after the last convolutional layer. It simply transforms the 2-dimensional array of neurons representing the image into a 1-dimensional array, where the neurons lose the information about their location in the image, which is why there are no more convolutional layers after the flattening layer.

(20)

Figure 6. A convolutional filter which detects a vertical line.

3.5.3 Typical convolutional neural network architecture

Convolutional neural networks usually consist of two parts. First the convolutional part ex- tracts patterns and shapes from the image, then the classification part classifies the image based on the results of the convolutions. As Alom et al. (2018) propose, the overall architecture of the CNN includes an input layer, multiple alternating convolution and max-pooling layers, one fully-connected layer and one classification layer (Figure 7). This of course can be modified to fit specific needs. Two popular CNN architectures, AlexNet and VGG16, are constructed with the same basic principle but with more convolutional and fully connected layers (Figure 8).

3.5.4 Training neural networks

During training, a neural network calculates an output for each input. The output is then compared with the correct answer given in the training set using a designated loss function, which could be for example the sum or square sum of errors. Based on the difference between the output and the correct answer, a backpropagation algorithm goes through the whole network modifying the neurons’ parameters to the direction, which would have resulted in an output closer to the correct answer. How much the parameters are modified depends on a hyperparameter of the network called learning rate.

There are different algorithms to handle the parameter tuning in the backpropagation. One of the most popular is the stochastic gradient descent algorithm (Bottou et al. 2008). Learning rate, momentum, and other hyperparameters such as Nesterov’s momentum (Nesterov 1983) can be used to adjust the functioning of the algorithm.

The training process usually continues until the model achieves a certain accuracy. A common problem is that the model might overfit the training data because the training will not stop until the accuracy is high enough. Overfitting means that the model learns properties or connections which happen to exist in the training data but are not generally associated

(21)

Figure 7. Typical convolutional neural network architecture (Alom et al. 2018).

Figure 8.Architectures of AlexNet (top) and VGG16 (bottom) (Yu et al. 2016).

(22)

with the task in question. Overfitting can be detected by using validation data during the training. If the accuracy with the validation data starts to decrease, the model is overfitting to the training data. Overfitting can be avoided by using larger training dataset or by using dropout layers (Srivastava et al. 2014) in the neural network.

Another possible problem in the training is the bias of the data. If there are significantly different number of elements of different classes in the training set, the network might learn that one class is by default more probable than another. This kind of approach can be useful in certain circumstances, but usually the network should be neutral towards its output classes.

(23)

4 PROPOSED METHOD

4.1 Saimaa ringed seal data cleaning

Image data collected using automatic camera traps contains large amounts of empty images and also images from cases when something else than a Saimaa ringed seal triggered the camera trap. An automated method to tell the images with a Saimaa ringed seal apart from those without one enables the use of advanced Saimaa ringed seal re-identification techniques on the image dataset.

Convolutional neural networks provide a promising tool to the task of detecting the images with a Saimaa ringed seal. The proposed method is as follows:

1. Divide images into small patches with common size.

2. Classify each patch separately with a CNN.

3. Find the largest connected component of positive patches.

4. Classify the image based on the size of the largest connected component.

A model achieved by following the introduced steps should be able to classify images into positive or negative to contain a Saimaa ringed seal based on how many neighbouring patches look like the fur of the animal to the convolutional neural network. The CNN does not need to recognize a whole Saimaa ringed seal, the point of the splitting of the images is that the neural network would detect the looks of the fur of the animal instead. The method is visualised in Figure 9.

4.2 Patch cropping

When the images are cropped into small patches, there are a few things to take into con- sideration. Firstly, the size of the patches should be small enough that the one patch should usually represent only one thing - whether it is the fur of the Saimaa ringed seal or the surface of the lake or something else. On the other hand, the patches should be large enough to be able to contain the pelage pattern. Too small patches of a high resolution image contain too little information for either a CNN or a human to be able to classify it.

(24)

Figure 9.Proposed model.

Secondly, the patches should overlap to some extent, so that the location of the potential Saimaa ringed seal in the image would have minimal impact on the result. If the patches would not overlap at all, different results would be produced by images where the seal lies in the middle of one patch and where the seal lies in the corner point of four adjacent patches.

However, too much overlapping produces very many patches, which might take a lot of storage space and computational resources while training the CNN.

The following algorithm is proposed for patch cropping:

1. Crop out the region from the image bottom containing the timestamp and the camera tag.

2. Count how many patches can be created vertically with less than 50% overlapping.

Letmbe the number of patches,sis patch side length andlis image side length:

m_y=d2l_y

se −1 (2)

Count how many patches can be created horizontally with less than 50% overlapping.

With the previously introduced symbols:

m_x=d2l_x

se −1 (3)

(25)

3. Get the area where the patches’ center points lie by subtracting from each side a margin of half of the patch side length. Let M be the area of the center points and Abe the area of the image, as shown in Figure 10.

Figure 10.Area of the center points of the patches.

4. Calculate center points inside the center points’ area. Let the center points’ area’s borders be

• top: M_y₀

• bottom: M_y₁

• left: M_x₀

• right: M_x₁

Thex-coordinates of the center points can be presented as:

x_n=M_x₀+n·M_x₁−M_x₀

m_x−1 (4)

They-coordinates can be presented respectively:

y_n=M_y₀+n·M_y₁−M_y₀

m_y−1 (5)

Now the center points are:







x₀,y₀ x₁,y₀ · · · x_m_x₋₁,y₀ x₀,y₁ x₁,y₁ · · · x_m_x₋₁,y₁

· · · ·

x₀,y_m_y₋₁ x₁,y_m_y₋₁ · · · x_m_x₋₁,y_m_y₋₁







5. Calculate the boundaries of the patches with known center points and patch side length.

Crop the patches from the image.

(26)

4.3 Patch classification using CNN

A convolutional neural network predicts the probability that a patch contains Saimaa ringed seal. It is proposed to implement at least two different architectures to be able to compare them with each other, although a functioning model needs only one. For example ResNet50 (He et al. 2016) and MobileNetV2 (Sandler et al. 2018) are two popular and lightweight architectures for a convolutional neural network, lightweight meaning that they have relatively small number of parameters and layers which makes both training and predicting relatively fast. The architecture of ResNet50 is presented in Figure 11 and that of MobileNetV2 in Figure 12.

Figure 11.The architecture of ResNet50 by Rezende et al. (2017).

Figure 12.The architecture of MobileNetV2 by Hoanh Nguyen (2020). Left is the structure of a single depthwise separable convolution block.

4.4 Decision rules

After the convolutional neural network has evaluated the patches, the results need to be interpreted and combined into the final decision, whether the image contains a Saimaa ringed seal or not. For that, a connected-component labeling algorithm is utilized.

(27)

Connected-component labeling (CCL) is common image processing method. The basic principle is that the input is a binary map, and the algorithm must group the positive points which are connected to each other via 4-connectivity or 8-connectivity. 4-connectivity counts points as connected, if they are adjacent and exactly above, below, or next to each other. 8- connectivity counts also diagonally adjacent points as connected. Figure 13 illustrates the result of an 8-connectivity CCL algorithm as presented by Bolelli et al. (2017).

Figure 13.Connected-component labeling visualized by Bolelli et al. (2017).

Figure 14.Patch probabilities of Figure 1 predicted by a CNN.

Figure 15.Probabilities of Figure 14 rounded.

For the array of the probabilities predicted by the neural network, a 4-connectivity CCL algorithm is proposed. It needs two parameters, however, which are the minimum probability threshold to count a pixel as positive in the binary image and the number of pixels the largest connected component of positive pixels must include for the algorithm to classify the whole image as positive for including a Saimaa ringed seal. To clarify, the array of the probability values of the patches (Figure 14) is being converted into a binary map (Figure 15) for the CCL algorithm, where one patch becomes one point.

(28)

An intuitive minimum probability threshold would be 0.5, which means that positive values will be given to the points of which the probability to contain Saimaa ringed seal is at least as great as the probability not to. Unless a reason to modify the value occurs, the minimum probability threshold should be considered 0.5. The number of pixels in the largest connected component on the other hand cannot be predicted beforehand, for as the patches are constant size, the sizes of the connected components depend on the sizes and resolutions of the original images. It is proposed to measure the results on different values of the smallest acceptable connected component and decide the final value based on the results.

(29)

5 EXPERIMENTS

5.1 Data

5.1.1 Initial image dataset

The Saimaa ringed seal image dataset used in this thesis contained 13,604 images, mostly taken with camera traps. 3,861 images had also an associated binary map created by the segmentation algorithm (Nepovinnykh, Eerola, and Kalviainen 2020).

Figure 16.An image of a Saimaa ringed seal and its segmentation result.

Figure 17.A challenging image and an empty image.

A possible problem with the dataset lies with date tags and camera name tags which are found at the bottom of most images (Figures 16 - 17). As the images will be split into smaller patches for neural network training (Section 5.1.2), bottom corner patches will often have a relatively large area of black text on white background. The camera traps are also

(30)

positioned so, that their images would capture the whole seal. Therefore the corners will most likely be empty anyway, because the potential seal usually lies in the middle of the image. Thus, it would be expected for the neural network to associate the white label with a negative result.

5.1.2 Preprocessing

Image preprocessing steps were taken to enhance the dataset for the neural network usage.

The first was to pair the binary maps with their corresponding images. The binary maps were delivered separately from the other data. They were named after the image they corresponded with, which made the pairing easy.

An attempt to use full images to train the neural network was made, but the network did not learn to tell apart whether there was a ringed seal in the image or not. There was a need to split the training images into small patches. This was done for the images which had an associated binary map, by following the steps proposed in Section 4.2. A typical resolution for a camera trap images seemed to be 2560×1920 pixels, although not every image was of this size. Based on this, the patch side length of 300 pixels was used, because the resulting patches satisfied the patch size conditions introduced in Section 4.2.

The information about which cropped patches have pixels of Saimaa ringed seal is essential in order to train a CNN with them. Hence, with every patch was extracted the coordinates of its bounding box. The same patch was then cropped from the corresponding binary map, and its pixels were examined. The percentage of white pixels in the binary map, rounded to the nearest integer, was then appended to the information of the original patch. This integer now told, how many percentages of the area of the patch belongs to a Saimaa ringed seal.

The training set for the convolutional neural network was formed from these 300×300 pixel patches. The images taken with a randomly chosen single camera trap were excluded for validation purposes, leaving 730,617 patch images for the training set. The testing set included 340 handpicked images, 170 images with and 170 images without a Saimaa ringed seal.

5.2 Experimental arrangements

The convolutional neural network was implemented in Python 3.6.1, using Keras library (version 2.3.1), which operates on top of TensorFlow library (version 2.2.0). The implemen-

(31)

tation was done as following:

1. The convolutional layers of a network created by original authors and already trained with Imagenet were used. Imagenet (Deng et al. 2009) is a diverse set of images, and CNN:s trained with it should detect common shapes generally well.

2. An ordinary fully connected layer was added on top of them to do the classification.

It had two neurons for the two output classes (positive and negative) and a softmax activation function to scale the probabilities.

3. The parameters of the classifying layer were trained by freezing the weights of the rest of the network. Stochastic gradient descent was used as the optimization algorithm with learning rate of 0.01. This is how the classifying layer caught up with the previously trained convolutional layers.

4. The whole network was trained. Stochastic gradient descent was used as the optimization algorithm with learning rate of 0.001. The smaller learning rate should prevent the network from overfitting while still fine-tuning to the Saimaa ringed seal dataset.

As the training set contained images ranging from empty to the full area being part of a Saimaa ringed seal, a threshold was needed to tell apart which images were labeled as positive or negative. After an experiment made with a simple threshold produced a poorly per- forming model, two thresholds were set. The final partition to positive and negative training set was with the following rules:

• Images with at least 50 percent of area belonging to a Saimaa ringed seal formed the positive images of the training set.

• Images with exactly 0 percent of area belonging to a Saimaa ringed seal formed the negative images of the training set.

• Images with 1 to 49 percent of area belonging to a Saimaa ringed seal were not used in training of the convolutional neural network.

Finally, to prevent the CNN from developing a bias, the filtered training set was balanced.

That was done by counting the images in both the positive and negative training set and taking a random sample of the larger set, equal in size to the smaller set. 8,996 images out of the original 730,617 made it to the balanced training set, forming one epoch to the convolutional neural net training.

(32)

In these experiments both of the trainings lasted one epoch regardless of the model accuracy after the epoch. Batch size was set to 16 images per batch.

After a model was trained, it was tested. As described in Section 4.4, during the prediction of an image, it is split into multiple patches. To be able to efficiently test the impact of the minimum size of the largest connected component, the probabilities to contain Saimaa ringed seal predicted for each patch in each image in the testing set were saved. Then, the CCL algorithm was executed with the minimum size of the largest connected component ranging from 1 to 40 on the saved probabilities. The threshold to accept the patch as positive was not varied.

Two models with different architectures on convolutional layers were evaluated for compar- ison. One used the convolutional layers of MobileNetV2, the other of ResNet50. The two architectures were chosen because they are commonly used and light based on the number of parameters.

5.3 Evaluation criteria

The implemented models which include the convolutional neural network and the CCL- based decision algorithm were evaluated using three established measures: precision, recall and theF₁score. Precision represents how great a deal of the images predicted as positive actually were positive, and recall represents how great a deal of all the positive images were predicted as positive. To clarify, an image is positive if it contains a Saimaa ringed seal.

Formally, let p₊ be the number of true positives, p₋ is false positives, and n₋ are false negatives,Pstands for precision andRfor recall, the equations can be represented as:

P= p₊

p₊+p₋ (6)

R= p₊

p₊+n₋ (7)

F₁score is the harmonic mean of precision and recall. Should precision and recall be equally important measures,F₁score would tell the overall rating of the results. F₁score is calculated as follows:

F₁=2 P·R

P+R (8)

F₁score is used as the primary value to compare when evaluating the models, because of the way it combines precision and recall. As there is no reason to value one of them over the

(33)

other, their harmonic mean should do well as an overall measure of quality. In Section 5.4 the different models are sorted by theirF₁score.

5.4 Results

Both MobileNetV2 and ResNet50 based models were evaluated using the criteria introduced in Section 5.3, with different values for minimum size of the largest connected component.

Based on theF₁scores, the best result of each model is shown in in Table 2.

Table 2.Best results of both models byF₁score.

Pretrained network Minimum size of the largest connected component

F₁score Precision Recall

ResNet50 10 0.941 0.898 0.988

MobileNetV2 11 0.858 0.776 0.959

Figure 18.F₁scores as function of largest connected component size threshold.

Table 2 and Figure 18 show that the ResNet50 based model performs better at every level than the one based on MobileNetV2. The bestF₁scores for both models are achieved with minimum largest connected component size between 9 and 12.

These results with the ResNet50 based model can be considered promising. While cleaning a dataset, it is obvious that any image with a Saimaa ringed seal is not wanted to be discarded.

(34)

Figure 19.Precisions and recalls as function of largest connected component size threshold.

To emphasize this, a model with high recall should be chosen. On the other hand, every image without a seal in the dataset wastes computational resources, might need human labor to go through the images, and in worst case could produce false results. To avoid that, a model with high precision should be chosen.

As one could expect, the results contained some models with precision or recall very close to or exactly 1 (Figure 19). The models with a very small number (1-4) as the minimum size of the largest connected component had extremely good recall values, but low precision.

They managed to classify nearly every positive image as positive, but had also plenty of false positives. On the other hand, models with high largest connected component thresholds had good precision values, but many false negatives. Precision values never reach 1 whereas recall values do. This means that no matter how high the largest connected component threshold, always some false positives will appear. This suggests that the CNN is producing false positive results too, not only the CCL.

(35)

6 DISCUSSION

The objective of the thesis was to review existing methods to animal biometrics and automatic image dataset cleaning, and to create a tool for cleaning a Saimaa ringed seal image dataset. A tool was implemented using a convolutional neural network and a connected- component labeling algorithm. The tool was evaluated by calculating precision, recall and F₁ score for each implemented model. The best scores for each characteristics were over 90% which can be considered promising. However, there is still room for improvement.

Because the decision on an image containing a Saimaa ringed seal is based on the number of pixels in the largest connected component, the model is not very stable with images of different sizes and resolutions. If the images in the training set are of the size, where a seal seems to be found in more than 9-12 patches, the same images in lower resolution would produce negative results, because the size of the patches remains constant and there would be significantly less patches for the CCL algorithm to evaluate. The same applies, if the seal is too far away in the image. Only a few patches would produce a positive result and the largest connected component would be too small.

The training process of the models could be enhanced by implementing a model accuracy check at the end of each epoch, which controls if another epoch of training should be taken.

With this, it could be also beneficial to use a validation dataset to measure model accuracy at the end of each epoch with images from a camera, which does not occur in the training dataset. This way there would be more control over the training process, and the decision to finish training would be done based on a measurement instead of the dataset size.

The evaluation criteria could be adjusted if there was knowledge about how many of the delivered camera trap images actually contain a Saimaa ringed seal. The proposed best model has an excellent recall value, but approximately one out of ten empty images is classified as positive, which might become a problem, if most of the images delivered are empty. In that case, another model with higher precision should be considered.

The results produced by the connected-component labeling algorithm were not analyzed.

To be able to examine which patches get positive values and where the connected components are, the results of the connected-component labeling algorithm should be reviewed, for example by exporting them as a binary map the same way the segmentation algorithm introduced in Section 3.2 maps the pixels which belong to a Saimaa ringed seal. That way it could be verified, that the convolutional neural networks actually detect the Saimaa ringed seal, and not a dark brown color, for example.

(36)

Lastly, different architectures for the convolutional neural network could be experimented with. Deeper and heavier networks in parameter numbers could be compared to the cur- rent lighter solutions, especially because the ResNet50 which performed better has more parameters and depth as the MobileNetV2. It could also be possible to design a personalized convolutional neural network without any previously trained layers, but training it would take significantly more time than with those already trained to recognize ImageNet shapes and patterns.

(37)

7 CONCLUSIONS

In this thesis, image classification methods were used to clean a Saimaa ringed seal image dataset of empty images, leaving only those with a Saimaa ringed seal in it. The proposed method includes splitting the image to multiple small patches, using a convolutional neural network to predict for each patch independently, if it contains Saimaa ringed seal or not, finding the largest group of neighbouring patches which were all predicted as positive, counting the size of the largest group, and based on the size, classifying the image as positive or negative to contain a Saimaa ringed seal.

The image dataset was processed to match the needs of the proposed model. This means that the dataset was divided into a training set and a testing set. The images in the training set were then split into 300×300 patches for the convolutional neural network and the testing set was left intact, for it is intended to use in the testing of the whole model, not only the neural network. Two slightly different convolutional neural networks were then trained with the training set, and the other components of the proposed model were implemented. The whole model was tested with different parameters and different convolutional neural networks.

The results were promising. The best model that was found hadF₁score of 0.941, precision of 0.898 and recall of 0.989. This means that it was able to recognize a seal with the probability of roughly 99%, and the image that was recognized as a seal by this model actually had a seal with the probability of roughly 90%. As one of the first iterations on this subject, the results in scope of this thesis can be considered good, but to be able to utilize the model in a large scale, it needs to get even more accurate. A few suggestions for improvement were discussed.

(38)

Alom, Md Zahangir, Taha, Tarek M, Yakopcic, Christopher, Westberg, Stefan, Sidike, Pahed- ing, Nasrin, Mst Shamima, Van Esesn, Brian C, Awwal, Abdul A S, and Asari, Vijayan K (2018). “The history began from alexnet: A comprehensive survey on deep learning approaches”. In:arXiv preprint arXiv:1803.01164.

Amelon, Sybill K, Hooper, Sarah E, and Womack, Kathryn M (2017). “Bat wing biometrics:

using collagen–elastin bundles in bat wings as a unique individual identifier”. In:Journal of mammalogy98.3, pp. 744–751.

Barnich, Olivier and Van Droogenbroeck, Marc (2010). “ViBe: A universal background subtraction algorithm for video sequences”. In: IEEE Transactions on Image processing 20.6, pp. 1709–1724.

Bolelli, Federico, Cancilla, Michele, and Grana, Costantino (2017). “Two more strategies to speed up connected components labeling algorithms”. In: International Conference on Image Analysis and Processing. Springer, pp. 48–58.

Bottou, L´eon and Bousquet, Olivier (2008). “The tradeoffs of large scale learning”. In:Ad- vances in neural information processing systems, pp. 161–168.

Chehrsimin, T., Eerola, T., Koivuniemi, M., Auttila, M., Levänen, R., Niemi, M., Kunnas- ranta, M., and Kälviäinen, H. (2018). “Automatic individual identification of Saimaa ringed seals”. In:IET Computer Vision12.2, pp. 146–152.

Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2018). “DeepLab:

Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs”. In:IEEE Transactions on Pattern Analysis and Machine Intel- ligence40.4, pp. 834–848.

Clarke, Roger (1994). “Human identification in information systems”. In:Information Tech- nology & People.

Crall, Jonathan P, Stewart, Charles V, Berger-Wolf, Tanya Y, Rubenstein, Daniel I, and Sundaresan, Siva R (2013). “Hotspotter—patterned species instance recognition”. In:

2013 IEEE workshop on applications of computer vision (WACV). IEEE, pp. 230–237.

Deng, J., Dong, W., Socher, R., Li, L., Kai Li, and Li Fei-Fei (2009). “ImageNet: A large- scale hierarchical image database”. In:2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255.

(39)

trajectories evaluation by static video sensors”. In:

Systems Conference. IEEE, pp. 864–869.

Hallinan, Peter W (1991). “Recognizing human eyes”. In:Geometric Methods in Computer Vision. Vol. 1570. International Society for Optics and Photonics, pp. 214–226.

Halloran, Kelly M, Murdoch, James D, and Becker, Matthew S (2015). “Applying computer- aided photo-identification to messy datasets: a case study of T hornicroft’s giraffe (G iraffa camelopardalis thornicrofti)”. In:African Journal of Ecology53.2, pp. 147–155.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian (2016). “Deep residual learning for image recognition”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.

Hoffer, Elad and Ailon, Nir (2015). “Deep Metric Learning Using Triplet Network”. In:

Similarity-Based Pattern Recognition. Ed. by Aasa Feragen, Marcello Pelillo, and Marco Loog. Cham: Springer International Publishing, pp. 84–92.ISBN: 978-3-319-24261-3.

Kurita, Takio, Otsu, Nobuyuki, and Abdelmalek, N (1992). “Maximum likelihood thresholding based on population mixture models”. In: Pattern recognition 25.10, pp. 1231–

1240.

Larregui, Juan I, Cazzato, Dario, and Castro, Silvia M (2019). “An image processing pipeline to segment iris for unconstrained cow identification system”. In:Open Computer Science 9.1, pp. 145–159.

Li, Shuyuan, Li, Jianguo, Lin, Weiyao, and Tang, Hanlin (2019). “Amur tiger re-identification in the wild”. In:arXiv preprint arXiv:1906.05586.

Lind, Nanna M, Vinther, Michael, Hemmingsen, Ralf P, and Hansen, Axel K (2005). “Val- idation of a digital video tracking system for recording pig locomotor behaviour”. In:

Journal of neuroscience methods143.2, pp. 123–132.

Malcolm, J. R., Liu, C., Neilson, R. P., Hansen, L., and Hannah, L. (2006). “Global Warm- ing and Extinctions of Endemic Species from Biodiversity Hotspots”. In:Conservation Biology20.2, pp. 538–548. DOI:10.1111/j.1523-1739.2006.00364.x.

Nepovinnykh, Ekaterina, Eerola, Tuomas, and Kalviainen, Heikki (Mar. 2020). “Siamese Network Based Pelage Pattern Matching for Ringed Seal Re-identification”. In: The IEEE Winter Conference on Applications of Computer Vision (WACV) Workshops.

(40)

“Identification of Saimaa Ringed Seal Individuals Using Transfer Learning”. In:

vanced Concepts for Intelligent Vision Systems. Ed. by Jacques Blanc-Talon, David Hel- bert, Wilfried Philips, Dan Popescu, and Paul Scheunders. Cham: Springer International Publishing, pp. 211–222.ISBN: 978-3-030-01449-0.

Nesterov, Yurii (1983). “A method of solving a convex programming problem with conver- gence rateO(1/k²).” In:Soviet Mathematics Doklady. Vol. 27. 2.

Nguyen, Hoanh (2020). “Fast object detection framework based on MobileNetV2 architecture and enhanced feature pyramid”. In:Journal of Theoretical and Applied Information Technology98.05.

Nguyen, Hung, Maclagan, Sarah J, Nguyen, Tu Dinh, Nguyen, Thin, Flemons, Paul, An- drews, Kylie, Ritchie, Euan G, and Phung, Dinh (2017). “Animal recognition and identification with deep convolutional neural networks for automated wildlife monitoring”. In:

2017 IEEE international conference on data science and advanced Analytics (DSAA).

IEEE, pp. 40–49.

Parham, Jason, Stewart, Charles, Crall, Jonathan, Rubenstein, Daniel, Holmberg, Jason, and Berger-Wolf, Tanya (2018). “An animal detection pipeline for identification”. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 1075–

1083.

Piccardi, Massimo (2004). “Background subtraction techniques: a review”. In:2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583).

Vol. 4. IEEE, pp. 3099–3104.

Rezende, Edmar, Ruppert, Guilherme, Carvalho, Tiago, Ramos, Fabio, and De Geus, Paulo (2017). “Malicious software classification using transfer learning of resnet-50 deep neural network”. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, pp. 1011–1014.

Samal, Ashok and Iyengar, Prasana A (1992). “Automatic recognition and analysis of human faces and facial expressions: A survey”. In:Pattern recognition25.1, pp. 65–77.

Sandler, Mark, Howard, Andrew, Zhu, Menglong, Zhmoginov, Andrey, and Chen, Liang- Chieh (June 2018). “MobileNetV2: Inverted Residuals and Linear Bottlenecks”. In:The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

(41)

Koller, Thomas, Gerig, Guido, and Kikinis, Ron (1998). “Three-dimensional multi-scale line filter for segmentation and visualization of curvilinear structures in medical images”.

In:Medical image analysis2.2, pp. 143–168.

Sipil¨a, T. (2016a). “Pusa hispida ssp. ladogensis”. In: The IUCN Red List of Threatened Species e.T41674A66991648. URL: https : / / dx . doi . org / 10 . 2305 / IUCN . UK.2016-1.RLTS.T41674A66991648.en(visited on 02/18/2020).

— (2016b). “Pusa hispida ssp. saimensis”. In: The IUCN Red List of Threatened Species e.T41675A66991678.URL:https://dx.doi.org/10.2305/IUCN.UK.2016- 1.RLTS.T41675A66991678.en(visited on 02/18/2020).

Smith, Steven J, Edmonds, James, Hartin, Corinne A, Mundra, Anupriya, and Calvin, Kather- ine (2015). “Near-term acceleration in the rate of temperature change”. In: Nature Cli- mate Change5.4, pp. 333–336.

Sobral, Andrews and Vacavant, Antoine (2014). “A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos”. In: Computer Vision and Image Understanding 122, pp. 4–21. ISSN: 1077-3142. DOI: https : / / doi . org/10.1016/j.cviu.2013.12.005.

Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan (2014). “Dropout: a simple way to prevent neural networks from overfitting”. In:

The journal of machine learning research15.1, pp. 1929–1958.

Vishwakarma, Virendra P, Pandey, Sujata, and Gupta, MN (2009). “Adaptive histogram equalization and logarithm transform with rescaled low frequency DCT coefficients for illumination normalization”. In:International Journal of Recent Trends in Engineering 1.1, p. 318.

Wang, Taiqing, Gong, Shaogang, Zhu, Xiatian, and Wang, Shengjin (2014). “Person re- identification by video ranking”. In:European conference on computer vision. Springer, pp. 688–703.

Xu, Bing, Wang, Naiyan, Chen, Tianqi, and Li, Mu (2015). “Empirical Evaluation of Rec- tified Activations in Convolutional Network”. In:CoRRabs/1505.00853. arXiv:1505.

00853.URL:http://arxiv.org/abs/1505.00853.

(42)

ware for classifying humans, animals, and empty frames in camera trap images”. In:

Ecology and Evolution9.4, pp. 1578–1589.DOI:10.1002/ece3.4747.

Yu, Wei, Yang, Kuiyuan, Bai, Yalong, Xiao, Tianjun, Yao, Hongxun, and Rui, Yong (2016).

“Visualizing and comparing AlexNet and VGG using deconvolutional layers”. In: Pro- ceedings of the 33 rd International Conference on Machine Learning.

Zhelezniakov, Artem, Eerola, Tuomas, Koivuniemi, Meeri, Auttila, Miina, Levänen, Riikka, Niemi, Marja, Kunnasranta, Mervi, and Kälviäinen, Heikki (2015). “Segmentation of Saimaa Ringed Seals for Identification Purposes”. In: Advances in Visual Computing.

Ed. by George Bebis et al. Cham: Springer International Publishing, pp. 227–236.ISBN: 978-3-319-27863-6.