Results - Finding interesting images from large quantities of camera trap data

Both MobileNetV2 and ResNet50 based models were evaluated using the criteria introduced in Section 5.3, with different values for minimum size of the largest connected component.

Based on theF₁scores, the best result of each model is shown in in Table 2.

Table 2.Best results of both models byF₁score.

Pretrained network Minimum size of the largest connected component

F₁score Precision Recall

ResNet50 10 0.941 0.898 0.988

MobileNetV2 11 0.858 0.776 0.959

Figure 18.F₁scores as function of largest connected component size threshold.

Table 2 and Figure 18 show that the ResNet50 based model performs better at every level than the one based on MobileNetV2. The bestF₁scores for both models are achieved with minimum largest connected component size between 9 and 12.

These results with the ResNet50 based model can be considered promising. While cleaning a dataset, it is obvious that any image with a Saimaa ringed seal is not wanted to be discarded.

Figure 19.Precisions and recalls as function of largest connected component size threshold.

To emphasize this, a model with high recall should be chosen. On the other hand, every image without a seal in the dataset wastes computational resources, might need human labor to go through the images, and in worst case could produce false results. To avoid that, a model with high precision should be chosen.

As one could expect, the results contained some models with precision or recall very close to or exactly 1 (Figure 19). The models with a very small number (1-4) as the minimum size of the largest connected component had extremely good recall values, but low precision.

They managed to classify nearly every positive image as positive, but had also plenty of false positives. On the other hand, models with high largest connected component thresholds had good precision values, but many false negatives. Precision values never reach 1 whereas recall values do. This means that no matter how high the largest connected component threshold, always some false positives will appear. This suggests that the CNN is producing false positive results too, not only the CCL.

6 DISCUSSION

The objective of the thesis was to review existing methods to animal biometrics and auto-matic image dataset cleaning, and to create a tool for cleaning a Saimaa ringed seal image dataset. A tool was implemented using a convolutional neural network and a connected-component labeling algorithm. The tool was evaluated by calculating precision, recall and F₁ score for each implemented model. The best scores for each characteristics were over 90% which can be considered promising. However, there is still room for improvement.

Because the decision on an image containing a Saimaa ringed seal is based on the number of pixels in the largest connected component, the model is not very stable with images of different sizes and resolutions. If the images in the training set are of the size, where a seal seems to be found in more than 9-12 patches, the same images in lower resolution would produce negative results, because the size of the patches remains constant and there would be significantly less patches for the CCL algorithm to evaluate. The same applies, if the seal is too far away in the image. Only a few patches would produce a positive result and the largest connected component would be too small.

The training process of the models could be enhanced by implementing a model accuracy check at the end of each epoch, which controls if another epoch of training should be taken.

With this, it could be also beneficial to use a validation dataset to measure model accuracy at the end of each epoch with images from a camera, which does not occur in the training dataset. This way there would be more control over the training process, and the decision to finish training would be done based on a measurement instead of the dataset size.

The evaluation criteria could be adjusted if there was knowledge about how many of the delivered camera trap images actually contain a Saimaa ringed seal. The proposed best model has an excellent recall value, but approximately one out of ten empty images is classified as positive, which might become a problem, if most of the images delivered are empty. In that case, another model with higher precision should be considered.

The results produced by the connected-component labeling algorithm were not analyzed.

To be able to examine which patches get positive values and where the connected compo-nents are, the results of the connected-component labeling algorithm should be reviewed, for example by exporting them as a binary map the same way the segmentation algorithm introduced in Section 3.2 maps the pixels which belong to a Saimaa ringed seal. That way it could be verified, that the convolutional neural networks actually detect the Saimaa ringed seal, and not a dark brown color, for example.

Lastly, different architectures for the convolutional neural network could be experimented with. Deeper and heavier networks in parameter numbers could be compared to the cur-rent lighter solutions, especially because the ResNet50 which performed better has more parameters and depth as the MobileNetV2. It could also be possible to design a personalized convolutional neural network without any previously trained layers, but training it would take significantly more time than with those already trained to recognize ImageNet shapes and patterns.

7 CONCLUSIONS

In this thesis, image classification methods were used to clean a Saimaa ringed seal image dataset of empty images, leaving only those with a Saimaa ringed seal in it. The proposed method includes splitting the image to multiple small patches, using a convolutional neu-ral network to predict for each patch independently, if it contains Saimaa ringed seal or not, finding the largest group of neighbouring patches which were all predicted as positive, count-ing the size of the largest group, and based on the size, classifycount-ing the image as positive or negative to contain a Saimaa ringed seal.

The image dataset was processed to match the needs of the proposed model. This means that the dataset was divided into a training set and a testing set. The images in the training set were then split into 300×300 patches for the convolutional neural network and the testing set was left intact, for it is intended to use in the testing of the whole model, not only the neural network. Two slightly different convolutional neural networks were then trained with the training set, and the other components of the proposed model were implemented. The whole model was tested with different parameters and different convolutional neural networks.

The results were promising. The best model that was found hadF₁score of 0.941, precision of 0.898 and recall of 0.989. This means that it was able to recognize a seal with the prob-ability of roughly 99%, and the image that was recognized as a seal by this model actually had a seal with the probability of roughly 90%. As one of the first iterations on this subject, the results in scope of this thesis can be considered good, but to be able to utilize the model in a large scale, it needs to get even more accurate. A few suggestions for improvement were discussed.

Alom, Md Zahangir, Taha, Tarek M, Yakopcic, Christopher, Westberg, Stefan, Sidike, Pahed-ing, Nasrin, Mst Shamima, Van Esesn, Brian C, Awwal, Abdul A S, and Asari, Vijayan K (2018). “The history began from alexnet: A comprehensive survey on deep learning approaches”. In:arXiv preprint arXiv:1803.01164.

Amelon, Sybill K, Hooper, Sarah E, and Womack, Kathryn M (2017). “Bat wing biometrics:

using collagen–elastin bundles in bat wings as a unique individual identifier”. In:Journal of mammalogy98.3, pp. 744–751.

Barnich, Olivier and Van Droogenbroeck, Marc (2010). “ViBe: A universal background sub-traction algorithm for video sequences”. In: IEEE Transactions on Image processing 20.6, pp. 1709–1724.

Bolelli, Federico, Cancilla, Michele, and Grana, Costantino (2017). “Two more strategies to speed up connected components labeling algorithms”. In: International Conference on Image Analysis and Processing. Springer, pp. 48–58.

Bottou, L´eon and Bousquet, Olivier (2008). “The tradeoffs of large scale learning”. In: Ad-vances in neural information processing systems, pp. 161–168.

Chehrsimin, T., Eerola, T., Koivuniemi, M., Auttila, M., Lev¨anen, R., Niemi, M., Kunnas-ranta, M., and K¨alvi¨ainen, H. (2018). “Automatic individual identification of Saimaa ringed seals”. In:IET Computer Vision12.2, pp. 146–152.

Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2018). “DeepLab:

Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs”. In:IEEE Transactions on Pattern Analysis and Machine Intel-ligence40.4, pp. 834–848.

Clarke, Roger (1994). “Human identification in information systems”. In:Information Tech-nology & People.

Crall, Jonathan P, Stewart, Charles V, Berger-Wolf, Tanya Y, Rubenstein, Daniel I, and Sundaresan, Siva R (2013). “Hotspotter—patterned species instance recognition”. In:

2013 IEEE workshop on applications of computer vision (WACV). IEEE, pp. 230–237.

Deng, J., Dong, W., Socher, R., Li, L., Kai Li, and Li Fei-Fei (2009). “ImageNet: A large-scale hierarchical image database”. In:2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255.

trajectories evaluation by static video sensors”. In:

Systems Conference. IEEE, pp. 864–869.

Hallinan, Peter W (1991). “Recognizing human eyes”. In:Geometric Methods in Computer Vision. Vol. 1570. International Society for Optics and Photonics, pp. 214–226.

Halloran, Kelly M, Murdoch, James D, and Becker, Matthew S (2015). “Applying computer-aided photo-identification to messy datasets: a case study of T hornicroft’s giraffe (G iraffa camelopardalis thornicrofti)”. In:African Journal of Ecology53.2, pp. 147–155.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian (2016). “Deep residual learning for image recognition”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.

Hoffer, Elad and Ailon, Nir (2015). “Deep Metric Learning Using Triplet Network”. In:

Similarity-Based Pattern Recognition. Ed. by Aasa Feragen, Marcello Pelillo, and Marco Loog. Cham: Springer International Publishing, pp. 84–92.ISBN: 978-3-319-24261-3.

Kurita, Takio, Otsu, Nobuyuki, and Abdelmalek, N (1992). “Maximum likelihood thresh-olding based on population mixture models”. In: Pattern recognition 25.10, pp. 1231–

1240.

Larregui, Juan I, Cazzato, Dario, and Castro, Silvia M (2019). “An image processing pipeline to segment iris for unconstrained cow identification system”. In:Open Computer Science 9.1, pp. 145–159.

Li, Shuyuan, Li, Jianguo, Lin, Weiyao, and Tang, Hanlin (2019). “Amur tiger re-identification in the wild”. In:arXiv preprint arXiv:1906.05586.

Lind, Nanna M, Vinther, Michael, Hemmingsen, Ralf P, and Hansen, Axel K (2005). “Val-idation of a digital video tracking system for recording pig locomotor behaviour”. In:

Journal of neuroscience methods143.2, pp. 123–132.

Malcolm, J. R., Liu, C., Neilson, R. P., Hansen, L., and Hannah, L. (2006). “Global Warm-ing and Extinctions of Endemic Species from Biodiversity Hotspots”. In:Conservation Biology20.2, pp. 538–548. DOI:10.1111/j.1523-1739.2006.00364.x.

Nepovinnykh, Ekaterina, Eerola, Tuomas, and Kalviainen, Heikki (Mar. 2020). “Siamese Network Based Pelage Pattern Matching for Ringed Seal Re-identification”. In: The IEEE Winter Conference on Applications of Computer Vision (WACV) Workshops.

“Identification of Saimaa Ringed Seal Individuals Using Transfer Learning”. In:

vanced Concepts for Intelligent Vision Systems. Ed. by Jacques Blanc-Talon, David Hel-bert, Wilfried Philips, Dan Popescu, and Paul Scheunders. Cham: Springer International Publishing, pp. 211–222.ISBN: 978-3-030-01449-0.

Nesterov, Yurii (1983). “A method of solving a convex programming problem with conver-gence rateO(1/k²).” In:Soviet Mathematics Doklady. Vol. 27. 2.

Nguyen, Hoanh (2020). “Fast object detection framework based on MobileNetV2 architec-ture and enhanced feaarchitec-ture pyramid”. In:Journal of Theoretical and Applied Information Technology98.05.

Nguyen, Hung, Maclagan, Sarah J, Nguyen, Tu Dinh, Nguyen, Thin, Flemons, Paul, An-drews, Kylie, Ritchie, Euan G, and Phung, Dinh (2017). “Animal recognition and identi-fication with deep convolutional neural networks for automated wildlife monitoring”. In:

2017 IEEE international conference on data science and advanced Analytics (DSAA).

IEEE, pp. 40–49.

Parham, Jason, Stewart, Charles, Crall, Jonathan, Rubenstein, Daniel, Holmberg, Jason, and Berger-Wolf, Tanya (2018). “An animal detection pipeline for identification”. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 1075–

1083.

Piccardi, Massimo (2004). “Background subtraction techniques: a review”. In:2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583).

Vol. 4. IEEE, pp. 3099–3104.

Rezende, Edmar, Ruppert, Guilherme, Carvalho, Tiago, Ramos, Fabio, and De Geus, Paulo (2017). “Malicious software classification using transfer learning of resnet-50 deep neu-ral network”. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, pp. 1011–1014.

Samal, Ashok and Iyengar, Prasana A (1992). “Automatic recognition and analysis of human faces and facial expressions: A survey”. In:Pattern recognition25.1, pp. 65–77.

Sandler, Mark, Howard, Andrew, Zhu, Menglong, Zhmoginov, Andrey, and Chen, Liang-Chieh (June 2018). “MobileNetV2: Inverted Residuals and Linear Bottlenecks”. In:The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Koller, Thomas, Gerig, Guido, and Kikinis, Ron (1998). “Three-dimensional multi-scale line filter for segmentation and visualization of curvilinear structures in medical images”.

In:Medical image analysis2.2, pp. 143–168.

Sipil¨a, T. (2016a). “Pusa hispida ssp. ladogensis”. In: The IUCN Red List of Threatened Species e.T41674A66991648. URL: https : / / dx . doi . org / 10 . 2305 / IUCN . UK.2016-1.RLTS.T41674A66991648.en(visited on 02/18/2020).

— (2016b). “Pusa hispida ssp. saimensis”. In: The IUCN Red List of Threatened Species e.T41675A66991678.URL: https://dx.doi.org/10.2305/IUCN.UK.2016-1.RLTS.T41675A66991678.en(visited on 02/18/2020).

Smith, Steven J, Edmonds, James, Hartin, Corinne A, Mundra, Anupriya, and Calvin, Kather-ine (2015). “Near-term acceleration in the rate of temperature change”. In: Nature Cli-mate Change5.4, pp. 333–336.

Sobral, Andrews and Vacavant, Antoine (2014). “A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos”. In: Computer Vision and Image Understanding 122, pp. 4–21. ISSN: 1077-3142. DOI: https : / / doi . org/10.1016/j.cviu.2013.12.005.

Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan (2014). “Dropout: a simple way to prevent neural networks from overfitting”. In:

The journal of machine learning research15.1, pp. 1929–1958.

Vishwakarma, Virendra P, Pandey, Sujata, and Gupta, MN (2009). “Adaptive histogram equalization and logarithm transform with rescaled low frequency DCT coefficients for illumination normalization”. In:International Journal of Recent Trends in Engineering 1.1, p. 318.

Wang, Taiqing, Gong, Shaogang, Zhu, Xiatian, and Wang, Shengjin (2014). “Person re-identification by video ranking”. In:European conference on computer vision. Springer, pp. 688–703.

Xu, Bing, Wang, Naiyan, Chen, Tianqi, and Li, Mu (2015). “Empirical Evaluation of Rec-tified Activations in Convolutional Network”. In:CoRRabs/1505.00853. arXiv:1505.

00853.URL:http://arxiv.org/abs/1505.00853.

ware for classifying humans, animals, and empty frames in camera trap images”. In:

Ecology and Evolution9.4, pp. 1578–1589.DOI:10.1002/ece3.4747.

Yu, Wei, Yang, Kuiyuan, Bai, Yalong, Xiao, Tianjun, Yao, Hongxun, and Rui, Yong (2016).

“Visualizing and comparing AlexNet and VGG using deconvolutional layers”. In: Pro-ceedings of the 33 rd International Conference on Machine Learning.

Zhelezniakov, Artem, Eerola, Tuomas, Koivuniemi, Meeri, Auttila, Miina, Lev¨anen, Riikka, Niemi, Marja, Kunnasranta, Mervi, and K¨alvi¨ainen, Heikki (2015). “Segmentation of Saimaa Ringed Seals for Identification Purposes”. In: Advances in Visual Computing.

Ed. by George Bebis et al. Cham: Springer International Publishing, pp. 227–236.ISBN: 978-3-319-27863-6.

In document Finding interesting images from large quantities of camera trap data (sivua 33-42)