Image clustering for unsupervised analysis of plankton data

(1)

Computational Engineering and Technical Physics Computer Vision and Pattern Recognition

Mark Ibrahim

IMAGE CLUSTERING FOR UNSUPERVISED ANALYSIS OF PLANKTON DATA

Master’s Thesis

Examiners: Prof. Heikki Kälviäinen

Assoc. Prof. Vyacheslav V. Potekhin Supervisors: D.Sc.Tuomas Eerola

Professor Lasse Lensu Professor Heikki Kälviäinen

(2)

Lappeenranta-Lahti University of Technology LUT School of Engineering Science

Computational Engineering and Technical Physics Computer Vision and Pattern Recognition

Mark Ibrahim

Image clustering for unsupervised analysis of plankton data

Master’s Thesis 2020

47 pages, 19 figures, 6 tables, 3 appendices.

Examiners: Prof. Heikki Kälviäinen

Assoc. Prof. Vyacheslav V. Potekhin

Keywords: computer vision, image processing, pattern recognition, convolutional neural network, clustering, feature extraction, plankton imaging

Advancements in automated imaging has made it possible to enhance the data both in terms of quantity and quality. This has prompted the development of plankton imaging systems for acquiring the species level information of the communities. However, screening the huge amount of data has been a challenge for both humans and computers.

This Master’s thesis project focused on visual clustering of plankton image data, by implementing and applying image clustering methods on plankton image data sets. These data set were collected from the Baltic Sea using an imaging flow cytometer. In order to form the clusters, first the features were extracted using AlexNet and ResNet-18. The extracted features of each CNN were clustered using the hierarchical and k-means algorithms and evaluated by checking the purity of the clustering result. The results showed that ResNet-18 is better in feature extraction and that for a small number of classes the k-means method has the highest level of purity. However, the hierarchical method shows higher purity in case the number of clusters is low. On the other hand, the Hierarchical method shows better purity when the number of classes is large.

(3)

First of all, I would like to thank Lappeenranta-Lahti University of technology and Peter the Great St.Petersburg Polytechnic University for giving me the opportunity to partic- ipate in this Double Degree program. This program was such a great experience that I believe would impact my whole life.

I would also like to express my sincere gratitude to Prof. Heikki Kälviäinen, Prof. Lasse Lensu, and D.Sc. Tuomas Eerola for their endless support and constructive supervision throughout this study.

Finally, I want to thank my parents and sister for their boundless encouragement and love, without them I would not have been here.

Lappeenranta, June 22, 2020

Mark Ibrahim

(4)

LIST OF ABBREVIATIONS

AE Auto Encoder

ADIAC Automatic Diatom Identification And Classification ANN Artificial Neural Network

CNN Convolutional Neural Network

CCNN Clustering Convolutional Neural Network DNN Deep Neural Network

fcv Fold Cross Validation

GAN Generative Adversarial Network GLCM Gray-Level Co-occurrence Matrix HAC Hierarchical Agglomerative Cluster IFCB Imaging FlowCytobot

LRN Local Response Normalization PCA Principle Component Analysis RGB Red Green Blue

SIFT Scale-invariant Feature Transform SVM Support Vector Machine

UCNN Unsupervised Convolutional Neural Network VAE Variational Encoder

(7)

1 INTRODUCTION

1.1 Background

Plankton are the huge biomass of very small organisms that dominated the aquatic systems. These plankton are one of the marine organisms supported by the pelagic environment (pelagic means ’open sea’). This type of organisms is known as powerless organisms for locating themselves or even resisting the current flow of the sea water and thus are calmly transported or move in the current’s sea direction [1]. As a result, the word

’plankton’ today refers to all drifting organisms which can appear in the form of plants, animals or even bacteria.

Moreover, the plankton are the main provider for the ecosystem services which are pro- vided through the aquatic systems. The aquatic systems are dominated by the plankton in the form of phytoplankton the producer, zooplankton the consumer and bacterioplankton the decomposer [2].

The phytoplankton as a producer is a critical factor of the food-chain providing food for bigger aquatic life forms fish, seabirds, penguins, seals and sharks. Besides, the environmental conditions are affected by the phytoplankton such as the water quality and oxygen contribution in the atmosphere, cycle nutrients, process many of the pollutants that humans dispose of through our water-ways and help to remove carbon dioxide from our atmosphere [3]. Although, all these roles the plankton are strongly influenced by water quality, because they cannot isolate themselves which may affect the population of plankton.

Research on planktonic systems is hampered by the bottleneck of acquiring species-level the information of the communities consisting of hundreds of microorganism species with generation times in the order of hours. Concealed plankton community dynamics reflect changes in the environmental forcing growth traits of competing species and multiple food web interactions. Recent technological advances have led to the emergence of automated imaging instruments with improving resolution and output rates up to tens of thousands of images per hour. By such means, it is possible to capture enough images of plankton population and produce real-time big data of plankton communities. This opens new hori- zons for testing core hypotheses in planktonic systems derived from macroscopic realms:

in community ecology, biodiversity research, and ecosystem functioning. However, it is absolutely infeasible for humans to screen through the millions of images collected

(8)

in order to classify the plankton species as in Figure 1. Big data calls for computer vision and machine learning approaches producing interoperable data across platforms and systems [2].

Research of phytoplankton community structures mainly relies on scientific researches to manually identify and count through microscopes, most of which belongs to the non- in situ category of observation methods. These traditional methods are time consuming, labour-intensive and require a high level of professional knowledge. On the other hand, computer vision is one of the leading technologies for the past few years which can carry on the human vision identification tasks automatically [4]. Therefore, a computer vision application is appropriate for automating the image classification of plankton species.

Computer vision uses machine learning to classify the input data into classes where the classifier should figure out how to assign input samples into correct output classes. In fact, there are four stages are that any computer vision system needs in order to be able to classify the images: acquire images, preprocess images, extract features and make decisions. Noting that for the decision making step there are several types of classification algorithms that can be used for example: linear classifiers, neural networks, decision trees, support vector machine, nearest neighbor, and random forest. Based on a few features these classifiers should be able to get the required information from the input data and to classify them into classes as an output [2].

Figure 1. Example phytoplankton images of different classes [2].

(9)

Despite the fact that, there are different types of classifiers as mentioned above that could be tested in order to get the best results, all these classifiers need supervised data. In other words, all the classifiers mentioned before depend that the input data they deal with is labeled. Maybe this is possible for a few applications to label the data, but it is laborious to label such a huge amount of data about thousands of species of plankton and often an expensive process. As a result, it leads us to the point that the classifier dealing with such an application of detecting and recognizing information from an image should be able to deal with unsupervised data where it can use the unlabeled images as an input and cluster them into different classes as an output according to the features explored [5].

Image clustering is the process of identifying groups or clusters of the input data based on similarity measurements or features. There are several algorithms to do the clustering for example: k-means, agglomerative hierarchical and mean-shift. These methods are also known as traditional methods of clustering where they rely upon predefined separation measurements which are difficult to identify on image datasets. Hence, the research has been directed to focus on a method where the images can be identified without any predefined data [6]. An unsupervised learning method which led to the inference of neural networks to be more specific the convolutional neural networks (CNNs). The unsupervised learning is a learning technique uses an unlabeled data where the model processes this data to learn some internal representation or important features that will be used to find out some unknown relationships. On the other hand, the supervised learning is another method where the data used is already labeled, the model will train on the labeled data and then a new labeled data will be tested and the model predicts the correct output [7]. The CNN will be used as it is known as the state-of-the-art and for its capability on extracting features from images before clustering. As the CNN is a neural network that has proven its ability to analyse essentially superior discriminative visual representations to images as it was trained on a large dataset consists of millions of images (ImageNet) [7]. So, then in the unsupervised learning case the extracted features help identifying the clusters of the unsupervised data where based on the common relationships between the features clusters are formed [8].

1.2 Objectives and delimitations

The main objectives of this thesis are the following:

1. To implement and to apply image clustering methods on plankton image data sets.

2. To analyze the performance of the different clustering methods used based on the

(10)

CNN used for feature extraction.

This thesis deals with phytoplankton, but the employed machine learning approaches are also applicable to other trophic groups of plankton. The classification task is limited to micro-images which is only collected from the Baltic Sea.

1.3 Structure of the thesis

Chapter 2 contains an introduction to phytoplankton taxonomy and explanation of how the plankton can be recognized automatically. Chapter 3 introduces the deep image clustering and contains in depth description of Convolutional Neural Network (CNN) components and how the networks can be trained and used in image clustering. Chapter 4 contains the description of the implemented methods for plankton recognition. In Chapter 5, the experiments and results are illustrated with used data and how did the process work. Then in Chapter 6 the results are discussed and the possible work that needs to be done in the future is described. Finally, Chapter 7 contains the conclusions of this thesis.

(11)

2 PHYTOPLANKTON IMAGE ANALYSIS

2.1 Phytoplankton taxonomy

Taxonomy is the science that includes identification and classification of biological organisms according to common features. A group of organisms of the same features is called taxon which then can be divided into smaller groups or branches for more specific features that are shared. The taxonomic idea is based on the grouping process to divide between animals and plants. The grouping process includes domain, kingdoms, division, class, order, family, genus and species. So for example, the plankton is a kingdom which then is splitted into divisions, each division is splitted into classes these classes are divided into smaller branches called order and finally the order is splitted into leaves called species. As noticed species are the lowest hierarchical stage of the taxonomic ranks for grouping and this means that these classes cannot be differentiated any more [2]. Phyto- plankton by characteristics is categorized under the photosynthetic microorganisms and it can be found in the lakes, rivers, and open seas [9]. Although, it is not totally clear if the species of phytoplankton is precisely defined, but sometimes it is categorized into more advanced taxonomic levels. Taking in consideration that classification can also happen in taxa depends on the phylogen of species, and their behavior in the ecosystem. Typical at- tributes utilized to identify plankton as taxa include form structure, dimensions and other traits. The taxonomy of phytoplankton from the Baltic Sea used in [2] that illustrates the taxonomic relations of different species or classes are in Figure 2 and 3.

2.2 Automatic plankton recognition

Plankton are an essential component factor of marine ecosystem dynamics and the pri- mary base for the food chain in the ocean. Studying the ecological and physical system alter the plankton population over a huge scale that can vary from centimeters to kilo- meters is a basic way to study how the marine ecosystems gets affected by the human activities and climate change. In order to make such studies, it requires a huge dataset with high-resolution and large-scale to be able to map the plankton multiplicity, taxonomy and size composition [10]. Furthermore, the implementation of the most common indicators usually needs a specific level of classification that considers time and expert training.

In the literature of [11] several work have been recorded to automatically identify diatoms. Diatoms are invertebrate organisms that are commonly used to analyse the river

(12)

quality. However, all the work recorded for automatic classification of diatoms was based on general state-of-art highlights, which are constrained and maybe not adequate to ad- dress this issue. As for an example, the analyzed number of species were limited according to the model capabilities and therefore the results were relatively poor which lead us to this understanding when the analyzed number of species increase the performance decrease. Such as in the ADIAC project (Automatic Diatom Identification And Classi- fication) [12] [13] where the automated diatom classification was held with a composed database that was ranging from 37-55 classes of the diatom taxa and the 171 features were used for classification. These features were meant to characterize the shape, geom- etry, diatom symmetry and texture by means of different descriptors like Gray-Level Co- occurance Matrix (GLCM), rectangularity, circularity, shape of poles, Fourier Transform, Scale-invariant Feature Transform (SIFT) and length-width ratio size. The preferable classifiers according to the performance are random forest and bagging decision trees, where all have been tested with 10-fold cross validation (fcv). The accuracy obtained was up to 97.7% using 38 classes by Fourier Transform and SIFT as texture descriptors with a random forest. The classification accuracy execution is being decreased to 96.17%, as the number of classes being increased to 55 using the same descriptors and classifier [14].

Also, following trials have been applied using new techniques based on convolutional neural networks. The testing happened in [15] using database consists of 30000 images belonging to 121 classes, but the result was not convenient with maximum performance of 73.90%. For the database of 30000 images belonging to 33 classes in [16] the obtained accuracy was 96.3%. Furthermore, one of the best results that were represented in [14], showed accuracy of 98.1% for classifying 80 species with 24000 samples which is considered as a huge dataset. In another study [17] and another huge dataset that consists of 55 species while using 1093 samples the accuracy result was 96.17%, but after doing another trial with less number of samples (837) the accuracy results has increased to 97.97%.

In [11] Deep learning CNN techniques has been used with a significant database composed of 160.000 samples where 11.000 diatom samples were labeled by an expert. There were 69.350 samples available after the first iteration of augmenting the data and after the image processing workflow. For the rest of the data further iterations of augmentation process were performed using rotations of 2^◦, in order to obtain the total number of samples. The accuracy result for one of the experiments held in pedraza2017automated was ranging between 95.62% to 98.81%, and the reason for accuracy variation is the different number of samples used for experimenting.

To summarize, Table 1 presents a summary of several different classification techniques

(13)

using different features in classifying diatom types, ranging between 14 to 80 species, with a database not larger than 30,000 samples. The highest accuracy obtained was 98.1%. In [11] the proposed method of applying AlexNet on different kind of preprocessed dataset such as experimenting with original dataset or segmented dataset or normalized dataset or original and normalized dataset with a different number of samples per class (300,700,1000) for each experiment. The highest accuracy obtained the accuracy obtained was 99.51%, that was by using the original and normalized dataset with 2000 samples per class.

Table 1.Summary of accuracy results for using different number of samples, different number of species and different classifiers [11].

Year[Reference] Num. Species Num. Samples Num. of Features and Type Classifier Accuracy(%)

2002 [12] 37 781 321 from Geometrical, Textural, Morphological and Frequency Bagging tree 96.9

2003 [18] 1 66 10 Morphological Multiple Discriminant Analysis 80.3

2012 [17]

38 837 30 Morphological and 200 Texture Random Forest 97.97

2016 [19] 14 10,000 4 Gemoterical 7 Moments and 33 Morphological SVM 94.7

2017 [14] 80 24,000 273 Morphological, Statistical, Textural, Space-Frequency Bagging Tree 98.1

2017 [11] 80 24,000 CNN-AlexNet Softmax 95.62

2017 [11] 80 160,000 CNN-AlexNet Softmax 99.51

(14)

Figure 2.Taxonomy of the classes in the dataset [2].

(15)

Figure 3. Continuation of Figure 2 [2].

(16)

3 IMAGE CLUSTERING

3.1 Image clustering methods

Image clustering is a fundamental topic for image processing and computer vision applications which attempts to form groups of unlabeled images based on their features. It has been used to solve the problem of organizing such a huge number of images [20]. Also, nowadays image clustering is an important process in pattern recognition and machine learning as the research has started in image clustering it was focusing on feature selec- tion but after a while research has started focusing on using features extracted which at this point requires using CNN pretrained neural network on ImageNet [8]. This CNNs role is to extract the features from the dataset introduced, which then will be clustered.

There are several methods to do the clustering for example: k-means clustering and agglomerative hierarchical clustering.

K-means is one of the most commonly used clustering algorithms it operates to identify groups inside the data presented. To find these clusters, it first identifies K centroids from the given dataset they are randomly chosen or extracted from a priori information. Then iteratively works on assigning each pattern in the dataset to the nearest cluster. The re- calculation process of the centroids according to the associated patterns keeps replicating until convergence is reached [21].

The pros in using this approach are that it is known to be one of the simplest implementing methods and it is fast. On the other hand, there are cons as the algorithm is data dependent and thus the user has to determine a certain number of clusters beforehand. Furthermore, if the data is strongly inconsistent then the algorithm does not produce the best results.

The algorithm always relies on the initial circumstances which may lead to a convergence of the algorithm into suboptimal solutions [22].

The hierarchical clustering algorithm is a method that create a dendrogram or a cluster tree and it has two know strategies either top-down or bottom up. The bottom up also called a hierarchical agglomerative cluster (HAC) which is an algorithm that generates cluster tree by initially treating each document as a single-tone cluster at the outset and then combining pairs of the cluster successively until a cluster tree is created [23]. Top-down also called "Divisive" initially the patterns are assigned to one cluster then it proceeds by dividing this cluster into more clusters in each stage until an individual pattern is obtained.

(17)

For the agglomerative hierarchical there are two popular algorithms first is a single link algorithm it combines the clusters whose distance is the smallest between their closest patterns. Second algorithm is a complete link deemed more efficient, combining the cluster which is the smallest difference between their most distant patterns. There is also a less popular agglomerative hierarchical algorithm called centroid method, which merges the clusters with a small distance between their centroids [22].

The problem of HAC algorithm is that it is computationally expensive, static for where an algorithm can never undo what is done (i.e. the assigned patterns cannot be moved to another cluster) and may not be efficient in separating overlapping clusters. On the other hand, the advantage of using this algorithm is that there is no need to specify a priori for the number of clusters [22].

3.2 Convolutional neural networks

CNNs are a class of deep learning neural networks that have proven a huge improvement in visual recognition and classification. The architecture of CNNs is motivated by the human vision system and the various sort of layers can be seen as simplified models of different cells found in a visual cortex. The CNNs are playing a huge role in pattern recognition. The default architecture of a CNN usually consists of consecutive layers the purpose of these layers is to represent the steps that the human vision systems (brain) takes to analyze an image. The architecture consists of an input layer, then hidden layers and finally the output layers. These hidden layers can consist of a convolutional layer, a pooling layer, an activation layer and a fully connected layer. In addition, usually the be- ginning of CNNs’s hidden layers are with a convolutional layer where features extraction from an input image take place and the last layer is a fully connected layer [2]. So, in general the architecture of hidden layers may differ from application to another it depends on the needs of feature extraction from the input. For example, a CNN structure for MNIST classification is as shown in Figure 4.

Next is that the input layer receives the pixels of each image as an input. Then, the output of the convolutional layer depends on the scalar product of the local region of the input image and the convolution filter [24]. Besides, there are several filters applied in the convolutional process this filter can perform different operations such as sharpening and edge detection. As a result, neurons which are interconnected computational nudes [24]

can extract the features such as oriented edges, end-points, corners. The output of the convolutional layer constitutes a feature map [25]. Accordingly, the pooling layer down-

(18)

samples the output from the convolutional layer. After that, the fully connected layer classifies the given output.

Figure 4. Overall architecture of the CNN includes am input layer, multiple alternating convolution and max pooling layers, one fully-connected layer and classification layer [7].

3.2.1 Convolution Layer

Convolution layers are the one of the main building blocks used in CNNs. A convolution layer operates on the output from the previous layer using the learnable kernels. These kernels usually have a smaller dimensionality than the dimension of the input, but it slides along the input image from the top left to the bottom right as in Figure 6 [7]. Each kernel or filter has its weights the scalar product of this kernel and the filtered part from the input image is calculated that is how the feature maps are created and they can visualized as in Figure 5 [24].

Furthermore, convolution layers have the ability to optimize the output in order to reduce the complexity of the model using three hyperparameters. First hyperparameter is the depth where the depth of the output volume generated by the convolution layer has an influence on the number of neurons in a layer that connects to the same input region. How- ever, it is advantage to be able to control the depth, but reducing it results in minimizing the number of neurons and reducing the efficiency of the model in pattern recognition.

Second hyperparameter is the stride where it allocates the depth around the spatial dimensions in order to place the receptive field. The higher the value of the stride, the less amount of overlapping and less spatial dimensionality for the output. On the other hand, if the stride value for example is one, this implies that the filter moves one pixel each

(19)

Figure 5. Visualization of the result of the first convolution layer showing how successfully it picked up the features of each digit [24].

Figure 6. Example of how the convolution layer operates when the kernel is placed on the input vector at the top left corner [24].

time resulting in incredibly large activations being produced by strongly an overlapping receptive field. Third hyperparameter is zero-padding it is the basic method of padding the input border and control of the output volume spatial dimensionality [24].

The spatial dimensionality of the convolution layer output can be computed as [24]

(V −R) + 2z

S+ 1 (1)

where V represents the input volume size, the receptive field size is R, the amount of zero padding set is Z and the stride is represented by S. If the result of this equation is not an integer, then the strides are wrong as the neurons cannot be tiled symmetrically to match over the given input.

(20)

3.2.2 Pooling Layer

The purpose of the pooling layer is to trim the dimensionality, the feature maps (representation) and to reduce the amount of parameters and computation in the network.

The pooling layer is usually operated on the output of the activation function layer using

“MAX” function in order to create a new feature map with lower dimension. Usually, the filter size or the size of the pooling operation is 2x2 applied with a stride 2 along the feature map. This means that the pooling layer will reduce the feature map down by 25%

from the original size. So, for example if the feature map is 6x6 which means 36 pixels, then when applying the pooling filter it will be scaled down to 9 pixels 3x3 [24].

There are two generally methods of max-pooling, first one is as mentioned above where the stride is 2 and the filter is 2x2. Second method is called overlapping pooling, where the stride size is set to 2, but the filter size is 3x3. Furthermore, there are general-pooling that may be included in the CNN depending on the architecture. In a general-pooling layer there are a couple of operations that can be performed such as L1/L2 normalization and average pooling which calculates the average value for each patch on the feature map [24].

3.2.3 Fully-connected Layer

A fully connected layer is known as feedforward networks the neurons in the fully connected layer are directly connect to two adjacent layers without being connected to any other layer between them.This arrangement of the neurons is the classical form of Artifi- cial Neural Networks (ANNs) as in Figure 7 [24]. Also, in the CNN the input of the fully connected layer is the output from the final pooling or convolutional layer as in Figure 4.

3.3 Deep image clustering

Deep clustering is clustering with deep learning the concept behind it was introduced in [26]. The main idea of clustering is separating the data into identical data features considering that the performance of clustering methods is highly dependent on the input.

Usually, the clustering is applied to different dataset accordingly the similarity measures and separation techniques required are different. Therefore, representation learning and dimensionality reduction have been used parallelly with clustering in order to create a

(21)

Figure 7.Feedforward neural network [24].

feature space where is would be easier to separate the input. Thanks for the deep neural network (DNN) made it easier to learn non-linear mappings without manually extracting features [27]. Deep clustering consists of DNN, network loss and clustering loss where these components aim to learn low dimensional non-linear data representations from the input. The loss function of deep clustering algorithms is a combination of unsupervised representation learning loss referred to asL_nand clustering loss referred to asL_c. They are formatted as

L=λLn+ (1−λ)Lc, (2) Whereλis a hyperparameter between 0 and 1 that balances the impact of two loss func- tions. The network loss aims for initializing of deep neural networks and it refers to the reconstruction loss of an autoencoder (AE) one of the powerful methods for training a map function ensuring the lowest reconstruction error between the data layer and the coder layer. Also, it refers to the variational loss of a variational encoder (VAE) or the adversarial loss of a generative adversarial network (GAN) [28]. Usually, the clustering loss is introduced by changing theλhyperparameter after several epochs. The clustering loss function has two kinds principle clustering loss (cluster assignments) and auxiliary clustering loss (cluster regularization). Principle clustering loss is the kind of algorithm that provides cluster assignments and cluster centroids directly to the data such as k- means loss, cluster assignment hardening loss, agglomerative clustering loss. Auxiliary

(22)

clustering loss is another algorithm that forces the network to preserve suitable discriminant information from the data for clustering, such as locality-preserving loss and group sparsity loss [28].

(23)

4 PLANKTON IMAGE CLUSTERING

4.1 Clustering pipeline

A general flowchart that shows the pipeline for the evaluation of image clustering of plankton dataset based on CNN is presented in Figure 8. First, the images are preprocessed. Second, applying the CNN on the preprocessed images for extracting the features from the data. Third, cluster the extracted features.

Figure 8. Flowchart shows the pipeline for the evaluation of image clustering of plankton dataset based on CNN.

The output of this pipeline will then be analysed by studying the samples inside and comparing the results of different clustering method used based on the features extracted by different CNNs with the original dataset.

4.1.1 Preprocessing

As mentioned in the pipeline of this work in order to use any CNN either for classification or feature extraction the dataset should be preprocessed first according to the required size of the CNN that will be used. Therefore, by looking through the plankton dataset there were two obstacles from the CNN point of view, first all the images are one channel and

(24)

most of the CNNs requires three channel (RGB) images. The second problem was that all the images needs to be resized to the appropriate size of the CNN that is going to be applied as it differs from CNN to another, so for example the AlexNet input size is 227x227x3 and for the ResNet-18 input size is 224x224x3.

4.1.2 Feature Extraction

Feature extraction involves viably decreasing the amount of information that must be processed, whereas still precisely and totally describing the original dataset. In deep neural networks, the features are learned automatically and represented hierarchically at multiple levels. In this thesis, AlexNet [7] and ResNet-18 [29] are applied on the plankton dataset to extract the features from the plankton images after augmenting the images to the CNN’s input size.

4.1.3 Clustering

There are several methods that can be used for clustering the extracted features. In this work, hierarchical algorithm were used as it creates a cluster tree by having small clusters and merging them to larger clusters. After studying and evaluating these clusters this tree formed can then be compared with the taxonomy to check the performance of this method. Also, k-means method is used as it operates to identify groups inside the data presented and the results will be used to measure the performance of the hierarchical method according to the extracted features [22].

4.2 CNN-based feature extraction

There are several well-known CNNs for classification and feature extraction, as generally speaking the CNNs architecture consists of two main parts: feature extractors and a classifier as shown in Figure 4. The nodes of the convolution layer extract the features from the input data through convolution operations on the input nodes. In this thesis, the CNNs proposed for feature extraction are AlexNet [7] and ResNet-18 [29].

(25)

4.2.1 AlexNet

AlexNet was inspired by LeNet [30] it can be considered the update of LeNet as it has more filters per layers and stacked with convolution layers. AlexNet is a deep neural network model that consists of 8-layers first five are convolutional layers and the remaining three are fully-connected layers as illustrated in Figure 9.

Figure 9. Architecture of AlexNet [7].

According to the structure of this network, the input layer filters the input by choosing only 227x227x3 images where 3 is the number of color channels for the image. Then, the output of this layer is the input of the first convolution layer that performs convolution and max-pooling with Local Response Normalization (LRN) where 96 different receptive filters used are 11x11 in size [31]. The size of the filters used in the max-pooling operation in the first convolutional layer is 3x3 with a stride 2 same size of the filter is used also for third, fourth and fifth convolutional layer with 384, 384 and 296 feature maps respectively.

In the second convolutional layer the same operation with the same sequence happens as in the first layer, but the filter size of max-pooling operation is 5x5. Next are two fully connected layers and this is where the feature representation of the data happens followed by soft-max layer as the last layer [7].

In order to make sure that all the data is accepted by the CNN the whole data will be augmented. Augmentation is the simplest and most common way to resize, rotate and reflect the data using label-preserving transformations [31].

(26)

4.2.2 Resnet-18

ResNet is a deep neural network model that won the first place in ILSVRC 2015 classification competition with the top-5 error rate of 3.57% in [29] and became one of the most popular neural networks for computer vision tasks. The goal of the residual network was to create an ultra-deep network that can overcome the vanishing gradient problem of the predecessors by skipping over layers. It was developed by Kaiming He, Xiangyu Zhang, Shaoqing and Jian Sun [29] and it was developed with different variants where each vari- ant has a different number of layers for example there are ResNet-18, 50 and 101. In this thesis, ResNet-18 was used for feature extraction from the images dataset it consists of 18 layers the input size is 227x227x3 and the total number of the parameters for this model is 11.7M [7].

ResNet is known as a feedforward network with residual layers. In any regular neural networks, each layer feeds into the next layer. On the other hand, with residual blocks each layer feeds the next layer and skips two to three layers away as shown in Figure 10 and 11 this feature is known as “identity shortcuts” and it happens directly if the input and output dimension are similar. The final output y of the residual block can be represented mathematically [29] as in

y=F(x) +x, (3)

where F(x) represents the output from the side where a couple of operations can be performed for example a convolution followed by batch normalization accompanied by an activation function on x where x represents the output from the previous layer. Taking into account that these operations differ from network to another.

Figure 10.Residual block [29].

(27)

Figure 11.Architecture of ResNet-18 [32].

However, in case the dimensions of F and x are not similar the shortcut still operates identity mapping withW_sto match the dimensions [29] as in

y=F(x) +W_sx. (4)

4.3 Clustering

4.3.1 Hierarchical clustering

The Agglomerative hierarchical clustering is known as Hierarchical clustering. This algorithm considers initially each data point as an individual cluster then at each iteration the nearest two clusters are merged together into a cluster until the number of clusters required are formed as shown in Figure 12. In other words, the hierarchical clustering algorithm starts with many small clusters and then keeps merging those small clusters until forming the required number of clusters [5].

4.3.2 K-means clustering

K-means clustering is referred to as centroid-based clustering [20], it aims to find groups in the data, depending on the number of groups required by variable K. The algorithm randomly chooses k as the initial cluster centroid from the input, then compute each sample to find its closest cluster centroid and finally the centroids updating are recalculated until convergence is achieved as shown in Figure 13. In simpler words, k-means minimizes the sum of square errors between data point and their nearest cluster centers [5].

(28)

Figure 12. Agglomerative hierarchical clustering algorithm (Bottom to top algorithm) where it starts with many small clusters and keep merging forming less and larger clusters [33].

Figure 13. Figure indicating changes in cluster representative locations (indicated by ‘x’ signs) and data assignments (indicated by color) during an execution of the k-means algorithm [34], where (a)The originl dataset. (b) Random initial cluster centroids. (c-f)Two iterations of k-means are carried out and in each iteration each training example is assigned to the nearest cluster centroid.

(29)

5 EXPERIMENTS AND RESULTS

5.1 Data

The data used for the experiments has been captured using Imaging FlowCytobot (IFCB).

Imaging FlowCytobot is the developed version of the imaging flow cytometer, and it was specially developed to be able to sample the plankton of the size range 10 to 100µm.

IFCB has the ability to gather the data using the combination of video and flow cytometric technology where using this combination it can identify the organisms using the captured images and measure the chlorophyll fluorescence connected with the image. In addition, IFCB has photomultiplier tubes that are used to decide if a particle will be imaged or not as in Figure 14 based on the light scatter and fluorescence observed from the particles hit by the device laser [2].

Figure 14.Phytoplankton images caputred based on the light scatter and fluorescence detected [2].

The dataset used was collected by Kraft et al. from the Marine Research Centre of the

(30)

Finnish Environment Institute and is related to the FastVision project [35]. The data collection happened of different timing first data collection was during autumn 2016 and it was collected from ALgaline ferrybox system of M/S Finnmaid and Silja Serenade.

Second data collection happened from spring 2017 till summer 2017 it has been collected from Utö Atmospheric and Marine Research Station. The dataset collected was collected as grayscale images and the sizes of the images varies, as the vertical axes of the image can range from 21 pixels to 770 pixels and the horizontal axes vary within the range 52 to 1359 pixels. The number of images captured for each class were not uniform, so there are classes that can have thousand samples and other classes that has just one sample. The number of samples for each class and the names are in Table 2

5.2 Evaluation criteria

The main objective of the evaluation criteria is to find the best clustered dataset. There are several methods that can be used to compare the results of this clustering method, the idea involves comparing the results with the standard data. So, in order to be able to do the evaluation process there should be standard results (for example expert annotated ground truth) or the results of another method to which the clustering method will be compared.

Here in this thesis, the standard results that are used to compare with will the biological taxonomy of the plankton defined by experts.

The performance of clustering is measured by purity. Purity is known as transparent and simplest for evaluation measures. The idea of purity method is assigning the clusterw_kto the classescj that occupies most of the samples inside this cluster. The accuracy is then measured by taking the maximum number of samples of the class samples that has highest presence inside each cluster and dividing the number of properly assigned samples by N as

purity(Ω, C) = 1 N

X

k

maxj |w_k∩c_j| (5) where N represents the total number of all samples,k is cluster index and j is class index.

For example, in Figure 15 there are three clusters and three classes where Class 1 is expressed as ’x’, Class 2 is defined as ’o’ and Class 3 is defined as ’/.’. The first cluster contains five samples of Class 1 and one sample of Class 2 which means that Cluster 1 will be assigned for Class 1. Second cluster has four samples of Class 2, one sample of Class 1 and one sample from Class 3 which means Cluster 2 will be assigned for Class 2. Third cluster has two samples of Class 1 and three samples of Class 3 which means

(31)

Table 2.Classes and number of samples per each class according the dataset.

class samples class samples

Akinete 14 Heterocapsa rotundata 164

Amoeba 1 Heterocapsa triquetra 1757

Apedinella radians 3 Heterocyte 162

Aphanizomenon flosaquae 585 Katablepharis remigera 83

Aphanizomenon sp 3 Melosira arctica 24

Aphanothece paralleliformis 29 Merismopedia sp 141

Beads 99 Mesodinium rubrum 64

Binuclearia lauterbornii 14 Mesodinium rubrum shrunken 73

Centrales sp 111 Microsporales 159

Ceratoneis closterium 34 Monoraphidium contortum 19

Chaetoceros danicus 1 Nanoplankton 1657

Chaetoceros resting stage 8 Nitzschia paleacea 4

Chaetoceros similis 9 Nodularia spumigena 79

Chaetoceros sp 819 Nodularia spumigena heterocyte 4

Chaetoceros subtilis 2 Nostocales 3

Chaetoceros throndsenii 12 Gymnodinium sp 2

Chlorococcales 73 Oocystis sp 474

Chroococcales 106 Oscillatoriales 15

Chroococcus small 1474 Pauliella taeniata 13

Ciliata 26 Pauliella taeniata resting stage 7

Ciliata strawberry 23 Pennales 1

Cluster A 72 Pennales sp basic 5

Coscinodiscus granii 2 Pennales sp boxy 87

Cryptomonadales 303 Pennales sp curvy 22

Cyclotella choctawhatcheeana 57 Peridiniella catenata chain 37

Cyst like 145 Peridiniella catenata single 13

Dino small funny shaped 978 Prorocentrum cordatum 231

Dinophyceae over 20 2 Pseudopedinella sp 89

Dinophyceae under 20 142 Pyramimonas sp 393

Dinophysis acuminata 18 Rotifera 2

Dinophysis sp 2 Scenedesmus sp 13

Dolichospermum Anabaenopsis 444 Skeletonema marinoi 810 Eutreptiella sp 41 Snowella Woronichinia sp dense 2385

Flagellates 205 Snowella Woronichinia sp loose 1360

Gymnodiniales 25 Teleaulax sp 122

Gymnodinium like 33 Thalassiosira baltica 141

Cluster 3 will be assigned for Class 3. Calculating N by summing up all these samples is equal to 17 then taking the maximum between w the set of clusters and its intersection with the C set of classes. Since, in case this example Cluster 1 was assigned for Class 1, therefore maxj|wk∩cj| here is 5, Cluster 2 was assigned for Class 2 which means that maxj|wk∩cj| here is 4 and Cluster 3 was assigned for Class 3 which means that

(32)

Figure 15.Purity example to illustrate cluster quality [23].

maxj|wk∩cj|here is 3. Summing 5+4+3 and dividing by 17, the result of purity is 0.71.

The purity result is always measured between 0 and 1, if the purity result is zero this means poor clustering and if the purity result is 1 then the clustering is perfect [23].

5.3 Description of experiments

The dataset used consists of 16561 images of plankton and the aim is to implement and apply the image clustering methods on this dataset and then analyze the clusters according to the biological taxonomy of the plankton. Hence, the following steps were implemented in order to analyze the clusters: first the images should be converted from grayscale (1 channel) to a RGB scale (3 channel) as the whole samples captured used IFCB was 1 channel images. Second, is where the images will be resized using augmentation according to the appropriate size of the CNN used. For AlexNet, the required input size is 227x227x3 and for ResNet-18 the required input size is 224x224x3. Third, is getting the feature representation of the dataset using “activations” function on a fully connected layer for AlexNet or the global pooling layer for ResNet-18. Fourth, is the clustering there are two clustering methods used for clustering “Hierarchical” and “K-means” these methods will cluster the features extracted according to the assigned number of clusters. Final step is the evaluation step using the purity method to check the level of purity on a scale from 0 to 1, where 1 means perfectly pure clustering and zero is the lowest purity level.

This evaluation results will be comparing the attitude of different clustering methods and different neural networks used.

Using the mentioned steps three experiments were carried out. Experiment 1 was done using Alexnet and ResNet-18 as the CNNs for feature extraction and for clustering hierarchical method and k-means method. Firstly, 5 classes with 64 random samples for

(33)

each class were selected and AlexNet was applied on the selected dataset for feature extraction the output was clustered using hierarchical method as the first clustering method and then k-means method. Secondly, the same data was used for feature extraction using ResNet-18 for fair comparison and the output features were clustered with hierarchical method and k-means method. The experiment was repeated several times for each clustering method under each CNN as the number of clusters were changed every time starting with 3 clusters up to 11 clusters. Then the purity results of every N of clusters for each algorithm based on each CNN were visualized in a plot.

Experiment 2 the number of classes were increase to 6 classes, 73 samples were chosen randomly from every class and the range of clusters were changed from 3 to 13. The same steps were followed for extracting the features using the AlexNet and ResNet-18 the output of both networks was clustered using both hierarchical algorithm and k-means algorithm. The results of both clustering methods are evaluated with purity method.

Experiment 3 the number of classes were increased to 26 classes, 100 samples were chosen randomly and as the number of samples has increased the clustering range were also increased so it starts with 26 clusters up to 100 clusters. The same steps were followed for extracting the features using the AlexNet and ResNet-18 the output of both networks was clustered using both the hierarchical algorithm and k-means algorithm. The results of both clustering methods are evaluated.

5.4 Results

5.4.1 Experiment 1

Here in Experiment 1 the clustering results were visualized after minimizing the features extracted using PCA in a 3D plot as shown in Figure 16, where in order to visualize the clusters it was given different colors and the classes were given different symbols.

The clusters formed were then studied by counting the number samples for each class inside each cluster and calculating the percentage of each class inside each cluster in accordance to the total amount of samples per the clusters formed as shown in Tables 3, 4, 5 and 6.

Final step was calculating the purity level of each clustering method based on the features extracted from the two CNNs used the clustering process for this step were then experi-

(34)

(a) (b)

(c) (d)

Figure 16. (a) 3D plot illustrates the result of clustering using AlexNet and hierarchical method.

(b) 3D plot illustrates the result of clustering using AlexNet and k-means method. (c) 3D plot illustrates the result of clustering using ResNet-18 and hierarchical method. (d) 3D plot illustrates the result of clustering using ResNet-18 and K-means mehtod.

mented on a different number of clusters to compare the purity level with the amount of clusters the range of clusters that were experimented for this step were within the range of 3 to 11 clusters as illustrated in Appendix 1. The results of this step were visualized as in Figure 17 that shows k-means method has higher level of purity based on the features extracted from ResNet-18 CNN followed by the hierarchical method that was showing a better purity level till 6 clusters, but as the number of clusters has increased the hierarchical method purity level dropped down and k-means took over.

5.4.2 Experiment 2

The results of purity level are illustrated in Appendix 2 and visualized in Figure 18 for checking the performance of each method by comparing the results of purity level for this experiment with the previous one it is noticeable that the purity has increased as

(35)

Table 3. Table shows the number samples of each class inside each cluster and the percentage of each class inside each cluster in accordance to the total amount of samples per the clusters for hierarchical method based on features extracted from AlexNet.

Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 # of samples Katablepharis remigera 0 (0%) 0 (0%) 0 (0%) 64 (34.972%) 0 (0%) 64

Mesodinium rubrum 0 (0%) 0 (0%) 0 (0%) 38 (20.76%) 26 (78.78%) 64 Nodularia spumigena 7 (100%) 37 (100%) 0 (0%) 13 (7.1%) 7 (21.212%) 64 Pennales sp boxy 0 (0%) 0 (0%) 60 (100%) 4 (2.185%) 0 (0%) 64 Pseudopedinella sp 0 (0%) 0 (0%) 0 (0%) 64 (34.97%) 0 (0%) 64

# of samples/cluster 7 37 60 183 33 320

Table 4. Table shows the number samples of each class inside each cluster and the percentage of each class inside each cluster in accordance to the total amount of samples per the clusters for k-means method based on features extracted from AlexNet.

Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 # of samples Katablepharis remigera 64 (37.86%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 64

Mesodinium rubrum 36 (21.3%) 0 (0%) 0 (0%) 28 (80%) 0 (0%) 64 Nodularia spumigena 5 (2.9%) 9 (27.27%) 36 (100%) 7 (20%) 7 (14.89%) 64 Pennales sp boxy 0 (0%) 24 (72.72%) 0 (0%) 0 (0%) 40 (85.11%) 64 Pseudopedinella sp 64 (37.87%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 64

the number of clustered has increased. Also, Still k-means based on ResNet-18 CNN for feature extraction is showing better purity compared with the hierarchical based on ResNet-18.

5.4.3 Experiment 3

In this experiment due to increasing the number of samples and increasing the number of clusters as illustrated in Appendix 3 the purity level has decreased were previously the highest purity level was around 0.86, but here the highest purity level is 0.63. Also, as shown in Figure 19 the hierarchical method based on ResNet-18 took over k-means.

(36)

Table 5. Table shows the number samples of each class inside each cluster and the percentage of each class inside each cluster in accordance to the total amount of samples per the clusters for hierarchical method based on features extracted from ResNet-18.

Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 # of samples Katablepharis remigera 0 (0%) 0 (0%) 1 (1.61%) 63 (51.22%) 0 (0%) 64

Mesodinium rubrum 0 (0%) 0 (0%) 0 (0%) 2 (1.63%) 62 (81.57%) 64 Nodularia spumigena 15 (100%) 37 (84.1%) 0 (0%) 0 (0%) 12 (15.79%) 64 Pennales sp boxy 0 (0%) 7 (15.9%) 56 (90.32%) 1 (0.81%) 0 (0%) 64 Pseudopedinella sp 0 (0%) 0 (0%) 5 (8.06%) 57 (46.34%) 2 (2.63%) 64

Table 6. Table shows the number samples of each class inside each cluster and the percentage of each class inside each cluster in accordance to the total amount of samples per the clusters for k-means method based on features extracted from ResNet-18.

Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 # of samples Katablepharis remigera 0 (0%) 0 (0%) 64 (46.71%) 0 (0%) 0 (0%) 64

Mesodinium rubrum 13 (86.66%) 0 (0%) 8 (5.84%) 43 (100%) 0 (0%) 64 Nodularia spumigena 2 (13.33%) 34 (100%) 0 (0%) 0 (0%) 28 (30.77%) 64 Pennales sp boxy 0 (0%) 0 (0%) 1 (0.73%) 0 (0%) 63 (69.2%) 64 Pseudopedinella sp 0 (0%) 0 (0%) 64 (46.71%) 0 (0%) 0 (0%) 64

Figure 17. Purity results for 5 classes, 64 samples for each class clustered using hierarchical and k-means method within the range (3-11) based on AlexNet and ResNet-18.

(37)

Figure 18. Purity results for 6 classes, 73 samples for each class clustered using hierarchical and k-means method within the range (3-11) based on AlexNet and ResNet-18.

Figure 19. Purity Evaluation results for 26 classes, 100 samples for each class clustered using hierarchical and k-means method within the range 26-100 based on the features extracted AlexNet and ResNet-18.

(38)

6 DISCUSSION

6.1 Current study

Despite the advanced level of technology in capturing images and collecting data sets with high resolution and in a short time, the technology has been hampered in identifying or grouping the collected data according to their biological species. In the presented study, two different clustering methods were briefly described and used to form clusters from the plankton image dataset and their performance were analyzed. The collected data by Imaging FlowCytobot was unlabeled these methods can deal with such unsupervised data. First after augmenting these images with deep neural network AlexNet and ResNet- 18 were implemented on these data to extract the features. The extracted features of these CNNs were then applied to the mentioned clustering methods in order to form clusters.

The samples inside each cluster formed where then studied by calculating purity metrics.

The purity level of the results for each method was calculated. Depending on the results of the experiments carried out, ResNet-18 is better than AlexNet in extracting the features from the data. The clusters formed from the features extracted from the ResNet-18 obtained higher level of purity than the features extracted by AlexNet. However, the CNNs were not that much affected by the number of classes of the number of samples, but the purity of clusters formed were then affected. When the number of classes are small which means also low number of samples, k-means clustering purity level is higher the hierarchical clustering. On the other hand, when increasing the number of classes that means increasing the number of samples, hierarchical method has higher purity level. Knowing that as the number of samples increase the level of purity of decreases as the numbers shows in Experiment 2 the purity level was within the range of 0.75 to 0.86, but when the number of samples popped up the purity level was within the range of 0.45 to 0.56.

However, the number of clusters in Experiment 3 was five times larger than the number of clusters carried in Experiment 2.

6.2 Future work

The study of image clustering for unsupervised data of plankton should be extended by trying different CNNs for feature extraction and use the same clustering methods to no- tice the effect of CNNs, as here for example ResNet-18 was showing better results than AlexNet. Also, as an extra step add another clustering method such as divisive clustering which is the inverse of hierarchical clustering as it starts with big cluster and then keep

(39)

breaking down to smaller clusters and start comparing the results with the methods used before. Additionally, due to the time limitation analyzing the clusters formed with respect to the biological taxonomy was not achieved in this thesis work, it would be an impressive way for comparing the results to prove the level of accuracy of the model used and this would be the most reliable way in evaluating the accuracy of the clusters formed.

(40)

7 CONCLUSION

The goal of this thesis was to test a model that is based on the image clustering methods in order to perform unsupervised plankton datasets. Nowadays with respect to the advanced level of computer vision there are now equipment capable of capturing thousands of images for plankton in a short time and with high resolution but not yet capable of identifying and grouping the species and families. After choosing AlexNet and ResNet- 18 as the two different deep neural networks for feature extraction the whole dataset was resized according to the required input size of each network. The output then was clustered using two different clustering methods hierarchical and k-means then the resulted of clusters were evaluated according to their purity. According to, the experiments carried out k-means method forms better clusters using the features extracted from ResNet-18 in case of a small number of classes, samples and clusters. On the other hand, hierarchical method takes over the k-means method for creating more pure clusters based on the features extracted from ResNet-18, when there is a huge number of classes, samples and clusters.

(41)

REFERENCES

[1] Carol Lalli and Timothy R Parsons. Biological oceanography: an introduction.

Elsevier, 1997.

[2] Osku Grönberg. Plankton recognition from imaging flow cytometer data using convolutional neural networks. Master’s thesis, Lappeenranta University of Technology, Finland, 2018.

[3] Iain Suthers, David Rissik, and Anthony Richardson. Plankton: A guide to their ecology and monitoring for water quality. CSIRO publishing, 2019.

[4] Qiong Li, Xin Sun, Junyu Dong, Shuqun Song, Tongtong Zhang, Dan Liu, Han Zhang, and Shuai Han. Developing a microscopic image dataset in support of intelligent phytoplankton detection using deep learning. ICES Journal of Marine Science, 2019.

[5] Jianwei Yang, Devi Parikh, and Dhruv Batra. Joint unsupervised learning of deep representations and image clusters. InProceedings of IEEE Conference on Com- puter Vision and Pattern Recognition, pages 5147–5156, 2016.

[6] Jianlong Chang, Lingfeng Wang, Gaofeng Meng, Shiming Xiang, and Chunhong Pan. Deep adaptive image clustering. InProceedings of IEEE International Confer- ence on Computer Vision, pages 5879–5887, 2017.

[7] Md Zahangir Alom, Tarek M Taha, Christopher Yakopcic, Stefan Westberg, Pa- heding Sidike, Mst Shamima Nasrin, Brian C Van Esesn, Abdul A S Awwal, and Vijayan K Asari. The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164, 2018.

[8] Joris Guérin and Byron Boots. Improving image clustering with multiple pretrained cnn feature extractors. arXiv preprint arXiv:1807.07760, 2018.

[9] Colin S Reynolds. The ecology of phytoplankton. Cambridge University Press, 2006.

[10] Xiaoou Tang, W Kenneth Stewart, He Huang, Scott M Gallager, Cabell S Davis, Luc Vincent, and Marty Marra. Automatic plankton image recognition. Artificial Intelligence Review, 12:177–199, 1998.

[11] Anibal Pedraza, Gloria Bueno, Oscar Deniz, Gabriel Cristóbal, Saúl Blanco, and María Borrego-Ramos. Automated diatom classification (part b): a deep learning approach. Applied Sciences, 7, 2017.

(42)

[12] Hans Du Buf. Automatic diatom identification, volume 51. World Scientific, 2002.

[13] Hans Du Buf, Micha Bayer, Stephen Droop, Ritchie Head, Steve Juggins, Stefan Fischer, Horst Bunke, Michael Wilkinson, Jos Roerdink, José Pech-Pacheco, et al.

Diatom identification: a double challenge called adiac. In Proceedings 10th In- ternational Conference on Image Analysis and Processing, pages 734–739. IEEE, 1999.

[14] Gloria Bueno, Oscar Deniz, Anibal Pedraza, Jesús Ruiz-Santaquiteria, Jesús Salido, Gabriel Cristóbal, María Borrego-Ramos, and Saúl Blanco. Automated diatom classification (part a): handcrafted feature approaches. Applied Sciences, 7, 2017.

[15] Yuming Kuang. Deep neural network for deep sea plankton classification. Technical report, Technical Report, 2015.

[16] Jialun Dai, Zhibin Yu, Haiyong Zheng, Bing Zheng, and Nan Wang. A hybrid convolutional neural network for plankton classification. In Asian Conference on Computer Vision, pages 102–114. Springer, 2016.

[17] Ivica Dimitrovski, Dragi Kocev, Suzana Loskovska, and Sašo Džeroski. Hierarchi- cal classification of diatom images using ensembles of predictive clustering trees.

Ecological Informatics, 7(1):19–29, 2012.

[18] Janice L Pappas and Eugene F Stoermer. Legendre shape descriptors and shape group determination of specimens in the cymbella cistula species complex. Phy- cologia, 42(1):90–97, 2003.

[19] Queenie TK Lai, Kelvin CM Lee, Anson HL Tang, Kenneth KY Wong, Hayden KH So, and Kevin K Tsia. High-throughput time-stretch imaging flow cytometry for multi-class classification of phytoplankton. Optics Express, 24(25):28170–28184, 2016.

[20] Chih-Chung Hsu and Chia-Wen Lin. Cnn-based joint clustering and representation learning with feature drift compensation for large-scale image data. IEEE Transac- tions on Multimedia, 20(2):421–429, 2017.

[21] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009.

[22] Mahamed GH Omran, Andries P Engelbrecht, and Ayed Salman. An overview of clustering methods. Intelligent Data Analysis, 11(6):583–605, 2007.

(43)

[23] Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Natural Language Engineering, 16:192–195, 2010.

[24] Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks.

arXiv preprint arXiv:1511.08458, 2015.

[25] Yann LeCun and Yoshua Bengio. Convolutional networks for images, speech, and.

The handbook of brain theory and neural networks, 1995.

[26] John R Hershey, Zhuo Chen, Jonathan Le Roux, and Shinji Watanabe. Deep clustering: Discriminative embeddings for segmentation and separation. In2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 31–35. IEEE, 2016.

[27] Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, Maximilian Strobel, and Daniel Cremers. Clustering with deep learning: Taxonomy and new methods. arXiv preprint arXiv:1801.07648, 2018.

[28] Erxue Min, Xifeng Guo, Qiang Liu, Gen Zhang, Jianjing Cui, and Jun Long. A survey of clustering with deep learning: From the perspective of network architecture.

IEEE Access, 6:39501–39514, 2018.

[29] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.

[30] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–

2324, 1998.

[31] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. InAdvances in Neural Information Pro- cessing Systems, pages 1097–1105, 2012.

[32] MATLAB. Resnet-18 convolutional neural network. https://www.

mathworks.com/help/deeplearning/ref/resnet18.html, 2020.

Online; accessed 16 June 2020.

[33] GeeksforGeeks. Hierarchical clustering (agglomerative and di-

visive clustering). https://www.geeksforgeeks.org/

ml-hierarchical-clustering-agglomerative-and\

-divisive-clustering/, 2020. Online; accessed 16 June 2020.

(44)

[34] Chris Piech. Based on a handout by Andrew Ng. K means.https://stanford.

edu/~cpiech/cs221/handouts/kmeans.html, 2020. Online; accessed 16 June 2020.

[35] FastVision project, Lappeenranta-Lahti University of Technology (LUT), Finland.

http://www2.it.lut.fi/project/fastvision/index.shtml, 2020. [Online; accessed June, 11, 2020].

(45)

The purity results of clustering 5 classes with 64 samples starting from 3 clusters to 11 clusters.

no.of clusters AlexNet/Hierarchical AlexNet/K-means ResNet-18/Hierarchical ResNet-18/K-means

3 0.3875 0.400 0.5844 0.5719

4 0.5531 0.4969 0.7406 0.6281

5 0.5781 0.600 0.7406 0.7219

6 0.6156 0.675 0.7406 0.7312

7 0.6156 0.6813 0.7406 0.7813

8 0.6156 0.725 0.7719 0.8063

9 0.7312 0.7438 0.8406 0.8625

10 0.7312 0.7656 0.8406 0.8719

11 0.7312 0.7844 0.8406 0.8844

Image clustering for unsupervised analysis of plankton data