Open-set plankton recognition

(1)

Subhashree Rautray

OPEN-SET PLANKTON RECOGNITION

Master’s Thesis

Examiners: Assoc. Prof. Tuomas Eerola Prof. Galina Fedorovna Malykhina Supervisors: Assoc. Prof. Tuomas Eerola

Prof. Lasse Lensu Prof. Heikki Kälviäinen

(2)

Lappeenranta-Lahti University of Technology LUT School of Engineering Science

Computational Engineering and Technical Physics Computer Vision and Pattern Recognition

Subhashree Rautray

Open-set plankton recognition

Master’s Thesis 2020

74 pages, 31 figures, 9 table, 2 appendices.

Examiners: Assoc. Prof. Tuomas Eerola Prof. Galina Fedorovna Malykhina

Keywords: phytoplankton, open-set recognition, convolutional neural network, pattern recognition

Research on the plankton community is often dealt with acquiring species-level information of individual micro-organisms. The efficiency is limited when manually analyzing a massive collection of plankton image data generated by modern imaging devices. How- ever, this process can be automated with computer vision and machine learning algorithms to provide results more efficiently. In a practical situation, the model needs to be able to handle unknown class inputs or non-plankton particles which were not present during the training process. This research work focuses on the open-set classification of plankton image data that allows the deep networks to classify or categorize plankton to their respective class, considering the functionality of rejection of unknown class samples. Two open-set methods that are OpenMax and Deep Open Classification (DOC) were implemented with the Convolutional Neural Networks (CNNs) to address this classification task. Based on multiple experiments, the method DOC outperformed the OpenMax for open-set plankton recognition.

(3)

deeply indebted to my sister and brother, Lopa, Madhu, Swaraj, for their constant support, motivation, and great confidence in me all these years.

Subhashree Rautray

(4)

LIST OF ABBREVIATIONS

ANN Artificial Neural Network CNN Convolutional Neural Network DNN Deep Neural Network

DOC Deep Open Classification GPU Graphical Processing Unit IFCB FlowCytoBot

ISIIS In Situ Icthyoplankton Imaging System KKC Known Known Class

KUC Known Unknown Class MAV Mean Activation Vector SGD Stochastic gradient descent SVM Support Vector Machine UUC Unknown Unknown Class

WHOI Woods Hole Oceanographic Institution

(7)

tides and currents. Phytoplankton are mostly single-celled organisms that perform pho- tosynthesis in which the process is to generate energy using chlorophyll and sunlight to support them. They take up carbon dioxide and release oxygen like other plants making them one of the most important producers of oxygen. These are also known as primary producers as they produce the first form of food. Zooplankton are mostly the microscopic animals (worms, krills, etc.) that eat phytoplankton which is in turn eaten by larger animals. They are known as primary consumers.

Plankton [2] are essential to have a healthy marine ecosystem, as the aquatic food chain heavily depends on them. Plankton are very sensitive concerning their environment, change in climate, and rising sea temperature it might pose serious risks to plankton environment. In past years, researchers have been following the lives of plankton, examining their distribution in the ocean and their behavior if any impact on their ecosystems.

The research [3] [4] on the plankton population is vital to understand the species-level information generally their community composition in the marine environment. As their population can be hampered due to external factors like global warming, increasing in carbon dioxide, and decrease in oxygen concentration, etc which in turn might cause disturbances in the ocean ecology. Also over an increase in the plankton population can result in the disruption of the whole ecosystem as some of the organisms have harmful effects.

To study their ecology and ecosystem functioning [5], modern imaging devices can be utilized to obtain image data of plankton. Due to the availability of a large number of images, recognizing the plankton manually from images is not very efficient. So there is a need for automatic classification of plankton species using the image data efficiently within a short time frame with minimal supervision. Some of the taken images might be non-plankton for which experts does not have any background information that can be called species from an unknown class or group.

(8)

This thesis is a part of the FastVision project [6]. It is a cooperated project with Finnish Environment Institute (SYKE) [7]. Gathering plankton species-level information is very important in the field of environmental research. The project aims to automate plankton recognition tasks by implementing computer vision and machine learning methods.

Due to recent technological advancement, continuous underwater images of plankton were generated. They are needed to be recognized to their class level for further research.

If compared traditional plankton sampling process, a real-time automated recognition system can save a lot of time. This thesis focuses on computer vision and machine learning techniques to recognize the plankton and also to handle images which are unknown mean- ing they do not belong to any known class (species) as shown in Fig. 1. The methods are focused on replicating the human vision to understand and process a large amount of data so to automate the plankton classification.

Figure 1.Example of plankton images with different known and unknown classes.

The use of the deep neural networks for the plankton classification tasks is considered to have created a huge breakthrough in detecting and classifying the images. Deep learn-

(9)

Traditional classifiers built on CNNs follow close-set recognition, which works with the categories of images in the training set same as those in the test set. The deep network with open-set methods enables open-set recognition. That means the classifier handles the situation with an unknown class image during the testing phase. So it should correctly classify the known class images and reject the images which are unseen during the training phase instead of classifying them wrongly as one of the known classes.

1.2 Objectives and delimitations

This thesis work focuses on the open-set recognition of plankton data. The aim is to develop a method that can classify the known class images while rejecting the unknown class.

The main objectives of this thesis work are as follows :

1. To survey and implement existing CNNs based open class classification methods.

2. To implement selected methods.

3. To quantitatively evaluate the performance of the methods on phytoplankton image data.

In this thesis work, only phytoplankton image data are considered for the open-set classification but the methodology is applicable for other similar data.

(10)

1.3 Structure of the thesis

Descriptions for the convolutional neural networks and some of its architecture are introduced in Chapter 2. In Chapter 3 details about plankton recognition using handcrafted feature and Deep Neural Network (DNN) are given. Chapter 4 covers the description of the open-set methods utilizing CNNs. Chapter 5 explains the details about the phytoplankton dataset and experiments with their results. Chapter 6 discussed the open-set plankton recognition with the open-set methods and their results. Possible future work are mentioned in Section 6.2. Chapter 7 have the conclusion of this thesis work.

(11)

future events. An ANN is constructed using the input, hidden, and output layers. Each layer consists of connected processing units, which are neurons that are interconnected between different layers.

2.1.1 Neuron

Neurons [12] are the fundamental component of the neural network. They are inspired by neurons in the human brain, which are interconnected, worked together to give a certain response. It can be expressed mathematically which is also named as an artificial neuron.

These neurons are the nodes in the neural network model through which the information and computations flow. They calculate the weighted sum of their inputs and pass the sum through an activation function.

2.1.2 Feedforward neural network

Feed-forward networks [11] are a specific type of neural network and consist of an input layer, hidden layers, and an output layer. A simple feed-forward neural network is shown in Fig. 2 [13]. The information in the whole network propagates in one direction, input to hidden and from hidden to output layer. Each incoming connection to a neuron is associated with a weight. The weights are the most important parameters of a neural network model, and the whole training process aims to adjust the model’s weights. Once the neurons get their input from its preceding layer, it adds up all other signals multiplied by their respective weight then passes them on to a non-linear activation function. After getting the output value of the neuron calculated by the activation function, the value is passed on to the next layer of the neural network. The computation continues in the feedforward until the computed value finally arrives at the output layer. During the training process, the weights are repeatedly updated till the network model shows the desired output. The feed-forward networks are universal approximators [14]. This means that

(12)

a continuous function can be approximated by the network to any degree of accuracy.

However, this does not say anything about the optimality of the network.

Figure 2. A feedforward neural network. [13]

2.1.3 Network training

After the neural network is designed, the model can be trained. This training process involves the network to learn the desired output. The most common learning method is supervised learning [11] where the given data to the network model contains inputs or features associated with their label or target output. This can be called as training data which is required to train the neural network. The network gets assigned randomly with some initial weights, and then the training process starts using the training data. The goal of the training is to match target output values. To measure how well the network is doing can be checked from the error or loss function. The network is trained to reduce the loss to have better accuracy.

Gradient descent algorithm [15] [11] is most commonly used optimization algorithm in machine and deep learning. The algorithms are used to estimate the local minimum of a function. It has three types that are batch gradient descent, stochastic gradient descent, mini-batch gradient descent. When solving the minimum value of the loss function, gradient descent algorithms used iteratively to solve step by step to obtain updated model parameters. The parameters are the weights and biases of the network. Stochastic gradient descent (SGD) is the most widely used optimization algorithm. It is very efficient for training neural network as its updates a network parameter for each training example [15]:

(13)

Gradient descent performs the learning process for the neural network using the gradient of the error function. As part of the optimization process, these gradients are computed using a back-propagation algorithm. The gradient denotes the desired change in the weights and biases of the network layer. The total error of the neural network formulated as [16]:

E = 1 2

X

c

X

j

(y_j,c−d_j,c)² (2)

wherec represents the index of the input-output pairs,j is the index over outputs , y is the actual output of the input, and d is the measured output. When training the neural network, back-propagation propagates the error backward through the network. It is done by calculating the partial derivative of the E with respect to each parameter using the chain rule of calculus.

2.2 Convolutional neural networks

CNNs [17] [9] [18] are a type of a DNN that showed great success in image processing and classification task. CNNs [11] specializes in processing data that has a grid-like topology, such as images having a two-dimensional grid of pixels. The most important benefit over traditional ANNs is their ability to share weights, which reduces the number of parameters learning and detect meaningful features from a given input image. This is done by using a kernel or filter having a size less than the input. In CNNs kernel is a weight vector that moves over an input image to perform dot product on a subset of the input image. The kernel extracts the important features like edges that occupy less pixel size than the input, making the network model work with fewer parameters. Weight sharing in the network refers to utilizing the kernel’s weights among all the pixels of an input image. So there is no need to learn a new set of parameters for every other location of the input image.

(14)

2.2.1 Building blocks

CNNs are composed of several layers including convolution layers, pooling layers, and fully-connected layers. A simple CNNs architecture is illustrated in Fig. 3. [19]

Figure 3. Overall architecture of a CNNs. [19]

The building blocks of the CNNs are as follows:

• Convolution Layer

The convolution operation [18] [13] provides feature maps from the given input image to the network. The convolutional functionality is done between a convolution filter and an input image. A filter have a weight vector that gets slid through the input horizontally and vertically to extract various features. This process is done several times usingN filters to produceN distinct feature maps.

• Pooling Layer

The pooling layer [13] used to lessen the size of the feature maps, thus saving computational time and memory. Also, it can retain the important features that are captured during the convolution operation. The pooling operation is done by using a window of selected size on each feature map independently. There are two common types of pooling operation, which are max pooling and average pooling.

Fig. 4 [18] shows the max pooling operation using a window 2×2 with 4×4 feature map, the resultant output is 2×2 which is decreased in dimension.

(15)

• Fully Connected Layer

A fully connected layer [18] is similar to a traditional neural network layer where each neuron in one layer is connected to every neuron in another layer. The last pooling layer’s output is given as input to the fully connected layer in the form of a single input vector. The CNNs can contain more than one fully-connected layer.

Computation is done based on the dot product between the input vector and the weights in order to generate a final classification result.

• Softmax Layer

For multi-class classification, the softmax layer follows the fully-connected layer.

It uses softmax activation function to transform the output of fully-connected layer to the probability distribution overndifferent classes [11]:

sof tmax(x)i = exp(x_i) Pn

j=1exp(x_j) (3)

wherexis the input vector andnis the number of classes. So for the classification task, ith element of the vector produced by the softmax activation function from Eq. 3 corresponds to the probability of the input of the network of belonging to the ith class.

2.2.2 Architecture

Many variations of CNNs architecture have already been developed in past years. The first successful CNNs architecture was LeNet [20]. Common architectures are listed as follows:

• LeNet [20] was developed by Yann Lecun in 1989 and used for the image recognition task. The most recent work of this architecture is LeNet-5 which consists

(16)

of a total of seven layers including three convolutional layers, two pooling or sub sampling layers and two fully connected layers. The architecture of LeNet-5 shown in Fig. 5. [20]

Figure 5. Architecture of LeNet-5. [20]

From Fig. 5, the first convolutional layer C1takes an input 32×32 grayscale image and generates six 28×28 feature maps using six convolution filters. S2layer is the sub-sampling layer also called as pooling layer. After the pooling operation, the dimension of the feature maps decreased to 14×14. C3is the second convolutional layer that generates 16feature maps of size 10×10. Then again, another sub-sampling layerS4results in16feature maps of size reduced to 5×5. C5layer is a convolutional layer with120feature maps of size 1×1. LayerF6is fully connected to layerC5and having84units. Finally, the last layer is a fully-connected layer with10nodes in total since there are10categories for classification.

• AlexNet [21] model was trained on a large-scale visual recognition application to classify images. The architecture includes a total of eight layers, out of which five are convolution layers and three fully connected layers. Max-pooling layers followed the first two convolutional layers. The second convolution layer uses256 filters of dimension 5×5×48 to transforms its input from the first convolutional layer. The third, fourth and fifth convolution layers are directly connected to each other. The fifth convolution layer’s output, followed by a max-pooling layer, goes into a series of two fully connected layers. The last layer, the fully connected layer, produces a distribution covering1000categories of labels. The model used multiple Graphical Processing Units (GPUs) in order to have a faster convolution operation.

The architecture design of AlexNet illustrated in Fig. 6 [21].

(17)

Figure 6.Overall architecture of AlexNet. [21]

• Residual neural network (ResNet) [22] are extremely successful for training large deep networks without facing a problem like vanishing gradients. Vanishing gradient is a kind of unstable behavior that makes the neural network harder to train.

To eliminate this problem the model use residual network structure. The residual network uses shortcut connections which have the functionality to skip some layers and have a direct connection to the output. ResNet also solves problem known as degradation which is a situation that leads to a decrease in accuracy of the network model when having deeper networks. Residual networks are made up of residual blocks, as shown in Fig. 7 [22]. x is the input to the network, and H(x) = F(x) +x is the expected output of the parameterized network. The residual learning is dedicated to learning the difference between the input and the output, which is the H(x) -xusing multiple parameterized network layers. xrepresents the direct iden- tity mapping, and H(x) -x represents the residual between input and output that the network layer learns. The network training goal is to achieve the residual value 0.

Figure 7.Basic unit of residual learning. [22]

(18)

• GoogLeNet [23] or Inception-V1 is a 27 layers deep architecture, including the pooling layers. The architecture is constructed using nine inception modules which are stacked linearly. At the end of the last inception module, the global average pooling layer is used instead of fully connected layers that give the average of every feature map. Fig. 8 [23] showing a naive inception module and an inception module with dimension reductions. A naive inception module consists of three filters of different sizes that are 1×1, 3×3, and 5×5, which operate on the same level.

The outputs were concatenated together and sent to the next inception module after the pooling operation. The convolution operations become very efficient by using different filter sizes rather than a filter of large size. The inception module with dimension reduction introduces 1×1 convolutions with 3×3, 5×5 to limit the number of input channels. This helps to reduce the computational cost of deep networks.

(a) (b)

Figure 8.Inception Module: (a) A naive inception module; (b) Inception module with dimension reductions. [23]

• Inception-V2 [24] is an improvement based on its previous version, Inception-V1.

Inception networks are computationally efficient in terms of the number of parameters generated by the network and memory consumption. Generally, the convolution operations are computationally expensive if consider the filter size like 5×5 or 7×7. Inception-V2 introduces the idea of factorization into smaller convolutions and asymmetric convolutions. Factorization into smaller convolutions split the higher dimension convolutions into lower dimension convolutions. Fig. 9 [24]

shows a basic inception module. Fig. 10 [24] shows that the layer having 5×5 filter as shown in Fig. 9 is replaced by two layers of 3×3 filters. So using filter 5×5 in a layer gives 25number of parameters and using two 3×3 filters gives only18 number of parameters, reducing the number of parameters by 28%.

(19)

Figure 9. An inception module. [24]

Figure 10.Modified inception module with two convolution 3×3 replacing 5×5. [24]

Factorization into asymmetric convolutions represents that a layer with n×n filter can be replaced by n×1 filter followed by 1×n filter as shown in Fig. 11 [24].

(20)

Figure 11.Modified inception module after asymmetric convolutions. [24]

For example, a 3×3 filter having nine parameters can replace by 3×1 and 1×3, which decreases the size of the parameters by 33%. Inception-V2 promotes higher representations of dimensions as shown in Fig. 12 [24] which extracts more number of feature maps with the factorized filters. An increase in features helps the network to learn faster. Inception-V2 uses an auxiliary classifier, which is a small network inserted between the layers. It improves the convergence speed of the deep network. Table 2 [24] gives the detailed information about Inception-V2 architecture. The input size of Inception-V2 net is 299×299×3. The network contains the inception modules. Three inception modules are used, taking the input of size 35×35×288. The size of the input is reduced to 17×17×768 due to the grid reduction technique. Then five inception modules are used to perform the factorized convolutions, followed by two inception modules giving an output of size 8×8×2048.

The grid size reduction method is realized by pooling after the convolution operations to efficiently reduce the feature map’s size. The reduction method decreases the computation complexity of the network.

(21)

Figure 12.Inception module with higher dimensional representation network. [24]

Table 2.Outline of the Inception-V2 architecture. [24]

type patch size/stride or remarks input size

conv 3×3/2 299×299×3

conv 3×3/1 149×149×32

conv padded 3×3/1 147×147×32

pool 3×3/2 147×147×64

conv 3×3/1 73×73×64

conv 3×3/2 71×71×80

conv 3×3/1 35×35×192

3×Inception As in Fig. 10 35×35×288

pool 8×8 8×8×2048

linear logits 1×1×2048

softmax classifier 1×1×1000

• Inception-V3 [24] is an variant of Inception-V2 and uses the same architecture as Inception-V2. All the features of Inception-V2 are present in Inception-V3 but contain additional functionalities. The network uses the factorize 7×7 convolutions by replacing into two 3×3 convolutions. The Inception-V3 adapts Label Smoothing as a regularization technique to avoid overfitting and to prevent the network from becoming too confident in predicting a specific category.

(22)

2.2.3 Regularization

To prevent the neural network from overfitting regularization methods are used. Overfit- ting [25] is a situation where the model learns the training data set too well to an extent that it hampers the performance of the network on the new data set. Common regularization methods which are used to address overfitting are listed as follows:

• L1andL2regularization [11],L1regularization required to add extra penalty term which is an absolute value of weight parameterswwith the taken loss function:

LossF unction=Loss+λX

||w|| (4)

where λ is a hyperparameter that needs to be chosen. For L2 regularization the penalty term is the squared value of weights added to the loss function:

LossF unction=Loss+λX

||w||² (5)

• DropOut [25] is a regularization method in which the output of some layers in the network gets ignored or can be called dropped out during the training process so to reduce the occurrence of overfitting. When the dropout is applied to the neural network it makes the network as thinned network as shown in Fig. 13 [25]. The gradient of the model gets updated as the units in the hidden layers randomly selected and dropped during the training phase. Dropout improves the performance of the network but takes more time to train the network.

(a) (b)

Figure 13. Neural network before and after dropout: (a) A Neural Network; (b) Structure of the neural network after the dropout. [25]

• Data Agumentation [11] is a process based on developing more number of image

(23)

(24)

3 PLANKTON RECOGNITION

3.1 Plankton imaging

The study on plankton images [5] provides information like the estimation of species sizes, their distribution, identification and categorization of species to their class. How- ever, this takes lots of time if done manually. Several plankton imaging systems and recognition systems have developed to automate all this process.

Plankton imaging systems [5] [26] are necessarily required to detect and analyze the plankton organisms. Several techniques have already existed for plankton imaging. In early days plankton samples were collected using niskin bottles, towed nets, pumps. Tra- ditional methods opted by researchers required to sample seawater and monitor the plankton in a laboratory which is quite labour-intensive and time-consuming. Also during sampling and transportation delicate species gets destroyed their original form. Efficient plankton imaging systems aims to address these issues. It helps to generate information about the species abundances, counting and classification of individual plankton to their corresponding group.

The first in situ silhouette camera system [27] was proposed in 1981. It provided underwater images of zooplankton by a camera placed in the plankton net. The system efficiently recognized most of the zooplankton genera. Later a video plankton recorder (VPR) [28]

was developed. It analyses plankton of sizes from0.1to10 m min real-time and quanti- fies their abundances. The extension of the VPR model developed as VPRII [29] which studies plankton of size range 100µm–1 cm. The system takes high-resolution images of plankton and measures their sizes. It is also integrated with a software called visual plankton that is used to identify plankton to major taxa. Fig. 14 shows the various plankton images with their class name that captured using the VPRII device.

(25)

Figure 14. Plankton images capture by VPRII. (A) Pteropods; (B) radiolarian; (C) marine snow;

(D) cladocera; (E) rotifers; (F) polychaete; (G) Echinoderm larvae. [29]

Shadowed Image Particle Profiling and Evaluation Recorder (SIPPER) [26] is an in Situ imaging system that collects a vast number of plankton pictures. The instrument is capa- ble of taking images of plankton having sizes from micrometer to centimeter scale. The system helps to generate accurate counting, sizing, and identification of plankton.

The electronic holographic camera [30] was developed to study plankton dynamics and composition. It allows fast video recording of all species of sizes from50µmto10 m m.

The system captures high-resolution images of plankton without doing any damages to the species.

Another effective imaging device is Flow Cytometer and Microscope (FlowCam) [31] that provides detailed analysis of the plankton community. It has the ability to perform flow cytometry, image and microscopic analysis. Flow cytometry is a process used to identify and count the cells. FlowCam generates and stores digital images of plankton which are used to record their shapes and sizes. The system avoids shrinkage and destruction of plankton shapes during the process of their storage and preservation.

(26)

The imaging FlowCytoBot (IFCB) [32] is a submersible flow cytometer used to study plankton of size10to150µm. To detect plankton it uses scattering light signals and fluorescence light generated from a laser beam. IFCB produce high-resolution images that used for classification of plankton to their group. These images are also used for estimating the chlorophyll fluorescence present in them. Chlorophyll fluorescence analysis is done for understanding plankton patterns by analyzing cells associated with them. Fig. 15 shows the captured phytoplankton images by the IFCB device.

Figure 15.Phytoplankton images taken by IFCB.

3.2 Automatic plankton recognition

Plankton imaging devices generate a large number of plankton images. Manual annota- tions of plankton species are very time-consuming. So automatic plankton classification is required to address this issue. Different approaches have already been developed to solve this classification task, such as using handcrafted feature extraction and deep learning techniques with CNNs.

(27)

Hu and Davis [34] proposed a method for classification which uses the support vector machine as the classifier with co-occurrence matrices as image features. The elements of co-occurrence matrices is a pair of pixels with gray levels i and j, a spatial distancedand angleθin an image. Support vector machine [11] is a supervised learning algorithm that performs binary classification, it is a two-class classification model. Training a SVM is to find the best separating hyperplane that distinctly classifies the data samples in higher- dimensional space. The method worked on a dataset having20000plankton images of7 different classes. The method showed overally 72% of classification accuracy.

Tang et al. [35] presented a method which used more than3000 images of 7classes for binary plankton image classification. The method uses normalized multilevel dominant eigenvector estimation (NMDEE) that showed better performance on extracting features than any other conventional methods.

Mosleh et al. [36] developed a method using ANN to automate the recognition and classification for some common algae which differ in shapes and sizes. The architecture of the whole system is shown in Fig. 16 [36]. Before the classification, image segmentation and feature extraction process are done on images. After the removal of noises from the images, image segmentation is done to detect edges of algae. The feature extraction is done to get different characteristics of algae from the images like area, shape, etc. Based on these features, classification is done using artificial neural network. The developed method gave 93% of accuracy rate tested on selected freshwater algae data.

(28)

Figure 16.Flowchart of overall classification system. [36]

Zheng et al. [5] proposed a system that does automatic plankton classification. It used multiple view features with multiple kernels learning systems. The multiple kernels learning process combines different kernel matrices computed from multiple views features and feeds to the multiple classifiers. To describe best features of plankton, it combines features like Inner-Distance Shape Context with the general features. The system used three data sets to perform a classification test, out of which Woods Hole Oceanographic Institution (WHOI) dataset showed better results.

3.2.2 Automatic plankton recognition using deep neural network

Li et al. [37] introduced deep residual networks for the classification of plankton data.

The dataset is trained on three residual networks having a depth of 19layers, 32layers, and50layers. It used30336gray-scale plankton image data for the training process. Data augmentation techniques are used to expand the size of each class so to prevent overfitting.

The32-layer residual network model gave the best accuracy of 95.8% and took0.1s for the testing process. Fig. 17 shows the overall architecture of the classification model.

The images were resized to 256×256 before the training process, and the output has the probability distribution values for the121classes.

(29)

Figure 17.The residual network architecture. [37]

Lee et al. [38] proposed a plankton classification model that use a dataset having more 3 million images for over 100 classes. However, the dataset suffers from an imbalance distribution class samples. To overcome this issue, the method adapted transfer learning, trained a CIFAR10CNN classifier with class normalized data. Then fine-tuned the pre- train model with the original data sample. The model performed better with respect to classification accuracy, even if with the uneven class samples.

Orenstein and Beijbom [39] introduced a method to classify plankton using labeled data from In Situ Icthyoplankton Imaging System (ISIIS) and IFCB. The method uses CNNs based architecture for classification on different plankton data sets. It also performs experiments by fine-tuning the network to test the similarity between the taken data sets.

The method focused on the transfer learning technique when using out-of-domain data.

It applies CNNs from scratch using AlexNet architecture to train the IFCB and ISIIS image data sets. The network performs well with respect to classification accuracy when fine-tuned from ImageNet.

Cui et al. [40] introduced a plankton classification system using hybrid CNN using Gaus- sian filtering for features extraction. The hybrid neural network is based on the AlexNet architecture with a separately added concat layer before the first convolution layer. This is used to combine different inputs to the neural network so to achieve input features fusion.

The whole model efficiently extracts the features with improved classification accuracy.

Lumini and Nanni [41] presented plankton classification models using single and com- bined CNNs architecture networks that are trained on different strategies. The networks were fused together and showed higher performance than the single networks. A various set of experiments were performed utilizing feature-based transfer learning and fine- tuning approaches. The model evaluated three different plankton datasets and showed

(30)

DenseNet is the best model for the plankton classification problem.

Schröder et al. [42] introduced a software tool called MorphoCluster, which efficiently annotate the plankton images. The software implemented a method based on unsuper- vised learning that analyzes the patterns from given data and combines the related image data into clusters. The process of clustering depends on the features which are extracted using DNN from images. The features are clustered into significant or homogenous clusters. The cluster formation goes through a series of steps from cluster approval to cluster growing iteratively, as illustrated in Fig. 18 [42] until desired clusters are found. The figure shows the clusters are validated, grown, and rejected. The cluster gets presented to the annotator for its class naming after getting them in hierarchical order. The image annotation technique helps to annotate images without looking at each image as images with similar features are grouped in clusters.

Figure 18.Process of MorphoCluster. [42]

Eerola et al. [43] introduced phytoplankton classification using AlexNet [21] based architecture. The classification utilizes compact CNNs architecture which aims to take less computational space and make the network train faster. The overall classification result

(31)

performed better than the other evaluated networks.

(32)

4 OPEN-SET CLASSIFICATION

4.1 Background and basic concepts

So far, the deep neural network has greatly impacted visual recognition applications such as object recognition, multi-class classification, etc. The adapted environment is commonly based on the closed set assumptions, which means the training and the test data of each category are known during any operation and does not work with any unseen categories of data [45]. In a real-world scenario such as plankton recognition, there is a high possibility chance of encountering unknown categories of data. Suppose an example of a plankton multi-classifier. Its objective is to classify certain types of plankton, but another class of plankton might run into the classifier that was not present during the network training phase, so the accuracy of the classifier gets affected when this situation occurs.

Traditional classifiers mostly work on the categories with Known Known Classes (KKCs) which means the sample data having the class label information, but not consider Un- known Unknown Classes (UUCs) [46]. Open set classification focuses on UUCs. It handles the unseen classes during the testing time while having incomplete knowledge about the dataset exist at the training phase. It enables the classifier to classify the known classes accurately but also effectively handles the unknown data by rejecting or ignoring it instead of having it misclassified as a known class. Depending on the requirement, various open-set classification methods have been proposed based on traditional machine learning and deep neural networks.

4.2 Traditional methods

The 1-vs-Set Machine [47] method uses Support Vector Machine (SVM) for open set recognition in a supervised learning environment. The objective of SVM [48] is to find the best separation hyperplane in the feature space that distinctly classifies positive and the negative samples. The 1-vs-Set Machine method added another hyperplane parallel to the decision boundary generated by the SVM. Any sample that falls in the middle of these hyperplanes can be taken as correctly classified. It introduced the concept of minimizing the open set risk. It means minimizing the risk of unknown and balancing the training error (empirical risk). Fig. 19 [47] showsA, and Ωare two planes where class ‘dog’ is the target classified class and frog, birds are the negative class samples. Other categories, raccoon and unknown?are from the class which are not known. Oval images are from

(33)

Figure 19. Two hyperplane are generated by SVM A andΩseparating negative and unknown class sample. [47]

The nearest neighbors distance ratio [49] is a multi-class open-set classifier that can recognize unknown samples which are not present during the testing time. The classifier uses a threshold value applied on the similarity scores considering two similar classes.

The proposed method uses a technique for enabling open-set recognition called Nearest Neighbor Distance Ratio (NNDR) [49]:

L= d(s, t)

d(s, u) (6)

wheresis the test sample andt,uare the selected nearest neighbour with respect to s and comes from the different classes,d(s,t) is Euclidean distance between sandtandd(s,u) is Euclidean distance between s andu in the feature space. If the value of Lis less or equal to a selected threshold value then the sample datasis classified as the same class as totherwise, it classified as an unknown data sample.

4.3 Open-set classification with convolutional neural networks

DNNs based methods to solve the open set classification has showed better performance than the traditional machine learning techniques, specifically with the CNNs which has powerful feature extraction capabilities. For this thesis work, we have considered CNNs

(34)

as our base architecture and augments open-set methods for rejection of UUCs.

4.3.1 OpenMax method

Bendale and Boult [50] proposed a method using OpenMax layer with convolutional neural networks. The architecture with the method is shown in Fig. 20.

.

Figure 20.Architecture of OpenMax layer with CNNs

Typically, the softmax function is used as the activation function for multiclass classification. However, for open-set classification, it’s not suitable. OpenMax layer is considered an extension to the softmax layer, which can recognize the unknown class data.

For an input, the OpenMax layer estimates its probability of being unknown. It allows the rejection of an image as from an unseen class. The method outputs the recalibrated activations also with the score for the unknown class. To estimate this, it adopted the idea of Meta-Recognition to the deep neural networks. The scores from the penultimate layer, also called the activation vector, are used to estimate if the input is far from the training data. The method computes the Mean Activation Vector (MAV) and distances for each class, only considering the correctly classified training samples. The distance is computed between the training sample’s activation score and their corresponding class MAV to get class-specific distance distribution. Different distance measures have been taken into account to calculate these distances, such as euclidean-cosine distance, cosine distance, euclidean distance.

(35)

added pseudo activation score for the UUCs. The method’s goal is not to find out the class of input, rather determine whether an input is from an unknown class or not, so it can be rejected.

4.3.2 Deep open classification

Shu et al. [51] built a multi-class classifier for text classification which is a DOC using CNNs with a 1-vs-rest final layer of sigmoids instead of softmax. The classifier minimizes the open space risk using Gaussian fitting that allows rejection of unknown class samples.

Network layers of the classifier is illustrated in Fig. 21 [51].

Figure 21.Network structure of DOC. [51]

The DOC [51] network architecture have the embedded layer that takes the input and converts it into dense vectors. The second layer performs convolution operation over the dense vectors by taking different sizes of filters. Then the max-pooling layer is used to generate the feature vectorh. The feature vector is reduced todvector havingmdimen- sion, wherem is the number of known classes. 1-vs-rest layer serves as an output layer

(36)

which addresses the rejection of unknown class sample. It containsm sigmoid functions that correspond tomknown classes. Fork-thsigmoid activation function corresponding to classc_k, the method takes all positive examples asy=c_kand others as negative exam- plesy6= c_k. The model is trained with the summation of all binary cross-entropy loss of msigmoid activation functions [51] as follows:

Loss =

m

X

k=1 n

X

j=1

−I(y_j =c_k) logp(y_j =c_k)

−I(y_j 6=c_k) log (1−p(y_j =c_k)),

(7)

whereIrepresent as an indicator function,p(y_j =c_k) =Sigmoid(d_j,_k) which is the probability output from thek sigmoid function.

The DOC method [52] can also be used for image classification while ignoring the unknown class images. It considers the correctly classified training samples to fit a gaussian model and determines its mean and standard deviation. It is done for all the seen classes.

After prediction for a test image, this method finds out the maximum value from predicted Sigmoid probabilities. This value compares against a threshold that is calculated separately with respect to the predicted class. If the predicted probability value is less than the threshold value, it belongs to an UUCs otherwise, it belongs to the predicted class.

4.3.3 Entropic open-set loss and objectosphere loss

Dhamija et al. [53] proposed a method to improve the known class classification accuracy while decreasing the effect of unseen input data. The method considers KKCs and available Known Unknown Classes (KUCs) as background classes to deal with open-set classification. The KUCs are the samples with no meaningful class information which can be taken as negative samples. The method contributed Entropic Open-Set and Ob- jectonosphere as loss functions for handling unknown data samples. During the training of known class samples with KUCs samples, the method does not form any softmax output for KUCs. The entropic open-set loss is designed to maximize entropy for background class samples at the softmax layer. Using the Objectonosphere loss function increases the model’s performance by reducing the magnitude of the features for the training samples coming from the background classes. So for the unknown sample, it lowers the softmax responses. On thresholding the softmax response scores, the method effectively deals with ignoring the unknown samples.

(37)

tion error [54]:

II −loss=



 1 N

K

X

j=1

|C_j|

X

i=1

kµ~_j −z~_ik²₂



−

1≤m≤K,m+1≤n≤Kmin kµ~_m−µ~_nk²₂

(8) where the number of training samples denoted as N, the number of available known classes isK, |C_j| is a number of training samples presented in the classC_j,−→µ_j is classC_j mean and−→zi is the output vector of the last linear layer of the network. Using the II-loss function increases the distance between the samples of different classes and decreases the samples distance from its class mean. The cross-entropy loss is applied to the output of the softmax layer. The network, as illustrated in Fig. 22 [54] is first trained to minimize the II-loss, then in the next step, it minimizes the cross-entropy loss. The method detects the unknown class samples during the testing phase by using an outlier score. This outlier score is the minimum distance of a testing sample to the known class means. After calculating the outlier score, the values are used to identify unknown samples using a selected threshold value.

Figure 22.Architecture of network with II-loss and cross entropy loss. [54]

(38)

5 EXPERIMENTS

In this chapter, the required dataset and preprocessing steps needed to develop a model for the open-set classification are discussed. For analyzing model performance, different metric evaluation criteria are showed. The description of experiments and their results considering open-set methods are mentioned.

5.1 Data

For this thesis work, the phytoplankton data is used for training CNNs and then to evaluate open-set methods. The data is given by Finnish Environment Institute (SYKE) [7]. The plankton data are captured using a IFCB [32] from baltic sea. IFCB is a submersible imaging flow cytometer that generates images of particles from the aquatic environment.

The device takes images of underwater particles ranging in size from10to150µm, with an image resolution of about3.4pixels per µm. The data contains50different classes of phytoplankton which were manually labeled by the experts. It consists of63074images, where the number of samples in each category is imbalanced. The differences in the number of class samples are large. For example, phytoplankton class Dolichospermum- Anabaenopsis have 12280 images, while some classes have images below100. Table 3 shows phytoplankton classes with their respective numbers of sample images.

The available images are in grayscale with various dimensions. Some of the phytoplankton images from the dataset of different classes are shown in Fig. 23. Before the training process to generate CNNs model, the data samples have been preprocessed according to the deep network requirement considering the input size of the samples. For testing the open-set classification, the data are divided into balanced and imbalance test sets. The balanced test set consists of an even number of samples for each class, where as the imbalance test set consists of an uneven number of samples in each phytoplankton class.

(39)

Phytoplankton samples Phytoplankton samples

Amylax triacantha 19 Heterocapsa Rotundata 614

Aphanizomenon flosaquae 6989 Heterocapsa Triquetra 3276

Aphanothece paralleliformis 29 Heterocyte 263

Beads 125 Katablepharis Remigera 54

Centrales Sp 480 Licmophora Sp 74

Ceratoneis Closterium 45 Melosira Arctica 43

Chaetoceros sp 1382 Merismopedia Sp 98

Chaetoceros Sp single 213 Mesodinium Rubrum 1132

Chlorococcales 95 Monoraphidium Contortum 327

Chroococcales 142 Nitzschia Paleacea 65

Chroococcus Small 827 Nodularia Spumigena 169

Ciliata 243 Oocystis Sp 842

Cryptomonadales 713 Oscillatoriales 4440

Cryptophyceae Teleaulax 6830 Pauliella Taeniata 119

Cyclotella choctawhatcheeana 102 Pennales Sp Thick 210

Cymbomonas Tetramitiformis 199 Pennales Sp Thin 781

Dinophyceae 1433 Peridiniella Catenata Chain 193

Dinophysis Acuminata 217 Peridiniella Catenata Single 899 Dolichospermum Anabaenopsis 12280 Prorocentrum Cordatum 276 Dolichospermum Anabaenopsis Coiled 2504 Pseudopedinella Sp 379

Euglenophyceae 102 Pyramimonas Sp 1224

Eutreptiella Sp 2247 Skeletonema Marinoi 4128

Gonyaulax Verior 22 Snowella Woronichinia 2950

Gymnodiniales 69 Thalassiosira Levanderi 2537

Gymnodinium Like 158 Uroglenopsis Sp 516

(40)

Figure 23.Example of phytoplankton images from the dataset

5.2 Data preprocessing

Due to unequal sample distribution between phytoplankton classes in the dataset, the model predictive performance can perform poorly, especially for categories with fewer samples. So data augmentation was done to maintain the required number of images in each class. This helps the model to generalize well if having a balanced dataset. The image augmentation is performed using Keras deep learning library via ImageDataGen- erator class to expand the size of the dataset. The generator can generate as many images as possible for each class, but the process is maintained to have around2000images. The generated images are the modified versions of images that are present in each class. This involves transforming images using different ranges of operations such as shift, rotation, flip, shear, zoom. After the image augmentation process, all the images were resized to the required dimension needed for the CNNs network, which is 299×299. Resizing was done while maintaining the aspect ratio using the python imaging library. The images were padded to make their width and height uniform in size. The most frequent color present in the original images was used as a background color. The output after resizing the phytoplankton images is shown in Fig. 24.

(41)

Figure 24.Resized phytoplankton images

5.3 Evaluation criteria

To evaluate the performance of the open-set classifier, three evaluation indicators were calculated: precision, recall, and F1-score. Precision tells the proportion of correct positives classification from cases that are predicted as positive. Recall means the proportion of correct positive classifications (true positives) from actually positive cases. However, to comprehensively evaluate the performance of the model, Macro F1-scores are taken into consideration. Macro F1-scores is accomplished by computing the F1-score for each class and then averaging them. F1-score generalizes the performance better than the accuracy metrics and is defined as [55]:

F1-score = 2

1/precision + 1/recall (9)

where

Precision = T P

T P +F P (10)

Recall = T P

T P +F N (11)

(42)

where T P is the number of true positives that is the model predicts the positive class correctly. F P is the number false positives that is the model predicts the positive class incorrectly. F N is the number of false negatives that is the model predicts the negative class incorrectly.

Confusion matrices are computed for each classification. They are generated to compare the actual and prediction results. It describes the classifier’s performance on a set of test data for which the ground truth classes are known. The matrix has two dimensions, where one dimension represents the index of the actual class of the test samples and another showing the class which the model predicts. Experiments are done repeatedly utilizing k-fold cross-validation process. All open-set experiments F1-scores are taken to calculate the overall mean with the standard deviation to summarize the results. Histograms are also shown for each classification result that shows the comparison between each class correctly predicted samples and the taken number of actual samples.

5.4 Description of experiments

For open-set classification, OpenMax [50] and Deep Open Classification [51] methods were considered. The methods use CNNs as a backbone. Every experiment was done by taking a different test and training set. For the training process, transfer learning was used from a pre-trained network Inception-V3 [24] available from Keras library [56]. Out of 50phytoplankton classes, 40were taken as seen classes, and the rest10are taken as unseen classes. The normalization of image samples was done using a preprocess-input function of the Keras Inception-V3 library. For training, first the model layers are freezed so to preserve its information during the future training process. Then dense layer and activation layer are added on top of it. The model was first trained for 30 epoch with the phytoplankton dataset. Then again, the network model was re-trained for another 30epoch by partially unfreezing some of the last layers with a low learning rate. After building the classifier, the model was used with the OpenMax and DOC method to enable rejection of UUCs. For the OpenMax method, the generated classifier was used for getting MAV and distances using only correctly classified samples. This distances values are saved in separated files so they can be used to generate the Weibull model. For the DOC method, the corresponding classifier was used to predict training samples to generate mean and standard deviation values for all seen classes after the Gaussian fitting. These data were saved in a file and used while predicting test data. To evaluate the performance of the open-set classifier, all the experiments have different sets of unclassified samples chosen from the phytoplankton dataset. This was done in a K-Folds Cross-Validation

(43)

classification using the methods OpenMax and DOC. Each subexperiment used a different collection of unknown class samples taken from the dataset. For example, for Subexperi- ment 1, the first ten sets of phytoplankton classes were taken as an unknown class and the rest as known classes for the training. Table 4 shows subexperiments with taken phytoplankton classes as unknown. For the testing set, samples from the unknown class were added in one folder as an unclassified class grouped with the other known classes. These testing sets were evaluated for classification using open-set methods.

Table 4.Unknown classes that used in different subexperiments

Subexperiments Unknown Classes

Subexperiment 1 Amylax triacantha,Aphanizomenon flosaquae, Aphan- othece paralleliformis, Beads, Centrales sp, Ceratoneis closterium, Chaetoceros sp, Chaetoceros sp single, Chlorococcales, Chroococcales

Subexperiment 2 Chroococcus small, Ciliata, Cryptomonadales, Cryp- tophyceae Teleaulax, Cyclotella choctawhatcheeana, Cymbomonas tetramitiformis, Dinophyceae, Dinophysis acuminata, Dolichospermum Anabaenopsis, Dolichosper- mum Anabaenopsis coiled

Subexperiment 3 Euglenophyceae, Eutreptiella sp, Gonyaulax verior, Gymnodiniales, Gymnodinium like, Heterocapsa rotundata, Heterocapsa triquetra, Heterocyte, Katablepharis remigera, Licmophora sp

Subexperiment 4 Melosira arctica, Merismopedia sp, Mesodinium rubrum, Monoraphidium contortum, Nitzschia paleacea, Nodu- laria spumigena, Oocystis sp, Oscillatoriales, Pauliella taeniata, Pennales sp thick

Subexperiment 5 Pennales sp thin, Peridiniella catenata chain, Peridiniella catenata single, Prorocentrum cordatum, Pseudopedinella sp, Pyramimonas sp, Skeletonema marinoi, Snowella Woronichinia, Thalassiosira levanderi, Uroglenopsis sp

(44)

5.4.2 Experiments with imbalanced test set

For experiments with imbalanced test set, the CNN classifier trained using all the available classes of the phytoplankton dataset. The testing dataset included the unclassified samples from random classes which were not annotated by an expert. To ensure proper open-set evaluation the number of samples taken is imbalanced for all the categories.

5.5 Results

Table 5 shows for the F1-scores results of open-set classification. Appendix 1 consist of confusion matrices for all subexperiments using the balanced test set for the OpenMax and DOC method.

Table 5.Macro F1-scores of Open-Set Classification using different methods.

Subexperiments Softmax OpenMax DOC Subexperiment 1 0.949 0.952 0.961 Subexperiment 2 0.954 0.958 0.972 Subexperiment 3 0.949 0.951 0.964 Subexperiment 4 0.947 0.950 0.962 Subexperiment 5 0.953 0.959 0.966

Average 0.950 0.954 0.965

Standard deviation 0.003 0.004 0.004

All the subexperiments were carried out using separately tuned hyperparameter settings with respect to open-set methods. For the OpenMax method, the hyperparameters were alpha and the Weibull tail size. While estimating the probability of the UUCs, it considers this parameter that tells the number of top classes to be revised. The tail size is used while Weibull fitting to estimates its distribution parameters. Both the alpha and tail size were controlled in order to get high F1-scores. Fig. 25 shows different alpha rank values with the F1-scores by taking tail size as 20 for Subexperiment 1. The figure shows that the F1-score is highest when the alpha rank value is8. This is done for every experiment to get the optimal values for the parameters. Table 6 shows for the parameter values taken for each subexpeiments.

(45)

Figure 25.Different alpha rank values with F1-score of Subexperiment 1

Table 6.Subexperiments with the hyperparameter values.

Subexperiments Alpha Tail Size Subexperiment 1 8 20 Subexperiment 2 4 60 Subexperiment 3 8 10 Subexperiment 4 4 20 Subexperiment 5 8 40

Table 7 showing different accuracies for unclassified test set of all the experiments done using the OpenMax method. Fig. 26 and 27 shows class histograms for the OpenMax method and softmax layer. The histograms are the output result from Subexperiment 1.

The results show the comparison between the number of taken ground truth samples of the phytoplankton test set and the correct classification prediction result. From Fig. 26 out of total120test samples for unclassified class, less than10images are got correctly classified, and the rest got misclassified. But for the other seen class samples, the prediction results are almost the same as the softmax output, which can be seen from Fig. 27.

Table 8 shows the classification report for experiments using the DOC method. The data is shown only for 120 unclassified samples from the test set. For testing, the method compares the probability output of the sigmoid function with a threshold. The default value for the threshold is 0.5. However, it can be increased to improve classification results. For all the sub experiments of DOC, 0.8 is taken as the threshold value. More

(46)

Chroococcus small Ciliata Cryptomonadales Cryptophyceae Teleaulax Cyclotella Choctawhatcheeana Cymbomonas Tetramitiformis Dinophyceae Dinophysis Acuminata Dolichospermum Anabaenopsis Dolichospermum Anabaenopsis Coiled Euglenophyceae Eutreptiella Sp Gonyaulax verior Gymnodiniales Gymnodinium Like Heterocapsa Rotundata Heterocapsa Triquetra Heterocyte Katablepharis Remigera Licmophora Sp Melosira Arctica Merismopedia Sp Mesodinium Rubrum Monoraphidium Contortum Nitzschia Paleacea Nodularia Spumigena Oocystis Sp Oscillatoriales Pauliella Taeniata Pennales Sp Thick Pennales Sp Thin Peridiniella Catenata Chain Peridiniella Catenata Single Prorocentrum Cordatum Pseudopedinella Sp Pyramimonas Sp Skeletonema Marinoi Snowella Woronichinia Thalassiosira Levanderi Uroglenopsis Sp Unclassified

0 20 40 60 80 100 120Number of samples

opensetgroundtruth

Figure26.OpenMaxandgroundtruthpredictionhistograms.

(47)

Cyclotella Choctawhatcheeana Cymbomonas Tetramitiformis Dinophyceae Dinophysis Acuminata Dolichospermum Anabaenopsis Dolichospermum Anabaenopsis Coiled Euglenophyceae Eutreptiella Sp Gonyaulax verior Gymnodiniales Gymnodinium Like Heterocapsa Rotundata Heterocapsa Triquetra Heterocyte Katablepharis Remigera Licmophora Sp Melosira Arctica Merismopedia Sp Mesodinium Rubrum Monoraphidium Contortum Nitzschia Paleacea Nodularia Spumigena Oocystis Sp Oscillatoriales Pauliella Taeniata Pennales Sp Thick Pennales Sp Thin Peridiniella Catenata Chain Peridiniella Catenata Single Prorocentrum Cordatum Pseudopedinella Sp Pyramimonas Sp Skeletonema Marinoi Snowella Woronichinia Thalassiosira Levanderi Uroglenopsis Sp Unclassified

softmaxgroundtruth

Figure27.Softmaxandgroundtruthpredictionhistograms.

(48)

Table 7.Classification report for unclassified test set using the OpenMax method.

Subexperiment Precision Recall F1-Score

Subexperiment 1 0.50 0.05 0.09

Average 0.458 0.106 0.158

details about the number of predictions can be seen from the confusion matrix Appendix 1 for the DOC method.

Table 8.Classification report for unclassified samples using the DOC method.

Subexperiment Precision Recall F1-Score

Average 0.51 0.316 0.386

Fig. 28 and 29 showing the number of predicted samples for seen and unseen classes using DOC method for the Subexperiment 1 and Subexperiment 2.

For an imbalanced test dataset, the numbers of samples for each class were different. Un- classified sets consist of data of samples not labeled by an expert and might belong to some known classes present in the dataset. Table 9 shows the Macro F1-scores of the OpenMax and DOC method using imbalanced dataset. Value for alpha is4, and tail size is80were taken for the OpenMax method. For the DOC method threshold value is 0.8.

The number of correctly predicted samples for each class for the method OpenMax and DOC is shown in the histograms Fig. 30 and Fig. 31. The main idea of this experiment is to know the distribution of unclassified class samples throughout the known classes after the classification. The information that how many test samples were misclassified to different classes can be checked from the confusion matrix Appendix 2. The result from the confusion matrix shows that 55unclassified samples were misclassified as Chaetoceros sp phytoplankton class, which tells that these samples from the unclassified class might

(49)

Cyclotella Choctawhatcheeana Cymbomonas Tetramitiformis Dinophyceae Dinophysis Acuminata Dolichospermum Anabaenopsis Dolichospermum Anabaenopsis Coiled Euglenophyceae Eutreptiella Sp Gonyaulax Verior Gymnodiniales Gymnodinium Like Heterocapsa Rotundata Heterocapsa Triquetra Heterocyte Katablepharis Remigera Licmophora Sp Melosira Arctica Merismopedia Sp Mesodinium Rubrum Monoraphidium Contortum Nitzschia Paleacea Nodularia Spumigena Oocystis Sp Oscillatoriales Pauliella Taeniata Pennales Sp Thick Pennales Sp Thin Peridiniella Catenata Chain Peridiniella Catenata Single Prorocentrum Cordatum Pseudopedinella Sp Pyramimonas Sp Skeletonema Marinoi Snowella Woronichinia Thalassiosira Levanderi Uroglenopsis Sp Unclassified

DOCgroundtruth

Figure28.DOCandgroundtruthpredictionhistogramsforSubexperiment1.

(50)

Amylax Triacantha Aphanizomenon Flosaquae Aphanothece Paralleliformis Beads Centrales Sp Ceratoneis Closterium Chaetoceros Sp Chaetoceros Sp Single Chlorococcales Chroococcales Euglenophyceae Eutreptiella Sp Gonyaulax Verior Gymnodiniales Gymnodinium Like Heterocapsa Rotundata Heterocapsa Triquetra Heterocyte Katablepharis Remigera Licmophora Sp Melosira Arctica Merismopedia Sp Mesodinium Rubrum Monoraphidium Contortum Nitzschia Paleacea Nodularia Spumigena Oocystis Sp Oscillatoriales Pauliella Taeniata Pennales Sp Thick Pennales Sp Thin Peridiniella Catenata Chain Peridiniella Catenata Single Prorocentrum Cordatum Pseudopedinella Sp Pyramimonas Sp Skeletonema Marinoi Snowella Woronichinia Thalassiosira Levanderi Uroglenopsis Sp Unclassified

0 20 40 60 80 100 120Number of samples

DOCgroundtruth

Figure29.DOCandgroundtruthpredictionhistogramsforSubexperiment2.

(51)

(52)

Amylax Triacantha Aphanizomenon Flosaquae Aphanothece Paralleliformis Beads Centrales Sp Ceratoneis Closterium Chaetoceros Sp Chaetoceros Sp Single Chlorococcales Chroococcales Chroococcus Small Ciliata Cryptomonadales Cryptophyceae Teleaulax Cyclotella Choctawhatcheeana Cymbomonas Tetramitiformis Dinophyceae Dinophysis Acuminata Dolichospermum Anabaenopsis Dolichospermum Anabaenopsis coiled Euglenophyceae Eutreptiella Sp Gonyaulax Verior Gymnodiniales Gymnodinium Like Heterocapsa Rotundata Heterocapsa Triquetra Heterocyte Katablepharis Remigera Licmophora Sp Melosira Arctica Merismopedia Sp Mesodinium Rubrum Monoraphidium Contortum Nitzschia Paleacea Nodularia Spumigena Oocystis Sp Oscillatoriales Pauliella Taeniata Pennales Sp Thick Pennales Sp Thin Peridiniella Catenata Chain Peridiniella Catenata Single Prorocentrum Cordatum Pseudopedinella Sp Pyramimonas Sp Skeletonema Marinoi Snowella Woronichinia Thalassiosira Levanderi Uroglenopsis Sp Unclassified

0 50 100 150 200 250Number of samples

opensetgroundtruth

Figure30.OpenMaxandgroundtruthpredictionhistogramsusingimbalancephytoplanktondataset.

(53)

Ceratoneis Closterium Chaetoceros Sp Chaetoceros Sp Single Chlorococcales Chroococcales Chroococcus Small Ciliata Cryptomonadales Cryptophyceae Teleaulax Cyclotella Choctawhatcheeana Cymbomonas Tetramitiformis

Dinophyceae Dinophysis Acuminata Dolichospermum Anabaenopsis Dolichospermum Anabaenopsis Coiled Euglenophyceae Eutreptiella Sp Gonyaulax Verior Gymnodiniales Gymnodinium Like Heterocapsa Rotundata Heterocapsa Triquetra Heterocyte Katablepharis Remigera Licmophora Sp Melosira Arctica Merismopedia Sp Mesodinium Rubrum Monoraphidium Contortum Nitzschia Paleacea Nodularia Spumigena Oocystis Sp Oscillatoriales Pauliella Taeniata Pennales Sp Thick Pennales Sp Thin Peridiniella Catenata Chain Peridiniella Catenata Single Prorocentrum Cordatum Pseudopedinella Sp Pyramimonas Sp Skeletonema Marinoi Snowella Woronichinia Thalassiosira Levanderi Uroglenopsis Sp Unclassified

e31.DOCandgroundtruthpredictionshistogramsusingimbalancephytoplanktondataset.

(54)

6 DISCUSSION

6.1 Current study

This thesis work focussed on the open-set classification of phytoplankton images. The classification aims to identify the known and unknown class samples. The unknown class represents the samples that were not part of the training process. The recognition of unknown samples achieved by using open-set methods. Thresholding the softmax layer probabilities of a deep network can able to detect the unknown samples. However, the performance cannot be improved if unknown images are having high activation score.

All the experiments were done using the OpenMax and DOC open-set methods. The methods uses Inception-V3 network as a backbone CNNs architecture. The augmentation technique was used to solve the class imbalance problem. The input of the Inception-V3 network requires uniform dimension, so images were resized to 299×299. A different set of phytoplankton classes are considered unclassified samples in all the experiments for testing purposes.

With the OpenMax layer, the model identifies the unclassified class samples. It rejects the unknown images, which gives high activation for a class even if visually different than the known classes. The OpenMax layer considers the activation vectors. These activations provide the distribution of what class it relates to. For input, it measures the distance between its activation vector and mean activation vector, recalibrates the scores, and provides the probability estimation of being unknown. However, with the OpenMax method, it is difficult to classify the unknown classes if they share related features with the known classes, as the logits were more likely to be similar. The result shows that using the OpenMax method, the open-set classifier classifies the seen classes well and rejects the unseen class samples. The OpenMax layer performs better than the softmax layer.

DOC method improved the classification results for unclassified samples. DOC adopted 1-vs-Rest layer of sigmoids as a final output layer. Sigmoid functions allow rejection capabilities by tightening the decision boundaries with Gaussian fitting. It contains sigmoid function equal to the number of seen classes in the output layer. Each function takes its corresponding class samples from the training set as positive and the rest of the samples as negative examples. During testing, if the sample’s predicted value from the sigmoid function is less than the taken threshold, then it gets classified as unknown. DOC method outperformed the OpenMax method in all experiments. Especially in Subexperiment 2,

Open-set plankton recognition