• Ei tuloksia

Species identification of wooden material using convolutional neural networks

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Species identification of wooden material using convolutional neural networks"

Copied!
49
0
0

Kokoteksti

(1)

Master’s Programme in Computational Engineering and Technical Physics Intelligent Computing Major

Master’s Thesis

Dmitrii Shustrov

SPECIES IDENTIFICATION OF WOODEN MATERIAL USING CONVOLUTIONAL NEURAL NETWORKS

Examiners: Professor Heikki Kälviäinen Professor Lasse Lensu Supervisors: D.Sc. Tuomas Eerola

Prof. Lasse Lensu Prof. Heikki Haario Prof. Heikki Kälviäinen

(2)

ABSTRACT

Lappeenranta University of Technology School of Engineering Science

Master’s Programme in Computational Engineering and Technical Physics Intelligent Computing Major

Dmitrii Shustrov

Species Identification of Wooden Material Using Convolutional Neural Networks

Master’s Thesis 2018

49 pages, 30 figures, 8 tables.

Examiners: Professor Heikki Kälviäinen Professor Lasse Lensu

Keywords: computer vision, machine vision, image processing, wood species identifica- tions, deep learning, convolutional neural networks, sawmill industry

Wood species identification is necessary and in demand in the sawmill industry. These systems can be widely used in the control of manufacturing from raw material to the final products which increase the resource efficiency of the entire production. The main goal of this thesis is to develop a computer vision system being capable of identifying a species of the wooden material using convolutional neural networks and to determine the necessary parameters for its functioning. In the work, the method for accurate identification among three wood species was proposed. The method consists of the extraction of image patches from boards images, the wood species identification the patches using the selected con- volutional neural network, and the wood species identification of each board by decision rules based on the class labelled patches. The proposed method was tested on the follow- ing convolutional neural networks: AlexNet, GoogLeNet, VGG-16, and ResNet-50. The best result were obtained using GoogLeNet architecture in that case the method was able to classify correctly 99.4% of boards in case of fir, pine, and spruce.

(3)

PREFACE

I would like to thank my supervisors Heikki Kälviäinen, Lasse Lensu, Tuomas Eerola, and Heikki Haario for accurate coordination during the work on this thesis. I’m grateful to my family in St. Petersburg for inspiration and support.

Lappeenranta, May 25, 2018

Dmitrii Shustrov

(4)

CONTENTS

1 INTRODUCTION 7

1.1 Background . . . 7

1.2 Objectives . . . 7

1.3 Structure of the thesis . . . 8

2 WOOD SPECIES IDENTIFICATION 9 2.1 Wood characterization . . . 9

2.1.1 Hardwoods . . . 10

2.1.2 Softwoods . . . 11

2.2 Wood species identification . . . 12

2.3 Automatic methods of identification . . . 12

2.3.1 Computer vision based methods . . . 12

2.3.2 Other approaches . . . 14

2.3.3 Drawbacks of existing approaches . . . 15

3 CONVOLUTIONAL NEURAL NETWORKS FOR IMAGE CLASSIFICA- TION 17 3.1 Structure . . . 17

3.1.1 Input layer . . . 18

3.1.2 Convolutional layer . . . 18

3.1.3 Sub-sampling layer . . . 19

3.1.4 Fully-connected layer . . . 19

3.2 Learning algorithms . . . 20

3.3 Regularization of CNN . . . 20

3.3.1 Dropout . . . 21

3.4 Transfer learning . . . 21

3.5 Existing architectures of convolutional neural networks . . . 22

3.5.1 AlexNet . . . 22

3.5.2 VGG . . . 23

3.5.3 GoogLeNet . . . 23

3.5.4 Deep Residual Network . . . 25

4 PROPOSED METHOD 27 4.1 Image patch extraction . . . 27

4.2 Decision rule . . . 28

4.2.1 Majority vote rule . . . 29

4.2.2 Mean probability rule . . . 29

(5)

4.2.3 Max rule . . . 29

5 EXPERIMENTS AND RESULTS 30 5.1 Data . . . 30

5.1.1 Data preparation . . . 30

5.2 Implementation . . . 32

5.3 Evaluation Criteria . . . 32

5.4 Single image patches identification performance . . . 33

5.4.1 High quality image patches . . . 33

5.4.2 All image patches . . . 35

5.5 Board identification performance . . . 38

5.5.1 High quality image patches . . . 38

5.5.2 All image patches . . . 39

5.6 Influence of number of image patches on board identification accuracy . . 40

6 DISCUSSION 42 6.1 Future Work . . . 43

7 CONCLUSION 44

REFERENCES 45

(6)

LIST OF ABBREVIATIONS

ACC Accuracy

CCD Charge-Coupled Device

CIFAR Canadian Institute for Advanced Research CNN Convolutional Neural Network

FN False Negative FP False Positive

GLCM Gray-Level Co-Occurrence Matrix GPU Graphical process units

ILSVRC ImageNet Large Scale Visual Recognition Competition MLP Multilayer Perceptron

NIR Near Infra Red

PPV Positive Predictive Value PRE Precision

ResNet Residual Network TP True Positive

VGG Visual Geometry Group

(7)

1 INTRODUCTION

1.1 Background

Wood processing includes a number of intermediate processes such as: debarking, saw- ing, sorting, stowage, drying, and adjustment. Proper and accurate wood processing, for example, drying or storage, often directly depends on the identified wood species. This is due to the fact that different species require an individual and certain technological ap- proach to the drying process or the optimization of the sawing process. Also, one of the main reasons for wood species identification is to divide the inappropriate wood species from the desired ones. Traditionally, the human expert is used to identify the wood species which makes the process difficult and slow. Considering all the above factors, there is a demand for an automated identification system. [1] [2] [3]

This master’s thesis is a part of the DigiSaw research project [4]. The DigiSaw research project aims at improving the efficiency of the sawmill production process using digital technologies. The use of technologies such as computer vision should increase the effi- ciency of resource use which improves forest sustainability and allows the modernization of the sawmill industry, increasing its competitiveness.

This thesis considers the automatic image-based identification of wood species from board images. An example of input data and a method for identification that use for this study is shown in Figure 1. The extracted window from the input image is passed through the layers of the Convolutional Neural Network (CNN) and the output indicates the belonging to a certain wood species. The CNN is a special architecture of artificial neural networks aimed on effective image recognition.

1.2 Objectives

The main goal of this master’s thesis project is to study, test, and develop convolutional neural network for implementing a computer vision system capable to identify the wood species of a sawn timber. Identification should be performed among three available wood species: fir, pine, and spruce.

The objectives are formulated as follows:

• Apply the convolutional neural networks approach to solve wood species identifi-

(8)

cation task and determine CNN parameters and architecture which should be used.

• Evaluate the results of proposed method experiments using the provided data.

• Determine the minimum requirements for the amount of the data, e.g number of images needed for accurate wood species identification.

Figure 1. Example of the input data and method for identification.

1.3 Structure of the thesis

The rest of the thesis is organized as follows: Chapter 2 presents applications to solve the wood species identification task and main characteristics of different wood species.

Chapter 3 represents the CNN for image classification purposes. Chapter 4 introduces the proposed method that is used for implementation. Chapter 5 includes the experimental part of the work. The results of the experiments are discussed in Chapter 6. The conclu- sion is given in Chapter 7.

(9)

2 WOOD SPECIES IDENTIFICATION

Identification of species of the wood material is an important task. For example, in the construction industry, it is important to use certain types of wood because the quality characteristics and properties of the wood vary depending on different growth conditions and, most importantly, belonging to specified species. In order to choose the right material with suitable properties, it is necessary to identify the wood species. Given that these tasks usually require a human expert, an automated information system can improve efficiency and speed the process [5]. Usually the identification of wood species occurs according to the external characteristics of the tree, for example, the shape of leaves or a trunk, as well as the smell of the bark. However, when a tree undergoes processing and takes the form of boards or timber such identification becomes impossible. Thereby, in this case one of the remaining useful and unique characteristics for each species is surface texture. [3]

2.1 Wood characterization

In general wood can be divided in two groups: softwoods and hardwoods. Identification of wood species and non-wood fibers in pulps uses characteristic morphological features that are typical of the species or genus. For softwoods, the shape and arrangement of pits at the contact area of rays and tracheids are characteristic. For hardwoods, the size, shape, and pitting of vessel elements are characteristic. Emphasizing the morphological features of fibers uses direct staining methods like Kongo-red [2]. The method is most suitable for pure cellulose, but it stains all types of papermaking fibers. When mixed with water, it forms colloidal particles that penetrate into the crystal-line cellulose fibril structure. [2]

Macroscopic characteristics of wood include physical features such as color or gloss.

Anatomical features are visible with a magnifying glass or just a human eye. It includes porosity, vessel arrangement, ray size, axial parenchyma abundance or arrangement, rel- ative to vessel diameter, presence of the storied structure. Other characterization param- eters are deposits in vessels and their color. The difference of a vessel diameter and a density are also used. Vessels that are more than 100 micrometers in diameter are easily seen with a magnifying glass. In additional, vessels more than 200 micrometers are easily seen with the unaided eye. [6]

Macroscopic features such as texture (coarse / fine; even / uneven), grain (straight, spiral, interlocked, wavy), figure (many types) and lustre have poorly defined categories and

(10)

require experience before they can be effectively applied in wood identification. However they can be very helpful as complementary diagnostic features. Although not strictly a macroscopic feature, the physical property of specific gravity or density is used in many cases. [6]

Table 1 presents a comparison of hardwoods and softwoods macroscopic level charac- teristics of major cell types that includes cell’s orientation and dimensions. [2]. A clear difference, revealed at the gross structural level between softwoods and hardwoods, in hardwoods is the presence of vessels that are oriented in longitudinal direction of the stem.

Table 1. Macroscopic characteristics of major cell types in softwood and hardwood where V is a vertical orientation and H is a horizontal orientation. [2]

Cell type Orientation Length,mm Width,µm

Hardwood

Fibers V 0.4-1.6 10-40

Vessel element V 0.2-0.6 10-300

Longitudinal parenchyma

V <0.1 <30

Ray parenchyma H <0.1 <30

Softwood

Tracheid V 1.4-6.0 20-50

Ray tracheid H 0.01-0.016 2-50

Ray parenchyma H 0.01-0.016 2-50

Epithelial parenchyma V, H 0.01-0.016 2-50

.

2.1.1 Hardwoods

Depending on the structure, hardwoods species are categorized as follows [1] [2]:

• Ring-vascular: oak, elm, ash, poplar, karagach.

• Scattered vascular: birch, linden, aspen, beech, maple, alder.

These species of wood contain a complex microscopic structure. The reason of this is a strong development of vessels that move adjacent cells, after which the correctness and uniformity of the structure is disturbed. The composition of hardwoods includes core beams, vessels, tracheids, libriform fibers, and parenchyma cells. [1] [2] An example of the hardwoods structure is shown in Figure 2.

(11)

Figure 2.Microscopic structure of oak wood: 1) annual layer; 2) early zone; 3) late zone; 4) large vessels of the early zones; 5) small vessels of the late zone; 6) wide core ray; 7) narrow core ray;

8) vessels; 9) libriform. [1]

2.1.2 Softwoods

Softwoods species include spruce, larch, pine, cedar, and fir. Softwoods are different from hardwoods because of more simple structure. It consists tracheids, core rays, parenchymal cells, and pitch strokes as shown in Figure 3. [1] [2]

Figure 3.Microscopic structure of pine wood: 1) annual layer; 2) early zone; 3) late zone; 4) early tracheids; 5) late tracheids; 6) bordered pores; 7) core rays; 8) vertical pitching; 9) horizontal resin pitch. [1]

(12)

2.2 Wood species identification

The most common and historically used method for identification is the use of an expert human being, with the naked eye or using a magnifying glass [7]. Thus, as charac- teristics of belonging to a certain type, there are macroscopic characteristics, color, tex- ture, weight, moisture content. In Figure 4 examples of photography show microscopic features of six wood species from Malaysia. For imaging Khalid et al. used charge- coupled device (CCD) Picolo camera with 10x magnification and 8.60µm×8.30µm pixel size [8].

Figure 4. Examples of Malaysian wood microscopic images: (a) bintangor; (b) nyatoh; (c) sesendok; (d) ramin; (e) mersawa; (f) jelutong. [8]

As shown in Figure 4, the images look different and classification of these species is rela- tively easy task. However, if the high magnification is not available and the wood species are from the same family, as can be noticed in Figure 5, the difference in the macroscopic images of three species is not so obvious. This indicates that the use of an automated system for solving the classification task could be a useful tool.

2.3 Automatic methods of identification

2.3.1 Computer vision based methods

With the development of visual inspection systems in the sawmill industry and sawmill production, automated systems have been actively used to identify species or defects in

(13)

Figure 5. Examples of macroscopic images of the nordic wood species such as: fir, pine, and spruce.

(14)

wood material. Many works has been published that propose on automatic recognition systems using computer vision technologies in the sawmill industry. Such methods typ- ically aim to classify the quality of wood or the presence of defects. To collect input features, many different technologies have been used: microwaves, magnetic resonance systems, laser ranging, X-rays, spectroscopes, or ultrasound. [9]

Tou et al. [10] [11] [12] proposed the recognition and classification of five wood species using macroscopic images. A gray-level co-occurence matrix (GLCM) was used as a fea- ture extractor. The GLCM is texture feature extraction technique that provides informa- tion of the relationship between grey-scaled pixels values of the image. The texture and colour features were used to train the Multi-layer Perceptron(MPL). The wood species used in this work were: campnosperma auriculatum, dyera costulata, durio lowianus, kokoona littoralis, and anisoptera costata. As a result, the obtained accuracy for the clas- sification of five wood species ranged from 60 to 72%.

Khalid et al. presented the work [8] where a classification system capable to recognize 20 different species of wood. The system used images of wood samples that had been prepared: boiling and cutting into thin sections using a microtone. For the classification a MLP was used and the feature extractor was GLCM. The database consisted of 1753 training images and 196 test images. As a result, the received recognition accuracy was 95%. As noted in [13], the main drawback of this approach was the complexity and high cost of preparing wood samples which makes this work inapplicable for real applications.

Hoffemann et al. [14] represented the capabilities of CNN to solve wood species iden- tification task. Using the database from [15], the calculated recognition accuracy varied from 95 to 97%. Another study [16] presents the work which used only hardwood species from the same database [15]. As a result, the accuracy of the system using 25 species was 97%.

2.3.2 Other approaches

To overcome the task and to implement the automatic identification of wood species, there are also approaches which are not image-based. One such approach is the analysis of the spectral radiation reflected from the surface of the wood using a special radiation source.

Such methods are based on vibration spectroscopy. For example, in some works the use of Near Infra Red (NIR) [17] or Mid-IR [18] [19] have been proposed. Another example is the use of fluorescence spectroscopy technology [20]. The tools used in such systems

(15)

are a spectrometer, a laser source, and an optical filter.

In [21] a system for wood species recognition was developed, where ultrasonic signals were used as input features. Different wood types consist of various elastic reactions which presence is due of their own structure of cellular characteristics. The signal that has passed through the radial, tangential, and longitudinal surfaces of the wood is used as input features for the classification system. An artificial neural network was used for classification, specifically MLP. However, in the study, the system could only identify four species: alder, oak, maple, and pine. The resulting accuracy of classification was 97%, using 20 samples for training and 10 for testing. [21]

2.3.3 Drawbacks of existing approaches

The drawbacks of the previous approaches proposed in [17] [18] [19] include the spe- cial preparation of equipment or a laboratory installation for the wood characteristics extraction. Also, in the approach proposed in [8], a preparation of wood samples was used which also increases the cost and complexity of the identification process. Such approaches are quite expensive and require a certain adjustment and alignment of this equipment which is often difficult to implement in the industrial environment. The meth- ods proposed in [10] [11] [12] do not provide high accuracy enough for industrial environ- ment. Also, the approach was based on microscopic images, so the image acquisition step may need preparation and calibration actions. The work [14] using the database proposed in [15] and [13] has high accurate performance in case of macroscopic and microscopic images of the woods. However from the database it can be indicated that the microscopic and macroscopic images of different wood species do not have significant difference in appearance. The macroscopic examples are shown in Figure 6 and the microscopic ex- amples are shown in Figure 7.

All these factors are motivating for the development of an approach using computer vi- sion. In recent years CNN have become more widespread and used. For example, in the study [22] Canadian Institute for Advanced Research (CIFAR) dataset have used for training the CNN for image classification of 10 different object classes. An example of the texture classification is given in [23] which showed excellent results.

(16)

Figure 6.Example of the macroscopic images from the wood species database [15].

Figure 7. Example of the microscopic images from the wood species database [15].

(17)

3 CONVOLUTIONAL NEURAL NETWORKS FOR IM- AGE CLASSIFICATION

The convolutional neural networks are special architecture of an artificial neural networks imitating features of the visual cortex of the brain [24]. They consist of several multidi- mensional layers and are designed for effective recognition of complex images. For the first time such a model was proposed by Yann Lecun in 1989 and was intended for recog- nition handwritten symbols [25]. The novelty of the model proposed by Lecun consisted in the introducing of a multilayer perceptron of convolution layers into the architecture where each neuron was bound only with a small area of neurons of the previous layer.

This functionality allows to detect primitive features of the original image, and on the next layers of the network receive more complex features thus detecting complex objects on images. [24]

3.1 Structure

The convolutional neural networks contain multidimensional layers as shown in Figure 8.

The typical layers of CNN are as follows [26]:

• Input layer: input image.

• Convolution layer: all layer neurons, unlike perceptron, are connected only with a part of the neurons of the previous layer.

• Sub-sampling layer (Pooling, Subsampling): selection most significant features of the previous layer and a significant reducing the dimension of subsequent layers of the network.

• Fully-connected layer: a hidden layer artificial neural network type perceptron.

After a convolutional layer and a fully-connected layer the neuron activation function can be used which converts the neuron signal and generates an output signal. [26]

(18)

Figure 8. Example of the typical CNN architecture. Specifically, this is LeNet-5 CNN designed for digits recognition. [27]

3.1.1 Input layer

Regarding the tasks of object recognition on images, the input layer is most often repre- sented as a three-dimensional grid whose dimensions are depended as [26]:

I =W ·H·D (1)

whereI is a size of an input layer,W is a width of an input image, H is a height of an input image, andDis a depth or a number of image channels.

3.1.2 Convolutional layer

A convolutional layer of a neural network is one of the main layers and is designed to highlight the features of the image and their transformation. Further on deeper layers are used to obtain more complex features and, in the final, determine the class of the recognized object. This type of layers are the so-called multidimensional filters which represents matrices of weights of neurons connection of the previous layer with neurons of the convolutional layer. They are called filters because the convolution operation is very similar to the operation image filtration [26]. Each value of the neuron signal of the previous layer, located in a certain area corresponding to the core filter is multiplied by the corresponding value of the filter kernel in CNN. The values of the filter core are called the weights of the neuron coupling of the convolutional layer.

(19)

3.1.3 Sub-sampling layer

This layer of the CNN is used to reduce the dimensionality of data in order to reduce probability of rapid retraining, as well as to reduce computing costs and memory con- sumption [26]. Usually this layer is used after carrying out the convolution operation and converts the convolutional layer signals, highlighting the most significant, according to certain criteria. This layer uses a window of a certain size for selection of areas of neurons of the previous layer which is associated with neuron of the sub-sampling layer.

Then, according to certain rules one signal of the neuron of the sub-sampling layer is se- lected and calculated. The most typical ways of forming a signal of this layer are either by finding the maximum value among the signals of the previous layer (max) within the window of the sub-sampling; or by calculation of the mean value [26] [28]. An example of sub-sampling layer is shown in Figure 9.

Figure 9. Example of the sub-sampling layer.

3.1.4 Fully-connected layer

This layer is one-dimensional and in its each neuron is associated with each neuron of the previous layer at all levels if the previous layer has a depth parameter. The main purpose of this layer is to convert the signals received on convolutional network levels to a one- dimensional form and the one-dimensional level [26]. This layer also can be used as a the last (output) layer of the CNN which result is the probability of an input image belonging to a certain class. [26] The outline of this layer is shown in Figure 8.

The types of layers described in this subsection are basic for a formation of a CNN archi- tecture. However, they can be located anywhere and have a huge variation in the values

(20)

of the parameters. To form an approach to the formation of an optimal architecture net- work, it is necessary to analyze existing effective architectures and approaches to their construction.

3.2 Learning algorithms

The training of an artificial neural network is the process of adjusting the values of the connection weights between the network neurons. in CNN, supervised learning is used, implying the use of a training sample to compare the network output with the reference training sample value. Calculating of the error and adjusting the network connection weights in order to reduce the value of this error. In CNN, the gradient descent or back propagation algorithm or its modification is usually used [29]. For fully-connected and convolutional layers, the error value of the output signal of the network is as follows:

E(w) =X

i,k

(fik−yik)2 (2)

where E(w) is an error function of the network, fik is value of the output signal of the kthneuron of the network at submission theith sample from the training set, andyik is expected value of the output signal of thekthneuron of the network at submission of the ithsample of the training sample. [27]

For the sub-sampling layers, the calculation of the error is not performed because these layers do not participate in network training. When using the maximum function in sub- sampling layer (maximum value selection) the process of the error backpropagation oc- curs as follows: the error from the layer located after sub-sampling layer immediately moves to the convolutional layer located before the given sub-sampling layer. When us- ing the function which performs calculation of the error mean value, an error from the network layer, located after this layer, is divided by the number of elements of the sub- sampling layer and is transferred to all values of the convolutional layer. [29]

3.3 Regularization of CNN

Overfitting is one of the problems of CNN. It means that the model accurately classifies only the samples from the training set, rather than learning to classify samples that did not participate in the training (losing the ability to generalize). In recent years, many

(21)

solutions to the problem of overfitting have been suggested, but one of them has surpassed all others, due to its simplicity and excellent practical results. This solution is a dropout method [30].

3.3.1 Dropout

The main idea of dropout is instead of training one DNN to train an ensemble of several DNNs, and then the average of the results obtained. Networks for training are obtained by dropping out neurons with probabilityp, so the probability that a neuron remains on the network is

q = 1−p (3)

Dropping out a neuron means that for any input data or parameters it returns 0. Excluded neurons do not contribute to the training process at any stage of the backpropagation algorithm, so the exclusion of at least one of the neurons is equivalent to training a new neural network. [30] [31]

3.4 Transfer learning

A modern approach in CNN includes a massive number of layers and parameters. These factors affect the training time and in order train CNN from the scratch it takes a lot of time even using such computational resources as the graphical process units (GPU).

This task can be solved with the help of the concept of transfer learning [32]. The general idea of the concept is to retrain the model for other tasks with the preservation of the knowledge obtained on the initial training set. As applied to classifiers, the classification layer is replaced and the retraining of the model for classifying new classes.

Transfer learning is applicable for convolutional neural networks since in the lower levels, the network identifies basic textural features common to most images of real objects.

Therefore, it is possible to take the model of a convolutional neural network, trained on a sufficiently large representative set of images and retrain it to recognize a new set of object classes. This approach gives a great advantage in the speed of training, compared to training from scratch on a large dataset.

(22)

The concept of transfer learning at the moment is most often used for tasks in which it is difficult to collect a large representative set, sufficient to train the network with large number of parameters. For example, in [33] the application of the concept for training the recognition model for object classification on the X-ray images of luggage is considered.

The challenge is the insufficient size of the available training samples which does not allow to train enough convolutional network to distinguish signs. The authors originally train the model on a large training sample of general images, and then retrain for the task of recognizing objects in the X-ray spectrum.

Yosinski et al. [34] illustrated that each hidden layer in a convolutional neural networks has special features representation related characteristics among of which the lower layers provide general features extraction capabilities, whilst higher layers contains information that is increasingly more specific to the original classification task.

3.5 Existing architectures of convolutional neural networks

As mentioned above, the first proposed convolutional neural network was the model of Yan LeCun LeNet whose scheme is shown in Figure 8. It was the first model that con- tained alternating two-fold convolution layers and sub-sampling layers, and three fully- connected layers. In the following subsections, the modern and state-of-art approaches of CNN architectures are listed and described

3.5.1 AlexNet

As depicted in Figure 10, the network consists of eight layers: five convolutional layers and three fully-connected ones [26]. The output from the last fully-connected layer goes to the three-way softmax that produces distribution between the classes. The network maximizes multinomial logistic regression which is equivalent to maximizing the average number of learning log-probability of the correct label in the distribution.

In 2012, at the ImageNet Large Scale Visual Recognition Competition (ILSVRC) AlexNet reached the top-5 error 15.31% in image classification [35]. The top-5 error rate is the fraction of test samples for which the correct label is not among the five labels considered most probable by the mode. For the comparison, a method that does not use convolutional neural networks, received an error of 26.1%.

(23)

Figure 10.AlexNet CNN architecture. [26]

3.5.2 VGG

VGG (Visual Geometry Group) is a model of CNN proposed by [36] in 2014. This network does not use filters larger than 3x3. The authors showed that the layer with the filter 7x7 is equivalent to three layers with filters 3x3. Similarly, a layer with a 5x5 filter is equivalent to two layers with a 3x3 filter which save 22% of the network parameters as shown in Figure 11. The full architecture of the VGG-16 CNN is shown in Figure 12.

Figure 11.5x5 kernel decomposition [36].

At the ILSVRC 2014 competition, an ensemble of two VGG Networks (VGG-16 and VGG-19) received a top-5 error of 7.3%. Although these models did not win in the competition, they are commonly used for object detection [37] [38], semantic segmen- tation [39], or masking of objects [40] due to their simplicity.

3.5.3 GoogLeNet

Creators of the GoogLeNet CNN, led by Christian Szegedy proceeded from the fact that after each layer network it is necessary to make a choice whether the next layer is a

(24)

Figure 12.VGG-16 CNN architecture [36].

fold with a 3x3 filter, 5x5, 1x1 or a layer of sub-sampling [41]. Each of these layers is useful - a filter 1x1 reveals the correlation between the channels, in while filters of a larger size respond to more global signs, and the sub-sampling layer makes it possible to reduce dimension without large losses of information. Instead of choosing which one the layer should be the following, it is proposed to use all layers at once, in parallel to each other, and then combine the received results in one. To avoid the growth of number of parameters, 1x1 convolution is added before each layer. Such block of the layers is called the inception module. It is shown in Figure 13. The architecture of this network is shown in Figure 14.

Figure 13.GoogLeNet inception module. [41]

(25)

Figure 14.GoogLeNet CNN architecture. [41]

3.5.4 Deep Residual Network

The winner of ILSVRC 2015 with top-5 an error of 3.57% was an ensemble of six net- works including Residual network (ResNet), developed at Microsoft Research [42]. The authors of ResNet noticed that with the increase of the number of layers the neural net- work begins to degrade it decreases the accuracy at the validation set. So it can be con- cluded that the problem is not in overfitting of the network. It was suggested that if the convolutional neural network has reached its limit of accuracy on some layer, then all the following layers must degenerate into identical transformation, but because of the complexity of training deep networks does not occur. In order to "help" the network, it was proposed to introduce connections (shortcut connections), shown in Figure 15. The analysis in [43] [44] showed that deep residual neural network can be considered as an ensemble consisting of smaller residual neural networks whose effective depth increases in the training process. Figure 16 presents architecture of 34-layer residual networks.

Figure 15.Shortcut connection. [42]

(26)

Figure 16.ResNet architecture. [42]

(27)

4 PROPOSED METHOD

The following method for species identification is proposed in this study. Images of the upper and lower sides of the board undergo to extraction of image patches. The single im- age patch represents a square image extracted from the board excluding the background.

The subsequent identification of individual image patches is done using CNN. The de- cision rule combine decisions about the identified image patches into the final decision about the wood species of the whole board. All steps of the method are illustrated in Figure 17.

Figure 17.Proposed method for wood species identification.

4.1 Image patch extraction

The goal of the process is to form a set of square images (patches) that can be classified using CNN. The resulting images should include only the part on which the wood is depicted. As illustrated in Figure 18 the process includes the following steps: 1) the image

(28)

gray-scaling; 2) converting to a binary image format for searching of region of interests.

The biggest one represents the board. Then board’s rotation angle is compensated and the image of the board is extracted according to the bounding box of found region of interest;

3) final extraction of the image patches with removed background parts.

Figure 18.Patch extraction process.

4.2 Decision rule

Image patches are classified using CNN producing multiple identification results for each board. To combine the patch identification results to one final decision about the board the decision rule is proposed to be used. In this part several combination rules are con- sidered. Thus, the whole board represents as a pattern Z of the patches to be assigned to the possible classesw1, w2...wk, R represents number of the image patches from one board,m is total number of the classes,P(wk)is a priori probability,P(wk|xi)is a pos- terior probability, andδkiis a binary valued function which results in combining decision outcomes rather than combining a posteriori probabilities.

(29)

4.2.1 Majority vote rule

The rule assigns a pattern to that class for which the average a posteriori probability is maximum [45]. AssignZtowj if

R

X

i=1

ij =maxm

k=1 R

X

i=1

ki (4)

The hardening of the a posteriori probabilities P(wk|xi) to produce binary valued func- tionsδki is defined as

ki =

1 if P(wk|xi) = maxmj=1P(wj|xi) 0 otherwise

(5)

For each classwk the sum of Eq. 4 simply counts the votes from the classifier. The class which gets the large number of votes is then selected as the final decision. [45]

4.2.2 Mean probability rule

The rule assigns a pattern to the class for which the mean of a posteriori probability is maximum [45]. AssignZtowj if

1 R

R

X

i=1

P(wj|xi) =maxm

k=1

1 R

R

X

i=1

P(wk|wi) (6)

4.2.3 Max rule

This decision rule approximates the maximum of the posterior probabilities [45]. Assign Z towj if

(1−R)P(wj) +RmaxR

i=1 P(wj|xi) = maxm

k=1

(1−R)P(wk) +RmaxR

i=1 P(wk|xi)

(7)

and under assumptions of equal priors simplifies to: AssignZ towj if maxR

i=1 P(wk|xi) =maxm

k=1

maxR

i=1 P(wk|xi) (8)

(30)

5 EXPERIMENTS AND RESULTS

5.1 Data

The data used in this work consist of the images of upper and lower sides of the boards.

Each side the boards were taken with cameras located along the board. The whole board consists of the several such images. Figure 19 shows the example of the original image of lower side of board from one of the cameras.

Figure 19.Example of the original image.

5.1.1 Data preparation

The first preparation step is to extract the patches from the original images. A high quality image patch represents the properly extracted square image that is not contains any part of the background. Figure 20 represents examples of such extracted image patches from the image shown in Figure 19. However, image patches are not always accurately extracted.

The reasons for this are defects or knots on the boards and as well as the presence of spots on the background. Therefore two datasets were prepared: all image patches and only high quality ones. Figure 21 shows the example of inaccurately extracted image patches.

High quality examples are represented in Figure 20.

Figure 20.Example of high quality extracted image patches.

(31)

Figure 21.Example of inaccurately extracted image patches.

Table 2 represents the numbers of boards and extracted image patches for each wood species. Total number of all extracted image patches from all boards was 255724, in- cluding 242938 high quality patches. On average, approximately 250 image patches were extracted from each board.

Table 2.Number of the boards and extracted image patches for each wood species.

Wood species Number of boards

High quality image patches

All image Patches

Fir 455 99136 104353

Pine 256 55777 58713

Spruce 404 88024 92656

Total 1115 242938 255724

Table 3 represents numbers of the boards used training, validation, and test sets for each wood species. Total number of the all boards is 1115. All the boards are splitted into training, validation and test sets in a proportion 70/15/15 respectively.

(32)

Table 3.Number of boards for the training, validation, and test sets.

Wood species Training set Validation set Test set

Fir 318 65 72

Pine 178 35 43

Spruce 282 59 62

Total 778 159 177

5.2 Implementation

All experiments were performed using Python 2.7 [46] with the deep learning framework Caffe [47] which utilizes the CUDA 9.0 deep learning library with CUDNN 7.1 [48].

For data preparation and data splitting in training, validation, and test sets MatLab [49]

was used. All experiments were performed on the server with following computational resources: two NVIDIA GeForce GTX TITAN Black GPU, Intel Xeon CPU E5-2680 and 128 gygabytes of random access memory.

Transfer learning with ImageNet pretrained models were used to fine-tune the following convolutional neural networks: VGG-16, GoogLeNet, and ResNet-50. AlexNet CNN was trained from the scratch. The dropout rate was 0.5 for CNN regularization to avoid the overfitting.

5.3 Evaluation Criteria

In order to estimate and compare the performance of the models with different architec- tures, the following metrics and characteristics were used:

• Accuracy

The accuracy is the basic metric of the model performance defined as:

ACC = T

N (9)

whereT is a number of correctly identified samples,N is total number of samples.

• Confusion Matrix

A confusion matrix shows the number of correct and incorrect predictions of species made by the classification model compared to the real values in the data.

(33)

• F1 Score

F1 Score is a joint assessment of Positive Predictive Value (PPV) and True Positive Rate (TPR). This metric is calculated by the following formula:

F1 = 2∗ P P V ∗T P R

P P V +T P R (10)

where PPV and TPR defined as follows:

P P V = T P

T P +F P (11)

T P R= T P

T P +F N (12)

where True Positive (TP) is the number of correctly identified image patches, False Positive (FP) is the number of incorrectly identified image patches, and False Neg- ative (FN) is the number of missed image patches.

• The number of parameters of CNN

Total number of parametersN in specific model. This is important parameter, be- cause it affects on computational speed.

• The training time

The training time τT is the total amount of time in seconds spent on the model training .

• The inference time

The inference time τI is the amount of time in seconds spent to classify one test sample.

• The inference time for one board

The inference timeτB is the amount of time in seconds needed for one board pro- cessing.

5.4 Single image patches identification performance

5.4.1 High quality image patches

Table 4 shows the comparison between the architectures classification performance in case of single image patches identification. Figure 22 represents the confusion matrices

(34)

of all models utilized in single image patch wood species identification experiments.

Table 4.Performance comparison of different architectures in case of high quality image patches.

Architecture ACC F1 Score N τT τI

AlexNet 0.923 0.92 56870944 16218 0.02

VGG-16 0.819 0.681 134260416 47987 0.04

GoogLeNet 0.961 0.95 5969344 25933 0.02

ResNet-50 0.7022 0.675 23514176 50663 0.05

Figure 22. Confusion matrices of classification models in case of single high quality image patches identification.

As can be seen from the confusion matrices the identification performance varies between architectures. For VGG-16 there is a lot of misclassifications between pine and spruce, when the true label is pine. Also fir and spruce are mixed when the true label is fir. Within spruce wood species class is predicted more accurately. ResNet-50 shows the opposite.

In this case misclassifications are located more in the spruce class confusing it with the pine class. AlexNet and GoogleNet show more accurate predictions with all the classes without strong confusion.

(35)

5.4.2 All image patches

The same experiment was performed with the dataset including all patches. Table 5 shows the comparison between the architectures classification performance in case of the single image patches identification. Figure 23 includes the confusion matrices of the models.

The experiment shows that using all the image patches affects on VGG-16 negatively because the accuracy of the model is reduced. For AlexNet and GoogLeNet decrease of the performance is not so significant. Confusions of all the models are relatively the same as in the high quality image patches experiment.

Table 5.Performance comparison of different architectures in case of all image patches.

Architecture ACC F1 Score

AlexNet 0.915 0.915

VGG-16 0.687 0.681

GoogLeNet 0.947 0.94

ResNet-50 0.715 0.685

Figure 23. Confusion matrices of classification models in case of the single image patches iden- tification in case of the all image patches usage.

(36)

Figures 24, 25, 26, and 27 present the illustration of image patches classification perfor- mance. In all the figures the correct class is spruce for all the image patches.

Figure 24. Examples of predicted classes by AlexNet of the example board’s patches in case of spruce wood species.

Figure 25. Examples of predicted classes by VGG-16 of the example board’s patches in case of spruce wood species.

(37)

Figure 26. Examples of predicted classes by GoogLeNet of the example board’s patches in case of spruce wood species.

Figure 27. Examples of predicted classes by ResNet-50 of the example board’s patches in case of spruce wood species.

(38)

5.5 Board identification performance

5.5.1 High quality image patches

Table 6 provides the performance of wood species identification for the whole boards using different decision rules. It includes the accuracy with the different decision rules and the average timeτB needed for one board processing. Figure 28 represents the confusion matrices in board identification experiments.

Table 6.Performance comparison of different architectures in case of board wood species identi- fication. The rows in bold are the architectures with the best performance.

Architecture ACC Majority Vote

ACC Mean Probability

ACC Max Rule

τB

AlexNet 0.988 0.988 0.988 2.64

VGG-16 0.70 0.706 0.706 7.18

GoogLeNet 0.994 0.994 0.994 4.77

ResNet-50 0.761 0.711 0.711 9.49

Figure 28.Confusion matrices of classification models in case of board identification.

(39)

5.5.2 All image patches

Table 7 provides the performance of wood species identification of the whole boards for the dataset that includes all image patches. Figure 29 represents the confusion matrices for models implemented the experiment.

Table 7.Performance comparison of different architectures in case of board wood species identi- fication. The rows in bold are the architectures with the best performance.

Architecture ACC Majority Vote

ACC Mean probability

ACC Max rule

AlexNet 0.989 0.989 0.989

VGG-16 0.721 0.721 0.721

GoogLeNet 0.994 0.994 0.994

ResNet-50 0.783 0.764 0.764

Figure 29.Confusion matrices of classification models in case of board identification.

The experiments in board identification performance shows the high accuracy with AlexNet and GoogLeNet. Confusions of the models are relatively the same as in the single image patches identification performance. The use of all image patches increases the accuracy of

(40)

the VGG-16 and ResNet-50 models and there is no affection on performance of AlexNet and GoogLeNet models.

5.6 Influence of number of image patches on board identification ac- curacy

The processing of the one board takes a long time, even when using architectures with less number of parameters. A comparison was made to determine how the number of patches per board affects the accuracy of the identification of the entire board. The experiment was conducted on the same test set as in the previous experiments, consisting of 177 boards. Figure 30 shows the mean accuracy along the y axis and the number of used image patches alongxaxis. Total number of image patches extracted is 250 on the average. The experiment was performed for the 50 of used patches. For each value of used patches the experiment was repeated 100 times and the mean of obtained accuracies were computed.

The experiment is performed with the dataset containing all image patches. As the result the experiment showed that it can be used only 25 image patches per board for each architecture to obtain high accuracy. Table 8 represents the accuracy and the inference time of the board identification in case of the use only 25 image patches per one board.

Figure 30.Influence of number of image patches usage on the board identification accuracy. The figure indicates the mean accuracy that can be obtained. Total number of image patches extracted from one board is 250 on the average.

(41)

Table 8.The accuracy and the inference time in case of the use only 25 image patches per board.

Architecture ACC τB

AlexNet 0.986 0.26

VGG-16 0.721 0.72

GoogLeNet 0.994 0.47

ResNet-50 0.780 0.95

(42)

6 DISCUSSION

Wood species identification is important part of the sawmilling process. Accurately iden- tified species of wooden material allow to perform properly the technological process of the sawmill. According to the literature review, manual determination of wood species is a laborious and long procedure, requiring an experienced specialist. This serves as a huge motivation for the development of an automated approach. Also, modern approaches based on computer vision and the application of the approach using convolutional neural networks are examined.

In the work the method for wood species identification was proposed. The method in- cludes the following steps: image patches extraction, wood species identification of each image patches, and combination of the identification results to the final decision about the wood species of the whole board.

Based on the results of the work, it can be concluded that despite the presence of a large number of different parameters in the CNN architecture, their architecture templates can be selected and conduct differentiation of its parameters, highlighting the most significant.

Therefore, it is guided by the highlighted features of the architecture of the CNN and the most accurate model should be chosen for classification which is resistant to changes in the data being processed.

Based on the experimental results it can be highlighted that the best performance is shown by more simple networks with small number of layers and parameters, such as AlexNet and GoogLeNet. The accuracy of these models were 91.5% and 96.1% respectively in the single high quality image patches identification performance. The highest accuracy was indicated in board level wood species identification by the same CNN architectures AlexNet and GoogLeNet. The accuracies were 98.8% and 99.4% respectively. It was shown that simple architecture in this particular type of tasks shows more accurate results and can determine the difference in various wood species images. It should be concluded that more simple networks such as AlexNet and GoogLeNet are more suitable for the wood species identification task. The reason for this is presumably a small number of parameters in CNN allows for a smaller amount of time to reveal a significant difference between the classes of wood species. That, in turn, helps to avoid overfitting.

Also, the experiments shows that the inference time for one board can be can be short- ened by decreasing the number of used patches for identification without affection on identification accuracy. The accuracy of 99.4% can be obtained using GoogLeNet with

(43)

usage of only 25 image patches. This means it needed only 10% of total number of the image patches for one board identification. In that case the inference time for one board identification was 0.47 seconds.

6.1 Future Work

Despite the accurate performance in identification, only three wood species were used in the work. In the future work it makes sense to evaluate the work of the model in the presence of other wood species. Also, the inference speed of recognition has a strong influence, since it is an important parameter in case of application in an industrial envi- ronment. It is worth considering the possibility of accelerating the system up to real-time recognition, for example, by parallelizing of the computations. Also, the main factor is the quality of the input images. It should be considered how the resolution of the image patches affects on the identification accuracy. These measurements can show which the camera parameters and lightning conditions are more suitable for accurate wood species identification

(44)

7 CONCLUSION

In the thesis a method for wood species identification was proposed. The proposed method includes the steps of extraction of patches, their subsequent identification, and application of the decision rule to make a decision about the wood species of the whole board. In the experimental part of the work, the following architectures of convolu- tional neural networks were implemented and tested: AlexNet, VGG-16, GoogLeNet and ResNet-50. The experiments were implemented in two datasets: all image patches and only high quality ones. The experiments showed that high quality patch sorting has little impact in case of single patches identification.

The highest accuracy were shown by the AlexNet and GoogLeNet architectures in case of identification of individual patches 92.3% and 96.1% respectively. The highest accuracy is 99.4% in case of board wood species identification achieved by GoogLeNet. Also, experiments showed that the inference time for board identification with the accuracy of 99.4% can be shortened to 0.47 seconds due to the usage of only 10% of total number of the image patches for each board.

(45)

REFERENCES

[1] Gullichsen J. and Paulapuru H. Forest Product Chemestry. Fapet Oy, 2000.

[2] Gulliichsen J. and Paulapuru H. Pulp and Paper Testing. Fapet Oy, 1999.

[3] Ibrahim I., Khairuddin A. S. M., Talip M. S. A., Arof H., and Yusof R. Tree species recognition system based on macroscopic image analysis. Wood Science and Tech- nology, 51(2):431–444, 2017.

[4] Digisaw project. http://www2.it.lut.fi/project/digisaw/index.

shtml. Accessed: 2018-05-24.

[5] Timar M.C., Gurau L., and Porojan M. Wood species identification, a challenge of scientific conservation. International Journal of Conservation Science, 3(1):11–22, 2012.

[6] Elisabeth A. Wheeler and Pieter Baas. Wood identification -a review. The Interna- tional Association of Wood Anatomists, 19(3):241–264, 1998.

[7] Menon P.K.B. Structure and Identification of Malayan Woods. Forest Research Institute, Malaysia, 1993.

[8] Khalid M., Lee E.L.Y., Yusof R., and Nadaraj M. Design of an intelligent wood species recognition system. International Journal of Simulation Systems, Science and Technology, 9(3), 2008.

[9] Conners R. W., Kline D. E., Araman P. A., and Drayer T. H. Machine vision tech- nology for the forest product industry. Computer, 30(7):43–48, 1997.

[10] Tou J. Y., Lau P. Y., , and Tay Y. H. Computer vision-based wood recognition system. InInternational Workshop On Advanced Image Technology, 2007.

[11] Tou J.Y., Tay Y.H., and Lau P.Y. One-dimensional grey-level cooccurrence matrices for texture classification. InInternational Symposium On Information Technology, 2008.

[12] Tou J.Y., Tay Y.H., and Lau P.Y. A comparative study for texture classification techniques on wood species recognition problem. InFifth International Conference on Natural Computation, 2009.

[13] Paula Filho P.L., L.S. Oliveira, Nisgoski S., and Britto Jr. A.S. Forest species recog- nition using macroscopic image. Machine Vision and Applications, 25:1019–1031, 2014.

(46)

[14] Hafemann L.G., Oliveira L.S., and Cavalin. Forest species recognition using deep convolutional neural networks. InInternational Conference on Pattern Recognition, 2014.

[15] Martins J.G., Oliveira L.S., Nisgoski S., and R. Sabourin. A database for automatic classification of forest species. Machine Vision and Applications, 24(3):567–578, 2013.

[16] Yadav A.R., Dewal M.L., Anand R.S., and Gupta S. Classification of hardwood species using ann classifier. In Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, 2013.

[17] Tsuchikawa S., Hirashima Y., Sasaki Y., and Ando K. Near-infrared spectroscopic study of the physical and mechanical properties of wood with meso- and micro-scale anatomical observation. Applied Spectroscopy, 59(1):86–93, 2005.

[18] Nuopponen M.H., Birch G.M., Sykes R.J., Lee S.J., and D.J. Stewart. Estimation of wood density and chemical composition by means of diffuse reflectance mid- infrared fourier transform (driftmir) spectroscopy. Journal of Agricultural and Food Chemistry, 54(1):34–40, 2006.

[19] Orton C.R., Parkinson D.Y., and Evans P.D. Fourier transform infrared studies of heterogeneity, photodegradation, and lignin/hemicellulose ratios within hardwoods and softwoods. Applied Spectroscopy, 583(1):1265–1271, 2004.

[20] Puiri V. and Scotti F. Design of an automatic wood types classification system by using fluorescence spectra. IEEE Transactions on Systems, Man, and Cybernetics, 40(3):358–366, 2010.

[21] Jordan R., Feeney F., Nesbitt N., and Evertsen J.A. Classification of wood species by neural network analysis of ultrasonic signals. Ultrasonics, 36(1-5):219–222, 1998.

[22] Krizhevsky A. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.

[23] F. H. C. Tivive and A. Bouzerdoum. Texture classification using convolutional neu- ral networks. InProceedings of TENCON 2006 - 2006 IEEE Region 10 Conference, pages 1–4, Nov 2006.

[24] Yann LeCun and Yoshua Bengio. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks, 3361(10), 1995.

[25] Y. LeCun. Generalization and network design strategies. A technical report, 1989.

(47)

[26] Krizhevsky A., Sutskever I., and Hinton G. E. Imagenet classification with deep con- volutional neural networks.In Advances in Neural Information Processing Systems, 52(128):873–879, 2009.

[27] Y. Lecun, L. Bottou, and Y. Bengino P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[28] Aravindh Mahendran and Andrea Vedaldi. Visualizing deep convolutional neu- ral networks using natural pre-images. International Journal of Computer Vision (IJCV), 120(3):233–255, 2016.

[29] Thibaut Durand, Nicolas Thome, and Matthieu Cord. Weakly supervised learning of deep convolutional neural networks. InIEEE Conference on Computer Vision and Pattern Recognition, page 4743–4752, 2016.

[30] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.

[31] Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional net- works for text classification. InProceedings of the 28th International Conference on Neural Information Processing Systems, volume 1, pages 649–657, Cambridge, MA, USA, 2015. MIT Press.

[32] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, October 2010.

[33] S. Akçay, M. E. Kundegorski, M. Devereux, and T. P. Breckon. Transfer learning using convolutional neural networks for object classification within x-ray baggage security imagery. InProceedings of IEEE International Conference on Image Pro- cessing (ICIP), pages 1057–1061, Sept 2016.

[34] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? InProceedings of the 27th International Con- ference on Neural Information Processing Systems, volume 2, pages 3320–3328, Cambridge, MA, USA, 2014. MIT Press.

[35] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C.

Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Inter- national Journal of Computer Vision (IJCV), 115(3):211–252, 2015.

(48)

[36] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.

[37] R. Girshick. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, Dec 2015.

[38] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137–1149, June 2017.

[39] H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic seg- mentation. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1520–1528, Dec 2015.

[40] Pedro H. O. Pinheiro, Ronan Collobert, and Piotr Dollár. Learning to segment object candidates. CoRR, abs/1506.06204, 2015.

[41] Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., and Rabinovich A. Going deeper with convolutions. InIEEE Conference Com- puter Vision and Pattern Recognition, 2015.

[42] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.

In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016.

[43] Andreas Veit, Michael J. Wilber, and Serge J. Belongie. Residual networks are ex- ponential ensembles of relatively shallow networks. CoRR, abs/1605.06431, 2016.

[44] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine transla- tion. CoRR, abs/1609.08144, 2016.

[45] Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas. On combin- ing classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239, March 1998.

[46] Guido van Rossum and Python Development Team. Python 2.7.10 Language Refer- ence. Samurai Media Limited, United Kingdom, 2015.

(49)

[47] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. InProceedings of the 22-nd ACM International Confer- ence on Multimedia, pages 675–678, New York, NY, USA, 2014. ACM.

[48] Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cudnn: Efficient primitives for deep learn- ing. CoRR, abs/1410.0759, 2014.

[49] MATLAB. version 9.3.0 (R2017b). The MathWorks Inc., Natick, Massachusetts, 2017.

Viittaukset

LIITTYVÄT TIEDOSTOT

lähdettäessä.. Rakennustuoteteollisuustoimialalle tyypilliset päätösten taustalla olevat tekijät. Tavaraliikennejärjestelmän käyttöön vaikuttavien päätösten taustalla

Jos valaisimet sijoitetaan hihnan yläpuolelle, ne eivät yleensä valaise kuljettimen alustaa riittävästi, jolloin esimerkiksi karisteen poisto hankaloituu.. Hihnan

Tuntikeskiarvoilla on mallinnettu samat selitettävät kuin 10 min:n keskiarvoilla eli lentokentän meno- ja paluulämpötilat, virtaus ja paine-ero käyttäen samoja selittäjiä

tuoteryhmiä 4 ja päätuoteryhmän osuus 60 %. Paremmin menestyneillä yrityksillä näyttää tavallisesti olevan hieman enemmän tuoteryhmiä kuin heikommin menestyneillä ja

Työn merkityksellisyyden rakentamista ohjaa moraalinen kehys; se auttaa ihmistä valitsemaan asioita, joihin hän sitoutuu. Yksilön moraaliseen kehyk- seen voi kytkeytyä

The new European Border and Coast Guard com- prises the European Border and Coast Guard Agency, namely Frontex, and all the national border control authorities in the member

The problem is that the popu- lar mandate to continue the great power politics will seriously limit Russia’s foreign policy choices after the elections. This implies that the

The US and the European Union feature in multiple roles. Both are identified as responsible for “creating a chronic seat of instability in Eu- rope and in the immediate vicinity