• Ei tuloksia

4.2 Training Models

4.2.2 Model Definitions

i =γwˆ +i β. (4.13)

Batch normalization assumes stability on the distribution, rendering the opti-mization part of the model quicker than usual. Hence, the model saves time where the resources are limited due to practical issues.

4.2.2 Model Definitions

In this section, the experimental models are defined. There are 9 different models, 3 of each of them are used for a different representation of data. The models are CNN, RNN, and LSTM. The RNN and LSTM models are relatively simpler compared to CNN models. All models are optimized with Adam, and the loss function is cross-entropy.

CNN

TheModel 1 is slightly based on a research of 2015 on the classification of environ-mental sounds[63]. The model is based on the MFCC representation of the data.

The components of the model are defined in Figure 4.6.

The Model 2 is based on the spectrogram representation of the data. The components of the model are defined in Figure 4.7.

Figure 4.7: CNN model for spectrogram representation of data.

TheModel 3 is based on the Mel spectrogram representation of the data. The components of the model are defined in Figure 4.8.

Figure 4.6: CNN model for MFCC representation of data.

Figure 4.8: CNN model for Mel spectrogram representation of data.

RNN

The Models 4, 5, and 6 are similar to each other. The models are based on the MFCC, spectrogram, and Mel spectrogram representations of the data, respectively.

The components of the models are defined in Figure 4.9. Inside of the RNN cell, the activation function tanh is used.

(a) RNN model for MFCC representation of data.

(b) RNN model for spectrogram representation of data.

(c) RNN model for Mel spectrogram representation of data.

Figure 4.9: RNN models for different representations of data.

LSTM

The Models 7, 8, and 9 are similar to each other. The models are based on the MFCC, spectrogram, and Mel spectrogram representations of the data, respectively.

The components of the models are defined in Figure 4.10. Inside of the LSTM cell, the activation function sigmoid is used.

(a) LSTM model for MFCC representation of data.

(b) LSTM model for spectrogram representation of data.

(c) LSTM model for Mel spectrogram representation of data.

Figure 4.10: LSTM models for different representations of data.

Table 4.2 summarizes the learning models given above with their methods, num-ber of layers, and their representation of data.

Model Method Number of Layers Data Representation

Model 1 CNN 10 MFCC

Model 2 CNN 6 Spectrogram

Model 3 CNN 6 Mel Spectrogram

Model 4 RNN 2 MFCC

Model 5 RNN 2 Spectrogram

Model 6 RNN 2 Mel Spectrogram

Model 7 LSTM 3 MFCC

Model 8 LSTM 2 Spectrogram

Model 9 LSTM 2 Mel Spectrogram

Table 4.2: Summary of the learning models and their properties.

5. Results and Discussion

The experimental results are below expectations. This problem is due to the repre-sentation of data, limitation of time and computational resources, and the inability of determination of the correct hyperparameters. Table 5.1 gives the list of accura-cies and number of fitting steps (epoch) of the training and the test data for each model.

Training Data Accuracy Test Data Accuracy Epochs

Model 1 97.70% 71.37% 5

Model 2 100% 29.81% 30

Model 3 100% 62.88% 30

Model 4 52.63% 40.37% 47

Model 5 33.52% 33.33% 10

Model 6 32.30% 33.37% 30

Model 7 97.30% 39.89% 200

Model 8 98.48% 34.30% 30

Model 9 99.74% 35.70% 140

Table 5.1: The rates of success for each model, and their epoch count.

At this step, it can be observed that epoch counts for the models differ. The training is stopped for a model is when the loss value of the model begins to increase and gets rolled back to the previous epoch. For example, in the case of Model 1, after the 5th epoch, the loss increased, and the model was stopped early. The only exception is Model 7, where the exact epoch limit for training is achieved and it did not stop early.

Models 2, 7, 8, and 9 suffered from overfitting. Models 4, 5, and 6 suffered from underfitting. The only relatively satisfactory results were taken from Models 1 and 3.

5.1 Working as a Binary Classification Problem

The classification results of the given 9 models are investigated within the con-fusion matrices in detail. A confusion matrix is a comparison table of actual

True False

Table 5.2: Ternary and binary confusion matrices.

observations and predictions of the given model. Other statistical metrics other than accuracy are investigated. To achieve this, the models are transformed into binary classification problems. So, two of the three classes, Problematicand Not Working are merged, since they are similar conditions in practical environ-ments. On the other hand, a problematic system will lead to the halting eventually.

In other words, they are in a causal relation.

Thebinary confusion matrix (denoted in Table 5.2) consists of four different values:

1. True positive (TP), which an actual Working instance is classified as a Working instance.

2. True negative (TN), which an actual Problematic or Not Working instance is classified as a Problematicor Not Working instance.

3. False positive (FP), which an actual Problematic or Not Working instance is classified as a Workinginstance.

4. False negative (FN), which an actual Problematic or Not Working instance is classified as a Workinginstance.

Conversion of a three-class problem into a binary classification problem gives the advantage to compute statistical metrics other than the accuracy. In the binary classification analysis of this work, the defined metrics are

1. Recall, which is the ratio of true positives to the sum of true positives and false negatives, denoted as R = TP+FNTP ,

2. Precision, which is the ratio of true positives to the sum of true positives and false positives, denoted as P = TP+FPTP ,

3. Specificity, which is the ratio of true negatives to the sum of true negatives and false positives, denoted as S = TN+FPTN .