4.2 Training Models
4.2.2 Model Definitions
i =γwˆ +i β. (4.13)
Batch normalization assumes stability on the distribution, rendering the opti-mization part of the model quicker than usual. Hence, the model saves time where the resources are limited due to practical issues.
4.2.2 Model Definitions
In this section, the experimental models are defined. There are 9 different models, 3 of each of them are used for a different representation of data. The models are CNN, RNN, and LSTM. The RNN and LSTM models are relatively simpler compared to CNN models. All models are optimized with Adam, and the loss function is cross-entropy.
CNN
TheModel 1 is slightly based on a research of 2015 on the classification of environ-mental sounds[63]. The model is based on the MFCC representation of the data.
The components of the model are defined in Figure 4.6.
The Model 2 is based on the spectrogram representation of the data. The components of the model are defined in Figure 4.7.
Figure 4.7: CNN model for spectrogram representation of data.
TheModel 3 is based on the Mel spectrogram representation of the data. The components of the model are defined in Figure 4.8.
Figure 4.6: CNN model for MFCC representation of data.
Figure 4.8: CNN model for Mel spectrogram representation of data.
RNN
The Models 4, 5, and 6 are similar to each other. The models are based on the MFCC, spectrogram, and Mel spectrogram representations of the data, respectively.
The components of the models are defined in Figure 4.9. Inside of the RNN cell, the activation function tanh is used.
(a) RNN model for MFCC representation of data.
(b) RNN model for spectrogram representation of data.
(c) RNN model for Mel spectrogram representation of data.
Figure 4.9: RNN models for different representations of data.
LSTM
The Models 7, 8, and 9 are similar to each other. The models are based on the MFCC, spectrogram, and Mel spectrogram representations of the data, respectively.
The components of the models are defined in Figure 4.10. Inside of the LSTM cell, the activation function sigmoid is used.
(a) LSTM model for MFCC representation of data.
(b) LSTM model for spectrogram representation of data.
(c) LSTM model for Mel spectrogram representation of data.
Figure 4.10: LSTM models for different representations of data.
Table 4.2 summarizes the learning models given above with their methods, num-ber of layers, and their representation of data.
Model Method Number of Layers Data Representation
Model 1 CNN 10 MFCC
Model 2 CNN 6 Spectrogram
Model 3 CNN 6 Mel Spectrogram
Model 4 RNN 2 MFCC
Model 5 RNN 2 Spectrogram
Model 6 RNN 2 Mel Spectrogram
Model 7 LSTM 3 MFCC
Model 8 LSTM 2 Spectrogram
Model 9 LSTM 2 Mel Spectrogram
Table 4.2: Summary of the learning models and their properties.
5. Results and Discussion
The experimental results are below expectations. This problem is due to the repre-sentation of data, limitation of time and computational resources, and the inability of determination of the correct hyperparameters. Table 5.1 gives the list of accura-cies and number of fitting steps (epoch) of the training and the test data for each model.
Training Data Accuracy Test Data Accuracy Epochs
Model 1 97.70% 71.37% 5
Model 2 100% 29.81% 30
Model 3 100% 62.88% 30
Model 4 52.63% 40.37% 47
Model 5 33.52% 33.33% 10
Model 6 32.30% 33.37% 30
Model 7 97.30% 39.89% 200
Model 8 98.48% 34.30% 30
Model 9 99.74% 35.70% 140
Table 5.1: The rates of success for each model, and their epoch count.
At this step, it can be observed that epoch counts for the models differ. The training is stopped for a model is when the loss value of the model begins to increase and gets rolled back to the previous epoch. For example, in the case of Model 1, after the 5th epoch, the loss increased, and the model was stopped early. The only exception is Model 7, where the exact epoch limit for training is achieved and it did not stop early.
Models 2, 7, 8, and 9 suffered from overfitting. Models 4, 5, and 6 suffered from underfitting. The only relatively satisfactory results were taken from Models 1 and 3.
5.1 Working as a Binary Classification Problem
The classification results of the given 9 models are investigated within the con-fusion matrices in detail. A confusion matrix is a comparison table of actual
True False
Table 5.2: Ternary and binary confusion matrices.
observations and predictions of the given model. Other statistical metrics other than accuracy are investigated. To achieve this, the models are transformed into binary classification problems. So, two of the three classes, Problematicand Not Working are merged, since they are similar conditions in practical environ-ments. On the other hand, a problematic system will lead to the halting eventually.
In other words, they are in a causal relation.
Thebinary confusion matrix (denoted in Table 5.2) consists of four different values:
1. True positive (TP), which an actual Working instance is classified as a Working instance.
2. True negative (TN), which an actual Problematic or Not Working instance is classified as a Problematicor Not Working instance.
3. False positive (FP), which an actual Problematic or Not Working instance is classified as a Workinginstance.
4. False negative (FN), which an actual Problematic or Not Working instance is classified as a Workinginstance.
Conversion of a three-class problem into a binary classification problem gives the advantage to compute statistical metrics other than the accuracy. In the binary classification analysis of this work, the defined metrics are
1. Recall, which is the ratio of true positives to the sum of true positives and false negatives, denoted as R = TP+FNTP ,
2. Precision, which is the ratio of true positives to the sum of true positives and false positives, denoted as P = TP+FPTP ,
3. Specificity, which is the ratio of true negatives to the sum of true negatives and false positives, denoted as S = TN+FPTN .