Evaluation of performance - An efficient system for online training of a seizure detection mode

In order to perform a better algorithm’s analysis of the new implementation versus the old one, it was decided to save the other two best classifier models in order to compare their performance with the old binary classifier model. Selecting the best classifiers with high average precision and low variance. Comparing them with the current classifier model used by the company referred to as model_rf. To perform the analysis the classifier models were used to create the inference for different patients. The inference was run for the patients 118, 224, 34, 40, 59, and 81, obtaining JSON files with the predicted events detected. Where only patient 34 and 40 were used on the training but since they present a high number of seizures were used to check their performance. These patients were the ones selected since they present the clearer annotations and have already been processed by the pipeline obtaining their signals. With these JSON files denominated as

findings files, were used against the ground truth JSON annotation files with the tools previously mentioned in Section 3.1 and 3.4.4.

First in order to analyze the general performance of the model we run the script mentioned in 2.3.7 for patients 81 and 59. Neither of these patients was used in the training. Patient 81 presents a high amount of clonic epileptic seizures. From the script mentioned in Section 3.1 the performance of the new binary classifier models can be compared against the current model used by the company.

The ROC curve obtained from the calculation of true positive rate vs false-positive rate.

In the graph generated for patient 81 Figure 4.1 it can be seen that less false-positive and higher true positive result is reached with a smaller threshold by the new binary classifier models, model_0_3 model_0_8 and model_0_10, while the old model_rf_clonic reach it on a high value of threshold and with a high value of false-positive rate.

Figure 4.1.ROC curve for patient 81.

For patient 59 similar results are obtained as from patient 81. The number of true positive rate is at their maximum with a small value of false-positive rate and maintains constant through several threshold values. As it can be seen in Figure 4.2

Figure 4.2.ROC curve for patient 59.

To be able to understand the accuracy of the models in detecting the appearance of epilepsy seizures, the following graph is generated. This in contrast to the above graphics presents the absolute value of the total number of the true positive samples generated against the false-positive samples. With this representation, it can be easier to under-stand the accuracy of the binary classifier models in detecting epilepsy seizures.

For the patient 81 it can be seen that false-positives samples generated by the new binary classifier model has been reduced in comparison to the original model classifier. This by seeing how the curve in Figure 4.3 reaches its peak of true positive samples with the smaller value of false-positive samples.

Figure 4.3. Total number of true positives seizure samples against false-positive seizure samples for patient 81.

Same case can be seen from the curve for patient 59 in Figure 4.4 that the model_rf only reaches the maximum of true positive samples with a huge amount of false-positive samples. By presenting a huge amount of false-positive samples makes the work of the nurse heavier. In the worst-case nurse would have to watch 5000 events in order to catch all seizures. Although these are short events, the total time for the nurse to watch them all would be about 27 hours. However, with the new binary classifier, the high amount of true positive samples of epileptic seizures are present with less than 1000 false-positive.

Reducing the working time to 5hrs.

Figure 4.4. Results of best binary classifier model.

Finally, the other graphic representation obtained from the script is a graph showing the sensitivity against the precision value. These graphs show an increased amount of in-consistency between sensitivity and precision. As it can be seen in the graph of patient 59 the best model presents some peak values for the precision. These peaks represent that not for having higher sensitivity means it will be a higher precision. However, from these graphs in Figure 4.5, it can be seen that the values increase a little in comparison to the original binary classifier model_rf.

(a)Patient 59. (b)Patient 81.

Figure 4.5. Sensitivity vs precision curve.

Having this graphic representation of the classifier performance in different thresholds it can be established that the new binary classifier model has a better performance than the current binary classifier model used by the company. However, as it can be seen in the graph 4.5 panel a, the performance of this new classifier is not as high as it was expected. Because of this, an extra evaluation method is needed to compare the models.

The script described in Section 3.4.4 was run with the inference files of the patients 118, 224, 34, 40, 59, and 81. Obtaining from each of these the values of precision, sensitivity, and false-positive rate. To proceed an obtained the optimal threshold that maximized the sensitivity value of each patient.

First, we present the results obtained by the current binary classifier model used by the company. Although in order to understand the functionality of each of the binary clas-sifiers, it is important to highlight the information on how they were trained and their characteristics. This corresponds to an XGBClassifier using a learning rate of 0.1 and a booster of gbtree, trained with 300 features, and with the annotations containing seizure descriptors of convulsive movement and clonic oscillation to train the classifier model. By this classifier model the values obtained can be seen in the following Table 4.1:

Table 4.1. Results of model_rf.

Currently in order to have a sensitivity higher than 0.83 at least for one patient the opti-mal threshold has to be really low as a value of 0.26, however by having this low value increases the false-positive rate making it only possible of having zero false-positive rate for two patients.

In the case the new algorithm implementation of the best classifier model generated cor-responds to the model_0_3, being an Extra Trees Classifier with entropy criterion and 500 estimators. This classifier was trained with 550 features and with the subfolders structure giving the selected JSON files for positive samples and irrelevant samples, generating random negative samples. From this classifier model the results obtained from the in-ference findings file were following. Having an optimal threshold of 0.82 the sensitivity above 0.8 is achieved for three patients reducing the false-positive rate for all the patients and increasing the precision for the patients 118 and 81. In this case, by looking at the false-positive rate results and the value of the optimal threshold we can point out that the new binary classifier presents a more stable solution making higher the optimal threshold reducing the number of positive samples. As it can be seen in Table 4.2

Table 4.2. Results of best binary classifier model.

The second-best classifier generated corresponds to the binary classifier Gradient Boost-ing Classifier refered as model_0_8. This one was trained with a criterion of Friedman mean square and a learning rate of 0.01 utilizing a loss of deviance with 500 estimators and feed with 550 features. The analytic values obtained from the inferences of using this model are the following show in Table 4.3:

Table 4.3. Results of second best classifier model.

This binary classifier model_0_8 the optimal threshold value corresponds to 0.639731 where it is already better than the threshold from the model_rf presenting a higher sen-sitivity value for most of the patients and also increase in the precision. However, com-paring this model against the model_rf it presents better false-positive rate values, but not as good as the model_0_3 which is the best model generated automatically from the training system.

Finally, the third-best classifier corresponds to a XGBClassifier referred as model_0_10.

This binary classifier was trained with a booster of gbtree and a learning rate of 0.1, fed with 550 features. The analysis of the inferences obtained from using this model proportionate an optimal threshold of 0.6969 is higher than the model_rf, as it can be seen in the following image. It presents better false-positive rate results than the model_rf, but not as good as the model_0_3, as it is presented in Table 4.4.

Table 4.4. Results of third best classifier model.

5 CONCLUSIONS

The aim of the thesis was to provide a solution to Neuro Event Labs, that allows them to train a binary classifier model faster than the current implementation in an online service, which allows them to decrease the number of false-positive rate samples in order to decrease the work effort by the nurses of watching the videos. Offering to the company a system capable of training a classifier model with different parameters and capable of training with more amount of data. The solution presented to the company consists of two parts. First, it is presented the current alternatives to run machine learning tools in cloud services. As it was mentioned before, some of the companies that offer these machine learning tools are Google Cloud Platform and Amazon Web Services.

Since the company stores its information into Amazon Web Services (AWS) the best so-lution for their problem is to use the machine learning tools provided by Amazon Web Services (AWS). After selecting the best option for the company, the second part of the solution took place, comparing the current training pipeline implementation with the re-quirements of the machine learning tools of AWS. By doing this comparison the weak points of the old training implementation were identified and needed to be changed in order to be able to package the system into a Docker image that can be deployed into the ECS containers of AWS where the different tasks of training can be activated. As it was presented in the results the main weak points of the old pipeline training implementation consist of the processing of the data and the trigger for running a new binary classifier training. Taking several intermediate steps to perform this task, making the process of training a model slower and incapable of adapting to new types of data, referring as new annotation format or new signals format. The solution presented to the company facili-tated the adaptability of new formats of data and makes the training process with different parameters easier. By making the implementation more flexible it also provide to the user the opportunity of creating different types of classifiers without the necessity of modifying inner scripts of the implementation but just modify the parameters needed to run the run-ner script. This new runrun-ner script provides the facility of packing the system into a Docker image that can run in any server without the need to install other dependencies. Giving this alternative to the company makes the generation of classifier models easier by just requiring the location of the paths in the S3.

The main changes in the training system consist of simplifying the data processing and the creation of the previously mentioned runner script. Once these changes had taken place it was tested by running it for patients with clonic seizures in order to compare the

new classifier model obtained against the current clonic model used by the company.

As presented in the previous section, the new implementation performs in faster running time, accomplishing the goal of the thesis. On the other hand, the availability of the new implementation of reading the positive, negative, and irrelevant samples may give the option to the company to use different kinds of annotations that can be of old type references or newer ones with descriptors. The classifier model generated with the new implementation presents a higher stable threshold where the amount of false-positive events detected as epilepsy seizures are reduced. Presenting less false-positive events makes the work of the nurses optimized by watching less amount of recording video material.

Finally, the new implementation reaches the objective of the thesis, providing a solution to the company where it is possible to run the container image in the ECS and generate a new binary classifier model in order to create inference for new patients. This new implementation generates more stable models that reduce the amount of false-positive events and gives the opportunity of selecting different formats. Having better adaptability makes it possible to be a functional system even when the annotation format changes.

REFERENCES

[1] AWS Documentation.URL:https://docs.aws.amazon.com/index.html.

[2] Badger, M. L., Grance, T., Patt-Corner, R. and Voas, J. M. Cloud Computing Syn-opsis and Recommendations. en. (May 2012). Last Modified: 2018-11-10T10:11-05:00.URL: https://www.nist.gov/publications/cloud-computing-synopsis-and-recommendations.

[3] Bilen, H., Fernando, B., Gavves, E. and Vedaldi, A. Action Recognition with Dy-namic Image Networks. arXiv:1612.00738 [cs] (Aug. 2017). arXiv: 1612.00738.

URL:http://arxiv.org/abs/1612.00738.

[4] Breiman, L. Random Forests.Machine Learning45 (2001), 5–32.DOI:10.1023/A:

1010933404324.URL:https://doi.org/10.1023/A:1010933404324.

[5] Clonic Seizures. en. Library Catalog: www.epilepsy.com.URL:https://www.epilepsy.

com/learn/types-seizures/clonic-seizures.

[6] Cord, M. and Cunningham, P.Machine Learning Techniques for Multimedia. Springer Berlin Heidelberg, 2008.

[7] Dangeti, P.Fundamentals of Statistical Modeling and Machine Learning Techniques.

Library Catalog: learning.oreilly.com. URL: https : / / www . safaribooksonline . com / videos / fundamentals of statistical / 9781788833981 / 9781788833981 -video1_3.

[8] Deep, J.Some Key Machine Learning Definitions. en. Library Catalog: medium.com.

Nov. 2017. URL: https : / / medium . com / technology nineleaps / some key -machine-learning-definitions-b524eb6cb48.

[9] Docker Documentation. en. Library Catalog: docs.docker.com. July 2020.URL:https:

//docs.docker.com/.

[10] Epilepsy - Symptoms and causes. en. Library Catalog: www.mayoclinic.org. URL: https : / / www . mayoclinic . org / diseases conditions / epilepsy / symptoms -causes/syc-20350093.

[11] Fergus, P., Hussain, A., Hignett, D., Al-Jumeily, D., Abdel-Aziz, K. and Hamdan, H. A machine learning system for automated whole-brain seizure detection. en.

Applied Computing and Informatics12.1 (Jan. 2016), 70–89.ISSN: 2210-8327.DOI: 10.1016/j.aci.2015.01.001. URL:http://www.sciencedirect.com/science/

article/pii/S2210832715000022.

[12] Geurts, P., Ernst, D. and Wehenkel, L. Extremely randomized trees. en. Machine Learning 63.1 (Apr. 2006), 3–42. ISSN: 1573-0565. DOI: 10 . 1007 / s10994 006 -6226-1.URL:https://doi.org/10.1007/s10994-006-6226-1.

[13] Google Cloud Platform | Documentation. en. Library Catalog: cloud.google.com.

URL:https://cloud.google.com/docs.

[14] Guidi, B., Ricci, L., Calafate, C., Gaggi, O. and Marquez-Barja, J. Smart Objects and Technologies for Social Good: Third International Conference, GOODTECHS 2017, Pisa, Italy, November 29-30, 2017, Proceedings. en. Springer, Mar. 2018.

ISBN: 978-3-319-76111-4.

[15] Introduction to Boosted Trees — xgboost 1.1.0-SNAPSHOT documentation.URL: https://xgboost.readthedocs.io/en/latest/tutorials/model.html.

[16] K. V., R. and Ran, G.-B. DART: Dropouts meet Multiple Additive Regression Trees.

(2015).

[17] Kaitai Struct: declarative binary format parsing language.URL:https://kaitai.

io/.

[18] Keras: the Python deep learning API.URL:https://keras.io/.

[19] LibXtract: Main Page.URL:http://libxtract.sourceforge.net/.

[20] Lopes da Silva, F., Gonçalves, S. and De Munck, J. Encyclopedia of Neuroscience.

John Wiley & Sons, 2014. Chap. Electroencephalography (EEG). URL: https://

doi.org/10.1016/B978-008045046-9.00304-1.

[21] Manikandan, S. Measures of central tendency: The mean. Journal of Pharma-cology & Pharmacotherapeutics 2.2 (2011), 140–142.ISSN: 0976-500X.DOI:10.

4103/0976- 500X.81920.URL:https://www.ncbi.nlm.nih.gov/pmc/articles/

PMC3127352/.

[22] NIST/SEMATECH e-Handbook of Statistical Methods. Apr. 2012. URL: https : / / www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm.

[23] OpenCV: How to Use Background Subtraction Methods. URL: https : / / docs . opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html.

[24] OpenCV: Optical Flow.URL:https://docs.opencv.org/3.4/d1/dc5/tutorial_

background_subtraction.html.

[25] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blon-del, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cour-napeau, D., Brucher, M., Perrot, M. and Duchesnay, E. Scikit-learn: Machine Learn-ing in Python.Journal of Machine Learning Research12 (2011), 2825–2830.

[26] Pollock, D. S. G.A Handbook of Time-series Analysis, Signal Processing and Dy-namics. en. Academic Press, 1999.ISBN: 978-0-12-560990-6.

[27] Pop, D. Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions. arXiv:1603.08767 [cs] (Mar. 2016). arXiv: 1603.08767. URL: http : / / arxiv.org/abs/1603.08767.

[28] Rojas, R. AdaBoost and the Super Bowl of Classifiers A Tutorial Introduction to Adaptive Boosting. (2009).

[29] Rokach, L. and Maimon, O.Data Mining With Decision Trees: Theory And Applica-tions.World Scientific, 2008.

[30] Ruparelia, N. Cloud computing. eng. The MIT Press essential knowledge series.

Cambridge, Massachusetts ; The MIT Press.ISBN: 0-262-33412-7.

[31] Solichin, A., Harjoko, A. and Eko, A. Movement Direction Estimation on Video us-ing Optical Flow Analysis on Multiple Frames. en.International Journal of Advanced

Computer Science and Applications9.6 (2018). ISSN: 21565570, 2158107X. DOI: 10 . 14569 / IJACSA . 2018 . 090625. URL: http : / / thesai . org / Publications / ViewPaper?Volume=9&Issue=6&Code=ijacsa&SerialNo=25.

[32] Stafstrom, C. E. and Carmant, L. Seizures and Epilepsy: An Overview for Neu-roscientists. Cold Spring Harbor Perspectives in Medicine5.6 (June 2015). ISSN: 2157-1422.DOI:10.1101/cshperspect.a022426. URL:https://www.ncbi.nlm.

nih.gov/pmc/articles/PMC4448698/.

[33] Sutton, R. S. and Barto, A. G. Reinforcement Learning: An Introduction. (2014).

[34] TensorFlow. Library Catalog: www.tensorflow.org.URL:https://www.tensorflow.

org/.

[35] The History of Google Cloud Platform. en-US. Library Catalog: linuxacademy.com Section: Cloud. Dec. 2018. URL: https : / / linuxacademy . com / blog / cloud / history-google-cloud-platform/.

[36] Tonic-clonic seizures | Epilepsy Action. en. Library Catalog: www.epilepsy.org.uk.

URL:http://www.epilepsy.org.uk/info/seizures/tonic-clonic.

[37] tsfresh.feature_extraction package — tsfresh 0.12.0 documentation.URL:https://

tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html.

[38] Vaugier, L., McGonigal, A., Lagarde, S., Trébuchon, A., Szurhaj, W., Derambrure, P. and Bartolomei, F. Hyperkinetic motor seizures: a common semiology generated by two different cortical seizure origins.Epileptic disorders: international epilepsy journal with videotape19 (Aug. 2017). DOI:10.1684/epd.2017.0932.

[39] Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Jarrod Millman, K., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C., Polat, ̇I., Feng, Y., Moore, E. W., Vand erPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P. and Contributors, S. 1. 0. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python.Nature Methods 17 (2020), 261–272.DOI:https://doi.org/10.1038/s41592-019-0686-2.

[40] Working with Amazon S3 Buckets - Amazon Simple Storage Service.URL:https:

//docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html.

In document An efficient system for online training of a seizure detection model-Epilepsy seizure detection (sivua 49-0)