A scoping review on the different data mining methods used in epilepsy seizure detection to improve clinical decisions support

(1)

A SCOPING REVIEW ON THE DIFFERENT DATA MINING METHODS USED IN EPILEPSY SEIZURE DETECTION TO IMPROVE CLINICAL DECISIONS SUPPORT

Temilola Ajayi Master’s thesis

University of Eastern Finland Department of Health and Social Management

November 2018

(2)

UNIVERSITY OF EASTERN FINLAND, Faculty of Social Sciences and Business Studies [Abstract]

Department of Health and Social Management, health management sciences, health economics, social management sciences, health and human services informatics

Ajayi, Temilola: A scoping review on the different data mining methods used in epilepsy seizure detection to improved clinical decisions.

Master's thesis, 80 pages, 1 appendices (7 pages) Thesis supervisor: Senior Lecturer Virpi Jylhä,

December 2018_________________________________________________________

Keywords: 3-5 Data mining, Epilepsy, Clinical decision support, seizure detection.

Epilepsy is a common neurological condition which affects individuals of all ages, it is characterized by epileptic seizures which results from irregular brain activity. These seizures occur unexpectedly and therefore makes life difficult for epileptic patients.

The aim of this scoping review is to explore the use of different data mining methods for epilepsy seizure detection and improved clinical decision support.

Electroencephalogram (EEG) is considered the most reliable form of epilepsy diagnosis, as it provides information on the brain activity and signals. However, neurophysiologist encounter some challenges when reading the EEG signals by visual interpretation which includes the time consumed when analyzing a long-term recording and a lack of agreement as regards what constitutes a spike. This makes the manual analysis of EEG signals tedious. Due to this reason, there is need for automated methods, which can be used to detect seizures and can also provide information useful for physicians to improve clinical decision support.

The databases PubMed and Scopus were searched, 347 articles were identified but 15 articles were selected, which featured different data mining methods used in epilepsy seizure analysis. My main interest was to identify the data mining methods in the analysis of data on epilepsy, data mining methods used in the detection of epilepsy and role of seizure detection in improved clinical decision support.

We found out that data mining methods used in the analysis of epilepsy data can be utilized in two main ways which are in seizure detection and in the performance comparison of the different seizure detection methods, also the data mining methods were either novel methods or existing methods and four studies provided clinical information on location of the seizure, and information on clinical diagnosis that could help improve decision support.

Data mining has had an impact on epilepsy seizure detection and has shown promises it could be an important part of seizure detection in the nearest future, because it has the capability to provide new information as regards seizure detection which could ultimately help in epilepsy seizure management, clinical decision support and improved health outcomes for epilepsy patients.

(3)

TABLE OF CONTENTS

1 Introduction ... 5

2 Theoretical background ... 10

2.1 Theoretical models ... 10

2.2 Knowledge discovery and data mining ... 11

2.3 The knowledge discovery database process ... 13

2.4 Data mining methods ... 15

2.4.1 Classification Methods ... 17

2.4.2 Regression ... 23

2.4.3 Descriptive methods ... 23

3 Data mining and epilepsy ... 27

3.1 Data mining and machine learning ... 29

3.2 Classification of seizure types ... 30

3.3 Detection of Epilepsy Using Data Mining Methods ... 32

4 Aims of the study ... 35

5 Methodology ... 36

5.1 Study methods ... 36

5.2 Search strategy ... 36

5.3 Data extraction and analysis ... 38

6 Results ... 39

6.1 Study data ... 39

6.2 Use of data mining methods in the analysis of data on epilepsy ... 41

6.3 The data mining methods used in the detection of epilepsy. ... 42

6.4 The role of seizure detection in improved clinical decision support. ... 47

7 Discussion ... 50

7.1 Main findings ... 50

7.2 Limitation of the study. ... 52

7.3 Conclusion ... 53

8 References ... 54

Appendices ... 67

Figures Figure 1: The revised data-to-wisdom continuum (Nelson & Staggers 2016) ... 11

Figure 2:An overview of the KDD steps (Fayyad et. al, 1996) ... 15

(4)

Figure 3: Data mining paradigms (Maimon and Rokach, 2005). ... 16

Figure 4: NN architecture (Aguiar-Pulido et al., 2013) ... 19

Figure 5: Decision tree example (Yoo et al., 2012) ... 21

Figure 6: Example of SVM. Safe vs Risky ... 22

Figure 7: ILAE classification of Epilepsy (Fisher et al., 2017) ... 31

Figure 8: Flow chart of the study inclusion process ... 39

Tables Table 1. Use of data mining methods in the analysis of data on epilepsy ... 41

Table 2. Data mining methods used in the detection of epilepsy ... 43

Table 3.The role of seizure detection in improved clinical decision support ... 48

(5)

5 1 INTRODUCTION

In recent times, there has been a steady rise in the amount of data generated by the scientific community and businesses which is brought about by the rapid growth of IT data referred to as “Big data”. A huge chunk of this data needs to be analyzed at the pace it is gotten to extract beneficial information on the data attributes (Ahmed et al., 2016).

Big data consist of three main properties which are volume, velocity, and variety (Zikopoulos & Eaton 2011). The variety and velocity of big data poses the challenge of real time analysis which can’t be handled by customary methods. Examples of the challenges include the size of the data and the time required for an online processing model (Bifet,2013).

Big data in healthcare is related to digitalization of health services and it encompasses the process of storing digitally collected data within health care organizations. However, many industries involved in politics, retail, internet search engines and astronomy have developed advanced techniques to handle data, and these techniques have the capability to convert data into knowledge. To achieve the goals and purpose of digitalization, health care needs to follow in the footsteps of the other industries. (Kruse et al. 2016).

Over the years, data has mostly been stored in physical form, but the recent trend is in the digitization of these large data. The shift in trend is driven by the perceived benefit of digitization of healthcare data, which includes improved health care delivery, while also reducing costs and improved health outcomes. These healthcare data known as ‘Big data’

also offer future benefits in healthcare which includes improved clinical decision support, disease monitoring and population health management (Burghard 2012, Dembosky 2012, Feldman et al. 2012, Fernandes et al.2012, Raghupathi & Raghupathi 2014)

Big data in healthcare describes large amounts of health data generated electronically that are too complex to analyze with regular software’s and/or hardware’s and neither can they be handled with the usual data managing tools and techniques (Frost 2012). The complexity of big data in healthcare arises not from just the size, but as a result of the speed it needs to be analyzed and the diversity of data involved (Frost 2012). Health care data are produced by different health professionals and nowadays a patient’s medical data could be gotten from, laboratory,prescriptions, insurance, medical imaging, pharmacy and some administrative data, it also includes machine sensor generated patient data

(6)

6

gotten from vital signs monitoring, social media and some others like data gotten from emergency care and articles found in journals, and all these data make up Big data. (Bian et al. 2012, Raghupathi & Raghupathi 2014).

Data mining is a technique used in the extraction of vital data from a large database and these mined data can provide information which can be used in health care to assist physicians in making clinical decisions. The ability to mine data from databases is a recent interdisciplinary area in computer science, it involves automatic extraction and generation of predictive data from a database, and it is the act of finding previously unknown information from large databases (Han et al., 2006). Data mining involves the use of several analysis tools in finding the relationship between data in a large database. It consists of several techniques that includes database system, statistics and machine learning. The main aim of data mining is to extract beneficial information from database that can be understood by humans. Data mining can therefore be said to be essential in clinical decision making (Sharma et al., 2013).

Also, it is a rapidly expanding field in the computer industry, it started as an area of interest in computer science and statistics but over the years has emerged as a discipline of its own. With the natural ability to mine data from a large dataset, its target markets include data-warehousing, data-mart and decision support, bringing professionals from sectors such as insurance, healthcare, telecommunications etc. Within the business world, data mining can be used to identify trends in purchasing of services, detect fraud and devise an investment strategy (Kantardzic, 2011).

The knowledge discovered from data mining can also be used to enhance marketing decisions and the knowledge gotten used to provide specific needs to customers. The techniques can also be used in business process re-engineering, which involves reorganizing how work is done to become more efficient in the provision of services. Law enforcement agencies have also utilized the knowledge of data mining successfully to discover fraudulent activities and crime trends (Kantardzic, 2011).

Additionally, data mining is important in the modern society because we live in a world driven by data. These data are generated all around us and the information in these data need to be analyzed, extracted and converted into knowledge which could help in decision making. In the present digitalized society, large amounts of data are collected and stored

(7)

7

in data warehouses, this enables analysts to make use of data mining techniques to sieve through data extensively. The quantity and sources of these data are vast that it covers areas such as financial, scientific, commercial and industrial activities (Kantardzic 2011).

The ability to obtain new information from a large data set using data mining has generated interest in discovering different methods for gaining knowledge from raw data and this has been made possible due to the low cost of computers, databases, communication, and sensors coupled with highly skilled computer professionals (Kantardzic 2011).

Furthermore, data mining is a recent advancement that just gained popularity in the mid- 90s as a method to analyze data and discover knowledge. The first conference on data mining held in 1995 in the United States of America and by 2009 the phrase ‘data mining’

was listed for the first time under the Medical Subject Headings (MeSH). The origin of data mining was from the earlier work in machine learning and statistics as a multidisciplinary field. But over time, it has progressed to encompass artificial intelligence, visualization, database design and pattern recognition (Yoo et al. 2012).

At the onset, data mining projects were done in various ways, this involved a different approach to a problem by each data analyst, mostly through a trial and error approach.

However, as the process of data mining evolved there was a need to regulate the data mining process to commercialize it. This resulted in the establishment of the cross- industry standard process for data mining (CRISP-DM) (Gamberger et al. 2012).

The term data mining can be defined in different ways, but one widely used definition describes it as “the analysis of (often large) observational data sets used to find unsuspected relationships and to summarize data in novel ways that are both understandable and useful to the data owner” (Hand et al. 2001).

The digitalization of health care and patient records using health information systems has led to a rise in the production of a huge dataset in healthcare. Even though these generated data could help change healthcare management and delivery, it requires the use of appropriate tools and expertise to analyze and utilize the data (Tan et al. 2015). When applied correctly data mining could lead to knowledge discovery and new biomedical knowledge that can be utilized effectively in clinical decision making, healthcare

(8)

8

management (Sierra & Larranaga 1998; Silver et al. 2001;Adam et al. 2002; Petricoin et al.2002; Harper 2005; Yu et al.2005;Belazzi &Zupan 2008; Eastwood et al.2008;Stel et al.2008) Overall, knowledge from mined data can help improve healthcare provision from provider to patients.

Data mining can also serve as a method to reduce errors in clinical decision making and a useful method to obtain valuable information from a huge data set. The various models used are built to mirror strategies used in clinical decision making. Some of these data mining methods are already in use and assist in the decision-making process within the clinical field. (Chen &Fawcett 2016).

From the WHO (2018) fact sheets, it is estimated that around 50 million people worldwide currently live with epilepsy and at a given period of time between 4 and 10 per 1000 individuals of the general population have active epilepsy. Different studies however claim these figures are much higher in low and middle-income countries with between 7 and 4 per 1000 individuals. It is estimated that about 80% of people with epilepsy are found in low and middle-income countries.

Epilepsy ranks amongst the most prevalent neurological disorders and it is responsible for morbidity and death as a result of seizures and the type of drugs available. It is said that about 50 million people suffer from epilepsy globally and a further 5% of the world’s population will suffer at least one seizure in their lifetime excluding febrile seizures (Sander & Shorvon 1996; Brodie et al. 1997; Banerjee et al.2009; Wahab 2010). The prevalence of epilepsy is said to be between 5- 10 cases per 1000 and its yearly incidence ranges from 50 – 70 cases per 100000 in developed countries and in less developed countries it is about 190 per 100000 (Bell & Sander 2001).

The population of people living with epilepsy is about 80% in developing countries (Meinardi et al. 2001; Scott 2001; Mbuba et al. 2008; Radhakrishnan 2009; Wahab 2010). This high incidence is as a result of poor post-natal care, higher risks of intracranial infections and injuries to the head (Senanayake & Román 1993; Wahab 2010). Furthermore, the patients in developing countries have poor access to treatment.

This limited access to treatment is attributed to inequality in the healthcare system, treatment cost, beliefs and lack of antiepileptic drugs (Wahab 2010).

(9)

9

Epilepsy poses a difficult challenge due to the low quality of life of the patients and the medical expenses associated with it. Most of the present epilepsy treatment available are of no benefit to the patients, this has sparked the need for new ways for treating epilepsy.

Seizure detection is seen as a safe and alternative method in epilepsy treatment as it does not involve taking out brain tissues or side effects form Anti-epileptic drugs (Chaovalitwongse et. al. 2007).

This review focuses on Epilepsy because it is still widely prevalent both in industralized and less industralized countries. Additionally, the manual process of analyzing EEG data is a main problem area in the diagnosis of epilepsy and this also affects the treatment provision process for epilepsy patients. Despite how prevalent epilepsy is in the society, there are limited computational tools that can be used in the automatic diagnosis of epilepsy and some issues involved include variation that exist amongst individuals and the overlapping among seizure and non-seizure states (Echauz et al., 2008; Martinez-del- Rincon et al., 2017)

The main aim of data mining in the process of clinical decision is to identify various set of patterns and associated attributes within the clinical setting, to predict outcome and to guide clinicians during the decision, making process. (Chen & Fawcett 2016). Based on the introduction above, the aim of this scoping review is to identify the various data mining methods used in epilepsy seizure detection for improved clinical decision support.

(10)

10 2 THEORETICAL BACKGROUND 2.1 From data to wisdom

Due to the recent advancement in technology, the ability to generate and collect data is now an easier task. However, these gathered data need to be transformed from raw data to information and knowledge to be able to utilize it (Kantardzic 2011). There are some theoretical models of information which provide more insight into the conversion of data to knowledge, the Shannon-weaver information- communication model and the Nelson data-information-knowledge-wisdom model which was derived from the earlier works of Blum’s and Grave’s are common model examples (Nelson & Staggers 2016).

In the model by Claude Shannon which is known as “The Mathematical Theory of Communication” the information source or the message originator is the sender, while the contents of the message are converted to a code by the transmitter which serves as the encoder. These codes can be in different forms like symbols, words, music letters or computer codes. While the Nelson model is an extension of the works by Graves, Blum and Corcoran and is referred to as the data-to-knowledge continuum by the inclusion of wisdom. In this model, wisdom is described as the efficient utilization of knowledge in providing solution to human problems, it also involves knowing the right time in managing and solving patient’s problems. When used effectively to manage patient problems, it requires experience, values and knowledge. There’s an overlap of the concepts of data, information, knowledge and wisdom which is shown by the circles and arrows present in the model which shows how they are interrelated (Nelson & Staggers 2016).

(11)

11

FIGURE 1: The revised data-to-wisdom continuum (Nelson & Staggers 2016)

The figure shows the relationship between data, information, knowledge and wisdom and how they overlap with each other.

2.2 Knowledge discovery and data mining

Knowledge discovery and data mining are two terms often used interchangeably but researchers have identified data mining as a core part of the term Knowledge discovery from Databases (KDD). Maimon and Rokach (2010) have defined” knowledge discovery as the organized process of identifying valid, novel, useful, and understandable patterns from large and complex data sets”. KD and data mining can be used to identify and categorize patterns (Clancy & Gelinas 2016). According to Fayyad and others (1996) KDD is the process of discovering novel knowledge form data, and data mining is seen as an important step in this process. Specific algorithms are used in the discovery of patterns in the data mining process. The extra steps in the KDD process ensures only knowledge that are vital and useful is gotten from data. However, the wrong application of data mining methods which is referred to as data dredging can be misleading and could lead to the discovery useless and invalid patterns.

(12)

12

KDD is a combination of diverse fields which include databases, pattern recognition, machine learning, statistics, high-performance computing, artificial intelligence, data visualization and knowledge acquisition for experts. Its main purpose is in the extraction of vital knowledge from usually low-level data in the settings of huge sets of data. The data mining step in the KDD process requires the use of methods gotten from machine learning, statistics and pattern recognition (Fayyad et al. 1996)

Statistics is a key aspect of knowledge discovery from data. It supplies the framework and language that are used in analyzing the unpredictability involved when inferring patterns from a sample of a general population. KDD could be said to be a part of modelling rather than statistics. One of the main aims of KDD is to help in the provision of tools to computerize the process of data analysis and the art of hypothesis selection by statisticians (Fayyad et al. 1996).

Another similar field which has evolved from database is data warehousing, this deals with the process of gathering and cleaning transactional data for the purpose of using them for online inquiry and decision support. The process of data warehousing helps prepare the way for KD in two main ways (1) data cleaning and (2) data access (Fayyad et al. 1996)

In data cleaning, it is believed that in most situations data usually have problems. In some cases, during the data collection process mistakes will be made like unfilled fields or data entry error. Due to this, the KDD process cannot run smoothly without the need to clean up the data. It is essential to perform data cleaning as it might lead to poor data quality and in some cases, what seems like an anomaly to be gotten rid of might turn out to be a crucial pointer of an interesting domain phenomenon (Brachman & Anand,1994).

Data access are uniform methods which can be used to access data and provide access to data paths used to be an obstacle. The need for KDD arises when the issues with storage and data access have been resolved. A well-known method used for analyzing data warehouses is known as online analytical processing (OLAP) which is titled after a set of standards proposed by Codd (1993). These OLAP tools is focused on the provision of multidimensional data analysis, that is highly ranked than SQL in the computing of summaries and breakdowns along numerous dimensions. These tools are aimed at simplifying and supporting interactive data analysis but in KDD the tools are meant to

(13)

13

computerize much of the process. Which is why KDD is a step ahead present current database systems(Fayyad et al. 1996).

KDD can be differentiated from pattern recognition or machine learning because some methods of mining data used in data mining step of the KDD process are gotten from them. KDD encompasses the knowledge process from data, which includes how data are accessed and stored, the scaling of algorithms to large data sets and still function without issues, the interpretation of results and how the man-machine interaction could be properly modelled. The KDD process is seen as a multidisciplinary activity which utilizes techniques that covers a wide range of discipline such as machine learning. The emphasis on KDD is in the discovery of useful patterns that can provide beneficial knowledge (Fayyad, Piatetsky-Shapiro and Smyth, 1996).

2.3 The knowledge discovery database process

The KDD process is an iterative and interactive process which consists a total of nine steps and most of the decisions are made by the user. Each of the step is iterative and it may require moving back to the previous step in some cases. The process is said to be artistic and a specific formula for the right choices of each step cannot be presented. For this reason, it is important to comprehend the process and the possibilities in each step.

The process begins with identifying the KDD goals and ends with the implementation of the discovered knowledge, after which the loop is closed (Fayyad et al. 1996; Maimon &

Rokach, 2005)

The knowledge discovery database process is divided in 9 different steps which are:

1. Developing and understanding of the application domain: This initial step, which is the preliminary step allows for the understanding of specific actions to be taken as regards the recommendations that has to do with representation, algorithms etc. For the individual making the decisions involved in the KDD project they need to know and illustrate the goals of the end-user and what type of settings in which the knowledge discovery process will be utilized. This also includes the significant prior knowledge and as the KDD process progresses the need to revise this particular step might be appropriate (Fayyad et al 1996;

Maimon and Rokach, 2005)

(14)

14

2. Selecting and creating a data set on which discovery will be performed:

After the aims of the KDD process have been determined, the data which will be utilized in the discovery process need also to be decided on. This involves identifying the type of data, getting other necessary data and merging all data for the knowledge discovery into one dataset. This is a vital part of the process dueto the fact that the data mining gathers knowledge found in the available data. In construction of models, this is the evidence base part of it because if some set of attributes are not present it could result in the failure of the entire study (Maimon

& Rokach, 2005).

3. Cleansing and preprocessing: This step involves some simple actions which enhances data reliability like the removal of outliers or noise, making strategic decisions on how to handle missing data fields and accounting for known changes and time-sequence information step (Fayyad et al. 1996; Maimon & Rokach, 2005)

4. Data transformation: In this step, data mining is prepared and developed by the generation of better data. The methods in this step include attribute transformation and dimension reduction. This is an important step in the KDD process and determines how successful the entire KDD process will turn out and it also project specific (Maimon & Rokach, 2005).

5. Picking the data mining algorithm: This involves picking a data mining method and this usually depends on the goals of the KDD and on the previous steps.

examples of data mining methods include regression, clustering, summarization, classification etc (Fayyad et al.1996) .

6. Choosing the data mining algorithm: This step involves choosing a particular method which will be used to check for patterns. In this process, the decision on which models and parameters that’s appropriate is made(Fayyad et al.1996) 7. Data mining: In this step, the data mining algorithms is implemented. This is an

iterative process and might need to be done several times until a satisfactory result is obtained. This can be done by adjusting the control parameters of the algorithm for example by adjusting the minimum number of instances in a single leaf of a decision tree (Maimon & Rokach, 2005).

8. Interpretation and evaluation: In this step, the mined data is interpreted and the possibility of going back to any of the previous steps for iteration if needed. Also

(15)

15

this step involves, Visualization of the extracted patterns and visualization of the data considering the extracted models (Fayyad et al.1996).

9. Acting on the discovered knowledge: The knowledge gotten can be incorporated into another system for further action or by presenting the results to interested parties or by documenting it. This process also involves searching and resolving likely conflicts with existing knowledge (Fayyad et al. 1996).

FIGURE 2: An overview of the KDD steps (Fayyad et. al, 1996)

The various steps seen in the KDD process can be said to be repetitive and, in some cases, it involves loops, which could occur within any two steps. In studies done in the past, the focus of the KDD process as always been on the data mining step. However, for the effective application of KDD in practical situations, the other steps should not be neglected (Fayyad et al. 1996).

2.4 Data mining methods

The data mining process is an iterative one in which discovering knowledge through automatic or manual methods is a sign of advancement. The exploratory nature of data mining makes it effective in situations where no predetermined outcome is envisaged (Kantardzic, 2011). It requires the joint effort of humans and computers to bear good results which involves striking the right balance between expertise and knowledge of

(16)

16

humans in outlining problems and goals with the search competence of computers (Kantardzic, 2011).

FIGURE 3: Data mining paradigms (Maimon & Rokach, 2005).

There are numerous data mining methods which serve different aims and goals. Data mining can be distinguished by two main types which are verification oriented and discovery oriented. Patterns in data are naturally identified in discovery methods.

Discovery methods are further divided into two which are prediction methods and description methods (Maimon & Rokach, 2005;Yoo et al. 2012).

Verification methods are those methods that deal with the analysis of hypothesis proposed by an expert or external source. They include the most familiar traditional statistical methods like analysis of variance (ANOVA), test of hypotheses and test of goodness of fit. These statistical methods are not in line with the objectives of data mining, as data mining is aimed at discovering hypothesis rather than testing a known hypothesis (Maimon & Rokach, 2005;Yoo et al. 2012).

The classification of the data mining basically depends on the type of data being mined, the intended knowledge to be discovered and the kind of method utilized (Joudaki et al.

2014). Prior to employing data mining methods to medical data, research analysts need to know the type of data mining methods that exist and their mode of operation. The two

(17)

17

main goals of data mining are prediction and description. Prediction typically involves the use of variable in a data set to forecast previously unknown value of other variables of interest. While description usually aims to discover patterns that describe a data and can be interpreted by humans. (Kantardzic, 2011; Yoo et al. 2012).

Predictive (supervised learning) aims to identify connections that exist amidst input variable and an output variable. Supervised methods are useful in classification and prediction purposes and includes conventional statistical procedure like neural networks, regression analysis, Bayesian networks, discriminant analysis and support vector machine (SVM) (Joudaki et al., 2014;Yoo et al. 2012).

Descriptive data mining on the other hand aims to outline data using their underlying relationships. In descriptive (unsupervised learning), the model is often defined by observing the data and identifying patterns with no specific type of tag or class description. These methods are beneficial in summarizing and identifying relationships in a data or in its attributes which is being analyzed (Aguiar-Pulido et al. 2013).

Clustering, association, summarization and sequence discovery are examples of descriptive data mining (Yoo et al. 2012).

2.4.1 Classification Methods

In classification, data is classified into one of several class labels that have been predefined categorically. The word ‘class’ from classification refers to a characteristic in a set of data which is of interest to the user. In statistics it is known as the dependent variable. For a data to be classified, the classification method generates a model of classification which consists of characterization rules (Yoo et al. 2012). For example, in financial institutions classification methods can be used as knowledge discovery tool to identify trends in the market (Fayyad et al. 1996) . In healthcare, these methods can help define ailments and the observed symptoms based on diagnosis and prognosis (Yoo et al, 2012).

Classification involves a process of two steps, the first is training while the second step is testing. The training process, a classification model is built which is made up of classifying rules and by analyzing training data containing class labels`. In some other classifiers, mathematical formulas are used instead of the IF-THEN rules to improve accuracy. Testing which is the second step, a classifier is examined for accuracy or in its

(18)

18

capabilility to categorize unidentified items (records) for prediction. The testing step is an easy and cheap computational process in comparison to the training step (Yoo et al.

2012).

The Naïve Bayesian Classifier is a simple and efficient classification method. The term

‘naïve’ indicates the assumption that data attributes are independent .This assumption is known as conditional independence (Wang et al. 2007). This classifier is built on the Bayes theorem and is a probabilistic statistical classifier. One of the benefits of the nave Bayesian classifier is in its ease of use, as it’s the simplest algorithms amongst the classification algorithms. As a result, it can handle easily a data set with different features (Yoo et al. 2012). “This classifier only requires a small amount of training data to develop accurate parameter estimations because it requires only the calculation of frequencies of attributes and attribute outcome pairs in the training data set” (Yoo et al. 2012).

The main disadvantage associated with this algorithm is the basic assumption that all attributes are independent of each other. This fundamental assumption of the classifier is unrealistic in many cases (Yoo et al.2012).

A typical example is in the medical field where several health issues are related to one another, for example body mass index and blood pressure. This may cause some anomaly in the classification generated. Overall the Bayesian classifier provides accurate results when used for classification and is a very popular method in medical data mining (Yoo et al. 2012).

Neural Networks emerged during the 20^th century and was said to be the most effective classification algorithm, before other algorithms like decision trees and support vector machines (SVM) were introduced. As a result of it being one of the earliest used classification algorithms, it has been well used in various fields like healthcare and biomedicine. For example, it has been the algorithm of choice when diagnosing diseases like cancer and has also been used in the prediction of outcomes (Yoo et al. 2012).

Neural networks are computer programs which are built to mirror the neurological activities of the brain. They are made up of computational nodes that mimic how the neurons in the brain function. These neurone or nodes are linked to the other nodes through links with alterable weights. When the neural network is learning or being trained the link weights can be adjusted. The nodes found in neural networks can be classified

(19)

19

into two which are input and output layer, or in some cases it can be categorized into three input, hidden and output layers (Yoo et al. 2012).

For example, a network could be designed which links a set of observations to a set of diagnoses. Each specific input node would be assigned to different datum and each output node would be assigned similarly to a corresponding diagnosis. Then, the observations which have been identified is programmed to the network, the output node that has been the most stimulated by the input data is preferentially fired and thus produces a diagnosis (Coiera, 2015).

The knowledge derived from the observations and diagnoses are saved within the connections of the network. The main idea behind neural networks is inspired by reactions of neurones once a certain level of activation has been attained. A node present in the network fires up when the sum of its inputs is greater than a pre-determined threshold.

“These inputs are determined by the number of input connections that have been fired and the weights upon those connections. Thus when a network is presented with a pattern on its input nodes, it will output a recognition pattern determined by the weights on the connections between layers” (Coiera, 2015).

The frequently utilized neural network is the multi-layer perceptron which has a back- propagation, and can be found on Weka and it is said to perform better than the other neural algorithms (Yoo et al. 2012).

FIGURE 4: NN architecture (Aguiar-Pulido et al. 2013)

Although, neural network has been widely used, it still possesses several limitations.

Firstly, its learning or training process is usually slow, because it takes time to come up with the parameters to be used, due to the amount of different combinations and it is

(20)

20

computationally expensive. Secondly, it is limited in its ability to explain conclusions gotten, for this reason health professionals can’t comprehend how it arrives at its classification decisions. Unlike other algorithms like decision trees. Thirdly it needs a vast number of parameters and the performance is precise to those parameters that have been picked. Lastly, the accuracy of its classification is of lower standard to the newly developed classification algorithms like support vector machines and decision trees (Yoo et al. 2012).

Decision Trees are a popularly utilized classification algorithms in the data mining process and machine learning is decision tree due to its ease of understanding. They are useful in modelling when the aim is to comprehend the fundamental processes of the environment. In cases where data do not meet the assumptions needed by the more traditional methods then decision trees are useful (Czajkowski et al. 2014).

Decision tree algorithms was introduced by Ross Quinlan in the year 1979. The most widely used decision tree algorithm is the C4.5 which replaced the iterative Dichotomiser (ID3) (Yoo et al. 2012).

Decision trees consist of decision nodes, which are joined together by branches that extend in a top-down manner from the root node, before it then terminates at the leaf nodes. The process starts at the root node which is placed at the top commonly in the decision tree diagram, each attribute is tested at the decision nodes and the outcome forming a branch. The newly formed branch then leads to another decision node or a terminating leaf node (Larose & Larose, 2015).

For example. The diagram shows the family history of lung cancer in a decision tree and the family history happens to be the root node. When decision trees contain the IF-THEN rules then it is a classification model. This means that the construction of a decision tree can be regarded as an important part of the training process (Yoo et al. 2012) .

(21)

21

One of the benefits of the decision tree is the ability to visualize data in a class-focused way. This allows the user to comprehend easily the data structure and to readily observe the attribute which affects the class. While the limitation to the decision tree is when there are too many attributes and the decision tree becomes difficult to understand. This complex decision tree can be resolved by using tree pruning methods which uses statistical methods to take away the least important branches and this helps the user to focus and work with the more important attributes (Yoo et al. 2012).

FIGURE 5: Decision tree example (Yoo et al. 2012)

The support vector machine (SVM) came into existence in the early 90s while the extended support vector machine in the mid-90s, it was developed for AT & T bell laboratories by Vladimir Vapnik and his co-workers. The foundation work for SVM was carried out by Vapnik and his colleagues in 1963. The SVM was modelled using statistical learning theory, and its design is focused on solving a two-class classification problem (Yoo et al. 2012).

For example, safe vs risky. The early and later versions of the SVM were a bit different, the former only provided a linear kernel function, while the later supplied a non-linear kernel function e.g. polynomial and a radial basis function which help strengthened the classification accuracy (Yoo et al. 2012).

“The strategic basis of the SVM is when a dataset is represented in a high dimensional feature space, it searches for the ideal separating hyperplane where the margin amidst two unique objects is maximal. Hyperplanes are decision limits between two unique set of

(22)

22

objects. To discover the hyperplane that has the maximal margin, the SVM makes use of support vectors and the margin can be identified by the use of two support vectors” (Yoo et al. 2012).

One of the main benefits of the SVM is because of the accuracy of its classification. It should be noted it is not the preferred or best technique for every dataset. Some of the limitations of the SVM are multiple kernel functions are provided and each of those function is not the same for every data set. Secondly the SVM is modelled to provide solutions to a two-class classification problem. The approach used to resolve a multiclass classification problem is by decreasing the multiclass problem to a multiple binary problem (Yoo et al. 2012).

FIGURE 6: Example of SVM. Safe vs Risky (Yoo et al. 2012)

Ensemble approach is a data mining technique that falls under classification. Its logic is based on the fact that multiple classifiers, when they work together can result in a more effective classification accuracy than when using a single classifier. For example if three classifiers A,B and C make a prediction that a patient that’s difficult to classify has got lung cancer and another classifier C and D predicts that the said patient is not cancerous, then by voting strategy the patient is identified as having cancer in the lungs (Yoo et al.

2012).

(23)

23

When the ensemble approach is used researchers are more confident that the prediction results obtained is reliable. The ensemble approach is of three kind, they are bagging, boosting and random subspace. Further research has proved that classification performance can be optimized by using met-ensemble approaches (Yoo et al. 2012).

AdaBoost or adaptive boosting which was invented by Schapire and Freund around 1997 is a boosting ensemble method. The simple idea behind AdaBoost was released in an abstract in the year 1995. In recent times, it has gained popularity due to its ability to provide quality classification performance. AdaBoost has been able to outperform other classification techniques like SVM, for this reason it is the most used ensemble method (Yoo et al. 2012).

A major attribute which allows it to perform superior classification is in the weighted majority voting. The logic it utilizes is that classifiers that give better results of classification during the repetitive training process usually possess a greater voting weight than the rest in the concluding classification decisions (Yoo et al. 2012).

2.4.2 Regression

Due to its versatility, it is a common statistical technique which is used for studying relationships amongst variables. It is a technique that allows the establishing of a relationship between a variable called the criterion and a set of variables called predictors (Aguiar-Pulido et al. 2013).

Regression has found applications in the prediction of the amount of biomass available in a forest when given remotely sensed microwave measurements, in determining the probability a patient will survive based on the diagnostic test results and can also predict demand for a new product by consumers in relation to the advertising expenditure (Fayyad et al. 1996)

2.4.3 Descriptive methods

Clustering is a descriptive task which deals with the observation of independent variables only. The main difference between clustering and classification is the absence of ‘class’.

As a result of this clustering is especially good in studies that are of an exploratory nature and if the studies happen to deal with a huge amount of data but little information is known about the data (Yoo et al. 2012).

(24)

24

The clustering technique involves grouping a set of objects into clusters, this way objects that are found in the same cluster are similar to each other than to those that are in the other clusters. These similarity between these objects can be measured by their attribute values(Yoo et al. 2012).

Clustering methods are very useful when exploring how samples are interrelated in an effort to make preliminary judgement of the sample structure (Kantardzic, 2011).

Clustering has found application in biology, identical plants and animal species were clustered using their characteristics in order to create a taxonomy. Several clustering algorithms have been introduced over the years, clustering algorithm are mainly categorized into two which are hierarchical and partitional (Yoo et al. 2012).

Hierarchical clustering algorithms works by merging two groups that are most similar based on the pairwise ga amidst the two groups of objects until a termination is met, this way the objects are hierarchically grouped. As a result, hierarchical algorithms can be categorized using the method for computing the similarity or distance amidst the two groups of objects. Hierarchical clustering algorithms are of 3 types complete-link, single- link and average-link (Yoo et al. 2012).

One of the benefits of using the hierarchical clustering techniques is the visualization capabilities it offers, it easily shows how objects in the data set are similar to another.

With the use of a dendrogram it is possible for researchers to guess the number of clusters.

This attribute sets hierarchical algorithm form the other clustering algorithms as they do not have this particular feature, especially in situations where the data does not contain extra information about the data. In cases when the data is too big to be presented using a dendrogram the visualization capability is very poor (Yoo et al. 2012).

A way of overcoming this challenge is to sample the data randomly to enable the users comprehend the extensive grouping of the data with the use of the dendrogram generated with the sampled data. A major imitation to this algorithm is that they are limited when applied to large data sets. For this reason, they are slower when compared with partitional clustering algorithms. Additionally, they also require a large system memory to calculate distances between objects and lastly the accuracy of the clusters gotten is of lower standard than the one from partitional clustering algorithm (Yoo et al. 2012).

(25)

25

Partitional clustering algorithm is different from hierarchical clustering algorithm because it needs a K (number of clusters) value to be inputted by the user. They are generally categorized based on how they pick a cluster centroid amidst items in a incomplete cluster, how they relocate objects and how similarities are measured between the cluster centroids and objects. For example, the most used partitional algorithm is the K-means, it selects randomly firstly the K centroids after which it decomposes objects into K disjoint groups by repeatedly moving objects based on similarities that exist among the objects and also the centroids. For K-means the cluster centroid is said to be the mean value of the objects present in the cluster(Yoo et al., 2012).

Partitional clustering algorithm has some advantages over the other methods as they have a better clustering accuracy in comparison to hierarchical clustering algorithms which is said to be the outcome of their global optimization strategy. Additionally, partitional algorithms can manage large data sets effectively better than hierarchical algorithms and can generate clusters faster (Yoo et al. 2012).

One of the limitations of these partitional algorithms is that the results generated from clustering are dependent on the initial cluster centroids to an extent due to the fact that the centroids are randomly selected, this means that the results of clustering gotten are a bit contrasting every occasion the partitional algorithm is employed (Yoo et al. 2012).

Summarization is technique involves the procedure for identifying a compact description for a subset of data. A typical example of this is during the tabulation of the mean and standard deviations for all fields. The other complicated methods would involve the derivation of summary rules, multivariate visualization techniques and the identification of functional relationships between variables. These techniques are often utilized in interactive exploratory data analysis and in generating automated report (Fayyad et al. 1996).

Association or association rules can be said to be one of the important techniques in data mining, it is said to be the most popular form of local pattern discovery in unsupervised learning. This technique greatly mirrors the thoughts of individuals when they are trying to comprehend the data mining process. This is because it deals with the mining for knowledge through diverse data base. In this scenario, the gold would be an

(26)

26

interesting rule which gives information about the database which is not known beforehand and comprehendible (Kantardzic, 2011).

This method searches for all outcomes and patterns in a database and the advantage of this technique is in the thorough nature in which it is done, on the other hand, this can also be regarded as a disadvantage as the researcher could easily be swarmed with so much new information and the analysis of the usability is complex and time consuming (Kantardzic, 2011).

A novel association rule algorithm named Apriori was introduced at IBM Almaden research centre by Agrawl and his colleagues. This model has gotten a lot of attention because it solves the previous issues associated with a paste association rule algorithm called KID3 by using the “Apriori property” which enables association mining to be applied to databases to extract association rules (Yoo et al. 2012).

Market basket analysis or association rule is usually used to discover the relationship or patterns made by a customer. For example, when a consumer purchases bread with butter.

It can be assumed the consumer is likely to buy milk as well. By obtaining this data, a grocery store can boost its sales. In healthcare, this method can be used to identify patterns and relationships among symptoms, diseases and health conditions. From this knowledge, researchers can provide evidence based information about symptoms and health issues which are risk factors to a disease (Yoo et al. 2012).

The Apriori algorithm generally requires input of two users which are support and confidence. This is due to the fact that most users interest are in the association rules that occur often in the database and have high accuracy. The support and confidence can be utilized to sort out association rules that are uninteresting. The Apriori property is the backbone of the algorithm. So, in cases where an item is not frequent, its descendant is not frequent then no association rules are generated. This particular feature greatly reduces the search for item sets that occur frequently and increases the efficiency of the algorithm (Yoo et al. 2012).

When compared to clustering and classification, accuracy is not evaluated in association mining because each association algorithm mines all association rules. Thus, efficiency is the only is the main evaluation factor and the main aim of association mining algorithms (Yoo et al. 2012).

(27)

27 3 DATA MINING AND EPILEPSY

3.1 Epilepsy health outcome using data mining

The use of data mining in healthcare research has risen due to its ability to recognize healthcare patterns and its use in disease hypothesis (Shah & Lipscombe 2015). The Incidence of chronic diseases such as heart disease, diabetes, kidney disorders is a major cause for concern. Different tests and control measures are usually involved which in turn lead to an increased cost in healthcare for individuals with chronic diseases (Shah &

Lipscombe 2015).

According to Yildrim and others (2014), there’s a significant level of unpredictability when it comes to diagnosing many diseases because physicians utilizes pattern recognition and experience to generate hypotheses which are based solely on a patient’s medical history, symptoms, physical checkup and some other tests. In reality, “clinical decisions are prone to error” especially in situations with lot of information. Gupta and others (2016) suggest that plans should be made to reduce overall cost of chronic disease diagnoses and test. In most cases, these chronic diseases over time lead to other various diseases.

Another option for epilepsy treatment is the use of surgery but for this to be carried out, it is important to identify epileptogenic zone, which is the region in which the seizures originate from. This region can be identified thorough pre-surgical analysis and the epileptogenic tissue can be removed to prevent or reduce the frequency of seizures without causing damage or vital neurophysiological deficit (Chaovalitwongse et al.

2007).

Although, respective surgery will not be done on half the number of pre-surgical patients because the epileptogenic zone is found in a functional brain tissue or a single epileptogenic zone is not identified. Moreover, the percentage of patients who no longer experience seizures after surgery is about 60-80% (Chaovalitwongse. et al. 2007).

Sudden unexpected death in epilepsy (SUDEP) is the most common cause of mortality in epilepsy patients (Johnston & Smith 2007; Arida et al. 2008; Tomson et al 2008; Wahab 2010). It is said to occur unexpectedly with or without seizure any evidence of seizure (Arida et al. 2008; Wahab 2010). It causes about 1-3 deaths per 1100 epileptic patients in a year. Mostly epilepsy patients that are male and between 20-24 years have a higher risk

(28)

28

of SUDEP especially if they are pharmacoresistant and have generalized seizures (Wahab 2010).

Epileptic seizures arise because of momentary electrical malfunction in the brain and in some scenarios, it is possible for the seizures to occur without notice. These seizures could also be confused with some other ailments like stroke because they show similar symptoms like falling and the presence of migraines. It is estimated that at least one in a total of 100 persons will at one point in their life experience a seizure. Sadly, epileptic seizures are unpredictable and very little is understood about the mechanism and process (Lasemidis et al. 2003).

The main feature of epilepsy is epileptic seizures and it affects all age ranges from children to the elderly. Seizures are as a result of alterations to the brain neurons leading to excessive synchronous neuronal discharges in the brain. The term “Epileptic seizure”

is used to distinguish between seizures caused by excessive electrical charges in the brain rather than a physiological event (Huang et al. 2015; Stafstrom & Carmant 2015).

The causes of epilepsy vary, but they all show a fundamental malfunction of the brain with specific seizure types and recognizable syndromes. Epilepsy as a neurological disorder is quite complex to treat and assess due to its association with factors like learning disorders, neurological issues, psychological and psychiatric problems especially amongst the elderly population (Huang et al. 2015; Stafstrom & Carmant 2015).

The new cases of epilepsy are approximately 50 per 100,000 population, this makes it one of the most prevalent neurological conditions. 1% percent of the population are inflicted with epilepsy while refractory epilepsy accounts for one-third of patients living with epilepsy. Developing brains are more susceptible to epilepsy because a huge percentage of epilepsy starts from childhood (Stafstrom & Carmant 2015).

The sudden occurrence of seizures without any prior notice and the shift from a normal brain to seizure onset is typically referred to as an unexpected phenomenon. The nature and unforeseen occurrence of these seizures causes stress for the patients, exposes them to the risk of serious injuries and has a big influence on the daily life of the patient.

Therefore, the ability to predict when these seizures will occur will go a long way in

(29)

29

improving the quality of life of patients and provide some insight into the treatment of epilepsy (Lasemidis et al. 2003; Wang et al. 2014).

The detection of epileptic seizures is an area of interest to many researchers in the field of neurosciences not just because of its potential in clinical decision support and therapeutic devices but because it could pave the way to further understanding the hidden mechanisms involved in epilepsy (Lehnertz & Litt 2005).

3.1 Data mining and machine learning

The term data mining was coined in the 1980s, while the phrase machine learning has been in use from the 1960s. At the moment, the use of the word “data mining” is more well used than machine learning, this may be the reason why researchers refer to their work as data mining rather than machine learning (Buczak & Guven, 2016).

Data mining and machine intelligence is currently gaining so much attention as a research area and share similar fields such as artificial intelligence, database and statistics which are used in finding new information and patterns in big data. Data mining methods involves the use of machine learning techniques and scientific calculations (Guruvayur &

Suchithra, 2017).

A major difference in data mining is that it distinguishes itself from some statistical methodologies and data investigation because it makes use of logical systems which include pattern matching, numerical investigation and artificial intelligence like machine learning and genetic algorithms and neural systems (Guruvayur & Suchithra, 2017).

Data mining and machine learning are two overlapping research areas that are both occasionally confused because they often use the same methods. Arthur Samuel, who is the pioneer of machine learning defines it as a “field of study that gives computers the ability to learn without being explicitly programmed.” The focus of machine learning is on classification and prediction, which is based on already existing properties learned from the training data (Buczak & Guven, 2016).

The algorithms used in machine learning need a goal (problem formulation) from the domain (e.g dependent variable to predict). Data mining on the other hand focuses on the discovery of information that were previously unknown and it does not require specific

(30)

30

goals from the domain but dwells on discovering new knowledge (Buczak & Guven, 2016).

Machine learning typically involves two phases which are training and testing, usually in machine learning the following steps are followed:

• Identifying class attributes and classes from training data

• Identifying a subset of the attributes necessary for classification

• Learn the model using training data

• Use the trained model to classify the unknown data

Machine learning methods should include three phases and not two, training, validation and testing. Data mining and machine learning usually have parameters like number of layers and nodes for an ANN. When training is complete, several models exist. To decide on which to use that will give an accurate estimation of the error it will achieve on a test set, in this case, the model that has the best performance on the validation data should be employed and not fined tuned depending on its accuracy on the test data set. If not, this means the accuracy recorded is optimistic and necessarily does not show the accuracy that would be gotten on a different test similar to but a bit different from the existing test set (Buczak & Guven, 2016).

Machine learning algorithms are usually divided into supervised and unsupervised learning. In supervised learning the algorithms are trained on the illustrations called labeled cases where the inputs are equipped with the intended result known already (Guruvayur & Suchithra, 2017).

Unsupervised learning involves induction of a function to induct to depict concealed structure from “unlabeled data “. Since the unlabeled examples are given to the learner there is no assessment of the correctness of the structure that is resulted by the applicable algorithm. Which is considered as one way of recognizing unsupervised learning from other two learning method (Guruvayur & Suchithra, 2017).

3.2 Classification of seizure types

There are different ways in which epilepsy can be classified and this depends on the part of the brain the seizure originates from or the cause of the seizure or the observation of the patients during the episode and the reason behind the first seizure. Seizures can be

(31)

31

classified in numerous ways but they are mostly categorized as generalized, partial or unknown (Rijo et al. 2014).

The International league against epilepsy (ILAE) in 2017 released an updated version in the classification of seizure types. This was motivated by several factors, an example which is tonic seizures or epileptic spasms which could either have a generalized or focal onset. Also, poor knowledge about specific seizures made them difficult to classify and some terms used in seizure classification where ambiguous and in the old classification some seizure types were not included.(Fisher et al. 2017)

The first step in the classification of seizures deals with the examination of some symptoms usually associated with epilepsy. It is unrealistic to match a specific sign and symptoms to a particular seizure type, this is because these symptoms occur in more than one seizure type (Fisher et al. 2017)

On the other hand, a seizure type can be associated to several symptoms. Generally, seizures can be distinguished by identifying a sequence of attributes and some clinical features. For example, absence of seizures shows the rapid recovery of functions than a focal impaired awareness seizure. In other cases, information gotten from an electroencephalography (EEG) like imaging or laboratory studies is utilized in the classification of seizures (Fisher et al., 2017)

FIGURE 7: ILAE classification of Epilepsy (Fisher et al. 2017)

(32)

32

3.3 Detection of Epilepsy Using Data Mining Methods

In the USA, epilepsy is considered as the second most popular neurological condition and is known to affect individuals over the age of 65 more, with over one percent of Americans experiencing epilepsy every year. The ability to detect epilepsy from EEG signals is vital in treating epilepsy and in the prediction of seizures. The different seizure types usually produce distinct spikes on the EEG signals gotten from the brain (Sanei &

Chambers 2007). Lot of efforts have gone into detecting epilepsy spikes from EEG signals (Haydari et al. 2011; Schuyler et al. 2007)

EEG signals were first reported by German neuropsychiatrist Hans Berger in the early 1920s, these signals keep record of the electrical activities in the brain and they have primarily become a good alternative in the diagnosis of neurological disorders. Through the analysis of these EEG recordings some vital information about the physiological nature of the brain can be gotten and these could prove important in the detection of epilepsy, because the occurrence of seizures show obvious abnormalities in the EEG signals. Therefore, an impending seizure attack could be managed properly to avoid any inherent danger to the individual by initiating a warning signal (Zainuddin et al. 2013).

The automated detection of seizures is still a challenge because false positives are generated and due to this, the accuracy of automated EEGs are not trusted in clinical settings. Time series model are better suited due to the nature of waveform data gotten from EEG. Due to the fact the data gotten is shown as continuous wave forms, this allows specialists to be able to analyze, detect seizures and also identify other brain activities from the EEG readings, the data gotten by the machine is usually numerous electrical discrete readings that is measured in millivolts (mV) (Turner et al. 2014).

The diagnosis of epilepsy is made when an individual suffers an epileptic seizure and they have a condition that puts them at risk of encountering another one. Usually, the electrical information and activities generated by the cerebral nerve cortex of the brain is usually recorded by an electroencephalogram (EEG). These EEG signals are regarded as one of the vital physiological signals that can be used in the detection of epilepsy (Sadati et al.

2006). Although, the visual inspection of recordings gotten from EEG for epileptic related features is tedious and time consuming, and bio-signals may vary and show disagreements regarding the same condition. (Sadati et al. 2006; Mohamed et al. 2012).

(33)

33

Electroencephalography (EEG) is a popular technique that is used in the detection of epileptic seizure. It works by detecting electrical impulses generated by the brain neurons and detected by simple electrodes attached to the scalp (Wang & Zhang 2009). By placing these electrodes on the scalp, the brain activity is recorded with emphasis on the abnormal differences in the voltage impulses produced by the brain (Niedermeyer et al. 2005). EEG can provide valuable insight into the problems associated with the brain and provide details about the brain activity. The seizure free periods in epileptic patients has been considered as a vital part in the in the prediction and diagnosis process of epilepsy ( Alkan et al 2005; Adeli et al. 2009;Oğulata et al. 2009;Abualsaud et al. 2014;).

The analysis of epileptic seizure is based on the visual identification gotten from EEG signals and is performed by experienced neurologists who look for patterns of interest like spikes or spike wave discharges. This process of analysis is time consuming and requires expertise and often times cause disagreements between neuroscientist because the signal analysis is subjective.(Mohseni et al. 2006; Mporas et al., 2014)

Epileptic seizures could either affect a part of the brain (partial seizure) these are seen in few channels of the EEG recording or the entire brain which is seen in all channels of an EEG recording (Gotman 1999). It is the daily practice of neurologist to examine short recordings of interictal periods. The individual or isolated spikes, the sharp wave and spike-and-wave complex are the most common interictal periods. These periods are usually seen in most people with epilepsy and this makes the detection of interictal event important in diagnosing epilepsy. Although, during an isolated spike the brain is not in clinical seizure. During ictal period different EEG patterns is seen which consists of waveforms of different frequencies. Though interictal findings provide proof of epilepsy , diagnosing epilepsy is based on the observation of epileptic seizures (Gotman 1999;

McGrogan 1999;Tzallas, Tsipouras and Fotiadis, 2009)

According to Siddiqui & Islam (2016) signals from electrocorticogram (ECoG) are used in the monitoring of brain signals, the process by which EEG electrodes are attached to the scalp of the brain in an invasive manner is termed as ECog. These signals gotten are time series in nature, they are of importance in seizure detection and intervention. These time series data are surveillance made, with the sequence of time. The main feature of

(34)

34

these time series data is their high dimensionality and size which is calculated with respect to time (Fu 2011;Siddiqui & Islam 2016).

The presence of interictal spikes is generally accepted as a sign of epilepsy, although the reasons of the presence of interictal activity in the brain are unknown (Staley & Dudek, 2006). Apart from the usual activity recorded in the brain during epileptic seizures, the EEG of epileptic patients will show ‘spikes’ in some locations in the brain. These spikes provide information in the localization of epilepsy and in its diagnosis (Valenti et al.

2006).

When preparing an epileptic patient for epilepsy surgery, long periods of video/EEG recording of both the interictal and ictal periods of epilepsy are analyzed to decide the localization of the epileptogenic zone. The method to automatically detect interictal spikes has been used for many years to improve the visual analysis of large numbers of data. Different attempts have been made to determine a proper spike detection method but they have all come short due to the lack of characterization of the events to detect (Valenti et al. 2006).

Due to the recent interest and promise shown by data mining methods, data mining computational models can be used to extract recently unknown and useful information from large databases. The main rationale behind these is the methods is in the discovery of patterns that can be found in large databases which are hidden at first glance due to large amount of data stored ( Mitchell 1997; Flexer 2000;Valenti et al. 2006).

The two important steps in the automated detection of epileptic EEG involves the feature extraction method and the classification method. The features used in the automatic detection of EEG fall into four main categories which are time domain, frequency domain, time-frequency domain, and nonlinear domain (Thakor et al. 2004; Wang et al. 2014)

(35)

35 4 AIMS OF THE STUDY

The aim of this study is to explore the use of different data mining methods for epilepsy seizure detection and improved clinical decision support.

The objectives of this scoping review are:

1) To identify the use of data mining methods in the analysis of data on epilepsy.

2) To highlight the data mining methods used in the detection of epilepsy.

3) To understand the role of seizure detection in improved clinical decision support