Detection of medications associated with Alzheimer's disease using ensemble methods and cooperative game theory

(1)

UEF//eRepository

DSpace https://erepo.uef.fi

Rinnakkaistallenteet Terveystieteiden tiedekunta

2020

Detection of medications associated with Alzheimer's disease using

ensemble methods and cooperative game theory

Braithwaite, B

Elsevier BV

Tieteelliset aikakauslehtiartikkelit

© 2020 The Authors

CC BY-NC-ND https://creativecommons.org/licenses/by-nc-nd/4.0/

http://dx.doi.org/10.1016/j.ijmedinf.2020.104142

https://erepo.uef.fi/handle/123456789/8191

Downloaded from University of Eastern Finland's eRepository

(2)

Contents lists available atScienceDirect

International Journal of Medical Informatics

journal homepage:www.elsevier.com/locate/ijmedinf

Detection of medications associated with Alzheimer's disease using ensemble methods and cooperative game theory

B. Braithwaite

^a,

*, J. Paananen

^b

, H. Taipale

^a,c,d,e

, A. Tanskanen

^d,e

, J. Tiihonen

^d,e

, S. Hartikainen

^a,c

, A-M. Tolppanen

^a

aSchool of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio, Finland

bInstitute of Biomedicine, Univeristy of Eastern Finland, Finland

cKuopio Research Centre of Geriatric Care, University of Eastern Finland, Kuopio, Finland

dDepartment of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden

eDepartment of Forensic Psychiatry, Niuvanniemi Hospital, University of Eastern Finland, Kuopio, Finland

A R T I C L E I N F O Keywords:

Pattern recognition Knowledge organization Feature attribution Cooperative game theory Shapley value Gradient boosting Alzheimer's disease Nested case-control study Pharmacoepidemiology

A B S T R A C T

Objective:To study the feasibility of evaluating feature importance with Shapley Values and ensemble methods in the context of pharmacoepidemiology and medication safety.

Methods:We detected medications associated with Alzheimer's disease (AD) by examining the additive feature attribution with combined approach of Gradient Boosting and Shapley Values in the Medication use and Alzheimer's disease (MEDALZ) study, a nested case-control study of 70,719 verified AD cases in Finland. Our methodological approach is to do binary classification using Gradient boosting (an ensemble of weak classifiers) in a supervised learning manner. Then we apply Shapley Values (from cooperative game theory) to analyze how feature combinations affect the classification result. Medication use with a five to one year time-window before AD diagnosis was ascertained from Prescription register.

Results:Antipsychotics with low or medium dose, antidepressants with medium to high dose, and cardiovascular medications with medium to high dose were identified as the contributing features for separating cases with AD from controls. Medium to high amount of irregularity in the purchase pattern were an indicating feature for separating AD cases from controls. The similarity of medication purchases between AD cases and controls made the feature evaluation challenging.

Conclusions:The combined approach of Gradient Boosting and feature evaluation with Shapley Values identified features that were consistent with findings from previous hypothesis-driven studies. Additionally, the results from the additive feature attribution identified new candidates for future studies on AD risk factors. Our approach also shows promise for studies based on observational studies, where feature identification and interactions in populations are of interest; and the applicability of using Shapley Values for evaluating feature relevance in pattern recognition tasks.

1. Introduction

Observational studies on harms and benefits of treatments have mainly utilized traditional frequentist methods which are ad hoc with respect to the tested hypothesis. In this work, we propose a data-driven approach to do hypothesis-testing or generation in the context of identifying risk-modifying treatments using a pattern recognition approach.

In pattern recognition (PR), the goal is to build a discriminant or generative model for recognizing patterns using appropriate features [1]. Appropriateness in PR means that the features should offer discriminatory power between patterns[2]. Using inappropriate features

leads to poor PR models and thus to poor classification performances.

This is not only a practical point of building an accurate and robust model, but also from a computational stand-point. The number of features directly translates into the number of parameters the model has to evaluate.

Combining appropriate features for a model is coinedfeature selec- tion. There are three main approaches for (automated) feature selection, where features are combined in different ways to improve the model's learning performance. These three approaches arewrapper,filter and embedded[3]. The filter approach does feature pruning, such that only a subset of features are used for the learning algorithm and the rest is ignored. The wrapper approach on-the-other-hand searches features by

https://doi.org/10.1016/j.ijmedinf.2020.104142

Received 28 June 2019; Received in revised form 22 November 2019; Accepted 5 April 2020

⁎Corresponding author.

International Journal of Medical Informatics 141 (2020) 104142

T

(3)

embedding the learning model onto a feature subset which are gener- ated and evaluated during the learning algorithm. Filtering and wrap- pers can be combined into a embedded approach, which searches op- timal feature subsets which are built-in into the learning algorithm, and then searches the union of feature subsets the model hypothesis.

However, these approaches have the drawback that the features are only evaluated in some fixed order to form the discriminatory rule.

Perhaps a more severe drawback is that these approaches mainly aim to make the model more computationally tractable, which might not correspond to real world events. Recently, methods for explaining model performance, by inspecting which features contribute to the model output, have been developed. These methods have been coined asfeature attribution. There are three prominent approaches to feature attribution: evaluating local influences of features [4], recursively evaluating activation functions (neural networks only) [5], and a cooperative game theoretic approach[6]. In this work, we will approach evaluating feature importance using the framework proposed by Lundberg and Lee[6].

1.1. Aim of study

We investigate the feasibility of applying feature attribution to pharmacoepidemiology with the following binary classification problem: given a person's medication use history, can we predict whether the person is diagnosed with AD within the near future? Specifically, the features were medications used within a one-to-five year time window before AD diagnosis, because previous research has shown that

there is an increase in the use of several medications within this timeframe, such as increased initiation of psychotropics[7–9]. This provides us an opportunity to validate our results against previous findings. We propose an exploratory data analysis approach using PR models and methods for extracting explanations from the classification results. We use Gradient Boosting as our supervised learning model, and evaluate feature attributions by using Shapley values from cooperative game theory. With this approach, we extract and compare differences in medication patterns between persons with and without AD.

2. Materials and methods 2.1. Material: MEDALZ cohort study

The Medication use and Alzheimer's disease (MEDALZ)[10]study consists of all residents in Finland who received a clinically verified AD diagnosis during 2005–2011, containing a sample size of 70,719 par- ticipants. These persons were identified from the Special Reimburse- ment register, maintained by the Social Insurance Institution of Finland.

All AD cases were confirmed by medical examination, including CT/

MRI scan, excluding alternative diagnoses and confirmation of AD by a geriatrician or a neurologist. This study utilised data from the prescription register (1995–2011). A more detailed description of the register data can be found in Tolppanen et al.[10]. Data on purchased medication was extracted from the Prescription register and trans- formed into medication use periods using a validated PRE2DUP method [11–13]. PRE2DUP models the exposure periods of purchased medication by constructing exposure periods and estimates the average dosage during the exposure period by considering the number of purchases in defined daily doses (DDD), which were recorded in the prescription register. This gives a chronological use history of the used medication.

2.2. Methods

2.2.1. Preprocessing material

We took a 1:1 ratio of AD and non-AD (that is, controls (CTRL)) persons for our classification and feature evaluation tasks in order to reduce the bias caused by class over-representation[14]. The controls were matched for age ( ± one year), sex and region of residence. After we have extracted the closest matching AD and CTRL persons from the register-data, we extracted all data on medication use periods from the register data within a specified time-window. We defined our time- window such that, given an AD diagnosis datemandnyears prior to the AD diagnosis date, the time-window is given as a [m−n,m− 1] time- window. We selected to use a five-year period for extracting all data on medication use but excluded the year preceding the AD diagnoses for Fig. 1.Measuring the 10-fold cross-validated GB model using AUC metric.

Fig. 2.Feature attribution scores of irregularity (σDDD), average DDD (μDDD), exposure period (Exp) and ATC classification features, according to their SHAP values. Densities in the plots correspond to each observation in the data set. The plot illustrates the order of feature relevance and combined feature attributions to the models predictive performance. Stacked points in the plot means multiple equally relevant feature observations have been made.

B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142

(4)

extracting all data on medication use. We chose to exclude the preceding year from the AD diagnosis date, is because an ongoing diag- nostic process leads to increased contact with doctors, affecting

medication prescription. So we are using a four-year time-window in analyzing feature attributions for AD classification. After we have extracted the use periods, we applied the PRE2DUP modelling for the Fig. 3.Higher order interactions between each medication from categories A–H of the ATC-classification and its exposure period (Exp) in days, with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output. (categories J–V inFig. 3).

(5)

given time-window.

We approach the problem of identifying medication use associated with AD as a binary classification problem. We apply a supervised

learning scheme, using Gradient boosting (GB) [15,16], which is an ensemble method where a collection of weak tree classifiers are combined to create a decision boundary between classes.

Fig. 4.Higher order interactions between each medication from categories J–V of the ATC-classification and its exposure period (Exp) in days, with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output.

(6)

2.2.2. SHAP values

To explain the model behaviour given a set of features, Lundberg and Lee[6]presented an approach using Shapley values[17]and local

explanations, denoted as SHapley Additive exPlanation (SHAP) values [18](see Appendix A). The evaluation of feature attributions is done by evaluating the conditional expectation of the original model's output Fig. 5.Higher order interactions between each medication from categories A–H of the ATC-classification and its average daily doseμDDD with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output. (categories J–V inFig. 6).

(7)

and the simplified input features. It is shown by Lundberg and Lee[6]

that evaluating the conditional expectation of a model and its inputs using Shapley values is the only consistent approach for evaluating feature attribution. Lundberg et al.[18]proposes a recursive program

for computing the SHAP values in (TLD )² time in tree ensembles, whereTis the number of trees,Lis the maximum number of leaves in tree, andDis the maximum depth of any tree.

Fig. 6.Higher order interactions between each medication from categories J–V of the ATC-classification and its average daily doseμDDD with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output.

(8)

2.3. Data analysis

We approached evaluating medication use in persons with AD with a two-step approach:

1. First, we selected the following medication use-based features for investigating which features are most relevant according to their feature attribution value to the GB model: the Exposure (in days) of a given medication, the average daily dose (μDDD) of a given medication, the regularity of the medication (σDDD) within the Fig. 7.Higher order interactions between each medication from categories A–H of the ATC-classification and its purchase regularityσDDD with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output. (categories J–V inFig. 8).

(9)

persons purchase history, and the occurrence of the medication itself where the medication is given by its Anatomical Therapeutic Chemical (ATC) classification. The regularityσDDD is the dose-wise temporal, relative variation of μDDD if the medication appears

frequently enough in a time-window. (See Tanskanen et al.[11]for details.) The medications were grouped with respect to the first three characters in the ATC code, except for medications for nervous system, which were grouped with respect to the first four characters.

Fig. 8.Higher order interactions between each medication from categories J–V of the ATC-classification and its purchase regularityσDDD with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output.

(10)

(see Appendix B).

2. After we have identified which feature attributed the most to the GB model, we further investigated the medication's features attribution to detect medication patterns.

2.3.1. Analysis – step 1

For the first part of the analysis we selected the exposure periods, μDDD,σDDD and the occurrence of the medication, occurring within our specified time-window, as features. The occurrence of the medication were categorical, so each medication is mapped to a numerical value between [1 …M], whereMdenotes the total amount of unique medications occurring within the time-window. We split the data, such that 85% of the data is used as training data and 15% of the data as test

data. The hyperparameters such as learning rate, number of estimators and maximal depth of each tree classifier were selected by doing a randomized hyperparameter optimization with 10-fold cross-validation. For the objective function we used a binary logistic objective function. After the hyperparameter optimization, we proceeded to train the model using (also) a 10-fold cross-validation. From the learned data, we applied the model to the test set and retrieved the most relevant features using the SHAP values.

2.3.2. Analysis – step 2

We construct a binary feature matrix using one-hot-encoding of all medications appearing in our specified time-window. With this approach, we will obtain the SHAP values for each individual medication.

Fig. 9.Feature attribution according to their SHAP values. Densities in the plots correspond to each data point in the sample. The plot illustrates the combined feature attributions to the model's predictive performance. Blue value indicates the absence of the medication and red indicates the presence of the medication. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

(11)

By obtaining individual feature information, we can, for example, investigate how the particular medication interacts with age. The model constructed and trained using the same approach as in Step 1.

For the experiments, we used the Scikit-learn [19], Shap¹ and XGBoost²modules for the Python language.

3. Results

3.1. Results from step 1.

Fig. 1. shows our model's performance on classifying AD cases from controls, measured with Area Under the Curve (AUC) metric. Ex- planations of feature importance, from the most impactful to the to the least impactful, is illustrated inFig. 2. Each density point shows the SHAP value of a corresponding (single) observation and the colors corresponds to their respective feature value. Positive (negative) SHAP

values corresponds to pushing the models expected output towards AD (CTRL) classification. Irregularity (σDDD) and average DDD (μDDD) are shown to be the important feature for discriminating between AD cases from controls.

Each individual medication's associations to other features and their impact to the model's performance are depicted inFigs. 3and4. (exposure),5and6. (μDDD) and7and8. (σDDD). These figures illustrate the higher-order interactions of the feature's SHAP values, where the interaction is determined by the difference between the model's output and the model's expected output. That is, how the combination of features are affected across the data set. For example,Figs. 3 and4 shows most observed medication use with short exposures do not contribute much to the models classification accuracy. On-the-other- hand medication use with higher exposures contribute either positively or negatively to the classification accuracy.

3.2. Results from step 2

Fig. 9. shows a generalized summary of the 20 most used features, with which the GB model was able to discriminate AD cases from controls. Antidepressants (N06A) is ranked as the most significant feature, showing that the increased use of antidepressants is a predictive feature for AD. Other relevant features are the medications used for treating cardiovascular disease (ATC C-class medications) and antipsychotics (N05A). These features are highligthed to illustrate the validity of our approach. For example, Koponen et al.[8]observed that the use of antipsychotics started to increase two to three years before AD diagnosis, while Taipale et al.[7]observed that majority of persons with AD were using low or medium dose of antipsychotics. Ad- ditionally, Puranen et al. [9] showed that the incidence of antidepressants among persons with AD is more common compared to persons without AD already before the AD diagnosis. All these findings are based on the same data as this study. As for the cardiovascular disease medication, Deckers et al.[20]suggests an association between coronary heart disease (and their medication) and risk for cognitive impairment or dementia, and studies from Wolters et al. [21] and Bruijn and Ikram[22]report cardiovascular risk factors for dementia and AD. We have also demonstrated that the use of several cardiovascular drugs (includingβ-blockers, statins, diuretics and nitrates which were identified in this work) was more common in persons with AD before AD diagnosis[23].

Figs. 10–13show the interaction between age and antidepressants (N06A), antipsychotics (N05A), calcium channel blockers (C08) and lipid-modifying agents (C10) respectively with respect to their SHAP values. For antidepressants (Fig. 10.) and antipsychotics (Fig. 11.) we can see how the majority of the medication users push the model to classify a case as AD, while those who do not use these medications do not affect the model's behaviour. The same applies to calcium channel blockers (Fig. 12). As for the lipid-modifying agents (Fig. 13.), there is more variation for the control group when compared toFigs. 10–12. From a classification point-of-view, this means that it is difficult to

“decide” to which class an observation belongs to, when there is no clear distinction between observed classes.

4. Discussion and conclusion

Our results demonstrate the applicability of PR and game theory in pharmaepidemiological research. To our knowledge, PR methods have been utilized very little in this context, which gives our work some novelty value for organizing knowledge from a data set and doing exploratory analysis. Our two-phased approach consists of a PR model and a method for extracting explainability from the model. This general approach for exploratory data analysis can easily be modified by switching the specific methods or models applied in each phase. The challenge of the proposed approach is that the choice of the models and methods in each phase is crucial (and dependent on the analysis Fig. 10.The interaction between antidepressants and age with respect to their

SHAP values. The plot illustrates how age is associated with the use of antidepressants across the data set. Blue value indicates the absence of the medication and red indicates the presence of the medication. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 11.The interaction between antipsychotics and age with respect to their SHAP values. The plot illustrates how age is associated with the use of antipsychotics across the data set. Blue value indicates the absence of the medication and red indicates the presence of the medication. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

1https://github.com/slundberg/shap.

2https://github.com/dmlc/xgboost.

(12)

problem). The same applies to deciding which features to use in the PR model, and setting up hyperparameters to a selected PR model.

However, these common analytical concerns are not unique to our (specific) approach, and solving them is always, to some extent, a matter of trial-and-error.

Although, there have been applications of PR in the context of predicting AD based on data from different (functional) imaging mod- alities, such as Single Photon Emission Computed Tomography and positron emission tomography[24], electroencephalogram[25], and magnetic resonance imaging[26,27]. There are also examples where PR and machine learning has been used, for example, extracting potential adverse drug events from medical case reports [28], clinical records[29]and social media[30].

Because of the similar medication history between AD and CTRL groups, our model was not able to draw a clear distinction between the two groups. This could be observed, for example from the relative exposure, average daily dose (μDDD) and irregularity (σDDD) of the

medication use (Appendix C). This is also illustrated from the summary of purchased medication from the specified time-window between AD and CTRL groups (Appendix D). Although we were not able to build an accurate model, for the purpose of knowledge organization, we did identify individual features and feature combinations related to higher likelihood of being diagnosed with AD (from the used time-window).

These findings were consistent with the previous literature, which de- monstrates the validity of our approach. Thus, our proposed approach using PR and evaluating feature effects and relations with game theory offers a feasible way of exploring a data set and searching for candidates for (as an example) future pharmacoepidemiological studies with more detailed exposure assessment and confounder control.

Author's contributions

Problem formulation done by B. Braithwaite, A.M. Tolppanen and S.

Hartikainen. PRE2DUP modelling done by A. Tanskanen. Data analysis done by B. Braithwaite. B. Braithwaite drafted the first version of the manuscript. All authors authors contributed to interpretation of results and revised the manuscript for intellectual content.

Statement of conflict of interest

B. Braithwaite, A.M. Tolppanen and S. Hartikainen have no con- flicting interests. H. Taipale, J. Tiihonen and A. Tanskanen have par- ticipated in research projects funded by Janssen and Eli Lilly with grants paid to the institution where they were employed. J. Tiihonen reports personal fees from the Finnish Medicines Agency (Fimea), European Medicines Agency (EMA), Eli Lilly, Janssen-Cilag, Lundbeck, and Otsuka; and has received grants from the Stanley Foundation and Sigrid Jusélius Foundation. J. Paananen is a shareholder and employee in Afekta Technologies Ltd and Genomi Solutions Ltd, and employee of Blueprint Genetics Ltd.

Summary points What is known?

•

Hypothesis-testing and hypothesis-generation in observational studies are based on ad hoc and frequentist methods.

•

Persons with AD use antidepressants, antipsychotics and cardiovascular medications more commonly before AD diagnosis compared to persons without AD.

What the study has added?

•

A data-driven approach to explatory data analysis, using pattern recognition and cooperative game theory, can provide additional insights in observational studies.

•

A strategy for organizing knowledge in observational studies for conducting either hypothesis-testing or hypothesis-generation.

•

Irregularity of medication use was predictive of AD.

Acknowledgements

This work was supported by is funded by Academy of Finland (grants numbers 307232 and 295334 to A.M. Tolppanen, which also paid B. Braithwaite's salary) and strategic funding from the University of Eastern Finland (to AMT). The funders had no role in study design; in the collection, analysis, and interpretation of data, in the writing of the report; and in the decision to submit the paper for publication.

Fig. 12.The interaction between calcium channel blockers and age with respect to their SHAP values. The plot illustrates how age is associated with the use of calcium channel blockers across the data set. Blue value indicates the absence of the medication and red indicates the presence of the medication.

(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 13.The interaction between lipid modifying agents and age with respect to their SHAP values. The plot illustrates how age is associated with the use of lipid modifying agents across the data set. Blue value indicates the absence of the medication and red indicates the presence of the medication. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

(13)

Appendix A

Interpreting model behaviour falls into a class coinedadditive feature attribution methods, which aims to explain models with real-valued output attributed to each feature. Lundberg et al. [6,18]recently proposed an approach, where feature attributions are evaluated individually and in combination using Shapley values[17], a concept from cooperative game theory. Cooperative game theory is a branch of game theory, which is an approach to analyze how players in the game form cooperative teams (or coalitions), and how the actions of these teams result in some expected payoff (or utility). The expected payoff is measured using acharacteric function(that is, a metric). In our work, the Shapley value is used to evaluate how feature combinations (that is coalitions) affect the predicted outcome of our GB model. That is, we treat our GB model as the game setting, the model's features as our players, Area Under the Curve (AUC) as our characteristic function, and the predicted output of our GB model as the common interest. The common interest here is the classification of AD or CTRL from a series of medication use observations. Features are evaluated in permutations by evaluating their expected output, and the best coalition of features, which produces the highest contribution to the common interest, is selected.

A.1 Evaluating feature attribution

The additive feature attribution method can be briefly summarized as follows. Given a linear modelgwith binary variablesz∈{0, 1}^M, the attribution of features is defined as

= +

=

g z( ) z,

i M

i i 0

0 (1)

whereMis the number of features and i is the feature attribution value. The variablezdenotes either the absence or presence of a feature.

Lundberg and Lee[6]described that an important property of the additive feature attribution is that there exists a (single) unique solution with the following properties:

(2)Local accuracy, that is, when a modelf(which is the characteristic function) approximates for a specific inputx, local accuracy must (at least) match the output offfor a simplified inputx′ (“interpretable input”), corresponding to the original input. More formally,

= = +

=

f x( ) g x( ) x.

i M

i i 0

1 (2)

The explanationg(x′) matchesf(x) whenx′ is correctly converted and projected back to the original input space ofx, denoted asx=h_x(x′). Here ϕ₀=f(hx(0)) represents the output offwhere allx′ are missing, where0is the vector with 0 entries.

(3)Missing data, that is, if in a simplified inputx′ a feature is missing, then the original input has no effect on the model:

= =

x_i 0 _i 0. (3)

(4)Consistency, that is, if a simplified input's contribution to the model's accuracy increases or does not change, regardless of other inputs, then that input's contribution should not decrease the model's accuracy. More formally, iffx(x′) =f(hx(x′)) andx′ = 0 is denoted asx₀, then for two models fandkwe have the feature attribution relationϕ_i(k,x) ≥ϕ_i(f,x)∀i= 1 …M, if

k x_x( ) k x_x( )₀ f x_x( ) f x_x( ),₀ x {0, 1} .^M (4)

In order for a model to satisfy local accuracy, missing data and consistency properties, Lundberg and Lee[6]proposes the following additive feature attribution measure:

=

f x z M z

M f z f z

( , ) | |!( | | 1)!

! [ ( ) ( )],

i

z x

x x 0

(5) where |·| denotes the number of non-zero entries,z₀= 0 wherex′ = 0, andz⊆x′ denotes allzvectors where non-zero entries are a subset of the non- zero entries inx′. Eq.(5)is equivalent to the Shapley values[31,17]and is used to compute the SHAP values (Section2.2.2).

Appendix B

Table 1

Anatomical Therapeutic Chemical (ATC) classification and names of medications from A to H.

ATC classification

A01 Stomatological preparations

A02 Drugs for acid related disorders

A03 Drugs for functional gastrointestinal disorders

A04 Antiemetics and antinauseants

A05 Bile and liver therapy

A06 Drugs for constipation

A07 Antidiarrheals, intenstinal anti-inflammatory/anti-infective agents A08 Antiobesity preparations, excl. diet products

(14)

Appendix C

Table 2

Anatomical Therapeutic Chemical (ATC) classification and names of medications from J J to V.

ATC classification

J01 Antibacterials for systemic use

J02 Antimycotics for systemic use

J04 Antimycobacterials

J05 Antivirals for systemic use

L02 Endocrine therapy

L03 Immunostimulants

L04 Immunosuppressants

M01 Anti-inflammatory and antirheumatic products

M03 Muscle relaxants

M04 Antigout preparations

M05 Drugs for treatment of bone disease

M09 Other drugs for disorders of the musculo-skeletal system

N02A Opioids

N02B Other analgesics and antipyretics

N02C Antimigraine preparations

N03A Antiepileptics

N04A Anticholinergic agents

N04B Dopaminergic agents

N05A Antipsychotics

N05B Anioxolytics

N05C Hypnotics and sedatives

N06A Antidepressants

N06B Psychostimulants, agents used for ADHD and nootropics N06C Psycholeptics and psychoanaleptics in combination

N07A Parasympathomimetics

N07B Drugs used in addictive disorders

N07C Antivertigo preparations

N07X Other nervous system drugs

P01 Antiprotozoals

R01 Nasal preparations

R03 Drugs for obstructive airway diseases

R05 Cough and cold preparations

R06 Antihistamines for systemic use

S01 Ophthalmologicals

V03 All other therapeutic products

Table 1(continued) ATC classification

A09 Digestives, incl. enzymes

A10 Insulins and analogues

A11 Vitamins

A12 Calcium

A14 Anabolic agents for systemic use

B01 Antithrombotic agents

B02 Antihemorrhagics

B03 Antianemic preparations

C01 Cardiac therapy

C02 Antihypertensives

C03 Diuretics

C04 Peripheral vasodilators

C05 Vasoprotectives

C07 Beta blocking agents

C08 Calcium channel blockers

C09 Agents acting on the renin-angiotensin system

C10 Lipid modifying agents

G01 Gynecological anti-infectives and antiseptics

G02 Other gynecologicals

G03 Sex hormones and modulators of the genital system

G04 Urologicals

H01 Pituitary and hypothalamic hormones and analogues

H02 Corticosteroids for systemic use

H03 Thyroid therapy

H04 Pancreatic hormones

H05 Calcium homeostasis

(15)

Fig. 14.A cumulative distribution of exposure periods of the top 20 medication detected by GB.

(16)

Fig. 15.A cumulative distribution of average daily dose of the top 20 medication detected by GB.

(17)

Appendix D Table 3

Fig. 16.A cumulative distribution of the medication's purchase irregularity of the top 20 medication detected by GB.

Table 3

A summary of purchased medication from the specified time-window between persons with AD and without AD (CTRL).

Medication use summary between AD and CTRL groups

ATC AD CTRL

#Persons #Persons(%) #Persons #Persons(%)

N06A 16,642 62 10,394 38

C09 28,543 49 29,498 51

J01 40,093 50 39,889 50

M01 38,610 50 38,240 50

A02 23,619 50 23,884 50

C10 27,840 51 26,357 49

C03 24,556 49 24,710 51

C01 27,673 52 25,502 48

C07 36,124 51 35,200 49

B01 19,628 53 17,655 47

G04 14,711 55 12,308 45

(continued on next page)

(18)

References

[1] F. Keinosuke, Statistical Pattern Recognition, Academic Press, 1990.

[2] B.L. Avrim, P. Langley, Selection of relevant features and examples in machine learning, Artif. Intell. 97 (1997) 245–271.

[3] S. Yvan, I. Inza, P. Larrañaga, A review of feature selection techniques in bioinformatics, Bioinformatics 23 (2007) 2507–2517.

[4] M.T. Ribeiro, S. Singh, C. Guestrin, “Why Should I Trust You?”: explaining the predictions of any classifier, Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD) (2016) 1135–1144.

[5] A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through pro- pagating activation differences, Proceedings of the 34th International Conference on Machine Learning (PMLR) (2017) 3145–3153.

[6] S.M. Lundberg, S. Lee, A unified approach to interpreting model predictions, Adv.

Neural Inf. Process. Syst. (2017) 4765–4774.

[7] H. Taipale, M. Koponen, A. Tanskanen, A.M. Tolppanen, J. Tiihonen, S. Hartikainen, Antipsychotic doses among community-dwelling persons with Alzheimer disease in Finland, J. Clin. Psychopharmacol. 34 (2014) 435–440.

[8] M. Koponen, A.M. Tolppanen, H. Taipale, A. Tanskanen, J. Tiihonen, K. Johnell, J. Fasbom, R. Ahonen, S. Hartikainen, Incidence of antipsychotic use in relation to diagnosis of Alzheimer's disease among community-dwelling persons, Br. J.

Psychiatry 207 (2015) 444–449.

[9] A. Puranen, H. Taipale, M. Koponen, A. Tanskanen, A.M. Tolppanen, J. Tiihonen, S. Hartikainen, Incidence of antidepressant use in community-dwelling persons with and without Alzheimer's disease: 13-year follow-up, Int. J. Geriatr. Psychiatry 32 (2017) 94–101.

[10] A.M. Tolppanen, H. Taipale, M. Koponen, P. Lavikainen, A. Tanskanen, J. Tiihonen, S. Hartikainen, Cohort profile: the finnish medication and Alzheimer's disease (MEDALZ) study, BMJ Open 6 (2016).

[11] A. Tanskanen, H. Taipale, M. Koponen, A.M. Tolppanen, S. Hartikainen, R. Ahonen, J. Tiihonen, From prescription drug purchases to drug use periods-a second generation method (PRE2DUP), BMC Med. Inform. Decis. Mak. 15 (2015).

[12] A. Tanskanen, H. Taipale, M. Koponen, A.M. Tolppanen, S. Hartikainen, R. Ahonen, J. Tiihonen, Drug exposure in register-based research - An expert-opinion based evaluation of methods, PLOS ONE 12 (2017).

[13] H. Taipale, A. Tanskanen, M. Koponen, A.M. Tolppanen, J. Tiihonen,

S. Hartikainen, Agreement between PRE2DUP register data modeling method and comprehensive drug use interview among older persons, Clin. Epidemiol. 8 (2016) 363–371.

[14] N.V. Chawla, N. Japkowicz, A. Kotcz, Special issue on learning from imbalanced data sets, ACM Sigkdd Explor. Newsl. 6 (2004) 1–6.

[15] L. Breiman, Arcing the Edge, Technical Report 486, Statistics Department University of California at Berkeley, 1997.

[16] L. Mason, J. Baxter, P.L. Bartlett, M. Frean, Boosting algorithms as gradient descent,

Adv. Neural Inf. Process. Syst. (2000) 512–518.

[17] L.S. Shapley, A value for n-person games, Contrib. Theory Games 2 (1953) 307–317.

[18] S.M. Lundberg, G.G. Erion, S. Lee, Consistent Individualized Feature Attribution for Tree Ensembles, (2018) (arXiv preprint),arXiv:1802.03888.

[19] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, J. Vanderplas, Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.

[20] K. Deckers, S.H.J. Schievink, M.M.F. Rodriquez, R.J. van Oostenbrugge, M.P.J. van Boxtel, F.R.J. Verhey, S. Köhler, Coronary heart disease and risk for cognitive impairment or dementia: systematic review and meta-analysis, PLos ONE 12 (1997).

[21] F.J. Wolters, R.A. Segufa, S.K.L. Darweesh, D. Bos, M. Arfan Ikram, B. Sabayan, A. Hofman, S. Sedaghat, Coronary heart disease, heart failure, and the risk of dementia: a systematic review and meta-analysis, Alzheimer's Dement. 14 (2018) 1493–1504.

[22] R.F.A.G. de Bruijn, M. Arfan Ikram, Cardiovascular risk factors and future risk of Alzheimer's disease, BMC Med. 12 (2014).

[23] M. Vu, M. Koponen, H. Taipale, A. Tanskanen, J. Tiihonen, R. Kettunen, S. Hartikainen, A.M. Tolppanen, Prevalence of cardiovascular drugs use before and after diagnosis of Alzheimer's disease, Int. J. Cardiol. 300 (2020) 221–225.

[24] J. Ramirez, J.M. Górriz, D. Salas-Gongzalez, A. Romero, M. López, I. Álvarez, M. Gómez-Rio, Computer-aided diagnosis of Alzheimer's type dementia combining support vector machines and discriminant set of features, Inf. Sci. 237 (2013) 59–72.

[25] R. Polikar, A. Topalis, D. Parikh, D. Green, J. Frymiare, J. Kounios, C.M. Clark, An ensemble based data fusion approach for early diagnosis of Alzheimer's disease, Inf.

Fusion 9 (2008) 83–95.

[26] C. Davatzikos, Y. Fan, X. Wu, D. Shen, S.M. Resnickb, Detection of prodromal Alzheimer's disease via pattern classification of magnetic resonance imaging, Neurobiol. Aging 29 (2008) 514–523.

[27] S. Klöppel, C.M. Stonnington, C. Chu, B. Draganski, R.I. Scahill, J.D. Rohrer, N.C. Fox, C.R. Jack Jr., J. Ashburner, R.S.J. Frackowiak, Automatic classification of MR scans in Alzheimer's disease, Brain 131 (2008) 681–689.

[28] H. Gurulingappa, A. Mateen-Rajpu, L. Toldo, Extraction of potential adverse drug events from medical case reports, J. Biomed. Semant. 3 (2012).

[29] E. Aramaki, Y. Miura, M. Tonoike, T. Ohkuma, H. Masuichi, K. Waki, K. Ohe, Extraction of adverse drug effect from clinical records, Stud. Health Technol.

Inform. 160 (2010) 739–743.

[30] A. Nikfarjam, A. Sarker, K. O’Connor, R. Ginn, G. Gonzalez, Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features, J. Am. Med. Inform. Assoc. 22 (2015) 671–681.

[31] E. Ŝtrumbelj, I. Kononenko, Explaining prediction models and individual predictions with feature contributions, Knowl. Inf. Syst. 41 (2014) 647–665.

Table 3(continued)

Medication use summary between AD and CTRL groups

ATC AD CTRL

#Persons #Persons(%) #Persons #Persons(%)

C08 19,423 49 20,017 51

N05C 19,740 52 18,233 48

N02B 15,569 54 13,508 46

A10 11,852 54 10,135 46

A12 13,730 54 11,841 46

R03 11,022 48 12,181 52

G03 12,719 49 12,992 51

N05A 5966 63 3450 37

N02A 9512 49 9903 51