UEF//eRepository
DSpace https://erepo.uef.fi
Rinnakkaistallenteet Terveystieteiden tiedekunta
2020
Detection of medications associated with Alzheimer's disease using
ensemble methods and cooperative game theory
Braithwaite, B
Elsevier BV
Tieteelliset aikakauslehtiartikkelit
© 2020 The Authors
CC BY-NC-ND https://creativecommons.org/licenses/by-nc-nd/4.0/
http://dx.doi.org/10.1016/j.ijmedinf.2020.104142
https://erepo.uef.fi/handle/123456789/8191
Downloaded from University of Eastern Finland's eRepository
Contents lists available atScienceDirect
International Journal of Medical Informatics
journal homepage:www.elsevier.com/locate/ijmedinf
Detection of medications associated with Alzheimer's disease using ensemble methods and cooperative game theory
B. Braithwaite
a,*, J. Paananen
b, H. Taipale
a,c,d,e, A. Tanskanen
d,e, J. Tiihonen
d,e, S. Hartikainen
a,c, A-M. Tolppanen
aaSchool of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio, Finland
bInstitute of Biomedicine, Univeristy of Eastern Finland, Finland
cKuopio Research Centre of Geriatric Care, University of Eastern Finland, Kuopio, Finland
dDepartment of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
eDepartment of Forensic Psychiatry, Niuvanniemi Hospital, University of Eastern Finland, Kuopio, Finland
A R T I C L E I N F O Keywords:
Pattern recognition Knowledge organization Feature attribution Cooperative game theory Shapley value Gradient boosting Alzheimer's disease Nested case-control study Pharmacoepidemiology
A B S T R A C T
Objective:To study the feasibility of evaluating feature importance with Shapley Values and ensemble methods in the context of pharmacoepidemiology and medication safety.
Methods:We detected medications associated with Alzheimer's disease (AD) by examining the additive feature attribution with combined approach of Gradient Boosting and Shapley Values in the Medication use and Alzheimer's disease (MEDALZ) study, a nested case-control study of 70,719 verified AD cases in Finland. Our methodological approach is to do binary classification using Gradient boosting (an ensemble of weak classifiers) in a supervised learning manner. Then we apply Shapley Values (from cooperative game theory) to analyze how feature combinations affect the classification result. Medication use with a five to one year time-window before AD diagnosis was ascertained from Prescription register.
Results:Antipsychotics with low or medium dose, antidepressants with medium to high dose, and cardiovascular medications with medium to high dose were identified as the contributing features for separating cases with AD from controls. Medium to high amount of irregularity in the purchase pattern were an indicating feature for separating AD cases from controls. The similarity of medication purchases between AD cases and controls made the feature evaluation challenging.
Conclusions:The combined approach of Gradient Boosting and feature evaluation with Shapley Values identified features that were consistent with findings from previous hypothesis-driven studies. Additionally, the results from the additive feature attribution identified new candidates for future studies on AD risk factors. Our ap- proach also shows promise for studies based on observational studies, where feature identification and inter- actions in populations are of interest; and the applicability of using Shapley Values for evaluating feature re- levance in pattern recognition tasks.
1. Introduction
Observational studies on harms and benefits of treatments have mainly utilized traditional frequentist methods which are ad hoc with respect to the tested hypothesis. In this work, we propose a data-driven approach to do hypothesis-testing or generation in the context of identifying risk-modifying treatments using a pattern recognition ap- proach.
In pattern recognition (PR), the goal is to build a discriminant or generative model for recognizing patterns using appropriate features [1]. Appropriateness in PR means that the features should offer dis- criminatory power between patterns[2]. Using inappropriate features
leads to poor PR models and thus to poor classification performances.
This is not only a practical point of building an accurate and robust model, but also from a computational stand-point. The number of fea- tures directly translates into the number of parameters the model has to evaluate.
Combining appropriate features for a model is coinedfeature selec- tion. There are three main approaches for (automated) feature selection, where features are combined in different ways to improve the model's learning performance. These three approaches arewrapper,filter and embedded[3]. The filter approach does feature pruning, such that only a subset of features are used for the learning algorithm and the rest is ignored. The wrapper approach on-the-other-hand searches features by
https://doi.org/10.1016/j.ijmedinf.2020.104142
Received 28 June 2019; Received in revised form 22 November 2019; Accepted 5 April 2020
⁎Corresponding author.
International Journal of Medical Informatics 141 (2020) 104142
1386-5056/ © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/).
T
embedding the learning model onto a feature subset which are gener- ated and evaluated during the learning algorithm. Filtering and wrap- pers can be combined into a embedded approach, which searches op- timal feature subsets which are built-in into the learning algorithm, and then searches the union of feature subsets the model hypothesis.
However, these approaches have the drawback that the features are only evaluated in some fixed order to form the discriminatory rule.
Perhaps a more severe drawback is that these approaches mainly aim to make the model more computationally tractable, which might not correspond to real world events. Recently, methods for explaining model performance, by inspecting which features contribute to the model output, have been developed. These methods have been coined asfeature attribution. There are three prominent approaches to feature attribution: evaluating local influences of features [4], recursively evaluating activation functions (neural networks only) [5], and a co- operative game theoretic approach[6]. In this work, we will approach evaluating feature importance using the framework proposed by Lundberg and Lee[6].
1.1. Aim of study
We investigate the feasibility of applying feature attribution to pharmacoepidemiology with the following binary classification pro- blem: given a person's medication use history, can we predict whether the person is diagnosed with AD within the near future? Specifically, the features were medications used within a one-to-five year time window before AD diagnosis, because previous research has shown that
there is an increase in the use of several medications within this timeframe, such as increased initiation of psychotropics[7–9]. This provides us an opportunity to validate our results against previous findings. We propose an exploratory data analysis approach using PR models and methods for extracting explanations from the classification results. We use Gradient Boosting as our supervised learning model, and evaluate feature attributions by using Shapley values from cooperative game theory. With this approach, we extract and compare differences in medication patterns between persons with and without AD.
2. Materials and methods 2.1. Material: MEDALZ cohort study
The Medication use and Alzheimer's disease (MEDALZ)[10]study consists of all residents in Finland who received a clinically verified AD diagnosis during 2005–2011, containing a sample size of 70,719 par- ticipants. These persons were identified from the Special Reimburse- ment register, maintained by the Social Insurance Institution of Finland.
All AD cases were confirmed by medical examination, including CT/
MRI scan, excluding alternative diagnoses and confirmation of AD by a geriatrician or a neurologist. This study utilised data from the pre- scription register (1995–2011). A more detailed description of the register data can be found in Tolppanen et al.[10]. Data on purchased medication was extracted from the Prescription register and trans- formed into medication use periods using a validated PRE2DUP method [11–13]. PRE2DUP models the exposure periods of purchased medi- cation by constructing exposure periods and estimates the average dosage during the exposure period by considering the number of pur- chases in defined daily doses (DDD), which were recorded in the pre- scription register. This gives a chronological use history of the used medication.
2.2. Methods
2.2.1. Preprocessing material
We took a 1:1 ratio of AD and non-AD (that is, controls (CTRL)) persons for our classification and feature evaluation tasks in order to reduce the bias caused by class over-representation[14]. The controls were matched for age ( ± one year), sex and region of residence. After we have extracted the closest matching AD and CTRL persons from the register-data, we extracted all data on medication use periods from the register data within a specified time-window. We defined our time- window such that, given an AD diagnosis datemandnyears prior to the AD diagnosis date, the time-window is given as a [m−n,m− 1] time- window. We selected to use a five-year period for extracting all data on medication use but excluded the year preceding the AD diagnoses for Fig. 1.Measuring the 10-fold cross-validated GB model using AUC metric.
Fig. 2.Feature attribution scores of irregularity (σDDD), average DDD (μDDD), exposure period (Exp) and ATC classification features, according to their SHAP values. Densities in the plots correspond to each observation in the data set. The plot illustrates the order of feature relevance and combined feature attributions to the models predictive performance. Stacked points in the plot means multiple equally relevant feature observations have been made.
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
extracting all data on medication use. We chose to exclude the pre- ceding year from the AD diagnosis date, is because an ongoing diag- nostic process leads to increased contact with doctors, affecting
medication prescription. So we are using a four-year time-window in analyzing feature attributions for AD classification. After we have ex- tracted the use periods, we applied the PRE2DUP modelling for the Fig. 3.Higher order interactions between each medication from categories A–H of the ATC-classification and its exposure period (Exp) in days, with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output. (categories J–V inFig. 3).
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
given time-window.
We approach the problem of identifying medication use associated with AD as a binary classification problem. We apply a supervised
learning scheme, using Gradient boosting (GB) [15,16], which is an ensemble method where a collection of weak tree classifiers are com- bined to create a decision boundary between classes.
Fig. 4.Higher order interactions between each medication from categories J–V of the ATC-classification and its exposure period (Exp) in days, with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output.
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
2.2.2. SHAP values
To explain the model behaviour given a set of features, Lundberg and Lee[6]presented an approach using Shapley values[17]and local
explanations, denoted as SHapley Additive exPlanation (SHAP) values [18](see Appendix A). The evaluation of feature attributions is done by evaluating the conditional expectation of the original model's output Fig. 5.Higher order interactions between each medication from categories A–H of the ATC-classification and its average daily doseμDDD with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output. (categories J–V inFig. 6).
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
and the simplified input features. It is shown by Lundberg and Lee[6]
that evaluating the conditional expectation of a model and its inputs using Shapley values is the only consistent approach for evaluating feature attribution. Lundberg et al.[18]proposes a recursive program
for computing the SHAP values in (TLD )2 time in tree ensembles, whereTis the number of trees,Lis the maximum number of leaves in tree, andDis the maximum depth of any tree.
Fig. 6.Higher order interactions between each medication from categories J–V of the ATC-classification and its average daily doseμDDD with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output.
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
2.3. Data analysis
We approached evaluating medication use in persons with AD with a two-step approach:
1. First, we selected the following medication use-based features for investigating which features are most relevant according to their feature attribution value to the GB model: the Exposure (in days) of a given medication, the average daily dose (μDDD) of a given medication, the regularity of the medication (σDDD) within the Fig. 7.Higher order interactions between each medication from categories A–H of the ATC-classification and its purchase regularityσDDD with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output. (categories J–V inFig. 8).
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
persons purchase history, and the occurrence of the medication itself where the medication is given by its Anatomical Therapeutic Chemical (ATC) classification. The regularityσDDD is the dose-wise temporal, relative variation of μDDD if the medication appears
frequently enough in a time-window. (See Tanskanen et al.[11]for details.) The medications were grouped with respect to the first three characters in the ATC code, except for medications for nervous system, which were grouped with respect to the first four characters.
Fig. 8.Higher order interactions between each medication from categories J–V of the ATC-classification and its purchase regularityσDDD with respect to their computed SHAP value. The higher order interactions are determined by the difference between the model's output and its expected output.
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
(see Appendix B).
2. After we have identified which feature attributed the most to the GB model, we further investigated the medication's features attribution to detect medication patterns.
2.3.1. Analysis – step 1
For the first part of the analysis we selected the exposure periods, μDDD,σDDD and the occurrence of the medication, occurring within our specified time-window, as features. The occurrence of the medi- cation were categorical, so each medication is mapped to a numerical value between [1 …M], whereMdenotes the total amount of unique medications occurring within the time-window. We split the data, such that 85% of the data is used as training data and 15% of the data as test
data. The hyperparameters such as learning rate, number of estimators and maximal depth of each tree classifier were selected by doing a randomized hyperparameter optimization with 10-fold cross-valida- tion. For the objective function we used a binary logistic objective function. After the hyperparameter optimization, we proceeded to train the model using (also) a 10-fold cross-validation. From the learned data, we applied the model to the test set and retrieved the most re- levant features using the SHAP values.
2.3.2. Analysis – step 2
We construct a binary feature matrix using one-hot-encoding of all medications appearing in our specified time-window. With this ap- proach, we will obtain the SHAP values for each individual medication.
Fig. 9.Feature attribution according to their SHAP values. Densities in the plots correspond to each data point in the sample. The plot illustrates the combined feature attributions to the model's predictive performance. Blue value indicates the absence of the medication and red indicates the presence of the medication. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
By obtaining individual feature information, we can, for example, in- vestigate how the particular medication interacts with age. The model constructed and trained using the same approach as in Step 1.
For the experiments, we used the Scikit-learn [19], Shap1 and XGBoost2modules for the Python language.
3. Results
3.1. Results from step 1.
Fig. 1. shows our model's performance on classifying AD cases from controls, measured with Area Under the Curve (AUC) metric. Ex- planations of feature importance, from the most impactful to the to the least impactful, is illustrated inFig. 2. Each density point shows the SHAP value of a corresponding (single) observation and the colors corresponds to their respective feature value. Positive (negative) SHAP
values corresponds to pushing the models expected output towards AD (CTRL) classification. Irregularity (σDDD) and average DDD (μDDD) are shown to be the important feature for discriminating between AD cases from controls.
Each individual medication's associations to other features and their impact to the model's performance are depicted inFigs. 3and4. (ex- posure),5and6. (μDDD) and7and8. (σDDD). These figures illustrate the higher-order interactions of the feature's SHAP values, where the interaction is determined by the difference between the model's output and the model's expected output. That is, how the combination of features are affected across the data set. For example,Figs. 3 and4 shows most observed medication use with short exposures do not contribute much to the models classification accuracy. On-the-other- hand medication use with higher exposures contribute either positively or negatively to the classification accuracy.
3.2. Results from step 2
Fig. 9. shows a generalized summary of the 20 most used features, with which the GB model was able to discriminate AD cases from controls. Antidepressants (N06A) is ranked as the most significant feature, showing that the increased use of antidepressants is a pre- dictive feature for AD. Other relevant features are the medications used for treating cardiovascular disease (ATC C-class medications) and an- tipsychotics (N05A). These features are highligthed to illustrate the validity of our approach. For example, Koponen et al.[8]observed that the use of antipsychotics started to increase two to three years before AD diagnosis, while Taipale et al.[7]observed that majority of persons with AD were using low or medium dose of antipsychotics. Ad- ditionally, Puranen et al. [9] showed that the incidence of anti- depressants among persons with AD is more common compared to persons without AD already before the AD diagnosis. All these findings are based on the same data as this study. As for the cardiovascular disease medication, Deckers et al.[20]suggests an association between coronary heart disease (and their medication) and risk for cognitive impairment or dementia, and studies from Wolters et al. [21] and Bruijn and Ikram[22]report cardiovascular risk factors for dementia and AD. We have also demonstrated that the use of several cardiovas- cular drugs (includingβ-blockers, statins, diuretics and nitrates which were identified in this work) was more common in persons with AD before AD diagnosis[23].
Figs. 10–13show the interaction between age and antidepressants (N06A), antipsychotics (N05A), calcium channel blockers (C08) and lipid-modifying agents (C10) respectively with respect to their SHAP values. For antidepressants (Fig. 10.) and antipsychotics (Fig. 11.) we can see how the majority of the medication users push the model to classify a case as AD, while those who do not use these medications do not affect the model's behaviour. The same applies to calcium channel blockers (Fig. 12). As for the lipid-modifying agents (Fig. 13.), there is more variation for the control group when compared toFigs. 10–12. From a classification point-of-view, this means that it is difficult to
“decide” to which class an observation belongs to, when there is no clear distinction between observed classes.
4. Discussion and conclusion
Our results demonstrate the applicability of PR and game theory in pharmaepidemiological research. To our knowledge, PR methods have been utilized very little in this context, which gives our work some novelty value for organizing knowledge from a data set and doing ex- ploratory analysis. Our two-phased approach consists of a PR model and a method for extracting explainability from the model. This general approach for exploratory data analysis can easily be modified by switching the specific methods or models applied in each phase. The challenge of the proposed approach is that the choice of the models and methods in each phase is crucial (and dependent on the analysis Fig. 10.The interaction between antidepressants and age with respect to their
SHAP values. The plot illustrates how age is associated with the use of anti- depressants across the data set. Blue value indicates the absence of the medi- cation and red indicates the presence of the medication. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 11.The interaction between antipsychotics and age with respect to their SHAP values. The plot illustrates how age is associated with the use of anti- psychotics across the data set. Blue value indicates the absence of the medi- cation and red indicates the presence of the medication. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
1https://github.com/slundberg/shap.
2https://github.com/dmlc/xgboost.
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
problem). The same applies to deciding which features to use in the PR model, and setting up hyperparameters to a selected PR model.
However, these common analytical concerns are not unique to our (specific) approach, and solving them is always, to some extent, a matter of trial-and-error.
Although, there have been applications of PR in the context of predicting AD based on data from different (functional) imaging mod- alities, such as Single Photon Emission Computed Tomography and positron emission tomography[24], electroencephalogram[25], and magnetic resonance imaging[26,27]. There are also examples where PR and machine learning has been used, for example, extracting po- tential adverse drug events from medical case reports [28], clinical records[29]and social media[30].
Because of the similar medication history between AD and CTRL groups, our model was not able to draw a clear distinction between the two groups. This could be observed, for example from the relative ex- posure, average daily dose (μDDD) and irregularity (σDDD) of the
medication use (Appendix C). This is also illustrated from the summary of purchased medication from the specified time-window between AD and CTRL groups (Appendix D). Although we were not able to build an accurate model, for the purpose of knowledge organization, we did identify individual features and feature combinations related to higher likelihood of being diagnosed with AD (from the used time-window).
These findings were consistent with the previous literature, which de- monstrates the validity of our approach. Thus, our proposed approach using PR and evaluating feature effects and relations with game theory offers a feasible way of exploring a data set and searching for candi- dates for (as an example) future pharmacoepidemiological studies with more detailed exposure assessment and confounder control.
Author's contributions
Problem formulation done by B. Braithwaite, A.M. Tolppanen and S.
Hartikainen. PRE2DUP modelling done by A. Tanskanen. Data analysis done by B. Braithwaite. B. Braithwaite drafted the first version of the manuscript. All authors authors contributed to interpretation of results and revised the manuscript for intellectual content.
Statement of conflict of interest
B. Braithwaite, A.M. Tolppanen and S. Hartikainen have no con- flicting interests. H. Taipale, J. Tiihonen and A. Tanskanen have par- ticipated in research projects funded by Janssen and Eli Lilly with grants paid to the institution where they were employed. J. Tiihonen reports personal fees from the Finnish Medicines Agency (Fimea), European Medicines Agency (EMA), Eli Lilly, Janssen-Cilag, Lundbeck, and Otsuka; and has received grants from the Stanley Foundation and Sigrid Jusélius Foundation. J. Paananen is a shareholder and employee in Afekta Technologies Ltd and Genomi Solutions Ltd, and employee of Blueprint Genetics Ltd.
Summary points What is known?
•
Hypothesis-testing and hypothesis-generation in observational studies are based on ad hoc and frequentist methods.•
Persons with AD use antidepressants, antipsychotics and car- diovascular medications more commonly before AD diag- nosis compared to persons without AD.What the study has added?
•
A data-driven approach to explatory data analysis, using pat- tern recognition and cooperative game theory, can provide additional insights in observational studies.•
A strategy for organizing knowledge in observational studies for conducting either hypothesis-testing or hypothesis-gen- eration.•
Irregularity of medication use was predictive of AD.Acknowledgements
This work was supported by is funded by Academy of Finland (grants numbers 307232 and 295334 to A.M. Tolppanen, which also paid B. Braithwaite's salary) and strategic funding from the University of Eastern Finland (to AMT). The funders had no role in study design; in the collection, analysis, and interpretation of data, in the writing of the report; and in the decision to submit the paper for publication.
Fig. 12.The interaction between calcium channel blockers and age with re- spect to their SHAP values. The plot illustrates how age is associated with the use of calcium channel blockers across the data set. Blue value indicates the absence of the medication and red indicates the presence of the medication.
(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 13.The interaction between lipid modifying agents and age with respect to their SHAP values. The plot illustrates how age is associated with the use of lipid modifying agents across the data set. Blue value indicates the absence of the medication and red indicates the presence of the medication. (For inter- pretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
Appendix A
Interpreting model behaviour falls into a class coinedadditive feature attribution methods, which aims to explain models with real-valued output attributed to each feature. Lundberg et al. [6,18]recently proposed an approach, where feature attributions are evaluated individually and in combination using Shapley values[17], a concept from cooperative game theory. Cooperative game theory is a branch of game theory, which is an approach to analyze how players in the game form cooperative teams (or coalitions), and how the actions of these teams result in some expected payoff (or utility). The expected payoff is measured using acharacteric function(that is, a metric). In our work, the Shapley value is used to evaluate how feature combinations (that is coalitions) affect the predicted outcome of our GB model. That is, we treat our GB model as the game setting, the model's features as our players, Area Under the Curve (AUC) as our characteristic function, and the predicted output of our GB model as the common interest. The common interest here is the classification of AD or CTRL from a series of medication use observations. Features are evaluated in permutations by evaluating their expected output, and the best coalition of features, which produces the highest contribution to the common interest, is selected.
A.1 Evaluating feature attribution
The additive feature attribution method can be briefly summarized as follows. Given a linear modelgwith binary variablesz∈{0, 1}M, the attribution of features is defined as
= +
=
g z( ) z,
i M
i i 0
0 (1)
whereMis the number of features and i is the feature attribution value. The variablezdenotes either the absence or presence of a feature.
Lundberg and Lee[6]described that an important property of the additive feature attribution is that there exists a (single) unique solution with the following properties:
(2)Local accuracy, that is, when a modelf(which is the characteristic function) approximates for a specific inputx, local accuracy must (at least) match the output offfor a simplified inputx′ (“interpretable input”), corresponding to the original input. More formally,
= = +
=
f x( ) g x( ) x.
i M
i i 0
1 (2)
The explanationg(x′) matchesf(x) whenx′ is correctly converted and projected back to the original input space ofx, denoted asx=hx(x′). Here ϕ0=f(hx(0)) represents the output offwhere allx′ are missing, where0is the vector with 0 entries.
(3)Missing data, that is, if in a simplified inputx′ a feature is missing, then the original input has no effect on the model:
= =
xi 0 i 0. (3)
(4)Consistency, that is, if a simplified input's contribution to the model's accuracy increases or does not change, regardless of other inputs, then that input's contribution should not decrease the model's accuracy. More formally, iffx(x′) =f(hx(x′)) andx′ = 0 is denoted asx0, then for two models fandkwe have the feature attribution relationϕi(k,x) ≥ϕi(f,x)∀i= 1 …M, if
k xx( ) k xx( )0 f xx( ) f xx( ),0 x {0, 1} .M (4)
In order for a model to satisfy local accuracy, missing data and consistency properties, Lundberg and Lee[6]proposes the following additive feature attribution measure:
=
f x z M z
M f z f z
( , ) | |!( | | 1)!
! [ ( ) ( )],
i
z x
x x 0
(5) where |·| denotes the number of non-zero entries,z0= 0 wherex′ = 0, andz⊆x′ denotes allzvectors where non-zero entries are a subset of the non- zero entries inx′. Eq.(5)is equivalent to the Shapley values[31,17]and is used to compute the SHAP values (Section2.2.2).
Appendix B
Table 1
Anatomical Therapeutic Chemical (ATC) classification and names of medications from A to H.
ATC classification
A01 Stomatological preparations
A02 Drugs for acid related disorders
A03 Drugs for functional gastrointestinal disorders
A04 Antiemetics and antinauseants
A05 Bile and liver therapy
A06 Drugs for constipation
A07 Antidiarrheals, intenstinal anti-inflammatory/anti-infective agents A08 Antiobesity preparations, excl. diet products
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
Appendix C
Table 2
Anatomical Therapeutic Chemical (ATC) classification and names of medications from J J to V.
ATC classification
J01 Antibacterials for systemic use
J02 Antimycotics for systemic use
J04 Antimycobacterials
J05 Antivirals for systemic use
L02 Endocrine therapy
L03 Immunostimulants
L04 Immunosuppressants
M01 Anti-inflammatory and antirheumatic products
M03 Muscle relaxants
M04 Antigout preparations
M05 Drugs for treatment of bone disease
M09 Other drugs for disorders of the musculo-skeletal system
N02A Opioids
N02B Other analgesics and antipyretics
N02C Antimigraine preparations
N03A Antiepileptics
N04A Anticholinergic agents
N04B Dopaminergic agents
N05A Antipsychotics
N05B Anioxolytics
N05C Hypnotics and sedatives
N06A Antidepressants
N06B Psychostimulants, agents used for ADHD and nootropics N06C Psycholeptics and psychoanaleptics in combination
N07A Parasympathomimetics
N07B Drugs used in addictive disorders
N07C Antivertigo preparations
N07X Other nervous system drugs
P01 Antiprotozoals
R01 Nasal preparations
R03 Drugs for obstructive airway diseases
R05 Cough and cold preparations
R06 Antihistamines for systemic use
S01 Ophthalmologicals
V03 All other therapeutic products
Table 1(continued) ATC classification
A09 Digestives, incl. enzymes
A10 Insulins and analogues
A11 Vitamins
A12 Calcium
A14 Anabolic agents for systemic use
B01 Antithrombotic agents
B02 Antihemorrhagics
B03 Antianemic preparations
C01 Cardiac therapy
C02 Antihypertensives
C03 Diuretics
C04 Peripheral vasodilators
C05 Vasoprotectives
C07 Beta blocking agents
C08 Calcium channel blockers
C09 Agents acting on the renin-angiotensin system
C10 Lipid modifying agents
G01 Gynecological anti-infectives and antiseptics
G02 Other gynecologicals
G03 Sex hormones and modulators of the genital system
G04 Urologicals
H01 Pituitary and hypothalamic hormones and analogues
H02 Corticosteroids for systemic use
H03 Thyroid therapy
H04 Pancreatic hormones
H05 Calcium homeostasis
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
Fig. 14.A cumulative distribution of exposure periods of the top 20 medication detected by GB.
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
Fig. 15.A cumulative distribution of average daily dose of the top 20 medication detected by GB.
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
Appendix D Table 3
Fig. 16.A cumulative distribution of the medication's purchase irregularity of the top 20 medication detected by GB.
Table 3
A summary of purchased medication from the specified time-window between persons with AD and without AD (CTRL).
Medication use summary between AD and CTRL groups
ATC AD CTRL
#Persons #Persons(%) #Persons #Persons(%)
N06A 16,642 62 10,394 38
C09 28,543 49 29,498 51
J01 40,093 50 39,889 50
M01 38,610 50 38,240 50
A02 23,619 50 23,884 50
C10 27,840 51 26,357 49
C03 24,556 49 24,710 51
C01 27,673 52 25,502 48
C07 36,124 51 35,200 49
B01 19,628 53 17,655 47
G04 14,711 55 12,308 45
(continued on next page)
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142
References
[1] F. Keinosuke, Statistical Pattern Recognition, Academic Press, 1990.
[2] B.L. Avrim, P. Langley, Selection of relevant features and examples in machine learning, Artif. Intell. 97 (1997) 245–271.
[3] S. Yvan, I. Inza, P. Larrañaga, A review of feature selection techniques in bioin- formatics, Bioinformatics 23 (2007) 2507–2517.
[4] M.T. Ribeiro, S. Singh, C. Guestrin, “Why Should I Trust You?”: explaining the predictions of any classifier, Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD) (2016) 1135–1144.
[5] A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through pro- pagating activation differences, Proceedings of the 34th International Conference on Machine Learning (PMLR) (2017) 3145–3153.
[6] S.M. Lundberg, S. Lee, A unified approach to interpreting model predictions, Adv.
Neural Inf. Process. Syst. (2017) 4765–4774.
[7] H. Taipale, M. Koponen, A. Tanskanen, A.M. Tolppanen, J. Tiihonen, S. Hartikainen, Antipsychotic doses among community-dwelling persons with Alzheimer disease in Finland, J. Clin. Psychopharmacol. 34 (2014) 435–440.
[8] M. Koponen, A.M. Tolppanen, H. Taipale, A. Tanskanen, J. Tiihonen, K. Johnell, J. Fasbom, R. Ahonen, S. Hartikainen, Incidence of antipsychotic use in relation to diagnosis of Alzheimer's disease among community-dwelling persons, Br. J.
Psychiatry 207 (2015) 444–449.
[9] A. Puranen, H. Taipale, M. Koponen, A. Tanskanen, A.M. Tolppanen, J. Tiihonen, S. Hartikainen, Incidence of antidepressant use in community-dwelling persons with and without Alzheimer's disease: 13-year follow-up, Int. J. Geriatr. Psychiatry 32 (2017) 94–101.
[10] A.M. Tolppanen, H. Taipale, M. Koponen, P. Lavikainen, A. Tanskanen, J. Tiihonen, S. Hartikainen, Cohort profile: the finnish medication and Alzheimer's disease (MEDALZ) study, BMJ Open 6 (2016).
[11] A. Tanskanen, H. Taipale, M. Koponen, A.M. Tolppanen, S. Hartikainen, R. Ahonen, J. Tiihonen, From prescription drug purchases to drug use periods-a second gen- eration method (PRE2DUP), BMC Med. Inform. Decis. Mak. 15 (2015).
[12] A. Tanskanen, H. Taipale, M. Koponen, A.M. Tolppanen, S. Hartikainen, R. Ahonen, J. Tiihonen, Drug exposure in register-based research - An expert-opinion based evaluation of methods, PLOS ONE 12 (2017).
[13] H. Taipale, A. Tanskanen, M. Koponen, A.M. Tolppanen, J. Tiihonen,
S. Hartikainen, Agreement between PRE2DUP register data modeling method and comprehensive drug use interview among older persons, Clin. Epidemiol. 8 (2016) 363–371.
[14] N.V. Chawla, N. Japkowicz, A. Kotcz, Special issue on learning from imbalanced data sets, ACM Sigkdd Explor. Newsl. 6 (2004) 1–6.
[15] L. Breiman, Arcing the Edge, Technical Report 486, Statistics Department University of California at Berkeley, 1997.
[16] L. Mason, J. Baxter, P.L. Bartlett, M. Frean, Boosting algorithms as gradient descent,
Adv. Neural Inf. Process. Syst. (2000) 512–518.
[17] L.S. Shapley, A value for n-person games, Contrib. Theory Games 2 (1953) 307–317.
[18] S.M. Lundberg, G.G. Erion, S. Lee, Consistent Individualized Feature Attribution for Tree Ensembles, (2018) (arXiv preprint),arXiv:1802.03888.
[19] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, J. Vanderplas, Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
[20] K. Deckers, S.H.J. Schievink, M.M.F. Rodriquez, R.J. van Oostenbrugge, M.P.J. van Boxtel, F.R.J. Verhey, S. Köhler, Coronary heart disease and risk for cognitive im- pairment or dementia: systematic review and meta-analysis, PLos ONE 12 (1997).
[21] F.J. Wolters, R.A. Segufa, S.K.L. Darweesh, D. Bos, M. Arfan Ikram, B. Sabayan, A. Hofman, S. Sedaghat, Coronary heart disease, heart failure, and the risk of de- mentia: a systematic review and meta-analysis, Alzheimer's Dement. 14 (2018) 1493–1504.
[22] R.F.A.G. de Bruijn, M. Arfan Ikram, Cardiovascular risk factors and future risk of Alzheimer's disease, BMC Med. 12 (2014).
[23] M. Vu, M. Koponen, H. Taipale, A. Tanskanen, J. Tiihonen, R. Kettunen, S. Hartikainen, A.M. Tolppanen, Prevalence of cardiovascular drugs use before and after diagnosis of Alzheimer's disease, Int. J. Cardiol. 300 (2020) 221–225.
[24] J. Ramirez, J.M. Górriz, D. Salas-Gongzalez, A. Romero, M. López, I. Álvarez, M. Gómez-Rio, Computer-aided diagnosis of Alzheimer's type dementia combining support vector machines and discriminant set of features, Inf. Sci. 237 (2013) 59–72.
[25] R. Polikar, A. Topalis, D. Parikh, D. Green, J. Frymiare, J. Kounios, C.M. Clark, An ensemble based data fusion approach for early diagnosis of Alzheimer's disease, Inf.
Fusion 9 (2008) 83–95.
[26] C. Davatzikos, Y. Fan, X. Wu, D. Shen, S.M. Resnickb, Detection of prodromal Alzheimer's disease via pattern classification of magnetic resonance imaging, Neurobiol. Aging 29 (2008) 514–523.
[27] S. Klöppel, C.M. Stonnington, C. Chu, B. Draganski, R.I. Scahill, J.D. Rohrer, N.C. Fox, C.R. Jack Jr., J. Ashburner, R.S.J. Frackowiak, Automatic classification of MR scans in Alzheimer's disease, Brain 131 (2008) 681–689.
[28] H. Gurulingappa, A. Mateen-Rajpu, L. Toldo, Extraction of potential adverse drug events from medical case reports, J. Biomed. Semant. 3 (2012).
[29] E. Aramaki, Y. Miura, M. Tonoike, T. Ohkuma, H. Masuichi, K. Waki, K. Ohe, Extraction of adverse drug effect from clinical records, Stud. Health Technol.
Inform. 160 (2010) 739–743.
[30] A. Nikfarjam, A. Sarker, K. O’Connor, R. Ginn, G. Gonzalez, Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features, J. Am. Med. Inform. Assoc. 22 (2015) 671–681.
[31] E. Ŝtrumbelj, I. Kononenko, Explaining prediction models and individual predic- tions with feature contributions, Knowl. Inf. Syst. 41 (2014) 647–665.
Table 3(continued)
Medication use summary between AD and CTRL groups
ATC AD CTRL
#Persons #Persons(%) #Persons #Persons(%)
C08 19,423 49 20,017 51
N05C 19,740 52 18,233 48
N02B 15,569 54 13,508 46
A10 11,852 54 10,135 46
A12 13,730 54 11,841 46
R03 11,022 48 12,181 52
G03 12,719 49 12,992 51
N05A 5966 63 3450 37
N02A 9512 49 9903 51
B. Braithwaite, et al. International Journal of Medical Informatics 141 (2020) 104142