• Ei tuloksia

4.4.1 Epidemiological Analysis

We studied the association of sleep duration with blood lipid levels (II). The association was tested in Finrisk07 (N=6,092) and in the Health 2000 (5,948) sample using the general linear model (GLM). The analyses were performed using PASW Statistics 18 and all analyses were adjusted for relevant covariates (age, gender, lipid medication, hypnotics, BMI and cohort).

4.4.2 Association Analysis and Haplotype Analysis

Sleep duration showed normal distribution in the population level in Health 2000 and Finrisk07 samples. For the GWA study of sleep duration in GenMets and for analyses in the follow-up samples YF, DILGOM, Finnish Twins and rest of the Health 2000 cohort (I), the phenotype (sleep duration) was standardized in each sample by using the standard-beta option provided by PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/) (Purcell et al., 2007). Linear regression analysis was used to perform the association analysis and the analysis

was adjusted for age, gender, BMI as well as for genomic principal components (GenMets (Kristiansson et al., 2012), DILGOM, YF and the Finnish Twin sample) and metabolic disorder status (GenMets sample). Adjusting the genetic association analyses by the genomic principal components is a method used to correct for population stratification. Fixed effects meta-analysis was performed for calculating P-values using PLINK. Analysis for gene environment interaction was performed using option gxe provided by PLINK (I, III).

Haplotype analysis (I) was performed using Haploview (Barrett et al., 2005) and phasing was performed using Beagle (Browning and Browning, 2011). This software does phasing based on the Hidden Markov Model for haplotypes.

Pathway analysis for SNPs was performed using SNP ratio test (O'Dushlaine et al., 2009). This program calculates the empirical P-values for individual pathways based on permutation tests. In short, the phenotype is permuted one hundred times and then association analysis is done using the permuted phenotypes. The original association results are then annotated into KEGG pathways and compared to the permuted results that are also annotated into KEGG pathways.

In order to study the power of the GenMets sample in GWA for sleep duration (I), we performed genetic power calculation using the Genetic Power Calculator (Purcell et al., 2003). The power was calculated for an additive model assuming quantitative trait locus (QTL) variance of 1.5%, 5% minor allele frequency and perfect linkage disequilibrium between QTL and the marker.

For the candidate gene studies in II and III in the population data, linear or logistic regression analysis with additive model and with relevant covariates (age, gender and BMI) or chi-square test (III) was performed using the PLINK software (http://pngu.mgh.harvard.edu/~purcell/plink/) (Purcell et al., 2007). Sleep duration was normally distributed (II). Other quantitative traits (TG, TC, HDL-C, LDL-C and Epworth sleepiness scale) were squared in order to obtain normal distribution (II).

For case control analysis of the dichotomous traits (depression and sleep disturbances), normalization was not done (III). Bonferroni correction was used in order to adjust for multiple testing (II, III).

For family-based analysis of the candidate SNPs (IV), we used fbat for dichotomous phenotypes (www.biostat.harvard.edu/~fbat) (Horvath et al., 2004) and QTDT for quantitative phenotypes (Abecasis et al., 2000). To complement the QTDT test, Plink was used to analyze the direction of association for quantitative traits using option qfam-total. Age, gender and disease status were used as covariates in the analyses of the quantitative traits (IV). No normalization or correction for multiple testing was performed in study IV.

The softwares summarized in Table 4 were used to analyze the genetic markers and RNA expression.

4.4.2.1 Analysis of RNA Expression

The significance at the single transcript level was calculated using 2-way ANOVA, using control and case groups as one axis and time point as another axis (baseline, deprivation and recovery) using (R/Bioconductor open software packages;

http://www.r-project.org) (II). In addition, changes from baseline to deprivation were calculated with the Student’s T-test within the case groups.

In order to analyze enriched pathways in the RNA expression level (unpublished results) an in-house developed pathway analysis program, CIGA, was used (Pietilainen et al., 2008). Often individual transcripts that belong to the same biological pathway do not reach statistical significance alone. Thus we used an in-house developed pathway analysis program for RNA expression that calculates a cumulative P-value for those transcripts that belong to the same GO and permutated the significant pathways with 1,000 in order to adjust for multiple testing. The 2-way ANOVA was calculated using control and cases groups as one axis and time point as another axis (baseline, deprivation and recovery). The genes were ranked for the pathway analysis program by sorting them based on the significance from 2-way ANOVA. The path2-way analysis program calculates the P-value by answering the question: "how likely is it to see this many genes (k) that belong to the studied pathway this high-up in the ranked list of genes (j), when there are altogether t genes that belong to the pathway amongst n genes in the experiment".

The pathway analysis programs rely on the correct annotation of the genes into the pathways. Thus, it is often the case that different analysis methods and annotations find different pathways for a distinct data set. In order to control for annotation problems we verified the detected pathways with two independent programs IPA (Ingenuity Systems®, www.ingenuity.com) and Anduril (Ovaska et al., 2010).

For analysis of RNA expression levels in DILGOM population sample, linear regression was used with relevant covariates (age, gender and principal components correcting for population stratification). The analysis was performed using R version 2.14.2, http://www.R-project.org/.

4.4.2.2 Correction for Multiple Testing

Genetic studies produce a large amount of tests that create false positive results if only nominal P-values are considered. In the genome-wide setting (I) we used P<10-8 as significance threshold and assigned the empirical P-value by permutation.

In studies II and III, Bonferroni correction was used in order to adjust for multiple

testing (Dunn, 1961). Study IV was performed as follow-up for previous GWA studies and association analysis (Baum et al., 2008a, Baum et al., 2008b, Ferreira et al., 2008, Sklar et al., 2008, WTCCC, 2007). No correction for multiple testing was done in study IV.

Table 4. Summary of the software used in the analysis of the data

Analysis Software Reference Study

Association analysis PLINK Purcell et al., 2007 I-IV

QTDT fbat Horvath et al., 2004 IV

Haplotype analysis Haploview Barrett et al., 2005 I

Phasing Beagle Browning et al., 2011 I

Pathway analysis for SNPs SNP ratio test O’Dushlaine et al., 2011 I Power calculations Genetic power

calculator Purcell et al., 2003 I Pathway analysis for RNA

expression CIGA Pietilainen et al., 2008 I

Epidemiological analyses

General linear model SPSS IBM II

5 Results and Discussion