• Ei tuloksia

In human genetics, variation within the human genome and the related consequences on human features are researched. Human genetics encompasses, but is not limited to, several overlapping disciplines, such as genomics and medical genetics, as well as population and developmental genetics. In these disciplines, the role of genes is discussed differently, but with a common objective to understand human life. In the following overview, the viewpoint is mostly molecular genetics.

Ϯ͘ϭ͘ϭKǀĞƌǀŝĞǁŽĨƚŚĞƐƚƌƵĐƚƵƌĞĂŶĚĨƵŶĐƟŽŶŽĨƚŚĞŚƵŵĂŶŐĞŶŽŵĞ The euchromatic portion of the human genome sequence was deciphered by the Human Genome Project that concluded its work in 2003.17 The

Š—ƒ‰‡‘‡‡…‘†‡•‘Ž›ʹͲԜͲͲͲ–‘ʹͷԜͲͲͲ’”‘–‡‹Ǧ…‘†‹‰‰‡‡•17 that are sparsely distributed.18 The remainder of the genome is noncoding, with ample highly-repetitive sequences of incompletely known function.18

The human genome consists of nuclear deoxyribonucleic acid (DNA) within the nucleus and mitochondrial DNA within the mitochondria. In the nucleus, the DNA molecules form 23 pairs of chromosomes in human somatic cells.19 These pairs share the locations of genes, however, each allele may be different. The individual setup of alleles is the genotype. An individual can be homozygous or heterozygous at a given locus, depending on whether the alleles are the same or if they differ, respectively. This holds true for the 22 pairs of autosomes, but the sex determining allosomes come in two forms, X and Y, that are not complementary.

The DNA molecule is a large polymer that has sugar-phosphate backbone and a varying base that attaches to the sugar, thus forming a strand of nucleotides.18 The bases of DNA are adenine (A), guanine (G), cytosine (C), and thymine (T). Base pairing creates the DNA duplex form, in which the two DNA strands curve around one another to a double helix around the bases. The bases on opposing strands link together by hydrogen bonding according to a standard combination of the tolerated base pairs: A goes with T, and G with C.18

The DNA nucleotide sequence is an instruction to generate a complementary ribonucleic acid (RNA) sequence in a process called transcription. A gene is a set of such instructions, with an exon segment that is expressed and intron segments that are not expressed. In addition, in the vast majority of intergenic DNA that remains unexpressed, there are promoter regions to genes that contribute in the gene expression regulation. Even if each human cell contains every gene, the gene

expression needs to be regulated to match the temporal and

tissue-•’‡…‹ϐ‹…‡‡†•Ǥ18

Š‡ ϐ‹ƒŽ ’”‘†—…– ‘ˆ –”ƒ•…”‹’–‹‘ ‹• ‘†‹ϐ‹‡† „›ǡ ˆ‘” ‡šƒ’Ž‡ǡ splicing and, once mature, will serve as either noncoding operator or a messenger template to make a polypeptide. In a process called translation, the sequence delivered by the messenger is deciphered and an amino-acid chain is created according to nucleotide triplets. Not every triplet corresponds to a unique amino acid and hence not all variation leads to …Šƒ‰‡‹–Š‡‡š’”‡••‡†’‘Ž›’‡’–‹†‡Ǥ‹–Šˆ—”–Š‡”‘†‹ϐ‹…ƒ–‹‘ǡƒ’”‘–‡‹

is generated — an end product of the central dogma of molecular biology.20

Ϯ͘ϭ͘ϮsĂƌŝĂƟŽŶŝŶƚŚĞŚƵŵĂŶŐĞŶŽŵĞ

Variation in the genome occurs because of the inheritance of one allele from each parent, and additionally because of genomic mutations.21 Common DNA variants with a population frequency more than 1% are traditionally called polymorphisms.22 ‘™‡˜‡”ǡ ƒ ‘”‡ ˆ‡ƒ•‹„Ž‡ …Žƒ••‹ϐ‹…ƒ–‹‘ ‹•ǣ common variant with more than 5%; low variant between 0.5 and 5%; and rare variant with below 0.5% frequency.22

In meiosis, the homologous chromosomes exchange genetic material in corresponding loci through recombination, before the random assignment of either of the chromosomes in to a gamete. This increases the inherited variation. However, genetic loci of close proximity are inherited together more often than is expected by chance, since the crossing over does not usually separate them. The alleles in such loci are said to be in linkage disequilibrium (LD). However, LD is affected by other things besides physical distance between the alleles. Stable blocks of alleles in LD were ϐ‹”•– †‡•…”‹„‡† ‹ –Š‡ ƒ’ƒ’ ”‘Œ‡…–Ǥ23 The region adjacent harbors several such alleles that are known to descend as a unit called a “haplotype”.

Figure 1 illustrates an example of a haploblock in an LD plot.

2

&ŝŐƵƌĞϭ͘ŶŽƟŽŶĂůŝůůƵƐƚƌĂƟŽŶŽĨĂůŝŶŬĂŐĞĚŝƐĞƋƵŝůŝďƌŝƵŵ;>ͿƉůŽƚǁŝƚŚŚĂƉůŽďůŽĐŬ ŽĨǀĂƌŝĂŶƚƐϭƚŽϱ͘^ƚLJůĞĂĚĂƉƚĞĚĨƌŽŵƚŚĂƚƵƐĞĚŝŶ,ĂƉůŽǀŝĞǁƐŽŌǁĂƌĞ͘24dŚĞƌĞĚ ĐŽůŽƌŝŶĚŝĐĂƚĞƐĂŚŝŐŚůĞǀĞůŽĨůŝŶŬĂŐĞ͘EƵŵďĞƌƐŝŶƐŝĚĞƐƋƵĂƌĞƐŝŶĚŝĐĂƚĞŶŽƌŵĂůŝnjĞĚ ĐŽĞĸĐŝĞŶƚŽĨůŝŶŬĂŐĞĚŝƐĞƋƵŝůŝďƌŝƵŵ;͛Ϳ͕ĂĐŚŝĞǀŝŶŐǀĂůƵĞƐďĞƚǁĞĞŶϬĂŶĚϭϬϬ;ŶŽƚ ƐŚŽǁŶͿ͘^ŝnjĞŽĨƚŚĞŚĂƉůŽďůŽĐŬŝƐƐƚĂƚĞĚŝŶŬŝůŽďĂƐĞƐ;ŬďͿ͘

Mutations can occur spontaneously or due to an external trigger. The spontaneous endogenous mutations, such as errors in cellular mechanisms or chemical damage to DNA, exceed the mutations caused by environmental chemicals and radiation.22 The majority of the DNA changes are small scale and rarely have an obvious effect on the phenotype.22 Structural variation is less frequent, but usually has large consequences.22 The borderline between these, however, is arbitrary. In Table 1 some common types of genetic variation are listed according to the nomenclature from most recent recommendation.25

Block 1 (N kb)

1 2 3 4 5 6

98 0

99 99 99 0 98 98 2

98 0 0

Variant 1 Variant 2 Variant 3 Variant 4 Variant 5 Variant 6

Ϯ͘ϭ͘ϯ/ŵƉĂĐƚŽĨŐĞŶĞƟĐǀĂƌŝĂƟŽŶ

The mere knowledge of the position of genetic variation provides little understanding in relation to its impact. There are several databases that contain information about the predicted functional consequence of genetic variants (ClinVar26, Ensembl27). In addition, biochemical and cellular assays are the most reliable proof of the function of a genetic variant.

Table 1.ŽŵŵŽŶƚLJƉĞƐŽĨŐĞŶĞƟĐǀĂƌŝĂƟŽŶ͘

dLJƉĞŽĨǀĂƌŝĂƟŽŶ ĞƐĐƌŝƉƟŽŶ͗ΗĐŚĂŶŐĞǁŚĞƌĞŝŶƐƉĞĐŝĮĐƐĞƋƵĞŶĐĞ ĐŽŵƉĂƌĞĚƚŽƚŚĞƌĞĨĞƌĞŶĐĞƐĞƋƵĞŶĐĞ͘͘͘Η

/ŶǀĞƌƐŝŽŶ DŽƌĞƚŚĂŶŽŶĞŶƵĐůĞŽƟĚĞƌĞƉůĂĐĞƐƚŚĞŽƌŝŐŝŶĂůƐĞƋƵĞŶĐĞ͕

ďĞŝŶŐŝŶƚŚĞƌĞǀĞƌƐĞͲĐŽŵƉůĞŵĞŶƚ͘

dƌĂŶƐůŽĐĂƟŽŶ ZĂŶŐĞŽĨŶƵĐůĞŽƟĚĞƐĨƌŽŵŚŽŵŽůŽŐŽƵƐĐŚƌŽŵŽƐŽŵĞƌĞƉůĂĐĞ ŽƌŝŐŝŶĂůƐĞƋƵĞŶĐĞ͘

ŽƉLJEƵŵďĞƌ

sĂƌŝĂƟŽŶ sĂƌŝĂďůĞŶƵŵďĞƌŽĨƚĂŶĚĞŵƌĞƉĞĂƚƐĂƌĞŝŶƐĞƌƚĞĚ͘

^ƵďƐƟƚƵƟŽŶ KŶĞŶƵĐůĞŽƟĚĞŝƐƌĞƉůĂĐĞĚďLJĂŶŽƚŚĞƌŶƵĐůĞŽƟĚĞ͘

ĞůĞƟŽŶ KŶĞŽƌŵŽƌĞŶƵĐůĞŽƟĚĞƐĂƌĞĚĞůĞƚĞĚ͘

/ŶƐĞƌƟŽŶ KŶĞŽƌŵŽƌĞŶƵĐůĞŽƟĚĞƐĂƌĞŝŶƐĞƌƚĞĚŝŶƚŚĞƐĞƋƵĞŶĐĞ͕ƚŚĞLJ ĂƌĞŶŽƚĂĐŽƉLJŽĨƚŚĞϱΖ͘

ƵƉůŝĐĂƟŽŶ ĐŽƉLJŽĨŽŶĞŽƌŵŽƌĞŶƵĐůĞŽƟĚĞƐŝƐŝŶƐĞƌƚĞĚϯΖŽĨƚŚĞ ŽƌŝŐŝŶĂůĐŽƉLJ͘

ŽŶǀĞƌƐŝŽŶ ZĂŶŐĞŽĨŶƵĐůĞŽƟĚĞƐĨƌŽŵŽƚŚĞƌůŽĐƵƐƌĞƉůĂĐĞŽƌŝŐŝŶĂů sequence, in this indel.

ĞůĞƟŽŶͲ/ŶƐĞƌƟŽŶ KŶĞŽƌŵŽƌĞŶƵĐůĞŽƟĚĞƐĂƌĞƌĞƉůĂĐĞĚďLJŽŶĞŽƌŵŽƌĞŽƚŚĞƌ ŶƵĐůĞŽƟĚĞƐ͘

2 A variation may have no difference in the amino acid sequence should

the change be present in a noncoding region. Variation in the noncoding area may participate in transcription regulation at even remote loci, however. This is the case with the lactase-persistence polymorphic variant LCT -13910T that is located in intron 13 of the MCM6 gene, more than ͳ͵ԜͲͲͲ „ƒ•‡ ’ƒ‹”• ˆ”‘ –Š‡ ‹‹–‹ƒ–‹‘ •‹–‡ ‘ˆ –Š‡LCT gene that encodes lactase.28 A synonymous variation resides in the exon of a gene but does not alter the amino acid sequence. A nonsynonymous variation alters the amino acid sequence and, hence, changes the protein code.21 In addition, a variation may be described according to its potency to cause a loss of function or gain of function. In Table 2 the most common consequences of a given variation are listed.29

Table 2.WŽƐƐŝďůĞƚLJƉĞƐŽĨĐŽŶƐĞƋƵĞŶĐĞƐŽĨǀĂƌŝĂƟŽŶ͘

dLJƉĞŽĨǀĂƌŝĂƟŽŶ ĞƐĐƌŝƉƟŽŶ ĞůĞƟŽŶŽƌĂĚƵƉůŝĐĂƟŽŶ

ŽĨĂŐĞŶĞŽƌĂŶĞdžŽŶ

KŌĞŶƉĂƚŚŽŐĞŶŝĐ͕ĚƵƉůŝĐĂƟŽŶĐŚĂŶŐĞƐĂŵŽƵŶƚŽĨ ƉƌŽƚĞŝŶƉƌŽĚƵĐƚ͖ŵĂƌŬĞĚǀĂƌŝĂƟŽŶďĞƚǁĞĞŶŐĞŶĞƐ͘

sĂƌŝĂƟŽŶŝŶƚŚĞƉƌŽŵŽƚĞƌ

ĂƌĞĂ;ĐŝƐͲĂĐƟŶŐĞīĞĐƚƐͿ WŽƐƐŝďůĞĞīĞĐƚŽŶƚƌĂŶƐĐƌŝƉƟŽŶĂĐƟǀŝƚLJ͘

sĂƌŝĂƟŽŶƚŚĂƚĂīĞĐƚƐ ƐƉůŝĐŝŶŐ

ŚĂŶŐĞƐŝŶƚŚĞĞdžŝƐƟŶŐƐƉůŝĐĞĐŝƚĞƐŽƌĐƌĞĂƟŽŶŽĨ ĐƌLJƉƟĐŽŶĞƐ͖ĐŚĂŶŐĞŝŶƚŚĞƉƌŽƚĞŝŶƉƌŽĚƵĐƚ͘WŽƐƐŝďůĞ ůŽƐƐŽĨĨƵŶĐƟŽŶ͘

&ƌĂŵĞƐŚŝŌǀĂƌŝĂƟŽŶ /ŶƐĞƌƟŽŶŽƌĚĞůĞƟŽŶƚŚĂƚĂůƚĞƌƐƚŚĞƚƌŝƉůĞƚƌĞĂĚŝŶŐ ĨƌĂŵĞ͖ŶŽƉƌŽƚĞŝŶŝƐƉƌŽĚƵĐĞĚ͘>ŽƐƐŽĨĨƵŶĐƟŽŶ͘

EŽŶƐĞŶƐĞǀĂƌŝĂƟŽŶ WƌĞŵĂƚƵƌĞŝŶƚĞƌƌƵƉƟŽŶŽĨƚƌĂŶƐĐƌŝƉƟŽŶ͖ŶŽƉƌŽƚĞŝŶŝƐ ƉƌŽĚƵĐĞĚ͘>ŽƐƐŽĨĨƵŶĐƟŽŶ͘

DŝƐƐĞŶƐĞǀĂƌŝĂƟŽŶ KŶĞĂŵŝŶŽĂĐŝĚŝƐƌĞƉůĂĐĞĚďLJĂŶŽƚŚĞƌ͖ĐŚĂŶŐĞŝŶ ƚŚĞƉƌŽƚĞŝŶƉƌŽĚƵĐƚ͘WŽƐƐŝďůĞůŽƐƐŽƌŐĂŝŶŽĨĨƵŶĐƟŽŶ͘

2.2 IDENTIFYING SUSCEPTIBILITY VARIANTS FOR COMPLEX