• Ei tuloksia

The pathogenic Alpers clusters defined by Euro et al. (Euro et al., 2011)(See 4.4.2) were extended to include all available POLG syndrome patient cases in study I. This brought the total number of unique pathogenic mutations included in the clustering model from 57 to 136. The pathogenic clusters were further divided into subclusters (1: A-G; 2: A-D; 3: A-D; 4: A; 5: A-B) and their ranges were redefined to accommodate the new patient data.

The number of unique mutations from publicly available patient case reports was further increased to 176 in study III. The residue ranges of the clusters were also refined due to new patient data that had become available since study I was published. The ranges of four subclusters were extended slightly: 2B, 496–517 (previously 497–517); 2D, 752–769 (previously 752–767);

1F, 1098–1138 (previously 1104–1138); and 3C, 795–807 (previously 804–807). The full pathogenic cluster residue ranges are shown in Table 4.

Table 4. Residue ranges of the five pathogenic clusters divided into subclusters.

Cluster Subcluster Residue range 1. Polymerase

active site and environs

1A 83-88 1B 136-143 1C 417-433 1D 848-895 1E 914-966 1F 1098-1138 1G 1157-1196 2. DNA binding

channel

2A 463-468 2B 496-517 2C 561-617 2D 752-769 3. Partitioning loop 3A 268-277 3B 303-319 3C 795-807 3D 1047-1096 4. Distal accessory

subunit interface

4A 224-244 5. Putative

protein-protein interactions

5A 623-648 5B 737-749

The holoenzyme structure (Szymanski et al., 2015) was solved after studies I and II were completed, and it was used for analysis and visualization of the extended cluster in study III.

Unstructured regions in the spacer AID and IP subdomains that were solved in the apoenzyme structure (Y. Lee et al., 2009) were modelled onto the holoenzyme structure. The extended clusters with the sequence schematic of the clusters is shown in Figure 15.

57 Figure 15. The extended clusters mapped onto the Pol γ holoenzyme tertiary structure (PDBID:4ZTU). Pol γ-β

subunits are shown in light grey (proximal subunit) and dark grey (distal subunit). DNA primer template is shown in orange color, and the pathogenic clusters with matching colors between the sequence schematic and tertiary structure. Unstructured regions in the spacer IP and AID domains have been modelled onto the structure based on the apo enzyme structure. Figure reproduced from study III.

The extended clustering model provides a powerful tool for classifying both known and novel POLG mutations and their heterozygous combinations. Just as importantly, the mutations that map outside of the clusters are much more likely to be SNPs. In a clinical setting, being able to rule out mutations as the cause of a condition of a patient is very important for reaching a diagnosis and being able to determine the right course or treatment.

The role of cluster 3 (the partitioning loop) in the Pol γ structure remains unclear. In the apoenzyme structure (PDBID:3IKM) the partitioning loop extends inwards from the surface of the polymerase domain into the polymerase active site. The deposited apoenzyme crystal structure has unstructured regions (residues 1072-1090 and 319-344) that are missing from the structural data. The constant temperature factors assigned for this region of the structure also indicate that solving this part of the structure has not been straightforward. The missing regions of the structure could be IDRs and serve multiple possible functions, such as sites for mitochondrial helicase interaction (Qian et al., 2015), mtSSB interaction, regulating primer switching between the polymerase and exonuclease active sites (Euro et al., 2011) or functioning as a DNA clamping mechanism, found in many DNA polymerases. In the more recent

holoenzyme structure (PDBID:4ZTU), the partitioning loop (residues 1039-1074) adopts a very different conformation (Figure 16).

Figure 16. Panels A and B show the differences between the apo- (Panel A, PDBID:3IKM) and holoenzyme (Panel B, PDBID:4ZTU) structures. The region that is encompassed by pathogenic cluster 3 has a very different conformation between the two structures (shown in red). This area of the structure has also two unstructured regions (gaps, possible IDRs) whose function has not been elucidated. Figure reproduced from study III.

The ranges of the clusters may still need to be extended further as new patient data becomes available. The fingers subdomain of Pol γ-α has regions outside of the clusters that could very well affect the functionality of the polymerase domain active site (unpublished data). The currently available data can only take us so far, and efforts such as MSeqDR (Falk et al., 2016) are making headway in bringing the data from the clinics to the use of researchers. An analysis of novel POLG mutations based on the pathogenic clustering model was carried out in study II, and the results are presented in chapter 6.6.

6.2.1 Limitations of the pathogenic clustering model

While the clusters define high-risk areas for pathogenic mutations, not every residue within the cluster is equally important for the functionality of the Pol γ enzyme. The polypeptide chain has a tendency to have every other amino acid residue side chain extend to a different direction. This happens naturally in the process of finding an energetically favorably positioning (Chellgren &

Creamer, 2006). This phenomenon is emphasized with residues that have large, bulky sidechains that take up a lot of space around the backbone of the polypeptide chain. Therefore, at critical locations, such as the DNA binding channel, by the natural tendency of the polypeptide chain, every other residue has its sidechain extending into the channel, and next one, away from it. The

59 reality is of course not so trivial, but at the whole protein level, this tendency has a significant effect.

POLG mutation population frequencies obtained from the ExAC database (Lek et al., 2016) (Figure 17) show that some relatively frequent mutations (and therefore more likely to being benign) lie within the pathogenic clusters.

Figure 17. Population frequency data of the POLG-gene obtained from the ExAC database (Lek et al., 2016) shows that some mutations that are relatively frequent are inside the defined pathogenic clusters. Detected allele frequency is shown on the vertical axis, and the residue numbering on the horizontal axis (unpublished data). The pathogenic clusters are shown with a colored background and indicated below the chart.

The 17 mutations within the pathogenic clusters that have a frequency of ≥0.002 have been listed in Table 5. If these 17 mutations were in fact pathogenic in compound heterozygous form, the mutations would be likely to appear in patient case reports due to the number of carriers that exist in populations around the world. Mutation G517V has been reported in 12 separate patient cases, its pathogenicity has been debated in the literature and it has also been characterized biochemically and shown to exhibit 80-90% of wild-type DNA polymerase activity (Table 3;

Kasiviswanathan & Copeland, 2011). Currently, it is considered a benign mutation.

Table 5. In the ExAC project population frequency data 17 cluster-mapping mutations show up with a frequency ≥ 0.002. If these mutations were pathogenic in compound heterozygous form, they would be likely to show up in the patient data. Pathogenicity of G517V has been debated in the literature, but is considered benign currently (Kasiviswanathan & Copeland, 2011). The details of the patient cases are available on the PolG pathogenicity server (III).

The clustering model does not account for the severity of the amino acid changes at the primary structure level. For example, mutations from leucine to isoleucine or methionine, or from phenylalanine to tyrosine all have positive Blosum matrix substitution scores (Table 2), and they are considered neutral to moderate changes at the primary structure level. Moderate changes are less likely to cause major perturbations in the tertiary structure and biological functionality. On the other hand, dramatic changes outside of the clusters, even though not at critical locations for the biological functionality of the Pol γ enzyme, can be architecturally important and affect the folding of the enzyme so that its intrinsic tertiary conformation is never achieved.