• Ei tuloksia

4.1 Databases (I–V)

Protein 3D crystal structures (I–V) were acquired from the Protein Data Bank (Berman et al. 2000) (www.rcsb.org/pdb) and amino acid sequences (I, III) from the UniProt database (The UniProt Consortium 2019). Ligand and decoy molecules for VS benchmarking (V) were acquired from the DUD (Huang et al.

2006) and DUD-E (Mysinger et al. 2012) libraries.

4.2 Ligand preparation (I–V)

For docking, small-molecule 3D structures, protonization, partial charges and tautomers were generated (I–V). Further low-energy conformer ensembles were generated for the rigid NIB methodology (III, V). These steps were performed using mainly LigPrep and ConfGen (Watts et al. 2010) in the Schrödinger modelling environment (Schrödinger, LLC, New York, NY). For the evaluation of the NIB methodology (V), a set of alternative ligand preparation workflows was applied utilizing Open Babel (O’Boyle et al. 2011), RDKit, Molconvert in Marvin (ChemAxon), CXCALC in Instant JChem (ChemAxon) and MayaChemTools (Sud 2016). The PLANTS docking results for NIB rescoring (V) were taken from a previous study (Kurkinen et al. 2018). For parametrization for MD simulations, ligands were subjected to quantum-mechanical geometry optimization and calculation of electrostatic potential at the HF/6-31+G* level using the polarizable continuum model in Gaussian (Gaussian Inc. Wallingford CT), and the atomic point charges were derived with the RESP method (Bayly et al. 1993) (I, II, IV).

32

4.3 Protein comparison and preparation (I–V)

For purposes of protein sequence comparison and homology modelling, protein sequences were aligned with structure-based matrix using Malign (Johnson and Overington 1993) in the Bodil modelling environment (Lehtonen et al. 2004) (I, III). Protein 3D structure superimpositions were done with Vertaa in Bodil (I–V).

When the protein crystal structure was not available or the structures required refinement such as filling of missing loops homology modelling was utilized using Modeller (Šali and Blundell 1993) (I) or Nest (Petrey et al. 2003) (III). Protein structure protonation was performed using Reduce (Word et al. 1999) (I–V).

4.4 Molecular docking and virtual screening (I–V)

Molecular docking was performed with PLANTS docking algorithm and ChemPLP scoring function (Korb et al. 2007, 2009) (I, II, IV, V). The scoring function, fCHEMPLP, is empirical and uses the following formula:

fCHEMPLP = fplp + fchem-hb + ftors-lig + fclash-lig + 0.3 · fscore-prot − 20.0

where fplp considers the steric and fchem-hb the hydrogen bonding and metal-acceptor interactions between the ligand and the protein. The intramolecular ligand interactions are accounted by torsional potential and clash terms ftors-lig and fclash-lig, and the intramolecular protein interactions by fscore-prot.

Panther (Niinivehmas et al. 2015) and ShaEP (Vainio et al. 2009) were used for NIB docking (III), VS, and rescoring of flexible docking results (Kurkinen et al. 2018, 2019) (V). The scoring function in ShaEP produces a similarity score for the shape and electrostatic similarity of two small molecules or other atomic structures, the latter of which the Panther NIB models represent.

Rocker (Lätti et al. 2016) was used for receiver operating characteristic (ROC) curves and to calculate EF values and area under curve (AUC) metrics for VS benchmarking (V). The EF values were calculated as true positive rates when 1 or 5% (EFd 1% or EFd 5%) of the decoys were found.

4.5 Molecular dynamics simulations (I, II, IV)

4.5.1 Preparation

Tleap of AMBER (Case et al. 2005) was used to prepare ligand-protein structures for MD simulations. The ligand-protein complexes were solvated with rectangular boxes filled with transferable intermolecular potential three-point (TIP3P) waters (Jorgensen et al. 1983). The protein was applied with the ff14SB

force field (Maier et al. 2015), atomic parameters for six-coordinated iron were used for the CYP heme (Giammona et al., Giammona 1984), and general Amber force field (GAFF) parameters (Wang et al. 2004) with the RESP-derived partial charges (Bayly et al. 1993) were used for ligands. Hydrogens were added. When applicable, the systems were neutralized with appropriate Na+ or Cl- counter ions.

4.5.2 Simulation

The MD simulations were perfomed in NAMD (Phillips et al. 2005). The atoms of the simulated molecules move according to the Newtonian equations of motion, where the movement is defined by the mass, position and the total potential energy Upotential of the atom that depends on all atomic positions in the simulated system. The potential energy of an atom is defined by a common potential energy function:

Upotential = Ubond + Uangle + Udihedral + UvdW + UCoulomb

where the Ubond, Uangle and Udihedral account for bonded interactions and UvdW and UCoulomb for non-bonded van der Waals and electrostatic interactions.

The MD simulations were performed in four steps (III), or with an additional minimization step after the production run (I, II). The four-step protocol and setup was based on priorly published MD simulations (Ylilauri and Pentikäinen 2013). The systems were minimized using the default conjugate gradient algorithm, first with the protein alpha carbon atoms restricted (5 kcal/mol) for 15,000 steps and second without restrictions for 15,000 steps. A 180,000 step (I, II) or 1,200,000 step (IV) equilibration MD simulation in NVT ensemble was run with the alpha carbon atoms restricted (5 kcal/mol). Finally, a production simulation in NPT ensemble without restrictions was performed using either 2.4 ns (I, II) or 24.0 ns (IV) time frame. Alternatively, the production run snapshots were further minimized in 15,000 steps without constraints (I, II).

The MD simulations were conducted using periodic boundary conditions and the particle-mesh Ewald method (Darden et al. 1993) for long-range electrostatic interactions. Time step of 2.0 fs was applied as the hydrogens were restrained using the SHAKE algorithm (Ryckaert et al. 1977). Cutoff and switching distances of 12.0 Å and 10.0 Å, respectively, were applied for van der Waals and short-range electrostatic interactions. Non-bonded interactions were calculated every time step and the full electrostatic interactions every third step. The temperature was adjusted to 300 K with Langevin dynamics, using 5 ps-1 damping coefficient for all non-hydrogen atoms. The pressure was adjusted to 1 atm using the Langevin piston Nosé-Hoover method with oscillation time scale of 200 fs and damping scale of 100 fs.

34

4.5.3 Analysis

Binding free energy estimations for the ligand-protein complexes were calculated using the molecular mechanics generalized Born surface area (MMGBSA) (Genheden and Ryde 2015) or Nwat-MMGBSA method (Maffucci and Contini 2013, 2016, Maffucci et al. 2018). The calculations were carried out using MMPBSA.py (Miller et al. 2012) and the available GB models igb1 (Tsui and Case 2000), igb2 (I, II) and igb5 (I, II, IV) (Onufriev et al. 2004). The general formula for calculating the binding energy of ligand-protein complexes with MMGBSA is the following:

ΔGbind = Gcomp – Gprot – Glig

where ΔGbind is the binding free energy, Gcomp, Gprot and Glig are the free energies of the complex, protein and ligand, respectively. The free energies are calculated with the equation:

G = Ebnd + Eel + EvdW + Gpol + Gnp – TS

where the first three terms are molecular mechanics terms of bonded, electrostatic and van der Waals interactions in vacuum. Gpol and Gnp are the polar and non-polar contributions to the solvation free energy. The polar component is calculated according to the GB model. The last terms are the temperature T multiplied by the solute entropy S. The Nwat-MMGBSA approach uses the same framework, but a specified number N of explicit water molecules closest to the ligand molecule in the MD trajectory are considered as part of the protein.

Cpptraj in AMBER (Case et al. 2005) was used for calculating atomic distances, root mean square deviation (RMSD) values (I, II, IV) and the counts of immediate ligand-water interactions (IV). RMSD values of MD snapshots to relevant crystal structures (II) were calculated with the rmsd.py script in the Schrödinger platform (Schrödinger, LLC, New York, NY).

4.6 Visualization and figure preparation (I–V)

Visual analyses were performed in the Bodil modelling environment (Lehtonen et al. 2004). Molecular figures were prepared using MOLSCRIPT (Kraulis 1991), Raster3D (Merritt and Murphy 1994) and VMD (Humphrey et al. 1996). Rocker (Lätti et al. 2016) was used for ROC curves.

5.1 Finding substrate binding mode and site of metabolism (I, II) The prediction of small-molecule binding mode and SOM is a widely considered part of CYP metabolism prediction. Enzyme structure-based methods provide insight into the atomic interactions of ligand-protein complexes. Here, MD simulations and binding free energy calculations using MMGBSA were utilized for binding mode and SOM prediction of seven cases of ligand-CYP enzyme complexes (Table 4). First, on the simpler approach, only the MMGBSA binding energy was used to select the most favoured binding pose. Atomic distances of the ligands to the heme iron were used to predict the SOM. As shortly discussed, the MD/MMGBSA method was also tested on two other protein target cases to evaluate the method on the recognition of crystal structure-like ligand poses and on the correlation of MMGBSA with experimentally determined IC50 values. On the second approach, the binding energy-based binding mode and SOM prediction of CYP substrates were augmented with enhanced stability and atomic distance analysis, and the method was experimented with two CYP substrates.

Four molecular docking poses of each small molecule were subjected to short 2.4 ns MD simulations. Twenty snapshots of each simulation were analyzed as is (II) and/or further minimized (I, II). For each pose, the binding energy was calculated. For the CYP ligands, the atomic distances of the ligands to the catalyzing heme iron were calculated. The stabilities of the poses were analyzed visually (I) or, in addition, using ligand and heme-based RMSDLH and the standard deviation (STD) of the atomic ligand-iron distances (II).

Binding mode prediction of CYP ligands based on MMGBSA values was successful in four out of seven cases (Table 4). The second approach, with the incorporation of the enhanced stability and accessibility analysis, could predict the binding mode and SOM correctly for both applied cases, one of which failed in the binding energy-based prediction (Table 4). Thus, the dynamic stability and accessibility of potential SOMs were crucial to consider in binding mode and SOM prediction in addition to the binding energy. Although the simulations

36

were relatively short, it was found that a single binding mode can be obtained from multiple different docking poses in the MD simulations. Consequently, the correct predictions required also manual analysis. A post-minimization of the simulation snapshots did not enhance the prediction.

TABLE 4 Binding mode and SOM prediction based on MD metrics.

Ligand CYP

isoform Experimentally determined

SOMs or metabolites Predicted SOMs

Top 1 (Top 2) Utilized

1*: Binding mode prediction based on MMGBSA binding energy, SOM prediction based on atomic distances of substrate positions to heme iron. In addition, binding mode stability evaluated visually.

2*: Binding mode and SOM prediction based on MMGBSA binding energy, RMSDLH, atomic distances of substrate positions to heme iron and the atomic distance stability (STD).

5.1.1 Binding mode prediction based on MMGBSA (I, II)

The binding modes of 6MC, 7MC and 7FC in complex with CYP2A6 and 2A5 (I), and coumarin in complex with CYP2A6 (II), were attempted to be predicted based on calculated MMGBSA binding energy from 2.4 ns MD simulations of four initial docking poses of each ligand. The computational prediction was compared to experimental results.

The metabolites of 6MC were determined experimentally only for 7-hydroxylation, and the 7MC metabolites are tentative as no standards were available for the acquired mass spectrometry data. 6MC is converted to fluorescent 7-hydroxy-6-methylcoumarin by both CYP2A6 and CYP2A5 with similar rate to coumarin 7-hydroxylation by CYP2A6. 7MC was found to be a mechanism-based

inactivator for CYP2A6 but not for CYP2A5. CYP2A6 converts 7MC to the tautomer of the molecule and hydroxylates the molecule at positions 3 and/or 4.

CYP2A5 hydroxylates the molecule at 7-methyl and at position(s) 3 and/or 4. 7FC was determined to be a mechanism-based inactivator for both CYP2A6 and 2A5, but possible metabolites were not determined experimentally. (I)

The MMGBSA binding energy-based prediction of a single binding mode of 6MC and 7MC in complex with CYP2A6 and CYP2A5 was in accordance with experimental results (Table 4). Other SOM(s) than the experimentally confirmed ones were suggested by the MD simulations of 6MC and 7MC in complex with CYP2A6. The MMGBSA binding energy of the energetically preferred ligand pose was often within error limits of at least one other pose. In addition, it was common that a ligand could acquire an identical binding pose in the simulations from multiple differing docking poses. The binding mode of 7FC in CYP2A6 and 2A5 could not be predicted as the binding energy values of the poses were very close and the SOM(s) or metabolite(s) of 7FC were not determined experimentally.

5.1.1.1 Successful primary predictions

Selection of a single 6MC binding pose based on MMGBSA binding energy was in accordance with the experimental result of 6MC 7-hydroxylation by both CYP2A6 and 2A5. In the MD simulations of 6MC in complex with both enzymes, the energetically preferred pose had the position 7 and 6-methyl closest to the heme iron (I Fig. 2 B). The poses were also the most stable ones based on visual analysis. In addition, in the CYP2A5 simulations, the position 7 is closest to the heme iron in two of the most preferred binding poses (I Fig. 2 B). Both poses are stable by visual analysis and the binding energies are within error limits of each other.

The simulations of 6MC in complex with CYP2A6 suggested that other than the experimentally determined 7-hydroxylated metabolite could be possible. The MMGBSA values of two poses are within error limits of the most favored pose.

In one of these two poses, the position 5 stays close to the heme iron and is thus available for oxidation. However, the experimental studies tested 6MC only for 7-hydroxylation and thus a metabolite catalyzed at the position 5 has not been experimentally confirmed.

The top binding pose from the 7MC simulations in complex with CYP2A6 and 2A5 agrees with experimentally determined metabolites. In addition, the comparison of the CYP2A6 and 2A5 simulations indicated a difference in the compound’s binding to these enzymes, which could explain the mechanism-based inactivation of CYP2A6 by 7MC. The pose in which the 7-methyl is the closest position to the heme iron is favored energetically and by stability in both CYP2A6 and 2A5. The same pose was acquired in three out of four docking poses in the CYP2A5 simulations, and in all there is space available for oxygen to bind the heme iron (I Fig. 2 D, F). Accordingly, a metabolite hydroxylated at the 7-methyl was indicated by the experimental results. In contrast to the acquired 7MC-CYP2A5 complexes, the 7-methyl group lies at a very close proximity to the CYP2A6 heme iron and leaves no space for oxygen to bind the iron (I Fig. 2 C, F).

38

This could explain the mechanism-based inactivation of CYP2A6 by 7MC. The reaction mechanism is outside the scope of MD simulations, but a hypothesis was be derived based on the coordinates of the favored binding mode. The low distance between the methyl and the heme iron could lead to formation of a 7-methylene-7H- chromen-2-ol tautomer, which the experimental data suggests for CYP2A6 but not for CYP2A5. The formed tautomer could bind tightly with the iron by the tautomer’s pi-electrons or react irreversibly with the binding site environment in an unidentified way.

The other simulated poses of 7MC in complex with CYP2A6 and 2A5 suggested also other metabolites for the compound. In CYP2A6, other SOMs for 7MC could be the positions 4 and 5, but the experimental results suggested a 3- and/or 4-hydroxylated product. In CYP2A5, the single simulation where the 7-methyl was not closest to the heme, positions 3 and 4 lie close to the heme.

Although the pose was energetically clearly worst, it showed great stability and could thus be considered a secondary binding mode of 7MC in CYP2A5. This is in line with the experimental indication of a 3- and/or 4-hydroxylated product of 7MC by CYP2A5.

5.1.1.2 Wrong or ambiguous predictions

The prediction of coumarin binding mode, based on MMGBSA binding energies, did not agree with experimentally determined 7-hydroxylation by CYP2A6. The molecular docking yielded coumarin poses, where the closest sites to the heme iron are (1) 2-carbonyl, (2) position 7, (3) 2-carbonyl and position 3 and (4) positions 5 and 6 (II Fig. 1 B). The calculated MMGBSA values are within error limits of each other in the poses (II Table 1). The pose 4 is the most preferred, followed by pose 2, 3 and 1. Accordingly, the binding energy estimations and the atomic distances suggest that position 5 would be the most preferred SOM, although the correct pose 2 is also within error limits of the top pose. From four differing docking poses, two general coumarin orientations were acquired in the simulations where either the position 7 or 5 is closest to the heme iron (II Table 1). The mechanism-based inactivation of CYP2A6 and 2A5 by 7FC could not be deduced from the MD simulations. Based on the binding energy, the preferred binding mode of 7FC in complex with CYP2A6 and 2A5 is not clear as the MMGBSA values of the poses are close and steadily within error limits of each other. However, the 7-formyl of 7FC orientates to very close proximity of the CYP2A6 heme iron at the end of the simulation of the pose 4 (I Fig. 2 H). This could indicate that the 7-formyl group plays a role in the mechanism-based inactivation, which was backed up by literature of commonly observed mechanism-based inactivation of CYP enzymes by aldehydes.

5.1.2 Enhanced stability and distance analysis (II)

Since the MMGBSA binding energies of different CYP substrate poses were in many of the above cases within error limits of each other, it was reasonable to take a closer look into other variables of the simulated poses. In order to

investigate some of the options to analyze the energetics and stability of substrate poses, the already discussed MD simulations of four initial docking poses of coumarin and 6MC in complex in CYP2A6 were examined. Both coumarin and 6MC are 7-hydroxylated by CYP2A6 with similar rate (I). Other metabolites of 6MC were not determined (I) and the literature does not suggest other metabolites for coumarin.

A set of simulation-derived metrics was suggested for predicting the preferred binding mode and SOM of a CYP substrate. The binding energy comparison of the simulated substrate poses was incorporated with a simple calculation of the binding pose stability and the atomic distances the substrate to the heme iron. Namely, for each simulation, the RMSD of the substrate and the CYP2A6 heme were calculated together (RMSDLH) in order to consider the stability of the substrate pose in relation to the first frame of the simulation. On one hand, the exclusion of the CYP amino acid residues from the RMSD analysis and superimposition removes the effect of the protein macromovement to the substrate superimposition. This allows a more delicate analysis of the substrate stability. On the other hand, when the heme is included, the RMSDLH indicates of the stability and orientation of the simulated pose in relation to the binding site. Atomic distances and stabilities, the latter indicated by the STD of each distance, of the substrate positions to the CYP heme iron were calculated for each simulation. The closer substrate positions to the heme iron are more likely to be oxidized. The closest substrate position should also be well stabilized at the supposed reaction coordinates, which is indicated by the STD of the mean distance. The distance should be low enough for the position to be available for oxidation, but leave space for the reactive oxygen.

In both coumarin and 6MC simulations, the combination of MMGBSA and the above metrics of ligand stability and atomic distances resulted in correct binding mode and SOM prediction. However, especially in the case of coumarin, this required a manual analysis of the ligand trajectories in the individual poses.

Similar or close to identical binding poses were acquired from several simulations although the simulations of both compounds were initiated from four notably differentiating docking poses.

5.1.2.1 Coumarin

Combining the information from MMGBSA binding energy, RMSDLH and the atomic distances together, the coumarin position 7 is preferred at the proximity of the CYP2A6 heme iron, thus in agreement with experimentally determined 7-hydroxylation. The prediction relied partly on manual analysis of the simulations as the orientation of coumarin changed during the individual trajectories. The pose is energetically the second-preferred, but it is the most stable and the position 7 is closer to the heme iron than position 5 in the other binding poses.

Combining the information from MMGBSA binding energy, RMSDLH and the atomic distances together, the coumarin position 7 is preferred at the proximity of the CYP2A6 heme iron, thus in agreement with experimentally determined 7-hydroxylation. The prediction relied partly on manual analysis of the simulations as the orientation of coumarin changed during the individual trajectories. The pose is energetically the second-preferred, but it is the most stable and the position 7 is closer to the heme iron than position 5 in the other binding poses.