Virtual screening - REVIEW OF THE LITERATURE

2 REVIEW OF THE LITERATURE

2.3 Virtual screening

Virtual screening (VS) is used to find bioactive compounds to a target protein from large virtual molecular databases. It is widely used in drug discovery as it offers a powerful tool to enrich most potent molecules to the top fraction of the results. Thus, the burden of testing a vast number of compounds experimentally to find the most potent hit compounds can be eased by filtering molecular databases first by VS (Fig. 4) (Gimeno et al. 2019). As is the case with computational prediction of CYP mediated metabolism, the number of VS tools is vast, and the methods diverse with both protein structure and ligand-based approaches, and many of the utilized methods overlap. The computational demands of the tools are also diverse. Accordingly, methods with different computational demands can be streamlined into a filtering workflow, where the fastest methods are used for the initial filtering of the database, and more demanding methods are used to cherry-pick the most potent compounds amongst the pre-filtered library (Homeyer et al. 2014, Gimeno et al. 2019).

FIGURE 4 General scheme for virtual screening. Adapted from literature (Fig. 1 of Gimeno et al. 2019).

Ligand-based VS methods utilize existing structure-activity data and physicochemical properties of known ligands to identify crucial features for activity against a protein target, and as has been discussed, structure-based methods evaluate the interactions of potent molecules with the protein.

Ligand-based methods are generally computationally highly efficient, but their

ability to find truly novel ligand structures is somewhat limited due to their dependence on the knowledge of known ligands. Ligand-based methods include approaches such as 2D fingerprint matching (Duan et al. 2010), similarity search (Kumar and Zhang 2018), 2D and 3D pharmacophore models (Qing et al. 2014, Schaller et al. 2020) and 3D quantitative structure-activity relationship (3D-QSAR) methods (Verma et al. 2010). Molecular docking is one of the most used structure-based methods for structure-based VS (Maia et al. 2020). Other methods include structure-based pharmacophores (Qing et al. 2014, Schaller et al. 2020) and negative image-based (NIB) screening (Lee et al. 2009, Virtanen and Pentikäinen 2010, Niinivehmas et al. 2011, 2015, Lee and Zhang 2012) and rescoring (R-NIB) (Kurkinen et al. 2018, 2019). In lead optimization and final phases of VS, computationally more demanding methods such as binding free energy calculations can be used to cherry-pick the most potent molecules (Homeyer et al. 2014). The choice of the VS method is case-specific and a comparison of different approaches may be useful (Warren et al. 2006, McGaughey et al. 2007, Cross et al. 2009, Duan et al. 2010, Homeyer et al. 2014, Niinivehmas et al. 2016).

The computational validation process of VS methods is generally performed by utilizing libraries of active ligand molecules and decoy compounds (Gimeno et al. 2019). These can be derived from general chemical bioactivity databases such as ChEMBL (Mendez et al. 2019) or from specialized libraries such as the Database of Useful Decoys (DUD) (Huang et al. 2006) and DUD-Enhanced (DUD-E) (Mysinger et al. 2012). Ligands are compounds that have been experimentally confirmed to be highly active towards the target protein (Gimeno et al. 2019). Decoys resemble the ligands by structure and physicochemical properties but have not been determined to possess activity towards the target.

A number of metrics can be used for the validation (Gimeno et al. 2019, Maia et al. 2020). For example, the enrichment factor (EF) depicts the proportion of active molecules found after the application of a certain filter, for example the acquisition of a certain percentage of top-scored molecules (EF X%). In other words, the EF describes how well the method could, in theory, enrich bioactive molecules from a virtual database to experimental testing in a certain percentage of top-scored molecules. The area under curve (AUC) value of the characteristic operator curve (ROC) describes the overall performance of the VS method. The ROC curve visualizes the performance of a VS method by plotting the true positive rate of ligands against the false positive rate of decoy molecules, and the AUC value depicts the probability of a ligand to be scored better than a decoy compound. During the development of a VS tool, it is also beneficial to evaluate the performance of the method with a diverse set of both protein targets and small molecules (Mysinger et al. 2012). The topology and physicochemical nature of ligands varies from target to target, and thus utilizing a structurally and functionally diverse set of target proteins helps avoid bias towards a specific category of chemicals. The same applies for the chemical diversity of the ligand and decoy molecules. For example, the DUD-E benchmarking sets have been built for 102 different protein targets, including nuclear receptors, ion channels, kinases, proteases, and a wide array of other enzymes (Mysinger et al. 2012).

2.3.1 Negative image-based screening with Panther

NIB screening builds on the concept that the target protein’s ligand binding site is represented as a negative image of the cavity, the similarity of which can be compared to small molecules (Lee et al. 2009, Virtanen and Pentikäinen 2010, Niinivehmas et al. 2011, 2015, Lee and Zhang 2012). The approach combines the high computational efficiency of ligand-based methods and the structural information of the protein target. NIB methods define the protein binding cavity and its complementarity to a ligand more loosely than traditional docking (Lee and Zhang 2012, Niinivehmas et al. 2015). Instead of atom-wise evaluation of the ligand-protein interactions, they are based on the global similarity of electrostatics (Niinivehmas et al. 2015) or chemical features (Lee and Zhang 2012) and the shape of the ligand and the binding cavity. In a comparison between AutoDock molecular docking (Morris and Huey 2009) and BSP-SLIM (Binding Site Prediction with Shape-based LIgand Matching with binding pocket) NIB docking, the difference between the level of strictness was considered to result in better performance of the NIB method on low-resolution protein structures (Lee and Zhang 2012).

The priorly developed NIB VS method, based on atomic Panther NIB models, is a fast structure-based approach for rigid docking and VS (Niinivehmas et al. 2015). In this approach, the 3D NIB model is created using Panther (Niinivehmas et al. 2015), and the 3D shape and electrostatics comparison and scoring are performed with ShaEP (Vainio et al. 2009). The NIB method can be applied for VS as is with excellent early enrichment performance (Niinivehmas et al. 2015). As with other docking methods, the utilization of MD simulations to generate the protein structures for the NIB model creation may be useful (Virtanen and Pentikäinen 2010). Due to the simple atomic representation of the Panther NIB model, molecular fragments can be incorporated to the model in order to define desired properties of the screened molecules more precisely (F-NIB) (Jokinen et al. 2019). Alternatively, the NIB methodology can be utilized with great success for rescoring or consensus scoring of flexible molecular docking results (R-NIB) (Kurkinen et al. 2018, 2019). Although the R-NIB approach is slower than the basic Panther NIB VS due to utilization of flexible docking, the resulting binding poses may be more realistic in R-NIB especially in protein targets with a small binding cavity.

In this thesis, the different steps of Panther NIB VS are discussed and a detailed workflow has been provided together with an updated release of the Panther software to be utilized by both experts in the field and by users new to structure-based VS. The demonstrative workflow was performed with the cyclooxygenase-2 (COX-2) benchmarking set from the DUD and DUD-E databases (Huang et al. 2006, Mysinger et al. 2012). COX-2 is an established drug target for the treatment of inflammation and pain, and the marketed drugs include many non-steroidal anti-inflammatory drugs such as aspirin, ibuprofen and naproxen (Pasero and McCaffery 2001, Kasturi et al. 2019).

In this study, protein structure-based methods were used for computational analysis and prediction of CYP mediated metabolism and VS. Molecular modelling, docking and MD simulations were used to evaluate the mechanisms of isoform-selectivity of CYP ligands, ligand binding modes and substrate SOMs.

The aims were to develop MD-based protocol for CYP ligand binding mode and substrate SOM prediction, develop novel profluorescent tool molecules with good CYP isoform-selectivity, and to predict/evaluate isoform-selectivity of the reactions. Panther NIB docking, VS and rescoring were explored. Detailed description and workflow for the methodology have been provided. The aim was to provide a discussion and tutorial of the method to be utilized by both experts and users new to structure-based NIB VS.

In document Computational cytochrome P450 mediated metabolism and virtual screening (sivua 27-30)