Using PlayMolecule® BindScope to participate in the SAMPL7 challenge [TUTORIAL]

Source: Deep Learning on Medium

Using PlayMolecule® BindScope to participate in the SAMPL7 challenge [TUTORIAL]

In this short tutorial we will look at how to use BindScope and, as an example, we will try to generate a submission for the first stafe of the SAMPL7 challenge, in which the user is asked to discriminate between ligand binders and non-binders for a specific list of fragments against a protein

Two cents about the algorithm

The algorithm behind BindScope is a 3D convolutional neural network designed as a binder/non-binder classifier. It has been trained using the DUD-E database (which contains ligands and decoys) and given a ligand docked to a protein, the algorithm is able to extract tridimensional structural features in the style of DeepSite and cast a prediction whether the ligand might be a binder. If you want to learn more about how the protein-ligand is featurized please check one of our old tutorials:

As usual, for all the technical aspects please refer to the scientific publication.

The SAMPL7 challenge

The first stage of the SAMPL7 challenge consists in, given a list of 799 fragment SMILES and a protein structure(PHIP in this case2), finding which of these are actual binders. The organizers of the challenge, of course, have screened the whole library and know which are hits and which are not. You can find all the data available in the challenge github:

Applicability domain

One problem we must highlight at this point is that BindScope may not work because the training database (DUDE) has little representation of fragments as shown in the plot below. This suggests that we might be outside of the applicability domain of BindScope because the training dataset differs substantially from the test dataset (the stage-1 SAMPL7 fragments). For the purpose of this tutorial we will try if it works anyway.

Distribution of heavy atoms per ligand for the DUDE database (training dataset for BindScope) and the SAMPL7 fragments (our test set). Additionally we show the distribution for GDD (our GPCR-centric model). We can see how the distribution of SAMPL7 is clearly displaced towards the left and suggests we may be out of the applicability domain.

Preparing the inputs

BindScope expects tridimensional protein-ligand complexes so the first step is to generate binding poses for the list of fragment SMILES. For this we are going to use open source software, concretely we are going to generate starting 3D ligand conformations with RDkit and use SMINA (which is based on AutoDock VINA) to generate docking poses. We will also need Open Babel on the way.

First, we are going to generate a single SDF with a starting conformation for each of the fragments:

Second, we are going to run SMINA. For this we will have to:

  1. Convert the original SAMPL7 protein PDB into the PDBQT format.
babel PHIPA_C2_Apo.pdb -xr -O PHIPA_C2_Apo.pdbqt

2. Find out where is the center of the pocket. You can use your favorite molecular visualizer like VMD, pymol, etc. In the case of the challenge we can simply inspect the file called “PHIPA_C2_apo_sites.pdb” and we can find the binding sites reported by the challenge as the last 4 heteroatoms.

HETATM 1290 HE S1 A1501 -19.150 12.842 24.700 0.44 10.58 HE
HETATM 1291 NE S2 A1601 -22.697 21.498 11.130 0.31 11.52 NE
HETATM 1289 AR S3 A1501 -6.393 4.395 15.539 0.52 26.08 AR
HETATM 1292 KR S4 A1501 -27.336 3.939 22.899 0.49 22.37 KR

We will use only the first one for the purpose of this tutorial.

Finally with this information we run SMINA. We will ask it to generate 5 binding modes and assume a padding size of 15 Angstrom:

smina.static --receptor PHIPA_C2_Apo.pdbqt --ligand fragments.sdf --center_x -19.150 --center_y 12.842 --center_z 24.700 --size_x 15 --size_y 15 --size_z 15 --num_modes 5 --out docking.sdf

The result is a set of 3045 docked fragments, which is a number slightly inferior to the asked 799 input fragments*5 binding modes (3995) but that’s because in some cases 5 modes couldn’t be generated and fewer were provided.

Running BindScope

With the previous docking SDF and the original PHIPA_C2_Apo.pdb we can now go to the BindScope application present in the PlayMolecule web platform:

The input in this case is quite intuitive: simply upload the the PDB in the “Protein PDB” field and the ligands in the “Ligands” field. We will leave the remaining options by default and click Submit.