steeveslab-blog

Molecules in the IPython notebook with RDKit

One of my primary motivations for setting up this blog was to have a centralized place for sharing how I am using the combination of the RDKit and the IPython notebook in teaching and research. There are many great resources for learning how to use the RDKit python bindings, including the indispensible “Getting Started width the RDKit in Python”, Greg Landrum’s RDkit blog, and a particularly nice example from the ChEMBL-og.

Our first step is to import the module that we will be using:

In [1]:
from rdkit import Chem

Special imports for drawing molecules:

In [2]:
from rdkit.Chem.Draw import IPythonConsole #Needed to show molecules
from rdkit.Chem.Draw.MolDrawing import MolDrawing, DrawingOptions #Only needed if modifying defaults

Modify default molecule drawing settings:

In [3]:
DrawingOptions.bondLineWidth=1.8

Create a molecule object from a SMILES string:

In [4]:
ibu = Chem.MolFromSmiles('CC(C)Cc1ccc(cc1)C(C)C(=O)O')

Display the molecule in the notebook:

In [5]:
ibu
Out[5]:

The molecule object can now be queried to get various properties

In [6]:
ibu.GetNumAtoms()
Out[6]:
15

Anything seem wrong here? Let's print out a text representation of the molecule and take a closer look:

In [7]:
print Chem.MolToMolBlock(ibu)

     RDKit          

 15 15  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  2  4  1  0
  4  5  1  0
  5  6  2  0
  6  7  1  0
  7  8  2  0
  8  9  1  0
  9 10  2  0
  8 11  1  0
 11 12  1  0
 11 13  1  0
 13 14  2  0
 13 15  1  0
 10  5  1  0
M  END


We see that the hydrogen atoms are implicit in this representation of the molecule. They were not included in the SMILES string and are still not present in the connection table.

In [8]:
DrawingOptions.includeAtomNumbers=True
In [9]:
ibu
Out[9]:

If we are interested in doing anything that requires atomic coordinates, we probably want an all-atom representation of the molecule. We can make the hydrogen atoms explicit by adding them (and creating a new molecule in the process). We need more functions than provided in the base Chem module, so we need to import the expanded functionality of AllChem.

In [10]:
from rdkit.Chem import AllChem
In [11]:
ibuH = AllChem.AddHs(ibu)
In [12]:
DrawingOptions.includeAtomNumbers=False
ibuH
Out[12]:
In [13]:
print Chem.MolToMolBlock(ibuH)

     RDKit          

 33 33  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  2  4  1  0
  4  5  1  0
  5  6  2  0
  6  7  1  0
  7  8  2  0
  8  9  1  0
  9 10  2  0
  8 11  1  0
 11 12  1  0
 11 13  1  0
 13 14  2  0
 13 15  1  0
 10  5  1  0
  1 16  1  0
  1 17  1  0
  1 18  1  0
  2 19  1  0
  3 20  1  0
  3 21  1  0
  3 22  1  0
  4 23  1  0
  4 24  1  0
  6 25  1  0
  7 26  1  0
  9 27  1  0
 10 28  1  0
 11 29  1  0
 12 30  1  0
 12 31  1  0
 12 32  1  0
 15 33  1  0
M  END


In [14]:
ibuH.GetNumAtoms()
Out[14]:
33

Our all-atom represntation is now suitable for many purposes, such as calculating properties that depend only upon atom-types and connectivity. One of these is the total polar surface area (TPSA):

In [15]:
Chem.rdMolDescriptors.CalcTPSA(ibuH)
Out[15]:
37.3

For other properties, we will need more information about our molecule.

We can save the molecule for later use in a python format called a pickle:

In [16]:
import cPickle as pickle
In [17]:
pickle.dump(ibuH, open('ibuH.pkl','wb'))