Structural Bioinformatics Lab / Software

Contactos

Last updated on 2008-01-10 by mhuhtala abo fi.

Contactos is a program for calculating similarities between docked poses of protein ligands. The similarity matrices produced by Contactos can be used to cluster the docking results using an external clustering tool, such as MCL. Contactos itself does not include clustering.

Contactos was conceived and the initial version written by Mikko Huhtala. An efficient grid search algorithm and other optimizations were added by Santeri Puranen.

Contactos has succesfully been used in combination with MCL to cluster docking results from Schrödinger InducedFit and Gold runs.

Features

Requirements

Contactos is written in Python, so a Python runtime environment is required. Most Linux distributions provide this. Contactos has been developed and tested on Python 2.5 running on Linux, but it may work on other operating systems and versions of Python.

MCL is not required to run Contactos, but it is likely the easiest clustering program to use in conjunction with Contactos. MCL is written by Stijn van Dongen and it is distributed under the GPL license. MCL can be downloaded at micans.org/mcl.

License

Contactos is free software and it is licensed under the General Public License, version 3 (GPLv3).

Download

contactos-1.1.tgz The archive contains the Python-language file contactos.py, a change log / version history and a copy of this web page.

Usage examples

contactos.py -h
Print the help message that lists all available options.
contactos.py *mol2
Calculate similarities for all *mol2 files in the current directory. By default, Contactos assumes that each file contains one or more receptor conformations, that the ligand residue is named UNK in each, and that the ligand molecule is the same in each docked solution.
mcl contactos_out.mcl_pairs --abc -o mcl_output
Run MCL with the default settings on the Contactos output from the previous command and write the output in mcl_output. MCL must be installed and the executable mcl must be found in the search path for this to work.
contactos.py -t -o this_set -m *mol2
Calculate similarities for all *mol2 files in the current directory. Use the descriptor based on atom types only (-t, allows different ligand molecules, see next section), prefix the output file names with this_set (-o), and run MCL and post-process its output (-m). The executable mcl must be installed and found in the search path, otherwise running MCL will fail.
contactos.py -r receptor7.mol2 -t -c 3.5 docked/*mol2
Read one single receptor conformation from receptor7.mol2 and the docked ligands separately from docked/*mol2. Use the descriptor based on atom types only (-t, allows different ligand molecules, see next section). Set the distance cut-off for contact calculation to 3.5 Å.

Concept

Docked poses of one ligand molecule

Contactos generates a descriptor of the protein – ligand contacts for each docked pose. The program goes through each ligand atom – protein atom pair. If the atoms are within a cut-off distance of each other, they are defined as a 'contact'. The default cut-off is 3.0 Å. For each pair, one bit is added to the descriptor. A value of 1 denotes a 'contact' and a value of 0 an atom pair without contact. The completed descriptor is a vector of bits, describing which atom pairs are in contact. This algorithm requires that all ligand poses must have the same atoms listed in the same order in the input coordinate files, i.e. it can only be used when the input consists of the various docked poses of one molecule.

The similarity between two descriptors is calculated as c / ( a + b + c ), where c is the number of bits that are on in both descriptors, a is the number of bits that are on in the first but not the second descriptor and b the number of bits that are on in the second but not the first descriptor. In other words, the similarity index is the number of shared atom – atom contacts divided by the number of all possible contacts (analogous to the Tanimoto coefficient). The similarity index of a descriptor with itself is always 1.0.

Once the descriptors are generated for each docked pose in the input, Contactos calculates an all-against-all matrix of similarities and writes it out in different formats.

Different ligand molecules

Contactos implements a second algorithm for inputs that include different ligand structures (option -t or --types). The alternative algorithm works as follows: for each protein atom, check if there is a ligand atom within the cut-off distance. If such an atom is found, add the type of that atom into a list. The complete descriptor is the list of sets of ligand atom types that are in contact with each protein atom. The descriptor could be thought of as the footprint of the ligand on the receptor. Obviously, this descriptor is much less accurate than the default one, since it only considers atom types and not actual atom identities within a specific molecule.

The similarity between two such descriptors is calculated in a manner analogous to the default descriptors, but instead of bits, a score based on the atom types found in both descriptors at corresponding positions is used. 1.0 is added to the score for any type of carbon atom, 0.5 is added for hydrogen and 3.0 for other atom types (all non-carbon heavy atoms). The similarity index is the total score of shared atom types divided by the score of the union of atom type sets in both desciptors. Again, the similarity index of a descriptor compared with itself is always 1.0.

The atom type weighing scheme is currently coded in the Contactos program and cannot be changed at runtime. However, editing the Python-language source code is easy.

Symmetric molecules

Contactos does not detect symmetry in the docked molecules. The default descriptor algorithm will give two different descriptors for two different poses of a symmetric molecule, even if the poses are related to each other by a symmetry operation and are chemically identical. If the user wishes to have these poses clustered together, the only way to do that is to use the descriptor algorithm based on atom types only (option -t or --types), and suffer the loss in the quality of clustering.