Contactos is a program for calculating similarities between docked poses of protein ligands. The similarity matrices produced by Contactos can be used to cluster the docking results using an external clustering tool, such as MCL. Contactos itself does not include clustering.
Contactos was conceived and the initial version written by Mikko Huhtala. An efficient grid search algorithm and other optimizations were added by Santeri Puranen.
Contactos has succesfully been used in combination with MCL to cluster docking results from Schrödinger InducedFit and Gold runs.
Contactos is written in Python, so a Python runtime environment is required. Most Linux distributions provide this. Contactos has been developed and tested on Python 2.5 running on Linux, but it may work on other operating systems and versions of Python.
MCL is not required to run Contactos, but it is likely the easiest clustering program to use in conjunction with Contactos. MCL is written by Stijn van Dongen and it is distributed under the GPL license. MCL can be downloaded at micans.org/mcl.
Contactos is free software and it is licensed under the General Public License, version 3 (GPLv3).
contactos-1.1.tgz The archive contains the Python-language file contactos.py, a change log / version history and a copy of this web page.
Contactos generates a descriptor of the protein – ligand contacts for each docked pose. The program goes through each ligand atom – protein atom pair. If the atoms are within a cut-off distance of each other, they are defined as a 'contact'. The default cut-off is 3.0 Å. For each pair, one bit is added to the descriptor. A value of 1 denotes a 'contact' and a value of 0 an atom pair without contact. The completed descriptor is a vector of bits, describing which atom pairs are in contact. This algorithm requires that all ligand poses must have the same atoms listed in the same order in the input coordinate files, i.e. it can only be used when the input consists of the various docked poses of one molecule.
The similarity between two descriptors is calculated as c / ( a + b + c ), where c is the number of bits that are on in both descriptors, a is the number of bits that are on in the first but not the second descriptor and b the number of bits that are on in the second but not the first descriptor. In other words, the similarity index is the number of shared atom – atom contacts divided by the number of all possible contacts (analogous to the Tanimoto coefficient). The similarity index of a descriptor with itself is always 1.0.
Once the descriptors are generated for each docked pose in the input, Contactos calculates an all-against-all matrix of similarities and writes it out in different formats.
Contactos implements a second algorithm for inputs that include different ligand structures (option -t or --types). The alternative algorithm works as follows: for each protein atom, check if there is a ligand atom within the cut-off distance. If such an atom is found, add the type of that atom into a list. The complete descriptor is the list of sets of ligand atom types that are in contact with each protein atom. The descriptor could be thought of as the footprint of the ligand on the receptor. Obviously, this descriptor is much less accurate than the default one, since it only considers atom types and not actual atom identities within a specific molecule.
The similarity between two such descriptors is calculated in a manner analogous to the default descriptors, but instead of bits, a score based on the atom types found in both descriptors at corresponding positions is used. 1.0 is added to the score for any type of carbon atom, 0.5 is added for hydrogen and 3.0 for other atom types (all non-carbon heavy atoms). The similarity index is the total score of shared atom types divided by the score of the union of atom type sets in both desciptors. Again, the similarity index of a descriptor compared with itself is always 1.0.
The atom type weighing scheme is currently coded in the Contactos program and cannot be changed at runtime. However, editing the Python-language source code is easy.
Contactos does not detect symmetry in the docked molecules. The default descriptor algorithm will give two different descriptors for two different poses of a symmetric molecule, even if the poses are related to each other by a symmetry operation and are chemically identical. If the user wishes to have these poses clustered together, the only way to do that is to use the descriptor algorithm based on atom types only (option -t or --types), and suffer the loss in the quality of clustering.