ShaEP performs rigid-body superimposition of 3D molecular models. The user can specify several query structures on which to superimpose. Each input structure is overlaid on each query structure in turn and the superimposition corresponding to the highest similarity index is reported. ShaEP considers successive input structures with the same name (allowing a numbering suffix '_' or '-' followed by a number, i.e. 'mol', 'mol-1', and 'mol_2' are all understood as incarnations of 'mol') as conformations of the same molecule, and reports the highest similarity index over the set of conformers.
The output file is a regular ASCII text file that can be imported into a spreadsheet program. In addition, in the end of a run a hitlist file is produced that contains the names of maxhits compounds in the order of increasing similarity index (i.e., the best are the last). Optionally, a minimum similarity index limit (minHitSimilarity) can be set for a molecule to be included in the hitlist. The superimposed structures are output into a SDF file if structures option is given. The default is to omit the structures.
The superimposition is achived by first finding similarities in the molecular electrostatic potential (MEP) field, which results in a few candidate overlays. Starting from these initial superimpositions, the volume overlap of the structures and the MEPs are maximized. The overlap is calculated using a set of spherical Gaussian functions. The optimization algorithm is adapted from the TNPACK code of Schlick and co-workers (Schlick T and Fogelson A (1992) ACM Trans. Math. Softw. 18, 46-70 & 71-111; Xie D and Schlick T (1999) SIAM J. Opt. 10, 132-154; Xie D and Schlick T (1999) ACM Trans. Math. Softw. 25, 108-122.)
ShaEP lists available options when called with '-h' or '--help'.
Superimpose A on B
Here B is the reference (query) molecule that does not move. Partial atomic charges should be present in the file. Resulting structures will be written into 'out.sdf' (in SD format) and the similarity index to 'similarity.txt'
shaep -q B.mol2 A.mol2 -s out.sdf similarity.txt
Now the same, but using only volume overlap optimization, which does not require the partial atomic charges:
shaep --onlyshape -q B.mol2 A.mol2 -s out.sdf shapesimilarity.txt
Screen a virtual library against A
'A.mol2' can contain several structures, e.g., conformers of a known active molecule. Collect a hitlist of 300 structures with highest similarity index. No output of superimposed structures.
shaep --maxhits 300 -q A.mol2 --output-file similarity.txt libfile1.mol2 libfile2.mol2 ...
Now the same as above, but using only shape-density overlap, which is more approximate but faster:
shaep -q A.mol2 --onlyshape lib1.mol2 lib2.mol2 ... similarity.txt
Frequently asked questions
How to install ShaEP?
The ShaEP program binary is delivered in a compressed archive. Uncompress the archive to some suitable directory and you're done.
The average similarity is 'nan' for some compound in the output file. Is this a bug?
No, it's not a bug. One or more of the input structures have overlapping atoms, which causes the interatomic distance to be zero. This leads to divide-by-zero in the computation of the similarity value, which results in 'nan', not-a-number. Also negative similarity values are sometimes observed because of zero interatomic distances.
ShaEP crashes while screening a large dataset with an error message "terminate called after throwing an instance of 'std::bad_alloc'". Is this a bug?
The process ran out of memory most probably due to the hitlist having no size limit - please set option 'maxhits' to something else than zero (which means no limit).