Continued from part 1.
Statistically significant differences can be identified between the properties of molecules that show affinity for major target families based solely on the MQN representations of the molecules.
MQN-based assessments of large data sets have been shown to be practical, and so analysis for this study was limited to smaller sets. A data set consisting of small molecules that have shown affinity for one or more targets among a collection of several target families was constructed. Ligands and decoys (which are compounds that are chemically similar to known ligands but do not exhibit affinity for the targets) were downloaded from the Database of Useful Decoys (DUD) . The set of targets was identified from those that were shown to cluster in MQN-space in prior literature , and are summarized in this table:
RDKit is an open source package for cheminformatics that provides programming interfaces for the in silico modeling and manipulation of molecules, and for measurement of their properties. As RDKit does not accept the .mol2 file format used by ZINC and DUD as input, the Open Babel utility was used to convert .mol2 files to the SMILES (.smi) format . RDKit was used to calculate the MQNs and other properties for the molecules that were chosen for analysis.
The goal of statistical testing is to analyze the differences between the means and variances of the properties of the ligands for each group, to establish if any such differences are significant. Such differences provide for human-accessible descriptions of the qualities of the compounds that tend to provide affinity for one target family over another.
On the expectation that no underlying distribution can be assumed in the data, nonparametric tests were used exclusively. Kruskal-Wallis one-way analysis of variance (ANOVA) was used to establish when a group has significant differences from the others, but does not identify which group or where the differences exist. When statistically significant differences for MQN properties were confirmed with Kruskal-Wallis ANOVA, the Kruskal-Wallis signed rank test was further applied to pairwise comparisons of target groups to establish which groups exhibited significant differences. In all cases, an alpha risk of 5 percent was used as the threshold, indicating that a null hypothesis can be rejected when the p-value is below 0.05.
Ligands for each target were compared to the ligands for every other target, resulting in 6 choose 2 = 15 comparisons. Statistical tests comparing each combination of targets for each property yielded numerous significant differences. These differences are summarized in this figure:
Gray cells indicate that any measured difference between the ligand groups was not statistically significant for that property. Other colors indicate an identified statistically-significant difference (p-value ≤ 0.05). Dark blue indicates that the mean value was substantially higher for the first of the two compared groups; dark red indicates that the mean value was higher for the second.
There are numerous notable differences that can be identified from the heatmap; when necessary, such differences can be further visualized using histograms or boxplots. Some interesting distinctions between the ligand groups include:
- AmpC and COX-2 ligands tend to include more acyclic tetravalent nodes than ligands for the other targets.
- COX-2 ligands are much more likely to include fluorine atoms, to have fewer hydrogen bond donors, and to include more cyclic divalent nodes than ligands for the other targets.
- DHFR ligands have fewer acyclic oxygen atoms on average in comparison to the other ligand groups; this is further explored with histograms below.
- GART and COX-2 ligands tend to have higher heavy atom counts, which is generally associated with higher molecular weights. This is further detailed in the boxplot below.
- GPB ligands are much more likely to include cyclic oxygen atoms than ligands for the other targets.
- SAHH ligands are less likely to contain rings with 5 members than ligands for the other targets.
Histograms are useful for further exploration of the intuitions obtained during the analysis of discriminating features. As show in the figure on the left, the ligands for GART have a dramatic peak at a count of 6 acyclic oxygen atoms, while most DHFR ligands have 2 or fewer. The DHFR ligands are the only ones with a large proportion containing no acyclic oxygens.
Acyclic oxygens that are parts of amide, ketone, or sulfone moieties are notably prone to acting as hydrogen bond acceptors. However, acyclic ethers are significantly weaker acceptors than cyclic ethers . While the presence of acyclic oxygens may suggest such H-bond activity, the MQN system is not capable of definitively distinguishing among the moieties.
The plot on the right clearly shows that the GART ligands have the highest mean number of heavy atoms, though COX-2 and DHFR ligands vary widely in size and can rival the size of GART ligands. AmpC, GPB, and SAHH ligands almost always contain fewer heavy atoms in comparison to GART ligands.
Naturally, the results of this form of analysis depend on the correctness and completeness of the data set. If the analyzed ligands are not sufficiently representative of all the known ligands, or there are sizable quantities of unidentified ligands with different properties, then this approach may not successfully provide a good perspective on the archetypal ligand for a particular target. However, it does provide a human-accessible method of assessing whether a novel compound fits the general known criteria for target affinity. The availability of such distinguishing characteristics suggests that the construction of QSAR models for the prediction of biological activity may be fruitful.
- Huang N, Shoichet BK, and Irwin JJ. 2006. Benchmarking sets for molecular docking. J Med Chem 49(23): 6789–6801.
- Nguyen KT, Blum LC, van Deursen R, and Reymond JL. 2009. Classification of organic molecules by molecular quantum numbers. ChemMedChem 4(11): 1803-1805.
- O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. 2011. Open Babel: an open chemical toolbox. Journal of Cheminformatics 3: 33.
- Bissantz C, Kuhn B, and Stahl M. 2010. A medicinal chemist’s guide to molecular interations. J Med Chem 53(14): 5061-5084.