This is the first of a series of posts based on a project that I devised and completed for a computational drug discovery course at the Johns Hopkins University. The original submission was written in the style of an academic paper; the content has been modified, abridged, and rearranged to be more approachable as blog entries. There are also some things that, in retrospect, I would have done differently if I were to revisit the project; the text is modified to note these new considerations.
The original title of the project was:
Assessment of the Utility of MQN Property Spaces for Drug Discovery
The exploration and characterization of chemical space is an important enabling technology for drug discovery programs. A system of Molecular Quantum Numbers (MQNs) has been proposed in which individual organic molecules are represented by a series of 42 integers, each of which is a count of a structural feature in the molecule. This allows for the representation of compounds as points in a 42-dimensional property space. While the clustering of compounds with similar structural and physiochemical properties has been observed in dimensionally-reduced MQN space, the ability to use this space for tasks of interest to drug discovery programs has not been well-characterized. In this study, the predictive capabilities of the MQN representation system are assessed, as well as the ability of the system to represent true structural and physiochemical diversity in sets of compounds, which is useful for the development of focused libraries for compound or fragment screening.
Introduction and Objectives
Molecular Quantum Numbers (MQNs) provide the ability to represent molecules in a 42-dimensional property space ; the 42 MQNs are described in the table below. Within this space, distance metrics can easily be used to quantify the similarity of compounds . Dimensionality reduction methods are commonly employed to permit human visualization and interpretation of high-dimensional data sets; principal component analysis (PCA) is one such method that has been extensively applied to MQN data sets for this purpose [1-4]. These dimensionally-reduced property space maps can exhibit groupings of compounds with similar phsyiochemical and structural properties, and can even sometimes group compounds with similar bioactivities.
As these properties are generally reflective of molecular composition without consideration of topology or geometry, they fall into the category of constitutional descriptors in terms of quantitative structure-activity relationship (QSAR) analysis. Notably missing from the MQN system are topological, steric, electrostatic, and other such useful and relevant properties of compounds; these properties include molecular volumes, topological polar surface areas, molar refractivities, partition coefficients, and many others. Omission of such properties is partially due to the fact that they may not be easily quantizable, but more importantly, a single set of MQN properties can represent multiple molecules which may differ in such characteristics; molecules with the same set of MQNs are referred to as MQN isomers. While this may be viewed as a representational limitation of the MQN system, it is indicated by the authors of the system that MQN isomers generally exhibit similar bioactivities . It is possible that one MQN isomer could effectively bind to a drug target, while another has no such effects, though no such examples were found in a review of the literature.
An example of a molecule with its associated MQNs is presented in the figure below. A major advantage of the MQN system is its simplicity: MQN properties can easily be manually established with basic knowledge of organic chemistry, mappings of the chemical space are more easily graspable by human observers, and large chemical spaces can be quickly assessed and classified due to the computational tractability . While islands of molecules with affinities for certain targets have been observed in MQN property space  and simple diversity measurements have been described based on distances to the mathematical center of gravity of a set of compounds , the capabilities of the MQN approach for characterization of the chemical space in a manner useful for drug discovery are largely unproven.
I undertook a study to establish whether the representational power of the MQN system is sufficient for cheminformatics tasks that are relevant to drug discovery programs. This objective is characterized by three hypotheses:
- Statistically significant differences can be identified between the properties of molecules that show affinity for major target families based solely on the MQN representations of the molecules.
- Effective models can be constructed to predict possible target families for novel small molecules based solely on MQN representations.
- The calculation of diversity metrics from MQN properties provides diversity assessments that are comparable to those calculated from more complex (computationally intensive) sets of properties.
The following three posts will each cover one of the objectives in further detail; conclusions will then be discussed in an additional post.
- Nguyen KT, Blum LC, van Deursen R, and Reymond JL. 2009. Classification of organic molecules by molecular quantum numbers. ChemMedChem 4(11): 1803-1805.
- Reymond JL, Blum LC, and van Deursen R. 2011. Exploring the chemical space of known and unknown organic small molecules at http://www.gdb.unibe.ch. Chimia (Aarau) 65(11): 863-867.
- van Deursen R, Blum LC, and Reymond JL. 2011. Visualization of the chemical space of fragments, lead-like and drug-like molecules in PubChem. J Comput Aided Mol Des 25: 649-662.
- Reymond JL, van Deursen R, Blum LC, and Ruddigkeit L. 2010. Chemical space as a source for new drugs. Med Chem Commun 1: 30-38.