Diversity Series, Part 2
PART 2: What are the parameters for evaluating diversity?
This is the second in a three-part series discussing chemical diversity. In part 1, we introduced the concept of diversity in the context of fragment screening libraries. In this part, we will discuss common measures of diversity and the pros and cons of each in fragment library design. We view diversity as a measure of the efficiency of the library (hits obtained per compound screen) and the ability to provide of a broad range of starting points for synthesis. A diverse set of starting points mitigates risks during the optimization process that may arise from scaffold-associated toxicity/pharmacokinetic issues or unforeseen patent conflicts.
Chemical property diversity
Chemical property range is a common measure of fragment library diversity and has its historical context in Lipinski’s original Rule of Five and the more recent “Rule of Three” refinement for fragments. This measure includes the distribution of a range of properties including molecular weight, clogP and hydrogen bonding.
Pros: Chemical properties are closely tied with predicted success in the clinic.
Cons: Chemical properties are not predictive of biological activity and therefore are not predictive of the hit rate against a range of target classes.
Conclusion: The distribution of chemical properties should be within a range consistent with marketed drugs.
Chemical fingerprint diversity
Chemical fingerprints are typically binary representations of the 2D structure of a molecule. Fingerprints used in conjunction with Tanimoto index similarity is a common method to analyze the diversity of a chemical library. Furthermore, clustering based upon the Tanimoto score can aid library design by grouping compounds into similar "bins" and then computationally selecting subsequently more diverse compounds.
Pros: Highly efficient automated method for analyzing chemical diversity based upon 2D chemical structure. Fingerprinting methods are computationally efficient and thus they can be useful for analysis of very large compound libraries.
Cons: Like chemical properties, 2D fingerprints are not predictive of biological activity. For higher molecular weight fragment libraries, such an analysis can be misleading. High diversity based upon fingerprint analysis can be due to diversity of the cores with low functional group diversity or diversity of functional groups with low core diversity.
Conclusion: Clustering based upon molecular fingerprints and Tanimoto similarity is a useful first-pass analysis of fragment library diversity.
Compound shape diversity
Chemical diversity based upon compound shape is a relatively new approach in fragment library design. This has arisen out of a focus on incorporating 3-dimensional fragments into screening libraries.
Pros: 3D compound shape samples diversity in a manner consistent with a diverse array of shapes of target molecular binding pockets.
Cons: Shape diversity does not necessarily result in a diversity of functionality, and rigidification of scaffolds into 3D shapes can actually reduce the probability of finding lead compounds.
Conclusion: Compound shape is another useful parameter in analysis of fragment libraries. Balancing the library with rigid and flexible, high sp3 fragments along with more chemistry friendly aromatic scaffolds is our suggested focus of our library design.
Distribution of chemical properties for our libraries is summarized below:
Scatter plots for Zen-Library 1, Zen-Library 2, Zen-Opti and Zen-Flex: Each plot is at the same scale. The x-axis is the sp3 character (sp3 atoms/total atoms) and the y-axis, VABC (fragment volume) as calculated by the CDK (Chemical Developers Toolkit) node of KNIME. The size of the circles represents the number of aromatic atoms in the molecule and color the molecular weight.
The chart above shows that:
The number of compounds with zero aromatic atoms increases across our libraries (Zen-Flex > Zen-Library 2 > Zen-Opti > Zen-Library 1)
Zen-Library 1 is composed of few compounds that have zero aromatic atoms (dots rather than circles) with overall lower sp3 character: It is primarily composed of common cores found in drugs.
Zen-Flex has the high sp3 character without aromatic rings (more flexible); Zen-Library 2 has high sp3 character with aromatic rings (more rigid)
Size is relatively uniform across the collection although Zen-Opti and Zen-Library2 have larger molecules and Zen-Flex is composed of smaller molecules.
To recap: The parameters for evaluating diversity are as diverse as the measurement itself! We will conclude with Part 3 where we discuss our strategy for using these measures of diversity and measures developed internally in design of our screening libraries.