RECCR Rensselaer Exploratory Center for Cheminformatics Research




Homology Modeling


Co-PI: Steven Cramer

Project Modules

News Members Projects Publications Software Data MLI ECCRS


Much of the work carried out to date has employed molecular descriptors that are generic in nature and represent common physicochemical properties of the molecules. Accordingly, the same descriptors were employed for both small molecule and protein datasets. However, the generality of these descriptors led to some unique challenges during the model interpretation process. While many of the MOE molecular descriptors were readily interpretable for small molecules, their interpretation was not always clear for proteins. Furthermore, the interpretation of most of the electron-density derived TAE/RECON descriptors required the use of correlation plots to determine their correlation with other “easy to interpret” features.

We will develop new descriptor sets which include electrostatic descriptors based on both charge and electrostatic potential distributions and hydrophobic descriptors based on pH-dependent hydrophobic scales of the amino acids. The properties of the molecule will be calculated at the salt and pH of the mobile phase employed in the experiments. It is expected that model interpretation from models generated with these new descriptors will provide unambiguous insights into the physicochemical properties of the proteins that influence their isotherm parameters.

As indicated above, we have successfully demonstrated our ability to carry out a priori prediction of chromatographic column separations directly from protein crystal structure data. The application of this approach for chromatographic process design and optimization relies on the availability of crystal structure data for the biomolecule of interest as well as all impurities (or at least the key impurities) in a given feed mixture. However, crystal structure information is often not available for molecules of industrial relevance and the possibility of procuring three-dimensional structures of the impurities in these biological feed streams is even more remote. Thus, there is clearly a need to refine the present multiscale modeling strategy so as to ensure its success as a methods development tool for the biotech industry.

One possible solution to this problem is the generation of predictive QSPR models using topological 2D descriptors which are computed from the primary sequence of the molecule, without the need for 3D structure information. The MOE package computes a large number of 2D descriptors based on the connection table representation of a molecule (e.g., elements, formal charges and bonds, but not atomic coordinates). These include physical properties of the molecule (such as molecular weight, log P, molar refractivity, partial charge), subdivided van der Waals surface area of atoms associated with specific bin ranges of these physical properties, various atom and bond counts, and some pharmacophore feature descriptors. While this approach may be very useful in some systems, it could result in significant model degradation in systems where molecular size and shape factors are important.

Previous || Next

Rensselaer Polytechnic Institute RECCR Home Page || Member Area || Wiki

Copyright ©2005 Rensselaer Polytechnic Institute
All rights reserved worldwide.