- RS_WebPredictor Release Notes
RECCR Rensselaer Exploratory Center for Cheminformatics Research
Rensselaer Exploratory Center for Cheminformatics Research (RECCR)












News Members Projects Publications Software Data MLI ECCRS
RS-WebPredictor - Release Notes 1.0

Usage

Regioselectivity-WebPredictor (RS-WebPredictor) lets the user submit candidate molecule(s) in one of two ways, 1) a single structure may be either drawn or copied from a MOL file, or 2) a batch file of any number of compounds may be submitted in SDF or SMILES format. All fields originally contained within the SDF will be returned, as will the PRIMARY, SECONDARY and TERTIARY predicted sites of metabolism from each requested isozyme model. If the user did not supply an ID field then one will be created, with the name of each molecule placed into this field. If no molecule name was provided in the original file then the name will be 'molecule_x', where x is the numerical position of the molecule within the SDF. If the user supplied SDF already contains an ID field this field will be returned as an Original_ID field. Any special characters within the original name, such as *, &, ^, ', or %, will be replaced with an _ in the ID field. It is recommended that the input SDF contains only heavy atoms, though the server should work with molecules having attached hydrogen atoms. If the server does not work for a given SDF file please mail the file to brenec(at)rpi.edu, in addition to any other complaints. This example file may be used as demonstration of server output for a variety of different input formats. When applied to a large batch of files (> 100) execution time is approximately 2 seconds per substrate for descriptor generation and 1 seconds per substrate per model. On small batches, there can be substantial increase in per-molecule execution time. Applying the server to a single substrate takes 4.3s, 6.3s, and 13s for molecules with 3, 21, and 88 heavy atoms respectively.

RS-WebPredictor Technical Details

The initial server page is a standard HTML file. A ChemDoodle utility is included that lets users construct a candidate substrate. A perl .cgi script is employed to parse the user selected options, upload the submitted SDF or SMILES substrates, and them on to a .sh bash script. That script calls Open Babel to convert a set of SMILES into a SDF. Next MOE scripts are called to verify the set of submitted molecules is not empty, and if not, quantify the substrates with descriptors. Then for each isozyme selected by the user individual MATLAB calls are made to apply 100 previously trained models to rank the putative sites of metabolism of each submitted substrates. Next a MOE script is called that imports the MATLAB generated predictions and determine a consensus ranking of sites for each molecues. These predictions are then exported into both an SDF, and a tabular results file containing the molecule name, and the atom IDs from the primary, secondary, and tertiary predicted SOMs. Next a javascript is employed that generates graphics of each submitted molecule and builds a HTML page around them. Finally bash scripting is used to send the results page and the generated prediction files to the user's email, if they provided it.

The RS-Predictor Algorithm

RS-Predictor is a tool for generating pathway-independent, isozyme-specific P450 regioselectivity QSARs from any set of known substrates and metabolites. Details of the RS-Predictor algorithm are given in:

Zaretzki, J., Bergeron, C., Rydberg, P., Huang, T., Bennett, K., Breneman, C.
RS-Predictor: A New Tool for Predicting Sites of Cytochrome P450-Mediated Metabolism Applied to CYP 3A4
Journal of Chemical Information and Modeling, 51, 7, 1667-1689

The RS-Predictor models of this server were trained using a combination of topological descriptors and SMARTCyp reactivities applied to substrate sets of CYPs 1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and 3A4, as well a Combined set representing every curated reaction, regardless of isozyme. A large proportion of the metabolites in each set were identified within the top two predicted rank-positions: CYP isozyme (number of substrates, accuracy), 1A2 (271, 83.0%), 2A6 (105, 85.7%), 2B6 (151, 82.1%), 2C8 (142, 83.8%), 2C9 (226, 84.5%), 2C19 (218, 86.2%), 2D6 (270, 85.9%), 2E1 (145, 82.8%), 3A4 (475, 82.3%), Combined (680, 86.0%). Predictions weure made using 10 iterations of 10-fold cross-validation, and rank-aggregating the predicted SOMs of each substrate into a single consensus rank-ordering of sites. Full details of these models and results may be found in:

Zaretzki, J., Rydberg, P., Bergeron, C., Bennett, K., Olsen, L., Breneman, C.
RS-Predictor Models Augmented with SMARTCyp Reactivities: Robust Metabolic Regioselectivity Predictions for Nine CYP Isozymes
Journal of Chemical Information and Modeling, 52, 6, 1637-1659

SDFs of the collated substrates and metabolite for each isozyme may be found here. Server predictions made upon substrates that were contained within this set of 680 substrates should be considered overtrained. For further details and citing the server:

Zaretzki, J.; Bergeron, C.; Rydberg, P., Breneman, C.
RS-WebPredictor: A Server for Predicting Sites of P450-Mediated Metabolism
Bioinformatics, Under Review

Browser Requirements

Visual output of RS-WebPredictor is only available on current versions of Mozilla Firefox, Google Chrome, Internet Explorer and Safari. Versions of Internet Explorer, 8 and earlier do not support visualization, unless Google Chrome Frame plugin is installed. Version 3.6 of Firefox and later do support visualization.

Grants

National Institutes of Health, Grant: 1P20 HG003899 and Office of Naval Research, Grant: N00014-06-1-0014

Contact

Professor Curt Breneman

Jed Zaretzki, Ph.D.


Return to RS-WebPredictor
Rensselaer Polytechnic Institute RECCR Home Page || Member Area || Wiki

Copyright 2010 Rensselaer Polytechnic Institute
All rights reserved worldwide.