 |
|  |
|
Software
Modeling and Mining ||
Small Molecules ||
DNA & RNA ||
Proteins
Modeling and Mining
-
ROMS
The RECCR Online Modeling System (ROMS) is a general web-based machine
learning system. By using the available learning methods, users can
generate a model and visualize its performance by uploading their data
set through the web client. Three learning methods provided are Partial
Least Squares (PLS), Kernel-PLS and Support Vector Machine (SVM). In
addition to basic modeling functionality, cross validation methods such
as Leave-One-Out (LOO) and Monte Carlo Cross Validation (MCCV) are
provided for model parameter selection.
-
DMTL
Data Mining Template Library (DMTL) supports the mining of
increasingly complex and informative patterns types, in structured and
unstructured datasets, including Itemsets, Sequences, Trees and Graphs
(See Fig. 1). DMTL is a C++ library consisting of highly efficient
algorithms and data structures, utilizing a generic data mining
approach, where all aspects of mining are controlled via a set of
properties. Another novel feature of DMTL is that it provides
transparent persistency and indexing support for effective computation
over massive datasets. We have successfully mined datasets in the
60-100GB range using a desktop PC!
DMTL has been publicly released as open-source software on the
world-wide SourceForge site, and
it has already been downloaded by over 2000 researchers from all over
the world.
Small Molecules
-
RECON
RECON is an algorithm for the rapid reconstruction of molecular
charge densities and charge density-based electronic properties of
molecules, using atomic charge density fragments pre-computed from ab
initio wavefunctions. These are known as Transferable Atom Equivalents,
or "TAEs". The method is based on Bader's quantum theory of Atoms in
Molecules.
-
PEST
PEST Shape/Property hybrid descriptor technology, developed in DDASSL,
allows better representation of the kinds of intermolecular interactions
that are dependent on molecular shape. The inclusion of PEST descriptors
has been found to significantly improve QSPR models where intermolecular
interactions play an important role in the chemical effects being modeled.
PEST descriptors are generated using TAE molecular surface representations
to define property-encoded boundaries similar to the Zauhar "Shape
Signature" ray-tracing approach to shape/property convolution.
DNA & RNA
-
DIXEL
Web-based descriptor generator that provides a TAE-based
representation of the electronic properties of the major or minor grooves
of DNA. DIXEL represents electron density features such as electrostatic
potential (EP) and local average ionization potential (PIP) on the
accessible surfaces of the major or minor groove on a grid of rectangles
-- the "Dixel" coordinate system. These features can be displayed
graphically and/or employed as input to data mining algorithms.
-
MFOLD
The objective of the Mfold web server for nucleic acid folding and hybridization prediction is to provide easy access to RNA and DNA folding and hybridization software to the scientific community at large. By making use of universally available web GUIs (Graphical User Interfaces), the server circumvents the problem of portability of this software.
Detailed output, in the form of structure plots with or without reliability information, single strand frequency plots and 'energy dot plots', are available for the folding of single sequences.
Protein
-
PROTEIN RECON
A version of the RECON/TAE program optimized for use with
proteins, allowing users to rapidly produce a set of descriptors that can
characterize protein behavior. Protein Recon is an algorithm for the
rapid reconstruction of molecular charge density-based electronic
properties of proteins, using peptide fragments precomputed from ab
initio wavefunctions. These properties can be displayed graphically
and/or employed as input to data mining algorithms.
-
WebPDB
WebPDB is a web-based workflow system that is flexible and
capable of semi-automatic protein structure cleaning activities. The
protein data may be provided by the user, but can also be directly
downloaded from the PDB archive as part of the automated workflow. In its
next generation, WebPDB will produce pH-sensitive protein surface
descriptors that take into account appropriate protonation states and
fractional protonation/deprotonation of basic and acidic side chain groups.
WebPDB prepares proteins for use in virtual screening and predictive modeling.
It removes gaps (through self-homology with FASTA information), heteroatoms
and ligands (for re-use).
Coupled with other modeling tools, WebPDB can be useful in probe
development and the interpretation of secondary screening results through
docking and scoring computations.
-
HMMSTR
The Monte Carlo fragment insertion method for protein tertiary structure prediction (ROSETTA) of Baker and others, has been merged with the I-SITES library of sequence structure motifs and the HMMSTR model for local structure in proteins, to form a new public server for the ab initio prediction of protein structure. The server performs several tasks in addition to tertiary structure prediction, including a database search, amino acid profile generation, fragment structure prediction, and backbone angle and secondary structure prediction.
-
SCALI
Proteins of the same class often share a secondary structure
packing arrangement but differ in how the secondary structure
units are ordered in the sequence. We find that proteins that share
a common core also share local sequence-structure similarities, and
these can be exploited to align structures with different topologies.
In this study, segments from a library of local sequence-structure
alignments were assembled hierarchically, enforcing the compactness
and conserved inter-residue contacts but not sequential ordering.
Previous structure-based alignment methods often ignore sequence
similarity, local structural equivalence and compactness.
SCALI (Structural Core ALIgnment), can
efficiently find conserved packing arrangements, even if they are nonsequentially
ordered in space. SCALI alignments conserve remote
sequence similarity and contain fewer alignment errors. Clustering of
our pairwise non-sequential alignments shows that recurrent packing
arrangements exist in topologically different structures.
-
MASKER contacts & MASKER voids
A fast algorithm for computing the solvent accessible molecular surface area (SAS) using Boolean masks (Le Grand, S. M. & Merz, K. M. J. (1993). J. Comp. Chem. 14, 349-52.) has been modified to estimate the solvent excluded molecular surface area (SES), including contact, toroidal and reentrant surface
components. Numerical estimates of arc lengths of intersecting atomic SAS are using to estimate the toroidal surface, and
intersections between those arcs are used to estimate the reentrant surface area. The new method is compared to an exact
analytical method. Boolean molecular surface areas are continuous and pairwise differentiable, and should be useful for
molecular dynamics simulations, especially as the basis for an implicit solvent model.
MASKER contacts finds the surface area burial by residue in a protein while
MASKER voids finds the locations of empty cavities in proteins (or any molecule).
|
|
 |