RECCR Rensselaer Exploratory Center for Cheminformatics Research
News Members Projects Publications Software Data MLI ECCRS
Mining Complex Patterns

GPMT Frameworks

GPMT is composed of two main underlying frameworks working in unison:

  • Data Mining Template Library (DMTL): The C++ Standard Template Library (STL) provides efficient, generic implementations of widely used algorithms and data structures, which tremendously aid effective programming. Like STL, DMTL is a collection of generic data mining algorithms and data structures. In addition, DMTL provides persistent data and index structures for efficiently mining any type of model or pattern of interest. The user can mine custom pattern types, by simply defining the new pattern types, but there is no need to implement a new algorithm, since any generic DMTL algorithm can be used to mine them. Since the models/patterns are persistent and indexed, this means the mining can be done efficiently over massive databases, and mined results can be retrieved later from the persistent store.
  • Extensible Data Mining Server (EDMS): EDMS is the back-end server that provides the persistency and indexing support for both the mining results and the database. EDMS supports DMTL by seamlessly providing support for memory management, data layout, high-performance I/O, as well as tight integration with a DBMS. It supports multiple back-end storage schemes including flat files, and embedded, relational or object-relational databases.

The effectiveness of the DTML / EDMS system will offer an alternative data analysis system that will be evaluated against SVM and KPLS statistical learning methods on chemistry datasets ranging in size from very small (24 proteins) to medium-sized (54,000 molecules from the WDI dataset of drugs and drug candidates and a variety of bioresponses).

Previous || Next

Rensselaer Polytechnic Institute RECCR Home Page || Member Area || Wiki

Copyright ©2005 Rensselaer Polytechnic Institute
All rights reserved worldwide.