RECCR Rensselaer Exploratory Center for Cheminformatics Research
News Members Projects Publications Software Data MLI ECCRS
Mining Complex Patterns

The GPMT toolkit

The GPMT toolkit is highly relevant to cheminformatics applications; it will be an invaluable tool to perform exploratory analysis of complex datasets, which may contain intricate and subtle relationships. The mined patterns and relationships can be used to synthesize high-level actionable hypotheses for scientific purposes, as well as to build more global classification or clustering models of the data, or to detect abnormal/rare high-value patterns embedded in a mass of “normal” data.

GPMT currently supports the mining of increasingly complex and informative patterns types, in structured and unstructured datasets, such as the patterns shown in the Figure (right): Itemsets or co-occurrences (Zaki, 2000), Sequences (Zaki, 2001), Tree patterns (Zaki 2002 and Zaki, 2005) and Graph patterns.
In a generic sense a pattern denotes links/relationships between several objects of interest. The objects are denoted as nodes, and the links as edges. Patterns can have multiple labels, denoting various attributes, on both the nodes and edges. The main features of GPMT are as follows:
  • Generic data structures to store patterns and collections of patterns, and generic data mining algorithms for pattern mining. One of the main attractions of a generic paradigm is that the algorithms (e.g., for isomorphism and frequency checking) can work for any pattern type.
  • Persistent/out-of-core structures for supporting efficient pattern frequency/statistics computations using a tightly coupled database management systems (DBMS) approach.
  • Native support for different (vertical and horizontal) database formats for highly efficient data mining. We use a fully fragmented vertical database for fast mining and retrieval.
  • Support for pre-processing steps like data mapping and discretization of continuous attributes and creation of taxonomies, as well as support for visualization of mined patterns.

Previous || Next

Rensselaer Polytechnic Institute RECCR Home Page || Member Area || Wiki

Copyright ©2005 Rensselaer Polytechnic Institute
All rights reserved worldwide.