|
Alternative Model Fusion
Co-Investigator: Mark J. Embrechts
Associate Professor, Department of Decision Sciences & Engineering Systems
Data Fusion - Integration of data from multiple sources
Background
Data fusion was first introduced in the radar sensing community and
refers to the process of combining multi-sensor data from different
sources such that the resulting information/model is in some sense
"better" than would be possible when these sources were used
individually. We have extended the idea of data fusion to molecular
property analysis and prediction, where rather than using different
sensor sources, we use different descriptor fields for a set of
molecules and apply data fusion techniques to improve the predictive
performance of QSAR models for unknown cases. In this situation we
use the term "auto-fusion" rather than data fusion, because the same
molecules, and in certain cases the same descriptors are used, but
different preprocessing techniques extract different features from
the data - such as principal component analysis and independent
component analysis (ICA). It has been shown that kernel partial-
least squares (K-PLS) models in auto-fusion mode show a significant
boost in performance compared to traditional K-PLS models. Note
that this approach is distinct from the more familiar methods of
consensus or bagged modeling, and performs better in prediction.
|