Targeted Task Models for Cheminformatics Process Development
Ranking
Current QSPR models for ionexchange chromatography predict the protein retention time, but the key fact for bioseparations is the relative order of displacement. The statistical learning theory underlying SVM suggests that we can get better results by directly modeling the problem of ranking the displacement order of proteins rather than by trying to solve the harder problem of accurately modeling retention times (Vapnik, 1998). Highly nonlinear ranking methods have been developed by simply changing the loss function used in SVM to a loss function appropriate for ranking (Joachims, 2002). In the past PLS and KPLS could not be readily adapted to other loss function. As the name implies, PLS was created for least squares regression Recently we have developed a novel dimensionality reduction method called Boosted Latent Factors (BLF) (Momma and Bennett 2005). For any give loss function, BLF creates latent variables or principal components similar to those produced by PLS and PCA. We have extended BLF to ranking lossfunction with great success. BLF can use the kernel approach of SVM and KPLS to construct highly nonlinear ranking functions. For the least squares loss, BLF reduces to PLS. But now we can rapidly create learning methods for any convex loss function that maintain the many benefits of PLS. For example all of the feature selection and causal methodologies discussed in the Causal Chemometrics Modeling Module discussed can be readily adapted to BLF. The 1norm SVM feature selection and model interpretation methods developed for cheminformatics and chromatography can also be adapted into the BLF selection framework (Breneman et al 2003).
Previous  Next
