Publications:A novel approach to estimate proximity in a random forest : An exploratory study

From ISLAB/CAISR

Do not edit this section

Keep all hand-made modifications below

Title A novel approach to estimate proximity in a random forest : An exploratory study
Author Cristofer Englund and Antanas Verikas
Year 2012
PublicationType Journal Paper
Journal Expert systems with applications
HostPublication
Conference
DOI http://dx.doi.org/10.1016/j.eswa.2012.05.094
Diva url http://hh.diva-portal.org/smash/record.jsf?searchId=1&pid=diva2:548335
Abstract A data proximity matrix is an important information source in random forests (RF) based data mining, including data clustering, visualization, outlier detection, substitution of missing values, and finding mislabeled data samples. A novel approach to estimate proximity is proposed in this work. The approach is based on measuring distance between two terminal nodes in a decision tree. To assess the consistency (quality) of data proximity estimate, we suggest using the proximity matrix as a kernel matrix in a support vector machine (SVM), under the assumption that a matrix of higher quality leads to higher classification accuracy. It is experimentally shown that the proposed approach improves the proximity estimate, especially when RF is made of a small number of trees. It is also demonstrated that, for some tasks, an SVM exploiting the suggested proximity matrix based kernel, outperforms an SVM based on a standard radial basis function kernel and the standard proximity matrix based kernel. © 2012 Elsevier Ltd. All rights reserved.