Open Conference Systems, ITC 2016 Conference

Font Size: 
PAPER: Comparison of Variable Importance Measures from Tree-Based Models
Chansoon Lee, Wei-Yin Loh

Building: Pinnacle
Room: 3F-Port of Hong Kong
Date: 2016-07-03 03:30 PM – 05:00 PM
Last modified: 2016-05-21

Abstract


Taking into consideration the interaction between predictors and reducing the order effects from the recursive variable selection in a single tree, variable importance measures (VIMs) based on ensembles of trees are often used as an alternative tool for exploratory screening in bioinformatics and related areas where the number of predictor variables exceeds the sample size. Several VIM algorithms have been proposed and compared to each other in previous research. However, there are no studies that examine a wide variety of VIMs. The current study fills the gap by comparing several VIMs using simulated and real data to better understand their performance under different conditions.

Using classification or regression tree models, a series of simulation studies is conducted with four different sets of data: 1) continuous and categorical predictors containing different numbers of categories and different degrees of association with an outcome variable; 2) an unbalanced outcome variable; 3) three perfectly correlated predictors in classification trees; and 4) highly correlated predictors with different degrees of association with an outcome variable in regression trees. With these simulated data and real data from an international large-scale assessment, this research compares eight different VIMs, including four permutation measures, two node-impurity measures, and inclusion proportions under ensembles of trees as well as importance scores based on chi-square statistics under a single tree.

This study finds that the VIMs generally indicate predictors reliably, which have high association with an outcome variable, as important variables. Moreover the VIMs are not affected by the unbalanced data, with the exception of two permutation measures. However, highly correlated predictors in data mostly lead to different results among VIMs. The findings of this research can offer practitioners and scientists practical guidelines of successful implementation of VIMs to acquire a set of important variables.


An account with this site is required in order to view papers. Click here to create an account.