Random Forest

Random Forest (page 4 of 5)

A few of the advantages of random forest are: 1) It is one of the most accurate learning algorithms available, 2) It runs efficiently on large databases, 3) It can handle thousands of input variables without variable deletion, 4) It gives estimates of what variables are important in the classification, 5) It generates an internal unbiased estimate of the generalization error as the forest building progresses, 6) It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing, & 7) It has methods for balancing error in class population unbalanced data sets.

A few of the disadvantages of random forests are: 1) Random forests have been observed to create overfitted models for some datasets with noisy classification/regression tasks & 2) For data including categorical variables with different number of levels, random forests may be biased in favor of those attributes with more levels. Therefore, the variable importance scores from random forest are not reliable for this type of data.