Random Forest (page 5 of 5) |
Sklearn gives us many hyper-parameters that we can set to control the trees in the forest. We can set rules for the splits and the depth of each tree. These can reduce the potential over-fitting problem while still maintaining the "random" nature of the random-forest. However, pruning the trees in a forest may introduce bias to our trained model.
In general, the more trees in the forest the more robust the overall model so the size of the forest (n_estimators) is one of the first hyper-parameters we can tune. This is a powerful lever because a large number of relatively uncorrelated models (trees) operating as a committee (ensemble) will outperform any of the individual constituent models.
Reading Check: Diamonds are the hardest natural substance.
InClass Example: MBAs Employed At Graduation
Combined Decision Tree & Random Forest Example: Customer Loyalty Groups