In this chapter, we describe tree-based methods for regression and classification. Tree-based methods are simple and useful for interpretation. However, they typically are not competitive with the best supervised learning approaches in terms of prediction accuracy. Hence in this chapter we also introduce bagging, random forests, and boosting. Each of these approaches involves producing multiple trees which are then combined to yield a single consensus prediction. We will see that combining a large number of trees can often result in dramatic improvements in prediction accuracy, at the expense of some loss in interpretation.
Now you know how Random Forests method (RF) works. We often read some claims, like Random Forests “works well without tuning”, there is “no need to scale or recode predictors”, it “works well on high dimensional data”, it “cannot overfit”, etc..
In this section, you find a super talk and slides discussing some common claims about Random Forests and whether is it true that RF the first-choice method for every data analysis.