“Ensembles methods avoid overfitting by utilizing components that read irrelevant data differently (canceling out noise) but read relevant inputs similarly (enhancing underlying signals).”

“Overfitting” has two aspects: (a) bias, and (b) variance. The logit regression has more bias, but less variance. The forest has less bias but more variance. Typically overfitting is measured as the expected value of squared error between prediction and actual on out-of-sample tests. It has three components: (1) bias squared, (2) variance, and (3) irreducible error due to the data set. It’s possible, for instance, that the data is so bad, no models will perform well, and no number of models will perform well. In these cases the irreducible error is much bigger than the other two components of overfitting.

An extreme case of high variance and zero bias is memorizing the training set. That is, when given an input, return the output associated with the nearest matching value in the train. To the degree the training set perfectly predicts the out-of-sample, then, also, the bias will be small. But it is clear if the data has any noise, this is not going to work well. On the other hand, if the training set were the smoothed output of another model, it might.

These and similar discussions are nicely covered in: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/AlexZhao/Overfitting_0416_updated.pdf

]]>