Градиентный бустинг, остатки и диапазон предсказаний
Градиентный бустинг, остатки и диапазон предсказаний
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Boosting adds trees that approximate negative gradients of the loss, not raw targets. Because predictions are sums of many gradient steps, a boosted regressor can move outside the original target range.
Полный разбор
Gradient boosting builds an additive model. Start with an initial prediction, then repeatedly fit a weak learner to the negative gradient of the loss with respect to current predictions. For MSE, that gradient is proportional to the residual y - y_hat, so the intuition of fitting residuals is correct.
The leaf values in later trees are not simply averages of original y values. They are fitted updates, often gradients or Newton-style leaf estimates depending on the implementation and objective. The final prediction is the initial value plus learning-rate-scaled contributions from all trees.
A random forest regressor averages target values in leaves and then averages trees, so for standard settings it tends to stay within the range of training targets. Gradient boosting is a sum of updates; it can overshoot and predict outside the observed target range, especially with many trees, high learning rate or objectives that allow such updates.
Теория
The word gradient matters: boosting optimizes loss in function space rather than bagging target averages.
Типичные ошибки
- Say each boosting leaf stores only averaged y values.
- Explain gradient boosting exactly like random forest.
- Forget the learning rate in the additive prediction.
- Assume tree-based regressors can never extrapolate outside target range.
Как отвечать на собеседовании
- For MSE, derive the residual as the negative gradient.
- Contrast boosting with random forest leaf averaging.