Когда пробовать бустинг для прогноза LTV
Когда пробовать бустинг для прогноза LTV
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Try boosting when important effects are nonlinear, categorical, thresholded or interaction-heavy. Keep the linear baseline and promote boosting only if validation and business metrics improve.
Полный разбор
Gradient boosting often helps tabular LTV problems because it can model nonlinear effects, thresholds and feature interactions without manually specifying every transformation. It is especially strong with categorical features, mixed numeric/categorical data, missing values and business rules such as country-channel-plan interactions.
It is less compelling if the dataset is tiny, the signal is mostly linear, latency or explainability requirements are strict, or the labels are too noisy to justify a more flexible model. A more flexible model can also overfit recent campaign artifacts.
The decision should be measured. Keep linear regression as the baseline, use time/cohort validation, compare MSE/MAE plus conservative business metrics, inspect calibration and slice performance, then run an online or shadow evaluation if the offline result is meaningful. Model complexity is justified by stable lift, not by the fact that boosting is fashionable.
Теория
Boosting is a high-performing tabular default, but its value comes from nonlinear and interaction signal that survives validation.
Типичные ошибки
- Switch to boosting without a baseline.
- Validate randomly and accidentally leak future cohort behavior.
- Ignore calibration and business threshold performance.
- Assume a more complex model is automatically better for production.
Как отвечать на собеседовании
- Name nonlinearities and interactions as the reason.
- Say how you would validate the model before rollout.