Как обнаруживать overfitting и чем регуляризовать
Как обнаруживать overfitting и чем регуляризовать
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Overfitting shows up as improving train quality with worsening validation quality. Reduce it with more/better data, regularization, dropout, augmentation, early stopping, simpler models and ensembles.
Полный разбор
The canonical symptom is a growing gap between train and validation metrics: train loss keeps improving while validation loss or business metric gets worse. You also look at cross-validation variance, segment-level metrics and whether performance collapses on newer or shifted data.
Mitigations are data-side and model-side. Data-side: collect more representative data, clean labels, add augmentation or synthetic data when valid, and use a better validation split. Model-side: L1/L2 regularization, dropout, early stopping, reducing model capacity, pruning features, ensembling and calibration.
For neural networks, dropout, weight decay, augmentation and early stopping are common. For tabular/classical models, constraints such as tree depth, minimum leaf size, regularized linear models and robust validation are often more important.
Теория
Overfitting is memorization of training-specific patterns that do not generalize to unseen data.
Типичные ошибки
- Only compare train accuracy and test accuracy once.
- Add a larger model when validation quality is already falling.
- Use a random split when the real production split is temporal or segment-based.
Как отвечать на собеседовании
- Say how you would notice overfitting from curves.
- Match mitigation to data type and model class.