Назад к подготовке

ВопросСредняяml-fundamentalsТехническое собеседование · QIC QIC

Как обнаруживать overfitting и чем регуляризовать

Как обнаруживать overfitting и чем регуляризовать

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Overfitting shows up as improving train quality with worsening validation quality. Reduce it with more/better data, regularization, dropout, augmentation, early stopping, simpler models and ensembles.

Полный разбор

The canonical symptom is a growing gap between train and validation metrics: train loss keeps improving while validation loss or business metric gets worse. You also look at cross-validation variance, segment-level metrics and whether performance collapses on newer or shifted data. Mitigations are data-side and model-side. Data-side: collect more representative data, clean labels, add augmentation or synthetic data when valid, and use a better validation split. Model-side: L1/L2 regularization, dropout, early stopping, reducing model capacity, pruning features, ensembling and calibration. For neural networks, dropout, weight decay, augmentation and early stopping are common. For tabular/classical models, constraints such as tree depth, minimum leaf size, regularized linear models and robust validation are often more important.

Теория

Overfitting is memorization of training-specific patterns that do not generalize to unseen data.

Типичные ошибки

Only compare train accuracy and test accuracy once.
Add a larger model when validation quality is already falling.
Use a random split when the real production split is temporal or segment-based.

Как отвечать на собеседовании

Say how you would notice overfitting from curves.
Match mitigation to data type and model class.