Назад к подготовке

ВопросЛегкаяdeep-learning-regularizationТехническое собеседование · Navio

Вопрос

Explain how dropout behaves during training and inference. Why does the implementation need scaling, and what is inverted dropout?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

During training dropout randomly zeroes activations; during inference it is disabled. Scaling keeps the expected activation magnitude consistent between train and inference.

Полный разбор

Dropout is a training-time regularizer. For each forward pass it randomly zeroes a fraction p of activations, so the model cannot rely on one fixed set of neurons and behaves more like an ensemble of subnetworks. At inference time dropout is turned off, because predictions should be deterministic and should use the full network. Without scaling, the expected activation magnitude would differ between training and inference. In inverted dropout, which is common in frameworks such as PyTorch, the kept activations are divided by 1 - p during training. Then inference needs no extra multiplication. The alternative convention is to leave training activations unscaled and multiply by 1 - p at inference. The key interview point is expectation matching, not the exact convention.

Теория

Dropout changes the train-time computation graph; scaling aligns expected activations across train and eval modes.

Типичные ошибки

Say dropout also randomly zeroes neurons during normal inference.
Forget the scaling convention.
Confuse p with the keep probability 1 - p.

Как отвечать на собеседовании

State train and inference behavior separately.
Mention inverted dropout if the interviewer asks about framework defaults.