Назад к подготовке

ВопросСредняяdeep-learningТехническое собеседование · Tochka Tochka

Устойчивость градиентов, активации, skip connections и инициализация

Устойчивость градиентов, активации, skip connections и инициализация

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Gradients vanish or explode through repeated multiplication by small or large derivatives. Use stable activations, normalization, residual paths, careful initialization, gradient clipping for explosions and architecture choices that preserve signal.

Полный разбор

In backpropagation, gradients are multiplied through many layers. If typical derivatives or Jacobian norms are much smaller than one, gradients vanish. If they are much larger than one, gradients explode. Saturating activations such as sigmoid can create near-zero derivatives, while unnormalized dot products or unstable recurrent dynamics can increase norms. Activation choices help. ReLU avoids sigmoid saturation on the positive side, but dead ReLUs can receive zero gradient. Leaky ReLU, GELU and related activations keep smoother or nonzero gradients in more regions. Other controls matter as much: residual or skip connections give gradients shorter paths, normalization stabilizes activation distributions, Xavier or He-style initialization preserves variance early in training, and gradient clipping limits explosions. In sequence models, LSTM-style gates historically helped preserve long-term signal; Transformers rely heavily on residual paths, normalization and scaled attention.

Теория

Stable training is about keeping signal and gradient scales in useful ranges across depth.

Типичные ошибки

Call gradient clipping a fix for vanishing gradients.
Say ReLU has no gradient problems.
Ignore initialization and normalization.
Explain exploding gradients only as numeric overflow, not as unstable optimization.

Как отвечать на собеседовании

Separate vanishing and exploding remedies.
Mention residual connections as the main deep-network answer.