Сжатие моделей и catastrophic forgetting
Сжатие моделей и catastrophic forgetting
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Compression options include quantization, pruning and distillation; adaptation can use LoRA/adapters. Forgetting is detected on held-out general benchmarks and reduced with replay data, regularization to the base model and parameter-efficient fine-tuning.
Полный разбор
Compression and adaptation are different levers. Quantization reduces numeric precision, pruning removes weights or structures, distillation trains a smaller student from a stronger teacher, and LoRA/adapters update a small number of parameters while keeping the base mostly fixed.
Catastrophic forgetting shows up when fine-tuning improves the new domain but degrades general capabilities or old-domain benchmarks. You need an evaluation suite with both new-domain and old-domain tasks, plus slices for safety-critical or business-critical behavior.
Common mitigations are replaying a controlled mixture of old-domain data, using teacher logits or reference answers, regularizing the new model toward the base model, lowering learning rates, early stopping, and using LoRA or adapters instead of full fine-tuning. The trade-off should be measured: do not preserve old capabilities so aggressively that the model fails the target domain.
Теория
Forgetting is an evaluation problem first: without old-domain checks, fine-tuning regressions are invisible.
Типичные ошибки
- Only evaluate on the new domain.
- Treat LoRA as a guarantee against all forgetting.
- Distill without checking that the teacher is reliable on the target data.
Как отвечать на собеседовании
- Say “new-domain metrics plus old-domain regression suite”.
- Name replay data and parameter-efficient tuning as mitigations.