Назад к подготовке
ВопросСредняяllm-finetuningМатериалы интервью · Apriori

Как работает LoRA fine-tuning

Как работает LoRA fine-tuning

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

LoRA freezes the base model and learns small low-rank matrices whose product is added to selected linear layers. Only the adapter weights get gradients, so optimizer state and trainable memory are much smaller.

Полный разбор

A dense linear layer has a weight matrix W. Full fine-tuning updates W directly, which is expensive for large LLMs because gradients and optimizer state must be stored for many parameters.

LoRA instead freezes W and learns a low-rank update Delta W = B A, where A and B have rank r much smaller than the original dimensions. During forward pass, the layer behaves like W x plus the adapter contribution B A x, often scaled by a LoRA alpha factor. Common target layers are attention projections and sometimes MLP projections.

Because only A and B are trainable, memory and compute for optimizer state are much lower. At deployment time, adapters can be kept separate and swapped per task/tenant, or merged into the base weights for simpler inference. The trade-off is that LoRA capacity depends on rank, target modules and data quality; it is not a replacement for all full fine-tuning cases.

Теория

LoRA is parameter-efficient fine-tuning through a learned low-rank delta on top of frozen base weights.

Типичные ошибки

  • Say LoRA trains a separate small model unrelated to the base model.
  • Forget that the base weights are usually frozen.
  • Assume LoRA always has no inference cost; separate adapters can add operational complexity.

Как отвечать на собеседовании

  • Use the formula Delta W = B A to make the answer concrete.
  • Mention adapter swapping or merging if the role involves multi-tenant model serving.