Обязательно

LLM Scaling and Architecture

Decoder-only transformers, MoE, long context, KV-cache implications, scaling laws and practical architecture trade-offs.

Время изучения: 30 мин

LLM Scaling and Architecture

Dense Transformers, MoE, long context, data/scale trade-offs and inference-aware architecture decisions from public technical reports.

Что должен уметь кандидат

  • Объяснить dense vs MoE trade-offs without assuming MoE is always cheaper or better.
  • Связать context length, KV-cache, model size and serving memory.
  • Понимать activated parameters vs total parameters as an operational distinction.
  • Читать LLM technical reports critically: architecture, training data, post-training, eval and serving implications.

Что спрашивают на собеседовании

  • Почему одна команда выбирает dense Transformer, а другая MoE?
  • Что activated parameters per token меняют operationally?
  • Как long context влияет на latency and memory?

Практическая задача

Сравнить Llama 3, DeepSeek-V3 and Qwen2.5 from public reports: architecture, data scale, context, post-training and serving implications.

Source-grounded правило

Use public reports as examples, not as reproducible full recipes; many training details remain incomplete or workload-specific.