LLM Scaling and Architecture
Dense Transformers, MoE, long context, data/scale trade-offs and inference-aware architecture decisions from public technical reports.
Что должен уметь кандидат
- Объяснить dense vs MoE trade-offs without assuming MoE is always cheaper or better.
- Связать context length, KV-cache, model size and serving memory.
- Понимать activated parameters vs total parameters as an operational distinction.
- Читать LLM technical reports critically: architecture, training data, post-training, eval and serving implications.
Что спрашивают на собеседовании
- Почему одна команда выбирает dense Transformer, а другая MoE?
- Что activated parameters per token меняют operationally?
- Как long context влияет на latency and memory?
Практическая задача
Сравнить Llama 3, DeepSeek-V3 and Qwen2.5 from public reports: architecture, data scale, context, post-training and serving implications.
Source-grounded правило
Use public reports as examples, not as reproducible full recipes; many training details remain incomplete or workload-specific.