Sber / GigaChat: LLM подробный разбор, inference и distributed training

Плотное теоретическое интервью по LLM: BERT vs GPT, sentence embeddings, tokenization, positional embeddings, attention, GQA/SWA, KV cache, long context и DDP all-reduce.

Пройти собеседование

Аудио и материалы

Аудио собеседования

0:00 / 1:06:54

Выводы и как готовиться

LLM theory interviews often move quickly across architecture, training and inference systems.
Good answers connect formulas to cost: sequence length, KV cache memory, attention complexity and distributed communication.
поиск embeddings should be evaluated with поиск metrics, not generic classification accuracy.

Sber / GigaChat: LLM подробный разбор, inference и distributed training

Аудио и материалы

Аудио собеседования

BERT vs GPT: в чем архитектурная разница

Как обучать sentence embeddings

Сколько forward-pass нужно GPT на train batch

Почему у BPE-токенизатора почти нет unknown tokens

RoPE и positional embeddings в GPT

Attention complexity, GQA/MQA и Sliding Window Attention

Зачем нужен KV cache при inference LLM

Long-context training: почему не помещается attention

На что уходит GPU memory при training LLM

DDP и all-reduce overlap при distributed training

Выводы и как готовиться