На главную

Полная программа

Подробная программа теперь отображается как роадмап: проходите темы по этапам, открывайте материалы и отмечайте прогресс.

Advanced ML Engineering

Продвинутый roadmap для MLE/Research Engineer ролей: distributed training, LLM engineering, GenAI/multimodal systems, inference optimization and foundation-model data pipelines.

Прогресс

0 из 21 тем
Кликни на кружок ○ чтобы отметить прогресс0%

🧩 Distributed Training

0/4

Как обучать большие модели на multi-GPU и multi-node инфраструктуре без магического мышления.

Обязательно

Distributed Training Foundations

DDP, gradient synchronization, effective batch size, communication cost, NCCL and basic multi-node failure modes.

3 ресурса
Читать →
Обязательно

FSDP, DeepSpeed ZeRO and Sharding

Why optimizer states dominate memory, how FSDP/ZeRO shard params, gradients and optimizer state, and when sharding pays off.

3 ресурса
Читать →
Обязательно

Parallelism and Memory Engineering

Tensor, pipeline and sequence parallelism; activation checkpointing; gradient accumulation; BF16/FP16; memory budget accounting.

5 ресурсов
Читать →
Обязательно

Training Stability and Checkpointing

NaNs, loss spikes, mixed precision instability, sharded checkpoints, resume semantics and reproducibility for long training runs.

3 ресурса
Читать →

🧠 LLM Engineering

0/4

Engineering layer for large language models: scaling, post-training, serving, KV-cache, evaluation and cost.

Обязательно

LLM Scaling and Architecture

Decoder-only transformers, MoE, long context, KV-cache implications, scaling laws and practical architecture trade-offs.

3 ресурса
Читать →
Обязательно

LLM Fine-tuning and Post-training

LoRA, QLoRA, PEFT, SFT, preference optimization and practical risk management for domain adaptation.

3 ресурса
Читать →
Обязательно

LLM serving: KV-cache и батчинг

Как LLM отвечает токен за токеном: prefill/decode, KV-cache, continuous batching, метрики задержки и выбор vLLM/SGLang/TensorRT-LLM/TGI.

8 ресурсов
Читать →
Обязательно

LLM Evaluation, Latency and Cost

Offline evals, human preference, LLM-as-judge limits, hallucination checks, token economics and latency/cost trade-offs.

3 ресурса
Читать →

🎨 Generative and Multimodal Models

0/5

Diffusion, flow matching, image/video/audio generation, conditioning and evaluation for modern GenAI systems.

Обязательно

Generative Modeling Foundations

GAN/VAE context, diffusion fundamentals, latent spaces, conditioning and why production GenAI is not just sampling pretty images.

3 ресурса
Читать →
Обязательно

Diffusion, Flow Matching and DiT

DDPM/DDIM/SDE vocabulary, rectified flow, flow matching, Diffusion Transformers and few-step sampling quality/latency trade-offs.

3 ресурса
Читать →
Обязательно

Multimodal Conditioning

Text, image, video, pose, depth, segmentation, audio and reference conditioning via cross-attention, adapters and ControlNet-style branches.

3 ресурса
Читать →
Обязательно

Video and Audio Generation

Text-to-video, image-to-video, temporal modeling, identity preservation, audio generation and modality-specific failure modes.

5 ресурсов
Читать →
Обязательно

GenAI Evaluation

FID, FVD, CLIPScore, VBench, temporal consistency, identity preservation, human preference and safety regression suites.

3 ресурса
Читать →

⚡ Inference Optimization and High-Load Serving

0/4

How to make large ML systems fast, reliable and economically survivable in production.

Обязательно

Inference Optimization Foundations

Latency, throughput, memory, cost, profiling, bottleneck attribution, batching trade-offs and hardware-aware thinking.

3 ресурса
Читать →
Обязательно

Runtime Optimization Stack

ONNX Runtime, TensorRT, Triton, torch.compile, quantization and when each layer of the stack is worth the complexity.

3 ресурса
Читать →
Обязательно

High-Load Serving Patterns

Async APIs, queues, streaming, cancellation, continuous batching, GPU scheduling, autoscaling and graceful degradation.

5 ресурсов
Читать →
Обязательно

Latency, Cost and Observability

p50/p95/p99, queue depth, GPU utilization, cost per request, model regressions and product-facing reliability metrics.

3 ресурса
Читать →

🗄️ Foundation Model Data Pipelines

0/4

Data curation, filtering, deduplication, sharding and streaming loaders for large-scale foundation-model training.

Обязательно

Foundation Model Data Pipelines

Collection, licensing, preprocessing, metadata, captioning, filtering and reproducible dataset versions for large models.

3 ресурса
Читать →
Обязательно

Data Curation, Deduplication and Filtering

Near-duplicate search, quality scoring, unsafe content filtering, caption quality and why data quality can dominate architecture changes.

5 ресурсов
Читать →
Обязательно

Streaming DataLoaders and Storage

WebDataset, object storage, tar shards, shuffle quality, DALI/NVDEC, prefetching and avoiding GPU starvation.

3 ресурса
Читать →
Обязательно

Multimodal Data Governance

Video/audio/image/text governance: consent, PII, likeness abuse, synthetic media provenance and safety filters.

3 ресурса
Читать →
ML Mentor — От нуля до оффера в ML