Обязательно

Runtime Optimization Stack

ONNX Runtime, TensorRT, Triton, torch.compile, quantization and when each layer of the stack is worth the complexity.

Время изучения: 32 мин

Runtime Optimization Stack

ONNX Runtime, TensorRT, Triton, TensorRT-LLM, torch.compile and quantization as a layered optimization toolkit.

Что должен уметь кандидат

  • Know when ONNX export is worth it and where dynamic models make it hard.
  • Explain TensorRT engine constraints, calibration and hardware specificity.
  • Understand Triton as serving layer, not magic model optimizer.
  • Compare compile/export/quantization paths by engineering cost and rollout risk.

Что спрашивают на собеседовании

  • When would TensorRT beat plain PyTorch?
  • Why can TensorRT export fail?
  • What does Triton solve and what does it not solve?

Практическая задача

Take one PyTorch model and document optimization attempts: torch.compile, ONNX Runtime, TensorRT/Triton feasibility, plus blockers.

Source-grounded правило

Do not promise fixed speedups; optimization gains depend heavily on model graph, kernels, precision and hardware.