Обязательно

Latency, Cost and Observability

p50/p95/p99, queue depth, GPU utilization, cost per request, model regressions and product-facing reliability metrics.

Время изучения: 26 мин

Latency, Cost and Observability

Production metrics: queue depth, GPU utilization, p95/p99 latency, error modes, cost per request, regression detection and rollout monitoring.

Что должен уметь кандидат

  • Define SLOs for ML systems beyond average latency.
  • Monitor GPU utilization, queue depth, tokens/sec, cost/request and model-quality regressions.
  • Design canary/shadow rollout for ML serving changes.
  • Connect technical metrics with product and business constraints.

Что спрашивают на собеседовании

  • What metrics prove optimization worked?
  • How would you catch silent quality regressions?
  • How do online experiments differ for ranking vs generation?

Практическая задача

Build an observability spec for an ML inference service: metrics, alerts, dashboards, rollout gates and cost attribution.

Source-grounded правило

Use industry posts as patterns, not universal blueprints; adapt monitoring to task and traffic shape.