Обязательно

LLM Evaluation, Latency and Cost

Offline evals, human preference, LLM-as-judge limits, hallucination checks, token economics and latency/cost trade-offs.

Время изучения: 26 мин

LLM Evaluation, Reliability and Cost

Evaluation harnesses, task-specific evals, regression checks, cost-per-token reasoning, benchmark caveats and production monitoring.

Что должен уметь кандидат

  • Separate leaderboard quality from product utility and reliability.
  • Choose evals for capability, safety, latency and cost.
  • Estimate serving cost from token mix, GPU price, utilization and cache behavior.
  • Design rollout gates: offline evals, human review, canary and monitoring.

Что спрашивают на собеседовании

  • Why are leaderboards insufficient?
  • How do you evaluate a model change before rollout?
  • What metrics decide whether serving optimization worked?

Практическая задача

Create eval-and-cost scorecard with quality evals, latency SLOs, throughput target, token budget, GPU assumptions and go/no-go criteria.

Source-grounded правило

Cost examples must be recalculated from current GPU prices and traffic assumptions; do not hard-code stale numbers.