Обязательно

GenAI Evaluation

FID, FVD, CLIPScore, VBench, temporal consistency, identity preservation, human preference and safety regression suites.

Время изучения: 30 мин

GenAI Evaluation

FID/FVD, VBench, GenEval, human preference, prompt adherence, temporal consistency, safety and metric failure modes.

Что должен уметь кандидат

  • Choose metrics for image, video, audio and controllable generation.
  • Explain why FID/FVD/VBench are useful but not universal truth.
  • Design eval sets for prompt following, temporal consistency and controllability.
  • Combine automatic metrics with human review and regression prompts.

Что спрашивают на собеседовании

  • Why can FID/FVD disagree with user preference?
  • What does VBench decompose in video quality?
  • How would you evaluate a ControlNet feature?

Практическая задача

Create an evaluation matrix for 20 prompts across image/video/control tasks: automatic metrics where feasible, human rubric and failure taxonomy.

Source-grounded правило

Automatic GenAI metrics are proxies; publish limitations and complement with human preference or task-specific rubric.

Материалы