GenAI Evaluation
FID/FVD, VBench, GenEval, human preference, prompt adherence, temporal consistency, safety and metric failure modes.
Что должен уметь кандидат
- Choose metrics for image, video, audio and controllable generation.
- Explain why FID/FVD/VBench are useful but not universal truth.
- Design eval sets for prompt following, temporal consistency and controllability.
- Combine automatic metrics with human review and regression prompts.
Что спрашивают на собеседовании
- Why can FID/FVD disagree with user preference?
- What does VBench decompose in video quality?
- How would you evaluate a ControlNet feature?
Практическая задача
Create an evaluation matrix for 20 prompts across image/video/control tasks: automatic metrics where feasible, human rubric and failure taxonomy.
Source-grounded правило
Automatic GenAI metrics are proxies; publish limitations and complement with human preference or task-specific rubric.