ML System Design

You collected months of human-reviewer decisions for task outputs. How could you use this data to improve the automatic checker?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Create supervised examples from task/spec/output/reviewer errors, clean and deduplicate labels, split by time/task/customer, then train classifiers, rerankers or fine-tune an LLM for structured error detection.

Полный разбор

Reviewer data can become supervised training data: input is the task spec, worker output and evidence; target is the reviewer decision plus structured error list. Before training, normalize taxonomies, remove low-quality/disputed labels, deduplicate near-identical tasks and protect customer-sensitive fields. Start with simpler models where possible: error-type classifiers, risk scoring models or retrieval of similar past failures. For LLMs, supervised fine-tuning can teach the desired output schema and error wording, while preference data can rank better explanations or reduce false accepts. Validation must be time-aware and category-aware. Split by time, customer or task type to avoid memorizing templates. Track false accept rate, false reject rate, manual-review load and per-category degradation. Keep humans in the loop for uncertain or high-risk outputs.