ML System Design

What should the output schema of an automatic task checker look like if humans also produce lists of found errors?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Use a structured list of error objects with type, severity, location/evidence, explanation and suggested action, plus an overall decision. This makes human/model comparison and downstream operations possible.

Полный разбор

A checker should produce structured evidence, not just free-form text. A useful schema is: overall_decision, confidence, and errors[] where each error has type, severity, affected artifact/location, evidence quote or pointer, explanation and suggested fix. The taxonomy should be stable enough for metrics: missing file, inaccessible link, format violation, factual mismatch, hallucination, instruction mismatch, fraud/spam and low-quality output are examples. Free text can remain as explanation, but type and location should be machine-readable. This schema supports evaluation against human reviewers. You can compare sets of error objects by type and location, count false accepts/rejects, inspect disagreements and route specific error types to specialized follow-up checks.