Назад к подготовке

Вопрос по метрикам

A retail video analytics model should flag suspicious behavior, but humans do not fully agree on what “suspicious” means. How would you define success and evaluate whether the system is doing a good job?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

First turn “suspicious” into operational categories and severity levels, measure human agreement, build a labeled review set with adjudication, then optimize risk-calibrated precision/recall and downstream business outcomes.

Полный разбор

If humans disagree, the first task is not model selection; it is label design. Define categories of suspicious behavior, severity, required action and non-goals. Measure inter-annotator agreement and keep a third-party or senior-review adjudication process for hard examples.

Evaluation should mix ML metrics and product metrics. Offline, use a stratified video set across stores, camera positions, time of day and traffic level. Track precision, recall, false alarms per hour, missed high-severity incidents, calibration and performance by subgroup or environment. If labels remain ambiguous, report soft labels or agreement-weighted metrics rather than pretending there is one perfect ground truth.

Online, measure analyst workload, alert acceptance rate, time to incident review, customer/store outcomes and appeal rate. Thresholds should be risk-based: high-confidence severe events can alert immediately; uncertain low-severity events can go to passive review or sampling. Monitoring must watch drift by store layout, seasonality, camera changes and policy changes.

Теория

Ambiguous-label systems need operational definitions, agreement measurement and risk-calibrated decision thresholds before model metrics become meaningful.

Типичные ошибки

  • Optimize accuracy on a noisy label without defining the action.
  • Ignore annotator disagreement.
  • Use one global threshold for all severities and stores.
  • Forget false alarms per hour and reviewer workload.

Как отвечать на собеседовании

  • Start by defining the label and action.
  • Bring up inter-annotator agreement and adjudication.