Метрики фрод-классификатора при асимметричных ошибках
Метрики фрод-классификатора при асимметричных ошибках
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Use ranking metrics such as PR-AUC/ROC-AUC for model comparison, but choose the operating threshold by business cost, capacity and required precision/recall. With rare fraud, PR-AUC and precision at review capacity are often more informative than accuracy.
Полный разбор
Fraud is usually imbalanced, so accuracy is a weak metric. A model can look accurate by predicting "not fraud" almost everywhere. For model comparison, use ROC-AUC if class balance is moderate, and PR-AUC, precision@k, recall@k or lift when positives are rare.
The production threshold should be chosen from the business trade-off. A false positive may block or review a good user; a false negative may let fraud through. If the approximate costs are known, choose the threshold that minimizes expected cost:
expected cost = FP * cost_fp + FN * cost_fn.
If there is a manual review team, use capacity constraints such as top-k alerts per day and track precision at that capacity. If regulation or user experience imposes limits, add guardrails such as maximum false-positive rate for trusted users.
Always validate on a split that matches deployment time and population. Fraud patterns shift, so threshold and calibration should be monitored after launch.
Теория
The metric should match the decision: ranking quality, fixed-capacity review, or cost-sensitive automatic blocking are different objectives.
Типичные ошибки
- Optimize accuracy on an imbalanced fraud dataset.
- Pick ROC-AUC only and never discuss the operating threshold.
- Ignore the cost difference between blocking good users and missing fraud.
Как отвечать на собеседовании
- Ask what action follows the score: manual review, block, or soft friction.
- Mention PR-AUC or precision@k when fraud is rare.