Метрики качества для восстановления пунктуации и капитализации
Метрики качества для восстановления пунктуации и капитализации
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Use token-level accuracy and per-class precision/recall/F1 for punctuation and capitalization, plus sentence-level readability checks and domain-slice evaluation.
Полный разбор
At the token level, punctuation restoration is a multiclass classification problem and capitalization is often binary or small multiclass classification. Report accuracy, macro/micro F1, and per-class precision/recall because comma, period and question mark errors have different frequencies and costs.
Token accuracy alone can be misleading if most positions have no punctuation. For rare punctuation classes, per-class recall matters. For user-facing quality, also evaluate sentence-level readability, over-punctuation rate, missing sentence breaks and whether named entities/acronyms are capitalized correctly.
Slice the evaluation by language, domain, speaker style, ASR confidence, text length and noisy terms. If users complain online while offline metrics look good, the issue may be domain shift or a metric that does not capture readability.
Теория
Class imbalance makes aggregate accuracy weak for punctuation; per-class and slice metrics reveal real errors.
Типичные ошибки
- Report only overall accuracy when most labels are “no punctuation”.
- Treat comma and period errors as equally important without checking product impact.
- Ignore ASR confidence and domain slices.
- Skip human readability evaluation.
Как отвечать на собеседовании
- Call out class imbalance immediately.
- Add a human/readability metric beyond token labels.