Назад к подготовке

ASR+LLM бейзлайн для извлечения записи клиента из звонка

ASR+LLM бейзлайн для извлечения записи клиента из звонка

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

A pragmatic baseline is ASR to transcript, then an extraction model or LLM that returns accepted/rejected, normalized branch and normalized datetime, with current date and branch catalog in context.

Полный разбор

Start from the output contract. For every call, return whether the customer accepted the offer. If accepted, return branch id, normalized date, one-hour slot, confidence and evidence span. If rejected or ambiguous, do not create a booking automatically.

The baseline pipeline is VAD or call segmentation, ASR, transcript cleanup, then information extraction. For a quick first version, an LLM can read the transcript plus current date and branch catalog and return structured JSON. It should normalize relative dates, time phrases and address mentions.

The main risks are ambiguous dates, similar branch addresses, customers changing their mind, multiple proposed times, ASR errors and slot conflicts. Use a manually filled operator booking table as labels, evaluate accepted/rejected classification separately from branch accuracy and datetime accuracy, and route low-confidence cases to human review.

Теория

Speech MLSD should turn audio into a typed business decision with confidence and evidence, not just a transcript.

Типичные ошибки

  • Treat branch and slot as a flat classification problem with too many dynamic classes.
  • Forget the current call date for relative phrases like next Friday.
  • Create bookings for ambiguous or rejected calls.
  • Evaluate only transcript WER and ignore booking accuracy.

Как отвечать на собеседовании

  • Define the JSON output contract early.
  • Mention branch catalog and current date in the LLM context.