Назад к подготовке

ВопросСредняяllm-applicationsRAG-вопрос на техническом собеседовании · Chinor Chinor

LLM JSON extraction, контекст филиала и оценка качества

LLM JSON extraction, контекст филиала и оценка качества

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Prompt the model with transcript, current date and candidate branches, require schema-valid JSON, measure exact field accuracy, and improve with prompt changes, retrieval, larger models or LoRA fine-tuning.

Полный разбор

The LLM input should include the speaker-labeled transcript, current date, timezone, allowed branch ids or retrieved candidate branches, and explicit instructions to return schema-valid JSON. The output should include accepted flag, branch id, normalized date, normalized hour slot, confidence and evidence. If 500 branches fit in context, include them directly. If not, retrieve candidates using address/name matching or embeddings, then let the LLM choose among a smaller set. Always validate the JSON against allowed branch ids and slot availability. Evaluate on operator-confirmed bookings. Track exact match for accepted/rejected, branch, date, hour and full booking tuple. If a baseline model is not good enough, inspect errors, adjust prompts, compare models, add retrieval, or fine-tune with LoRA on transcripts and target JSON. External APIs are acceptable only if privacy, data residency and customer constraints allow it.

Теория

LLM extraction is reliable only when it has a constrained schema, constrained candidate set and field-level evaluation.

Типичные ошибки

Ask for free-form text instead of schema-valid JSON.
Give the model no branch catalog and expect exact branch ids.
Measure only subjective “looks good” outputs.
Use an external API without checking sensitive-call policy.

Как отвечать на собеседовании

Say “schema validation” and “exact full tuple accuracy”.
Mention retrieval if the branch catalog is too large for context.