Пройти собеседование: Chinor: ML System Design

1Кейс12 мин

ASR+LLM бейзлайн для извлечения записи клиента из звонка

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

A pragmatic baseline is ASR to transcript, then an extraction model or LLM that returns accepted/rejected, normalized branch and normalized datetime, with current date and branch catalog in context.

Подробный разбор

Start from the output contract. For every call, return whether the customer accepted the offer. If accepted, return branch id, normalized date, one-hour slot, confidence and evidence span. If rejected or ambiguous, do not create a booking automatically.

The baseline pipeline is VAD or call segmentation, ASR, transcript cleanup, then information extraction. For a quick first version, an LLM can read the transcript plus current date and branch catalog and return structured JSON. It should normalize relative dates, time phrases and address mentions.

The main risks are ambiguous dates, similar branch addresses, customers changing their mind, multiple proposed times, ASR errors and slot conflicts. Use a manually filled operator booking table as labels, evaluate accepted/rejected classification separately from branch accuracy and datetime accuracy, and route low-confidence cases to human review.

Типичные ошибки

Treat branch and slot as a flat classification problem with too many dynamic classes.
Forget the current call date for relative phrases like next Friday.
Create bookings for ambiguous or rejected calls.
Evaluate only transcript WER and ignore booking accuracy.

Как сказать на собеседовании

Define the JSON output contract early.
Mention branch catalog and current date in the LLM context.

2Кейс10 мин

Роутинг звонков, отсеивание отказов и метрики

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use a cheap accept/reject detector before expensive extraction, track precision/recall for accepted calls, branch accuracy, slot accuracy and end-to-end booking correctness.

Подробный разбор

A high-volume pipeline should not send every call to the most expensive model. First run cheap filters: duration thresholds, VAD, ASR with small model if possible, and a lightweight accepted/rejected classifier. Rejected or very short calls can bypass branch/time extraction unless confidence is low.

Metrics should match business damage. For accepted/rejected, use precision and recall with a clear preference: missing a real booking loses revenue, while booking a rejected customer creates an operational error. For extracted fields, track exact branch accuracy, exact slot accuracy, date accuracy and full booking accuracy.

Also track coverage. If the system returns “needs review” too often, it may be accurate but not useful. Segment metrics by operator, branch, language, call duration and ASR quality because failures will cluster.

Типичные ошибки

Run the LLM on all calls without cheap rejection filtering.
Report one aggregate accuracy for different failure modes.
Ignore abstention or human-review rate.
Forget that false bookings and missed bookings have different costs.

Как сказать на собеседовании

Separate accept/reject metrics from field extraction metrics.
Mention early exits for short rejections.

3Кейс12 мин

ASR для low-resource языка, когда Whisper не справляется

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Collect in-domain transcriptions, fine-tune ASR on the target language/domain, preserve timestamps, and optionally train downstream extraction directly from audio if transcript quality remains too low.

Подробный разбор

If the generic ASR fails on the target language, the first useful investment is in-domain data. Sample real calls across operators, branches, noise conditions and call outcomes. Ask native-speaking annotators to transcribe speech and, if needed, mark intervals where branch/time decisions are made.

Fine-tune an ASR model on these transcripts rather than trying to solve everything with a downstream LLM. The extraction model depends on dates, addresses and confirmations, so systematic ASR errors in those entities will dominate.

If full transcription is expensive, use a staged annotation strategy: first label outcome and final booking fields from the operator table, then add transcript or timestamp labels for confusing cases. Keep the business labels and ASR transcript labels separate, because they train different stages.

Типичные ошибки

Assume multilingual Whisper is good enough without measuring entity errors.
Label only final booking fields and expect ASR to improve.
Ask operators to relisten to every call as part of normal workflow.
Ignore language and noise stratification in sampling.

Как сказать на собеседовании

Say you would fine-tune ASR on in-domain calls.
Separate ASR labels from final booking labels.

4Кейс10 мин

Шумные ASR-аннотации и агрегация расшифровок

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use strict annotation guidelines, normalize numbers and addresses, double-label hard samples, align transcripts, resolve disagreements by confidence or review, and measure annotator quality.

Подробный разбор

Start with guidelines. Define how to write times, dates, numbers, branch names, hesitations and unclear speech. Provide examples and a normalization dictionary for branches and common address variants.

For aggregation, align transcripts at word or character level, using edit distance or sequence alignment. Tokens agreed by most annotators can be accepted automatically. Disagreements around key entities such as time, date and address should be escalated to expert review or resolved using the operator booking table when it is trustworthy.

Track annotator quality with overlap samples, disagreement rate, entity-specific error rate and adjudication outcomes. The goal is not a pretty transcript; it is a transcript and entity labels that improve ASR and downstream booking extraction.

Типичные ошибки

Use raw annotator text without normalization.
Aggregate transcripts with naive string equality.
Ignore entity disagreements because WER looks acceptable.
Fail to measure annotator-level quality.

Как сказать на собеседовании

Mention sequence alignment before majority decisions.
Focus review on dates, times and addresses.

5Кейс8 мин

VAD и разделение спикеров в пайплайнах обработки звонков

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

VAD removes silence and splits speech regions; diarization separates operator and client speakers so extraction can focus on customer acceptance and final negotiated slot.

Подробный разбор

Voice Activity Detection identifies regions containing speech. It reduces compute, removes silence and makes ASR chunks shorter and more stable. In a call pipeline, VAD can also support early routing because very short speech patterns often correspond to quick rejections.

Diarization assigns speaker labels to speech segments. For appointment extraction, it matters because the operator may propose several branches or times, while the customer acceptance is the decisive signal. Separating operator and client turns helps the extractor distinguish proposals from confirmations.

Use VAD before ASR to segment audio, then ASR with timestamps, then diarization or speaker-aware ASR depending on the available model. The downstream prompt or extraction model should preserve speaker labels and timestamps as evidence.

Типичные ошибки

Transcribe full audio including long silence with one heavy model call.
Ignore who said the accepted time or branch.
Assume diarization is always solved by the ASR model.
Lose timestamps before extraction and debugging.

Как сказать на собеседовании

Define VAD and diarization separately.
Tie diarization to operator proposal versus customer confirmation.

6Кейс10 мин

LLM JSON extraction, контекст филиала и оценка качества

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Prompt the model with transcript, current date and candidate branches, require schema-valid JSON, measure exact field accuracy, and improve with prompt changes, retrieval, larger models or LoRA fine-tuning.

Подробный разбор

The LLM input should include the speaker-labeled transcript, current date, timezone, allowed branch ids or retrieved candidate branches, and explicit instructions to return schema-valid JSON. The output should include accepted flag, branch id, normalized date, normalized hour slot, confidence and evidence.

If 500 branches fit in context, include them directly. If not, retrieve candidates using address/name matching or embeddings, then let the LLM choose among a smaller set. Always validate the JSON against allowed branch ids and slot availability.

Evaluate on operator-confirmed bookings. Track exact match for accepted/rejected, branch, date, hour and full booking tuple. If a baseline model is not good enough, inspect errors, adjust prompts, compare models, add retrieval, or fine-tune with LoRA on transcripts and target JSON. External APIs are acceptable only if privacy, data residency and customer constraints allow it.

Типичные ошибки

Ask for free-form text instead of schema-valid JSON.
Give the model no branch catalog and expect exact branch ids.
Measure only subjective “looks good” outputs.
Use an external API without checking sensitive-call policy.

Как сказать на собеседовании

Say “schema validation” and “exact full tuple accuracy”.
Mention retrieval if the branch catalog is too large for context.

7Кейс10 мин

Продакшен-архитектура автоматической записи по звонку

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Put completed calls into a queue, process audio once, pass lightweight artifacts between services, validate extracted bookings, then write with idempotency and transactional slot reservation.

Подробный разбор

After a call ends, store the audio in object storage and send a job id to a queue. Workers run VAD, ASR, extraction and validation. Avoid moving large audio through many services; pass references, transcripts and JSON artifacts instead.

Before writing a booking, validate that the branch exists, the slot is open, the confidence is high and the customer accepted. The write path should be idempotent by call id, and slot reservation should be transactional or protected by a uniqueness constraint on branch and time slot.

If the model is slower than the operator workflow, design the product flow carefully. Either the operator books during the call and the model audits/fills missing fields, or the model writes only when there is no conflict. Add human review for conflicts, low confidence and changed slots.

Типичные ошибки

Let the model write directly to the calendar with no validation.
Pass large audio blobs through every service.
Ignore double-booking and idempotency.
Assume slow asynchronous extraction cannot affect operations.

Как сказать на собеседовании

Mention queue plus object storage reference.
Use a transactional slot-reservation rule for double-booking.

8Кейс8 мин

Оптимизация стоимости ASR и LLM-инференса для звонков

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Profile the pipeline, skip heavy stages for obvious rejections, batch where possible, quantize ASR/LLM models, trim silence with VAD, cache reusable context and use smaller models for easy cases.

Подробный разбор

Optimize from measurements, not guesses. Break down cost by audio transfer, VAD, ASR, diarization, LLM extraction, validation and storage. The largest component determines the first intervention.

Common wins include VAD silence trimming, early rejection classifiers, batching ASR/LLM calls, quantization, faster runtimes such as ONNX where appropriate, smaller specialized models for accept/reject, and selective use of the largest LLM only on hard calls. Also reduce prompt size by retrieving only relevant branch candidates rather than passing the full catalog when it grows.

Track cost per call, latency percentiles, queue depth and quality regressions. A cheaper pipeline that silently drops hard accepted calls is not acceptable; every optimization should be checked against field-level booking metrics.

Типичные ошибки

Quantize everything before profiling.
Optimize latency while ignoring quality regression.
Send short rejection calls through the full pipeline.
Keep growing prompts with static context that could be retrieved.

Как сказать на собеседовании

Start with profiling and per-stage cost.
Name early exits and batching as practical first wins.