К обычному разбору
Тренировка по собеседованиюСкринингLLM-роль в Dubai, компания не подтверждена2025-10-17

LLM-роль в Dubai, компания не подтверждена: Техническое собеседование

Идите сверху вниз: сначала попробуйте сами, затем откройте разбор. Если шаг с кодом, пишите решение прямо здесь и запускайте проверки на странице.

Шагов
2
Вопросов
2
Задач
0
1Вопрос12 мин

Вопрос про production ML

You have a multi-GPU server and want to host one or more open-source LLMs. What software stack and design choices would you use?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Показать разбор

Короткий ответ

Use a serving runtime such as vLLM or TensorRT-LLM, choose model size/quantization to fit weights plus KV cache, expose an API layer, monitor latency/GPU memory, and route tasks to suitable models.

Подробный разбор

Start by sizing the model. GPU memory must cover weights, KV cache and runtime overhead. For multiple models, decide whether to shard one large model across GPUs or host several smaller models. Quantization can reduce memory but must be checked for quality.

Use an inference runtime built for LLM serving, such as vLLM, TensorRT-LLM, TGI or llama.cpp for smaller CPU/GPU cases. vLLM-style continuous batching and paged attention improve throughput and KV-cache utilization compared with a naive Transformers loop.

Wrap the runtime with an API layer, authentication, request limits, prompt/token budgets, logging and monitoring. Track time to first token, tokens/sec, queue time, GPU memory, error rate and per-route cost. If product tasks differ, route extraction, chat and long-context jobs to different model sizes or configs.

2Вопрос10 мин

Вопрос про production ML

For a FastAPI-backed LLM product, when would you use Postgres, ClickHouse and Redis?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Показать разбор

Короткий ответ

Use Postgres for transactional product state, ClickHouse for analytical event/log queries, and Redis for low-latency cache, sessions, rate limits or queues.

Подробный разбор

Postgres is the default for durable transactional state: users, projects, permissions, prompt templates, job metadata and relational business objects. It gives constraints, migrations, transactions and mature operational tooling.

ClickHouse is useful when the product emits high-volume analytical events: LLM request logs, token usage, latency, feedback, evaluation traces and aggregate dashboards. It is optimized for append-heavy columnar scans, not transactional updates.

Redis is for low-latency ephemeral state: caching model/config lookups, rate limits, sessions, short-lived queues, locks or streaming coordination. Do not use Redis as the only durable source for data that must survive restarts unless you deliberately configure persistence and accept the trade-offs.