Пройти собеседование: Constructor: Техническое собеседование

1Вопрос10 мин

Построение и обновление HNSW/Qdrant пайплайна векторного поиска

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Choose ANN parameters by recall/latency/memory, define how embeddings are written and refreshed, and monitor freshness, index build success, candidate quality and serving/query cost.

Подробный разбор

Moving from brute-force search to HNSW changes both quality and operations. Build an exact-search sample, then compare ANN recall@K, p95 query latency, memory, build time and downstream recommendation metrics. Tune HNSW parameters such as M, efConstruction and efSearch for the target recall/latency point.

The pipeline needs an artifact contract: compute embeddings, validate counts and dimensions, upsert into Qdrant with stable ids and payload metadata, build or refresh the index, then mark the version ready. For batch systems, decide whether full rebuild every few days is acceptable or whether incremental upserts are needed for freshness.

Operational checks should cover missing embeddings, stale index versions, failed upserts, schema changes, vector normalization, payload filters and rollback to the previous index.

Типичные ошибки

Compare ANN methods without exact-search ground truth.
Forget vector normalization and metric compatibility.
Overwrite the live index without a rollback path.
Ignore payload filters and metadata needed by downstream ranking.

Как сказать на собеседовании

Mention an exact-search evaluation sample.
Talk about full rebuild versus incremental upsert.

2ЗадачаMedium

RandomizedSet за O(1)

Условие

Implement RandomizedSet with insert, remove and get_random in average O(1), returning a uniformly random current element.

Решение прямо на странице

Напишите код, запустите проверки и только потом открывайте разбор.

Проверка решения

Нажмите «Запустить проверки» или Ctrl+Enter.

Показать разбор

Подсказки

Для random нужен массив
Если выбрать случайный индекс в массиве, элементы будут равновероятны.
Удаление через swap
Переставьте последний элемент на место удаляемого и обновите его индекс в hash map.

Идея решения

Для get_random() нужен массив: случайный индекс в массиве дает равномерный случайный элемент.

Для insert() нужен словарь position, который проверяет наличие значения и хранит его индекс в массиве.

Сложность возникает в remove(): удалить элемент из середины списка за O(1) нельзя, потому что пришлось бы сдвигать хвост. Поэтому переносим последний элемент на место удаляемого, обновляем его индекс в словаре, а затем удаляем последний элемент через pop().

Если удаляемый элемент уже последний, swap можно не делать, но тот же алгоритм все равно остается корректным при аккуратном порядке обновлений.

Эталонный код

import random


class RandomizedSet:
    def __init__(self):
        self.values: list[int] = []
        self.position: dict[int, int] = {}

    def insert(self, val: int) -> bool:
        if val in self.position:
            return False

        self.position[val] = len(self.values)
        self.values.append(val)
        return True

    def remove(self, val: int) -> bool:
        if val not in self.position:
            return False

        index = self.position[val]
        last = self.values[-1]

        self.values[index] = last
        self.position[last] = index

        self.values.pop()
        del self.position[val]
        return True

    def get_random(self) -> int:
        return random.choice(self.values)


def randomized_set_operations(operations: list[str], arguments: list[list[int]]) -> list[object]:
    randomized_set = None
    result: list[object] = []

    for operation, args in zip(operations, arguments):
        if operation == 'RandomizedSet':
            randomized_set = RandomizedSet()
            result.append(None)
        elif operation == 'insert':
            result.append(randomized_set.insert(args[0]))
        elif operation == 'remove':
            result.append(randomized_set.remove(args[0]))
        elif operation == 'get_random':
            result.append(randomized_set.get_random())
        else:
            raise ValueError(f'unknown operation: {operation}')

    return result

Сложность

Время: O(1) average per operation. Память: O(n).

Hash map хранит индекс каждого значения в массиве. Удаление делается swap-with-last и pop, поэтому не требует сдвига массива.

Открыть задачу в тренажере

3Кейс14 мин

Дизайн сервиса суммаризации URL и текста

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Route URL versus raw text first, fetch/extract URLs, validate content, choose a summarization path by length and language, chunk long inputs, and monitor cost, latency and quality.

Подробный разбор

Start with input routing. Detect whether the input is a URL or raw text, validate that it has meaningful content, and reject or ask for clarification on empty, spammy or unsupported inputs. For URLs, fetch the page and extract text before summarization.

Use different paths by input size and constraints. Short text may not need summarization or can use a cheap model. Medium text can go through a standard LLM prompt. Long documents need chunking, map-reduce summarization, hierarchical summaries or retrieval of salient sections. Multilingual content may need a model with strong language coverage or translation as a fallback.

Cost and latency controls matter: batching, caching fetched pages, caching summaries for repeated URLs, token limits, model routing, timeouts and fallback summaries. The product should define output length, style, citation/source behavior and hallucination constraints.

Типичные ошибки

Send every input to the same expensive model.
Forget to fetch and clean URL content before summarizing.
Chunk long documents without preserving global context.
Ignore empty or adversarial inputs.

Как сказать на собеседовании

Draw the router before the model.
Mention caching and model selection as cost controls.

4Кейс10 мин

Извлечение полезного контента страницы перед суммаризацией

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Fetch the page, parse HTML, extract candidate text blocks with metadata, remove boilerplate using rules or a block classifier, then send only useful content to the summarizer.

Подробный разбор

The first version can use standard web extraction libraries and heuristics: fetch HTML, parse DOM, remove scripts/styles/nav/footer, keep article-like headings and paragraphs, and preserve URL/title/source metadata. This is often enough for clean article pages.

For noisy pages, treat each DOM block as a candidate and classify whether it belongs to main content. Features can include tag type, text length, link density, position, heading proximity, boilerplate patterns and embeddings. A small supervised classifier or LLM-labeled dataset can bootstrap training, with human review for evaluation.

Do not dump the entire DOM into the LLM by default. It wastes tokens and increases hallucination risk. Keep a fallback path for pages where extraction confidence is low: ask the user, show partial extraction, or use a more expensive extraction model.

Типичные ошибки

Pass raw HTML or all visible text directly to the summarizer.
Ignore boilerplate, cookie banners and navigation text.
Train a block classifier without page-level evaluation.
Drop headings and metadata that help preserve structure.

Как сказать на собеседовании

Mention link density and DOM-block classification.
Describe a low-confidence fallback.

5Кейс10 мин

Вопрос по метрикам

How would you evaluate and improve a summarization service if user feedback is sparse or unavailable?

Ответьте без подсказки

Сначала проговорите ответ вслух или тезисами.

Запишите черновик

Формулы, план решения, риски и примеры.

Сравните с разбором

Откройте разбор только после своей попытки.

Открыть отдельную страницу вопроса

Показать разбор

Короткий ответ

Use a layered eval: reference-based metrics where labels exist, human or assessor rubrics, LLM-as-judge with source-grounding checks, slice metrics and production proxies such as edits, copies and retention.

Подробный разбор

If reference summaries exist, ROUGE/BERTScore-like metrics can provide a quick signal, but they do not fully capture faithfulness or usefulness. For open-ended summarization, create a human rubric: factuality, coverage of main points, concision, readability, harmful omissions and hallucinations.

LLM-as-judge can scale evaluation if used carefully. Give the judge the source and summary, ask it to score factual consistency, missing key points and verbosity, and calibrate it against human judgments. Use a stronger or different model than the production summarizer where possible.

Segment evaluation by URL versus raw text, language, content length, domain, extraction confidence and user intent. If explicit feedback is sparse, collect implicit signals such as copy/share, edits after summary, regeneration, dwell, thumbs-up prompts on samples and support complaints.

Типичные ошибки

Use ROUGE as the only success metric.
Let an LLM judge without source text or calibration.
Ignore hallucination and factual consistency.
Aggregate across short text, long pages and noisy web pages.

Как сказать на собеседовании

Use “faithfulness, coverage, concision” as rubric anchors.
Mention calibration of LLM-as-judge against humans.