Назад к подготовке

ВопросСредняяrag-productionRAG-вопрос из материалов интервью · Apriori

RAG-вопрос

Design the end-to-end сценарий for a RAG system: data preparation, vector index ingestion and serving-time retrieval.

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Prepare and chunk data, compute embeddings with metadata, ingest them into an ANN/vector store, retrieve top candidates at query time, rerank/filter them, assemble context and monitor answer quality.

Полный разбор

Data preparation starts with collecting documents or multimodal items, cleaning them, splitting long text into chunks and attaching metadata such as source, timestamp, access controls and product identifiers. Chunking matters because embedding models have context limits and because retrieval should return useful evidence units rather than huge documents. Ingestion computes embeddings for chunks or items and writes vectors plus metadata into a vector store such as Qdrant, OpenSearch vector search, FAISS-backed service or another ANN index. HNSW-style indexes are common for approximate nearest-neighbor search. The pipeline needs refresh logic, deletes/updates, versioning and monitoring for failed embeddings. At serving time, the user query is normalized and embedded, optionally expanded or classified, and used to retrieve candidates with metadata filters and access controls. A reranker can improve relevance. The final context is assembled with citations or source markers and sent to the LLM. Production systems track latency, recall/relevance judgments, hallucination reports and index freshness.

Теория

RAG is a retrieval and data pipeline around the LLM, not only adding a vector database to a prompt.

Типичные ошибки

Skip chunking and metadata, especially permissions and source tracking.
Assume approximate vector top-k is enough without reranking or filtering.
Forget update/delete paths and index freshness.

Как отвечать на собеседовании

Structure the answer in the exact three buckets from the prompt: data, ingestion, serving.
Mention one concrete vector index and one quality metric or evaluation method.