Назад к подготовке

Вопрос про production ML

When would you choose a columnar database over Redis, MongoDB or a row-oriented relational database for ML/data pipelines?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Короткий ответ

Columnar storage is best for analytical scans over structured tables where queries aggregate/filter a subset of columns across many rows. Redis fits low-latency key-value access; MongoDB fits flexible documents.

Полный разбор

Choose a columnar database such as ClickHouse when data is structured, append-heavy and queried analytically: aggregations, filters over time, metrics, logs, events or feature tables. Columnar layout reads only needed columns and compresses similar values well, so it is efficient for large scans.

Redis is a poor replacement for that workload because it is an in-memory key-value/cache system optimized for low-latency lookup, counters, queues and transient state. MongoDB or other document stores make sense when records have flexible nested structure and the access pattern is document-centric, but they are usually not the best first choice for wide analytical scans.

A row-oriented SQL database is still better for transactional updates, constraints and point lookups over full records. In interviews, anchor the choice in access pattern: scan/aggregate many rows over few columns means columnar; update/read one object with strong consistency means row/document/key-value depending on shape.