Вопрос про production ML
When are SQL window functions useful, how are they different from GROUP BY, and what ClickHouse MergeTree details matter when writing analytical queries?
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Window functions compute row-level analytics over a partition without collapsing rows. In ClickHouse MergeTree, partitioning and ORDER BY keys strongly affect data pruning and scan efficiency.
Полный разбор
GROUP BY collapses many rows into one row per group. Window functions keep the original row granularity while computing values over a related set of rows, such as row number, rank, lag, lead, running sum, rolling average or per-user cumulative revenue.
For example, to rank events inside each user history, use . To compute cumulative revenue, use a window ordered by payment timestamp. These are hard to express cleanly with plain GROUP BY because the output still needs each row.
In ClickHouse, MergeTree tables are stored in sorted parts. The PARTITION BY expression controls coarse partition pruning, while ORDER BY defines the primary sorting key and sparse index. Queries are much faster when filters use partition keys and leading ORDER BY columns. For ReplacingMergeTree or similar engines, understand that background merges are asynchronous and FINAL can be expensive. Good analytical SQL should match the table's sort order instead of forcing wide full scans.
Теория
Window functions preserve row detail; MergeTree performance depends on how the query aligns with physical sorting and partitioning.
Типичные ошибки
- Use GROUP BY when row-level output is still needed.
- Forget PARTITION BY in a window and compute global ranks by accident.
- Filter ClickHouse on columns unrelated to partition/order keys and expect pruning.
- Use FINAL casually on large tables.
Как отвечать на собеседовании
- Give one practical window example such as row number, lag or cumulative sum.
- For ClickHouse, say partition pruning and ORDER BY leading columns.