Вопрос про production ML

How would you choose between SQL and NoSQL storage, and what would you add so the data is not lost?

Ответить самому

Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.

Загрузка

Use SQL when the schema, relations and transactions matter; NoSQL/document/key-value stores when access is document-like or low-latency key-based. Data safety needs replication, backups, versioning, restore tests and clear ownership.

Полный разбор

A SQL database is a good default for structured data with relational constraints, joins, transactions and clear schema evolution. It gives mature tooling for consistency, indexes and migrations. NoSQL is not one thing: document stores fit flexible nested documents, key-value stores fit cache/session/lookup workloads, and wide-column or columnar stores fit different analytical patterns. The storage choice should follow access pattern and correctness requirements. If the pipeline needs analytical aggregation over many rows, choose a columnar database. If it needs low-latency cache state, use Redis. If it stores flexible JSON-like metadata with document reads, a document store can be reasonable. Reliability is separate from the database brand. Add replication, point-in-time backups, object-store versioning, lifecycle policies, disaster-recovery docs and periodic restore drills. Also restrict destructive permissions and keep audit logs so human mistakes are recoverable and attributable.

Treat SQL and NoSQL as a reliability hierarchy rather than access-pattern choices.
Have backups but never test restore.
Give every pipeline job broad delete permissions.