Вопрос про production ML
What mechanisms would you add so important ML datasets do not disappear because of human error or operational mistakes?
Ответить самому
Сначала сформулируйте ответ как на собеседовании, затем откройте разбор и оцените себя.
Короткий ответ
Use versioned immutable storage, backups, replication, least-privilege access, deletion protection, audit logs and tested restore procedures. A backup that is never restored is only a hope.
Полный разбор
Protect data at several layers. At the storage layer, enable versioning or immutable snapshots, replication across zones/buckets and lifecycle policies that keep cold backups. For object stores, object versioning plus delete markers can save you from accidental deletes.
At the access layer, use least privilege. Most users and jobs should not have hard-delete permissions on production datasets. Separate write, promote and delete roles; require review or break-glass access for destructive operations. Add audit logs so you can identify what happened.
At the process layer, make backups operational: scheduled snapshots, documented recovery objectives, restore drills and monitoring for missing partitions or unusual deletion volume. Dataset manifests and checksums help detect silent corruption or partial uploads.