Postgres replicas are not a backup.
The exact moment that taught me the difference, and the boring weekly process that replaced my anxiety.
The moment I learned, definitively, that streaming replicas are not backups was 02:11 on a wednesday. The corrupt write that had landed on the primary at 02:09 was, with admirable consistency, already on the replica. We had two copies of a broken database.
What I do now
A boring weekly process that I will defend to anyone:
- One full
pg_basebackup, weekly, encrypted, off-cluster. - WAL archives shipped continuously to object storage with retention.
- A monthly restore drill, into a throwaway environment, that proves the backup actually works.
The monthly drill is the part everyone skips and the part that matters most. A backup you have not restored is a hope.
The conversation I now have early
The first time I onboard a junior engineer who manages stateful things, I ask them this: “what is the worst case where your replicas don’t save you?” If they can’t name three, we have a conversation. Then we set up the drill. The drill is the conversation made real.