PostgreSQL Data Checksums

October 04, 2016

If you use SQL Server, you’re used to the database doing page verification for you as the sensible default. If you want SQL Server to not verify data, you have to do a bit of extra work. Naturally, I would’ve assumed that this was the case with other databases since, after all, having good data on disk is important. [caption id=“attachment_1065” align=“alignright” width=“300”] Not quite a check sum, but delicious enough.[/caption]

Turning on PostgreSQL Checksums

Data checksums were added to PostgreSQL 9.3. This is great, but there’s a catch - the data checksum has to be turned on during server set up - specifically when running initdb. Checksums can’t be enabled after a database is created either. To turn on checksums, during initialization an administrator needs to supply either --data-checksums or the -k flag, e.g. initdb --data-checksums databas. If you haven’t enabled the checksums, you’ll have to move the data into a new PostgreSQL installation through one of the usual means - some kind of export or logical replication. Have fun!

Automatic Repair

If you’ve turned on checksums, PostgreSQL still won’t fix data problems for you. It will, however, throw an error when bad data is retrieved from disk. This is a start, and your application should be set up to handle this possibility. But what if you’re lazy? I found out about checksums in PostgreSQL through an announcement about pg_healer. The idea behind pg_healer is that it sits in the background and attempts to correct different data corruption problems as they arise. It’s still early days for pg_healer, but the author admits that they want it to repair data as queries are happening as well as in the background, much like SQL Server’s DBCC CHECKDB command. It’s still early days for database repair in PostgreSQL, but we should all be setting up our PostgreSQL installations so that we at least know that corruption is happening.

“Chex Mix” by Steve Johnson is licensed with CC BY 2.0