Tuesday, January 11, 2011

Postgres Recovery

Postgres killed itself during a disk failure and while later trying to start it up I got messages along the lines of:

database system was interrupted; last known up at ...
database system was not properly shut down; automatic recovery in progress
redo starts at 309/3BA1EB48
record with zero length at 309/3C9F8ED8
redo done at 309/3C9F8EA8
last completed transaction was at log time ....
could not fdatasync log file 777, segment 60: Input/output error
startup process (PID 23142) was terminated by signal 6: Aborted
aborting startup due to startup process failure

On further investigation it sound like the transaction log was corrupted. This can be fixed with pg_resetxlog. This will clear the write ahead log and may result in some data loss or loss of integrity but when nothing else works it's a lifesaver. The documentation describes some follow up steps to ensure the integrity of data after postgres is starting properly.

You can do a dry run:
sudo -u postgres /usr/lib/postgresql/8.4/bin/pg_resetxlog -n /var/lib/postgresql/8.4/main
and if that indicates a new segment and there's no other option then you might as well reset with:
sudo -u postgres /usr/lib/postgresql/8.4/bin/pg_resetxlog /var/lib/postgresql/8.4/main
or
sudo -u postgres /usr/lib/postgresql/8.4/bin/pg_resetxlog -f /var/lib/postgresql/8.4/main

No comments:

Post a Comment