The reason is the following: When a I/O controller starts to die it often does not happen immediately. The controller dies slowly producing more and more corrupt data. When you just write data without checking or reading them it can take days or even weeks until you discover the problem.
But the nasty thing is, that even your backup is infected with the corrupted data. In worst case corruption started long before your oldest still existing backup was made.
Fortunately DBMS are a bit sensitive related to data corruptions and start to complain pretty early. So please consider warning or error messages about data corruption as serious and try to find the problem immediately and solve it!
What can we do against spreading data corruption?
- Monitor logs (syslog, database error log, application log, etc.).
- Ideally do physical AND logical backups. If your database is too big for a logical restore you can redirect your logical backup to
/dev/null
. So you can assure that at least the data can be read and the mysqldump command should complain when you hit data corruption (does not work for index corruption!). - Think about a backup retention policy.
- Think about 2 independent paths to recover your data (keep at least 2 good backups + all the binary logs).
- Do a check on your backuped files (myisamchk, innochecksum).
- Test your backup frequently with a restore (ideally on a daily basis)!
Have you EVER tested a restore of your backup? Please do! Especially if your data size is significant bigger than your amount of RAM!
In a later article I will show you a concept to do backups and test the restores regularly. Including some positive side effects on your development process.
If you need some more information or help about backup concepts, emergency restore or data recoveries please consider our consulting services.
No comments:
Post a Comment