Reliable data storage
This page lists a few best practices for reliable data storage.
This list is based on my personal experience, not on statistical analysis.
- Human error
- I lost most data using plain old stupidity by accidentally deleting a file.
- Missing backup
- Actually a failure of procedures. Most of my files are backed up properly, but sometimes the backup fails, and I notice when it is too late. The most common cause is actually that I often synchronize my data instead of backing it up. With synchronization, when I delete a file, that same file is also removed from the backup. Another cause is lack of proper backups for non-files, such as email or my address book.
- Failing disk
- It happens. The expected life time of a hard disk is about 15 years. For solid state disks it is about 10 years. For self-burned DVDs it is only 5 years or so.
Other potential risks
- Undetected data rot
- "If you don't ofter restore your data, you don't have a backup." Yet, I don't have (or take) the time to check if I can actually read all files in my backup. Not all file systems have build-in checksums, and a bad sector in my backups may go undetected until I actually need that file.
- Outdated file formats
- I've found myself in a situation where it is hard to read the original data. I still have some images and text files created in applications I no longer have access to. For example, files create with a draw application that was disbanded in the pre-Mac OS X era. I nowadays try to store both the original file, as well as to export a copy in a common file format (such as exporting or printing to PDF).
- Fire and theft
- Even with multiple copies of data at the same location, a fire would destroy all disks. If not from the fire directly, the tiny particles of smoke that pollute all electronics. The same risk applies to physical theft: a burglar may take all computers and all disks.
- Online data may be compromised, even if it is on your local computer. Ransomware is a trojan horse designed to decrypt your data, and only encrypt it if you pay a ransom fee to criminals, typically 100 to 1000 euro. This is getting a substantial risk.
- Identity theft
- The purpose of backups is to make it accessible to you, and perhaps your family in case you pass away. However, you don't want to give criminals access to your files.
- Online access
- If you store your data in the cloud, be aware that you may lose that access. A hacker may steal (or guess) your password and delete your data (either intentionally or as collateral damage). A cloud provider may go bankrupt or suddenly stop it's service. Be aware that in most countries, you have very limited legal rights: data is not considered an asset that you can claim, like you can with physical assets.
I try to follow the following recommendations to ensure my data won't be lost.
- Store data on multiple disks, for example with a (RAID, ZFS, or Btrfs) mirror.
- Use disks of different brands. Disks of the same brand and type are more likely to fail around the same time.
- Use hard disk drive (HDD) instead of solid state drive (SSD). The technology of hard disk is simply much more mature, which is a more important factor then the lack of moving parts.
- Make an offsite backup. E.g. store a disk at a family member, or make a backup in the cloud.
- Use different providers for cloud backups.
- Read data every so often to ensure the media is still readable. For online disk, schedule a daily short SMART selftest and a weekly long SMART selftest. For offline disks, test them each year.
- Buy decent quality hardware, or test your disks before putting it into production.
- Use a backup with snapshots. For example ZFS snapshots, or Time Machine on OS X. In all cases, you keep an archive of older data, even when it is erased on the source. On Unix-based OSes, you can use hard links, so the same file in different snapshots is only stored once.
- Use standard file formats for future accessibility. Keep both source file, and export to PDF.
- Give a copy of your important passwords to a family or friend (e.g. in a sealed envelope), in case you want them to access it after your death.
- Assumes that failures are bound to happen, and be prepared.
Credits go to my colleagues in the SURFsara data services group for their advice on HDD vs SSD, and using different brands of disks.
See also: Handling Disk Failures