Handling Disk Failures

The best way to deal with disk failures it to assume they will happen and be prepared. See Reliable data storage for best practises.

The cause of action is to:
 * 1) Run a SMART test (if that hasn't been done) to find which sector on the disk fails.
 * 2) Write directly to the sector with dd. This forces the drive to relocate the sector to one of your extra sectors.
 * 3) Scrub the file system on the disk. If you have a mirrored copy, everything should be restored to normal.
 * 4) If you don't have a mirror, use your backup disk to restore the data.
 * 5) Decide if you want to replace the disk to prevent future failures.

SMART selftests
A bad block may be reported as follows:

Device: /dev/ada0, 1 Currently unreadable (pending) sectors Device: /dev/ada0, Self-Test Log error count increased from 0 to 1

Bad blocks are best detected by running a SMART self-test, assuming that the disk has S.M.A.R.T. support.

To examine the status (including indication if a test is still running):


 * 1) smartctl -c /dev/ada0

which will include one of these results:

Self-test execution status:     (   0) The previous self-test routine completed without error or no self-test has ever been run.

or

Self-test execution status:     ( 249) Self-test routine in progress...                                         90% of test remaining.

To get the log of the last completed test, run:


 * 1) smartctl -l selftest /dev/ada0

(or use  to get all information about the disk, including the last test results).

To start a test:


 * 1) smartctl -t long /dev/ada0

or


 * 1) smartctl -t short /dev/ada0

To abort a test:


 * 1) smartctl -X /dev/ada0

Example Output
=== START OF INFORMATION SECTION === ... User Capacity:   3,000,592,982,016 bytes [3.00 TB] Sector Sizes:    512 bytes logical, 4096 bytes physical ... SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME         FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate    0x002f   200   200   051    Pre-fail  Always       -       0 3 Spin_Up_Time           0x0027   155   150   021    Pre-fail  Always       -       9225 4 Start_Stop_Count       0x0032   100   100   000    Old_age   Always       -       91 5 Reallocated_Sector_Ct  0x0033   200   200   140    Pre-fail  Always       -       0 7 Seek_Error_Rate        0x002e   200   200   000    Old_age   Always       -       0 9 Power_On_Hours         0x0032   070   070   000    Old_age   Always       -       22013 10 Spin_Retry_Count       0x0032   100   253   000    Old_age   Always       -       0 11 Calibration_Retry_Count 0x0032  100   253   000    Old_age   Always       -       0 12 Power_Cycle_Count      0x0032   100   100   000    Old_age   Always       -       89 192 Power-Off_Retract_Count 0x0032  200   200   000    Old_age   Always       -       74 193 Load_Cycle_Count       0x0032   001   001   000    Old_age   Always       -       914156 194 Temperature_Celsius    0x0022   108   107   000    Old_age   Always       -       44 196 Reallocated_Event_Count 0x0032  200   200   000    Old_age   Always       -       0 197 Current_Pending_Sector 0x0032   200   200   000    Old_age   Always       -       1   < this ought to be 0 198 Offline_Uncorrectable  0x0030   200   200   000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count   0x0032   200   200   000    Old_age   Always       -       0 200 Multi_Zone_Error_Rate  0x0008   200   200   000    Old_age   Offline      -       5   < this is worrying. SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error 1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 1 A few things to notice:
 * 1) smartctl -a /dev/ada0
 * 1) 1  Extended offline    Completed without error       00%     22010         -
 * 2) 2  Extended offline    Completed without error       00%     21999         -
 * 3) 3  Short offline       Completed without error       00%     21966         -
 * 4) 4  Short offline       Completed without error       00%     21965         -
 * 5) 5  Short offline       Completed without error       00%     21964         -
 * 6) 6  Short offline       Completed without error       00%     21963         -
 * 7) 7  Short offline       Completed without error       00%     21962         -
 * 8) 8  Short offline       Completed without error       00%     21961         -
 * 9) 9  Short offline       Completed without error       00%     21960         -
 * 10) 10 Short offline       Completed without error       00%     21959         -
 * 11) 11 Short offline       Completed without error       00%     21958         -
 * 12) 12 Short offline       Completed: read failure       90%     21957         310949139
 * 13) 13 Short offline       Completed without error       00%     21956         -
 * 14) 14 Short offline       Completed without error       00%     21955         -
 * 15) 15 Short offline       Completed without error       00%     21954         -
 * 16) 16 Short offline       Completed without error       00%     21953         -
 * 17) 17 Short offline       Completed without error       00%     21952         -
 * 18) 18 Short offline       Completed without error       00%     21951         -
 * 19) 19 Short offline       Completed without error       00%     21950         -
 * 20) 20 Short offline       Completed without error       00%     21949         -
 * 21) 21 Short offline       Completed without error       00%     21948         -
 * The disk has a sector size of 512 bytes logical, 4096 bytes physical. We need this info later.
 * SMART support is enabled. Good.
 * SMART still passes. So the disk is still usable. As long as it lasts, that is.
 * There is 1 Current_Pending_Sector. So there is 1 "bad block". This can be fixed, though for some people it is already enough to replace the disk. Usually if this number is low (less than 5 or 10), the disk can still be used without problems.
 * The disk has 5 Multi_Zone_Error_Rate. Anything larger than 0 is worrying, and an indication that the disk is starting to fail. For me, this would be a trigger to buy a replacement disk.
 * The first sector that is failing is 310949139.

Note that the logical block is reported. Logical (512 byte) block #310949139 is failing, which lies in (4096 byte) physical block #38868642 (310949139 divided by 8, rounded down), which spans logical blocks 310949136-310949143.

Relocate a bad sector
Modern disks will automatically relocate a bad sector to one of the spare sectors when it is written to.

To write directly to a disk block, use.

sysctl kern.geom.debugflags=16 dd if=/dev/ada0 of=/dev/ada0 bs=512 count=1 iseek=310949139 oseek=310949139 conv=noerror,sync sysctl kern.geom.debugflags=0

That's all. However, be careful not to make mistakes, it is easy to screw up your disks.

The  is a protection to accidentally (or maliciously) make alterations to a disk. It must be set to 16 to allow direct writing to disk with dd. The default is 0, which prevents this.

Make sure to specify the correct disk. The command is written that it reads and writes to the same sector. This is useful in case you accidentally specify the wrong disk or sector.

The  is required to ensure that dd continues, even in case of an error.

The  (and  ) parameters specify the bad sector as reported by smartctl.

ZSF scrubbing
If the bad block is relocated, it's content may still be lost, while the file system is not aware of that. Scrubbing a filesystem verifies the checksum of each block. This allows the filesystem to mark the file as bad and repair it (in case a mirror disk is present).

To start a scrub on ZFS:

zpool scrub poolname

To check the status of a scrub (progress and result):

zpool status -v poolname

Pre-production testing
Before taking a disk into production, some people suggest to test the disk for speed and reliability. As a consumer, I don't take these steps.

jgreco on the FreeNAS forum recommends the following steps:


 * 1) Start with a SMART conveyance test.
 * 2) Then you move on to a SMART extended test.
 * 3) Then move on to what we call burn-in, prior to making any filesystems or anything. Read all the data off each disk with dd.  Write zeros to each entire disk with dd.  Re-read all that data off each disk with dd.
 * 4) Do each set of tests in parallel and watch to see if any of the disks are unnaturally slower than the others, a warning flag.
 * 5) Then you make your filesystem(s).
 * 6) Then you run iozone in a seek-heavy manner.  Then you keep that running a few weeks (no, seriously, weeks is on the short end).

If your system is healthy at the end of that, you've probably done as much as you can to ensure that the hardware is good.

Replacing a Disk with ZFS
Replace the faulty disk with a new one, and use the following commands to ensure you know which disk it is.


 * 1) camcontrol devlist
 * 2) glabel status
 * 3) gpart show -l

Assuming that you are now convinced the new disk is located in ada0, ensure there is indeed no data on the disk:

gpart: No such geom: ada0.
 * 1) gpart show -l ada0

If there is no data, create a new partition. In this example a swap disk and ZFS file system whose size matches that of a different disk.

ada0 created ada0p1 added ada0p2 added
 * 1) gpart create -s gpt ada0
 * 1) gpart add -t freebsd-swap -a 4k -s 2G ada0
 * 1) gpart add -t freebsd-zfs -a 4k -s 5856338696 ada0

=>       40  5860533088  ada0  GPT  (2.7T) 40    4194304     1  freebsd-swap  (2.0G) 4194344 5856338696     2  freebsd-zfs  (2.7T) 5860533040         88        - free -  (44K)
 * 1) gpart show ada0

Then, determine the GPTID of the partition:

Name Status  Components gptid/b2fa344b-91bc-11e7-b516-bc5ff40dd410    N/A  ada0p1 gptid/ba96c94b-91bc-11e7-b516-bc5ff40dd410    N/A  ada0p2
 * 1) glabel status

Finally replace the old disk with the new filesytem with :

pool: freenas-data state: DEGRADED [...] config: NAME                                           STATE     READ WRITE CKSUM freenas-data                                   DEGRADED     0     0     0 mirror-0                                     DEGRADED     0     0     0 60860516858591446                          UNAVAIL      0     0     0  was /dev/gptid/a38a0556-b182-11e5-894a-bc5ff40dd410 gptid/a4fb997e-b182-11e5-894a-bc5ff40dd410 ONLINE       0     0     0 pool: freenas-data state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Sep 5 00:03:36 2017 5.60G scanned out of 1008G at 19.0M/s, 14h58m to go        5.59G resilvered, 0.56% done config: NAME                                             STATE     READ WRITE CKSUM freenas-data                                     DEGRADED     0     0     0 mirror-0                                       DEGRADED     0     0     0 replacing-0                                  UNAVAIL      0     0     0 60860516858591446                          UNAVAIL      0     0     0  was /dev/gptid/a38a0556-b182-11e5-894a-bc5ff40dd410 gptid/ba96c94b-91bc-11e7-b516-bc5ff40dd410 ONLINE       0     0     0  (resilvering) gptid/a4fb997e-b182-11e5-894a-bc5ff40dd410   ONLINE       0     0     0
 * 1) zpool status freenas-data
 * 1) zpool replace freenas-data /dev/gptid/a38a0556-b182-11e5-894a-bc5ff40dd410  gptid/ba96c94b-91bc-11e7-b516-bc5ff40dd410
 * 1) zpool status freenas-data