Question: I want to use badblocks to check my HDDs and would appreciate clarification of its operation.

Can someone please explain the best options to use with -b and -c? I have included their definitions from the man page, but am not sure if larger sizes would be beneficial for modern disks with 64MB RAM and 4k sectors.

-b block-size ?Specify the size of blocks in bytes. The default is 1024. -c number of blocks the number of blocks which are tested at a time. The default is 64

Secondly I would like to know if the write-mode test is any more thorough than the non-destructive read-write mode?

Lastly how many SMART sector re-allocations are acceptable / should drives with non-zero reallocation counts be immediately replaced?

Answer: Question 1:

With regards to the -b option: this depends on your disk. Modern, large disks have 4KB blocks, in which case you should set -b 4096. You can get the block size from the operating system, and it’s also usually obtainable by either reading the disk’s information off of the label, or by googling the model number of the disk. If -b is set to something larger than your block size, the integrity of badblocks results can be compromised (i.e. you can get false-negatives: no bad blocks found when they may still exist). If -b is set to something smaller than the block size of your drive, the speed of the badblocks run can be compromised. I’m not sure, but there may be other problems with setting -b to something smaller than your block size, since it isn’t verifying the integrity of an entire block, it might still be possible to get false-negatives if it’s set too small.

The -c option corresponds to how many blocks should be checked at once. Batch reading/writing, basically. This option does not affect the integrity of your results, but it does affect the speed at which badblocks runs. badblocks will (optionally) write, then read, buffer, check, repeat for every N blocks as specified by -c. If -c is set too low, this will make your badblocks runs take much longer than ordinary, as queueing and processing a separate IO request incurs overhead, and the disk might also impose additional overhead per-request. If -c is set too high, badblocks might run out of memory. If this happens, badblocks will fail fairly quickly after it starts. Additional considerations here include parallel badblocks runs: if you’re running badblocks against multiple partitions on the same disk (bad idea), or against multiple disks over the same IO channel, you’ll probably want to tune -c to something sensibly high given the memory available to badblocks so that the parallel runs don’t fight for IO bandwidth and can parallelize in a sane way.

Question 2:

Contrary to what other answers indicate, the -w write-mode test is not more or less reliable than the non-destructive read-write test, but it is twice as fast, at the cost of being destructive to all of your data. I’ll explain why:

In non-destructive mode, badblocks does the following:

  • Read existing data, checksum it (read again if necessary), and store it in memory.
  • Write a predetermined pattern (overrideable with the -p option, though usually not necessary) to the block.
  • Read the block back, verifying that the read data is the same as the pattern.
  • Write the original data back to the disk.
    • I’m not sure about this, but it also probably re-reads and verifies that the original data was written successfully and still checksums to the same thing.
  • In destructive (-w) mode, badblocks only does steps 2 and 3 above. This means that the number of read/write operations needed to verify data integrity is cut in half. If a block is bad, the data will be erroneous in either mode. Of course, if you care about the data that is stored on your drive, you should use non-destructive mode, as -w will obliterate all data and leave badblocks’ patterns written to the disk instead.

    Caveat: if a block is going bad, but isn’t completely gone yet, some read/write verification pairs may work, and some may not. In this case, non-destructive mode may give you a more reliable indication of the “mushiness” of a block, since it does two sets of read/write verification (maybe–see the bullet under step 4). Even if non-destructive mode is more reliable in that way, it’s only more reliable by coincidence. The correct way to check for blocks that aren’t fully bad but can’t sustain multiple read/write operations is to run badblocks multiple times over the same data, using the-p option.

    Question 3:

    If SMART is reallocating sectors, you should probably consider replacing the drive ASAP. Drives that lose a few sectors don’t always keep losing them, but the cause is usually a heavily-used drive getting magnetically mushy, or failing heads/motors resulting in inaccurate or failed reads/writes. The final decision is up to you, of course: based on the value of the data on the drive and the reliability you need from the systems you run on it, you might decide to keep it up. I have some drives with known bad blocks that have been spinning with SMART warnings for years in my fileserver, but they’re backed up on a schedule ?such that I could handle a total failure without much pain.