(I) Scrub: going thru the raid checking parity and correcting things. There are 2 types. check or repair.
*check just reads everything and doesn’t trigger any writes unless a read error is detected, in which case the normal read-error handing kicks in. So it can be useful on a read-only array.
*repair does that same but when it finds an inconsistency is corrects it by writing something. If any raid personality had not be taught to specifically understand check, then a check run would effect a repair. I think 2.6.17 will have all personalities doing the right thing.
*check doesn’t keep a record of problems, just a count. repair will reprocess the whole array.
* NOTE the ReadyNAS runs a repair when a scrub is run.
(II) Resync: means a drive fell out of the array, and your putting it back in (or you put in a new drive), and its resyncing it back to the array.
Here is how to start a scrub:
echo repair > /sys/block/md126/md/sync_action
echo repair > /sys/block/md127/md/sync_action
This will start a scrub on it
NOTE: you can also run “echo check > /sys/block/md126/md/sync_action” . For more info on difference between check and repair: https://raid.wiki.kernel.org/index.php/RAID_Administration
From “mdadm -D” and “cat /proc/mdstat” it will look like a resync. Even though its doing a repair/check/scrub. To find out what kind of operation is done just run “cat /sys/block/md126/md/sync_action” and it will say repair (meaning scrub) (if your running a check, it will say check)
In otherwords: A resync entry that shows up in “mdadm -D /dev/md126” or “cat /proc/mdstat” could mean that your adding a drive back in that failed, but it can also mean a scrub. To find out which its best to cat sync_action.
md126 : active raid5 sdb5 sdd5 sdc5 sda5 sdf5 sde5
4883104640 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
[>………………..] resync = 0.0% (12688/976620928) finish=1281.2min speed=12688K/sec <- its not running a resync, even though it says it is (its running a scrub / check)
md127 : active raid5 sda3 sdf3 sde3 sdc3 sdd3 sdb3
4859553920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
resync=DELAYED <- this means when md126 is done or cancelled this will start
Once md126 is done (or cancelled with echo idle), then md127 scrub begins.
So if mid scrub I cancel md126 with
echo idle > /sys/block/md126/md/sync_action
It will start scrubing md127. If I want all scrub to end I would have to then run:
echo idle > /sys/block/md126/md/sync_action
NOTE: it doesnt hurt to stop scrub (repair,check) mid operation. However stopping a drive resync mid operation, doesnt fully add that drive into the unit, so thats not advisable.
Here is how mdadm -D output looks like (this is when it was scrubbing md127):
mdadm -D /dev/md127
Version : 1.2
Creation Time : Fri Aug 8 18:37:41 2014
Raid Level : raid5
Array Size : 4859553920 (4634.43 GiB 4976.18 GB)
Used Dev Size : 971910784 (926.89 GiB 995.24 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Update Time : Mon Feb 1 00:26:42 2016
State : clean, resyncing
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Resync Status : 2% complete <- im trying to illustrate that this isnt a resync, its scrub (of the repair type)
Name : 33eaacf1:data-0 (local to host 33eaacf1)
UUID : 4baed9cb:dd933d79:d2a1fcc8:9d4f7407
Events : 416901
Number Major Minor RaidDevice State
8 8 3 0 active sync /dev/sda3
6 8 19 1 active sync /dev/sdb3
10 8 51 2 active sync /dev/sdd3
9 8 35 3 active sync /dev/sdc3
11 8 67 4 active sync /dev/sde3
7 8 83 5 active sync /dev/sdf3
NOTE: that mdadm only resync/scrub/repair/check (and reshapes) one raid at a time (not to hurt performance). When its done with it either because it finished or because it was cancelled (with an echo idle) then it moves on to the next operation (
SIDENOTE: To start a resync, first you would have to have a failed & removed drive, and then you would have to add a drive:
mdadm –fail /dev/md126 /dev/sda3
mdadm –remove /dev/md126 /dev/sda3
<replace drive sda, and put in another drive, it shows up as sdz, you partition it to have the same size partition as sda3 was – this can be done by copying the partition table from sda, or any other drive in the array that has same partition table as sda, to the new sdz, so our new sdz has an sdz3)
mdadm –add /dev/md126 /dev/sdz3
To summarize: output from “cat /proc/mdstat” and “mdadm -D /dev/mdX” mentions resync when you could either be repairing your raid with a scrub or repairing your raid by adding a failed drive back in (2 very different operations). So “Resync” could mean a Scrub (I) or a Repair (II). There are 2 forms of scrub: repair & check.
I guess another way to look at it is that:
resync is an umbrella term that could mean a drive being synced in, and also a scrub (where there are 2 types of scrubs: check, repair). And an idle commands stops em all (not sure if idle will stop a drive being added in. im pretty sure it will, but you dont want to stop a drive being synced anyhow – well i guess I can think of a couple reasons where you would want to stop a drive from being synced in stoped. ex: imagine a raid made up of sda,sdb,sdc,sdd. sdc failed and fell out of the array. so now you have sda,sdb,missing,sdd. you added in a drive[sdh] because some drive [sdc] failed and was removed from the raid. the resync started but you quickly realized you have another bad drive[sdd], and that resync process will cause the system to generate errors because of the other drive [sdd] which is bad & currently in the system – because resyncs are very read heavy from all of the present drives [sda,sdb,sdd]& write heavy to the new drive[sdh], in this case id stop the drive [sdh] being synced in, and clone that bad drive [sdd] to a new drive [sdx, is a clone of bad drive sdd], and put it [sdx] back in place of [sdd]. once that bad drive [sdd] situation is fixed [raid should look like sda,sdb,missing,sdx], id resync that original drive [sdh] in place of that failed [sdc]. when alls done you will have sda,sdb,sdh,sdx)