When parse_cluster_confirm_arg return 0, it means the
arg are parsed successfully, so change !rv to rv.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.
So don't assemble if OROM doesn't confirm it is OK.
There can still be problems for crash-dump not being able to find
the OROM. Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.
Signed-off-by: NeilBrown <neilb@suse.com>
The final completion of a recovery can be delayed, so use
sync_completed to check if it is finished, just not been reaped.
Signed-off-by: NeilBrown <neilb@suse.com>
sometimes the removed device is re-added before the writes
get all the way to the md device - so the array doesn't need
any recovery and the test fails.
So flush first to be safe.
Signed-off-by: NeilBrown <neilb@suse.com>
newer versions of mkfs.extX ask before creating a filesystem
on a device which appears to already have a filesystem.
We don't want that, so add the -F flag.
Also be explicit about fs type as one shouldn't depend on defaults.
Signed-off-by: NeilBrown <neilb@suse.com>
If the name in the array has a home-host, then
require that it matches, or is "any", or requested
homehost is "any".
Signed-off-by: NeilBrown <neilb@suse.com>
... rather than relying on the caller getting them in the
correct order.
This is better engineering and fixes a bug, but because the
failed_slotX numbers are used later with assumption that
they weren't swapped
Signed-off-by: NeilBrown <neilb@suse.de>
- document meaning of various arrays. In particular:
stripes[]
blocks[]
blocks_page[]
block_index_for_slot[]
It needs to be clear if these are indexed by raid_disk
number or syndrome number.
- changed meaning of block_index_for_slot[]. It didn't seem
to be used consistently. It also made use of the block numbers
in array data ordering, which is not directly relevant for syndrome
calculations.
- reduced number of args to autorepair and manual_repair
There don't need both stripes[] and blocks[]. And they don't need
diskP or diskQ.
blocks[-1] is the P chunk, blocks[-2] is the Q chunk.
block_index_for_slot[] can be used to find the target device for
a particular syndrome block.
- remove stripe locking from within manual_repair, and instead
use the global stripe locking used for check and autorepair.
- this necessitated changes to raid6_datap_recov and raid5_2data_reov
so the P and Q blocks could be before or after the data blocks.
Signed-off-by: NeilBrown <neilb@suse.de>
Earlier patch:
56fcbcbb6f
calculated the proper chunk size - but didn't use it..
Let's actually use it this time.
Signed-off-by: NeilBrown <neilb@suse.com>
The order of devices used for the syndrome calculation is not
the same as the order of data in the array.
The D block immediately after Q is first, then they continue
cyclicly in raid-disk order, skipping over the P disk if it is seen.
This gets the 'check' right for all layouts other than DDF, which is
quite different.
I haven't confirmed that this does't break repair.
Signed-off-by: NeilBrown <neilb@suse.de>
revert-inplace would sometimes find that the original reshape had
finished.
So slow down the reshaping during --stop (which needs to be a little
bit fast so that stop doesn't timeout waiting) and don't wait quite
so long before stopping.
Signed-off-by: NeilBrown <neilb@suse.de>
This checks that raid6check finds no errors in newly created array
with all different layouts.
(it doesn't...)
Signed-off-by: NeilBrown <neilb@suse.de>
If --save-logs is given we already save all logs to --logdir
If not, we should still save erroneous logs to --logdir.
Signed-off-by: NeilBrown <neilb@suse.com>
Some actions only appear in /proc/mdstat after a little delay,
so check in sync_action as well.
This applies when checking for recovery etc, and when waiting for idle.
Signed-off-by: NeilBrown <neilb@suse.de>
If the array is reshaping to more devices, then stopping
during that initial critical section is a bad idea.
So check for it and wait a bit.
Should probably handle final critical section of a reduction
too.
same-size reshape should be handled correctly already.
Signed-off-by: NeilBrown <neilb@suse.de>
A race can allow 'completed' to read as 2^63-1, which takes
a long time to count up to.
So guard against that possibility.
Signed-off-by: NeilBrown <neilb@suse.com>
If Wait() finds the array resync is 'frozen', then wait
a little while to avoid races, but don't wait forever.
Signed-off-by: NeilBrown <neilb@suse.com>
If a read fills the whole buffer, then we possibly
missed something of the end, and we definitely shouldn't
put a '\0' beyond the end, so just return an error.
This should never happen anyway.
Signed-off-by: NeilBrown <neilb@suse.com>
A 'devnm' never starts with '/', so this test is pointless.
The code should use the passed-in devname unless it is clearly
not usable. So fix it to do that.
Signed-off-by: NeilBrown <neilb@suse.de>
These both have the same value, and have done since the
'devnm' concept was introduced.
So discard the pointless duplicate.
Signed-off-by: NeilBrown <neilb@suse.de>
We can use the new added calc_bitmap_size func to remove some
redundant lines.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This extends nodes option for assemble mode, make the num of
cluster node could be change by user.
Before that, it is necessary to ensure there are enough space
for those nodes, calc_bitmap_size is introduced to calculate
the bitmap size of each node.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
To support change the cluster name, the commit do the followings:
1. extend original write_bitmap function for new scenario.
2. add the scenarion to handle the modification of cluster's name
in write_bitmap1.
3. let the cluster name also show in examine_super1 and detail_super1
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We want the clustered devices to be started exclusively by a cluster
resource-agent. So, avoid starting using the incremental option.
This also skips a clustered md from starting during boot in inactive mode.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This adds the ability to convert a regular md without bitmap
(--bitmap=none) to a clustered device (--bitmap=clustered).
To convert a device with --bitmap=internal or --bitmap=external,
you have to convert to --bitmap=none and then re-execute the
command with --bitmap=clustered.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
A clustered disk is added by the traditional --add sequence.
However, other nodes need to acknowledge that they can "see"
the device. This is done by --cluster-confirm:
--cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
or
--cluster-confirm SLOTNUM:missing (if disk is not found)
The node initiating the --add, has the disk state tagged with
MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
MD_DISK_CANDIDATE.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This adds capability of exmining bitmaps corresponding to all
nodes/slots on the device.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The home-cluster is stored in the bitmap super block of the
array. The device can be assembled on a cluster with the
cluster name same as the one recorded in the bitmap.
If home-cluster is not specified, this is auto-detected using
dlopen corosync cmap library.
neilb: allow code to compile when corosync-devel is not installed.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Specifies the maximum number of nodes in the cluster that may use
this device simultaneously. This is equivalent to the number of
bitmaps created in the internal superblock (patches to follow).
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
For a clustered MD, create bitmaps equal to number of nodes so
each node has an independent bitmap.
Only the first bitmap is has the bits set so that the first node
that assembles the device also performs the sync.
The bitmaps are aligned to 4k boundaries.
On-disk format:
0 4k 8k 12k
-------------------------------------------------------------------
| idle | md super | bm super [0] + bits |
| bm bits[0, contd] | bm super[1] + bits | bm bits[1, contd] |
| bm super[2] + bits | bm bits [2, contd] | bm super[3] + bits |
| bm bits [3, contd] | | |
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>