IMSM does more than comparing metadata and errors reported directly
from compare_super_imsm can be useful.
Add verbose flag to compare_super method and make all not critical
error printing configurable.
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
With kernel commit 480523feae58 ("md: only call set_in_sync() when it
is expected to succeed."), mddev->in_sync in clustered array is always
zero. It makes metadata resync_offset to always zero.
When assembling a clusterd array with "-U no-bitmap" option, kernel
md layer "mddev->resync_offset == 0" and "mddev->bitmap == NULL" will
trigger raid1 do sync on every bitmap chunk. the sync action is useless,
we should avoid it.
Related kernel flow:
```
md_do_sync
mddev->pers->sync_request
raid1_sync_request
md_bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1)
__bitmap_start_sync(bitmap, offset,&blocks1, degraded)
if (bitmap == NULL) {/* FIXME or bitmap set as 'failed' */
*blocks = 1024;
return 1; /* always resync if no bitmap */
}
```
Reprodusible steps:
```
node1 # mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sd{a,b}
node1 # mdadm -Ss
(in another shell, executing & watching: watch -n 1 'cat /proc/mdstat')
node1 # mdadm -A -U no-bitmap /dev/md0 /dev/sd{a,b}
```
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
I've seen a case where the dev_roles list of a linear array
was corrupt. ->max_dev was > 128 and > raid_disks, and the
extra slots were '0', not 0xFFFE or 0xFFFF.
This caused problems when a 128th device was added.
So:
1/ make Grow_Add_device more robust so that if numbers
look wrong, it fails-safe.
2/ make examine_super1() report details if the dev_roles
array is corrupt.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Now it only adds bitmap offset based on cluster nodes. It's not right. It needs to
add per node bitmap space to find next node bitmap position.
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Once the RAID0 layout has been set, the RAID0 array cannot be assembled
on an older kernel which doesn't understand layouts.
This is an intentional safety feature, but sometimes people need the
ability to roll-back to a previously working configuration.
So add "--update=layout-unspecified" to remove RAID0 layout information
from the superblock.
Running "--assemble --update=layout-unspecified" will cause the assembly
the fail when run on a newer kernel, but will allow it to work on
an older kernel.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
For 1.x metadata, when the user requested creation of an array on
component devices that were too small even to hold the superblock,
an undetected integer wraparound (underflow) resulted in an enormous
computed size which resulted in various follow-on errors such as
floating-point exception.
This patch detects this condition, prints a reasonable diagnostic
message, and refuses to continue.
Signed-off-by: David Favro <dfavro@meta-dynamic.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
If you have a RAID0 array with varying sized devices
on a kernel before 5.4, you cannot assembling it on
5.4 or later without explicitly setting the layout.
This is now possible with
--update=layout-original (For 3.13 and earlier kernels)
or
--update=layout-alternate (for 3.14 and later kernels)
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Since Linux 5.4 a layout is needed for RAID0 arrays with
varying device sizes.
This patch makes the layout of an array visible (via --examine)
and sets the layout on newly created arrays.
--layout=dangerous
can be used to avoid setting a layout so that they array
can be used on older kernels.
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Fixes the side effect of the patch b6180160f ("imsm: save current_vol number")
- wrong UUID is printed in detail for each volume.
New parameter "subarray" is added to determine what info should be extracted
from metadata (subarray or container).
The parameter affects only IMSM metadata.
Signed-off-by: Blazej Kucman <blazej.kucman@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Within the output of "mdadm --examine", there are three sizes reported
on adjacent lines. For example:
$ sudo mdadm --examine /dev/md3
[...]
Avail Dev Size : 17580545024 (8383.06 GiB 9001.24 GB)
Array Size : 17580417024 (16765.99 GiB 18002.35 GB)
Used Dev Size : 11720278016 (5588.66 GiB 6000.78 GB)
[...]
This can be confusing, since the first and third line are in 512-byte
sectors, and the second is in KiB.
Add units to avoid ambiguity.
(I don't particularly like the "KiB" notation, but it is at least
unambiguous.)
Signed-off-by: Corey Hickey <bugfood-c@fatooh.org>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Find the string length, copy it, and zero out the rest, instead of
relying on strncpy cleaning up for us.
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Previously, the dlm locking only protects several
functions which writes to superblock (update_super,
add_to_super and store_super), and we missed other
funcs such as add_internal_bitmap. We also need to
call the funcs which read superblock under the
locking protection to avoid consistent issue.
So let's remove the dlm stuffs from super1.c, and
provide the locking mechanism to the main() except
assemble mode which will be handled in next commit.
And since we can identify it is a clustered raid or
not based on check the different conditions of each
mode, so the change should not have effect on native
array.
And we improve the existed locking stuffs as follows:
1. replace ls_unlock with ls_unlock_wait since we
should return when unlock operation is complete.
2. inspired by lvm, let's also try to use the existed
lockspace first before creat a lockspace blindly if
the lockspace not released for some reason.
3. try more times before quit if EAGAIN happened for
locking.
Note: for MANAGE mode, we do not need to get lock if
node just want to confirm device change, otherwise we
can't add a disk to cluster since all nodes are compete
for the lock.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
PPL area should be cleared before creation/force assemble.
If the drive was used in other RAID array, it might contains PPL from it.
There is a risk that mdadm recognizes those PPLs and
refuses to assemble the RAID due to PPL conflict with created
array.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Add support for super1 with multiple ppls. Extend ppl area size to 1MB.
Use 1MB as default during creation. Always start array as single ppl -
if kernel is capable of multiple ppls and there is enough space reserved -
it will switch the policy during first metadata update.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
If no bitmap is present, then the test
if (__le32_to_cpu(bsb->nodes) > 1)
accesses uninitialised memory. So move that test inside
a test for a bitmap being present.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Currently if metadata requires more then 1M,
data offset will be rounded down to closest MB.
This is not correct, since less then required space is reserved.
Always round data offset up to multiple of 1M.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
The value of sb->max_dev will always be increased by 1 when adding
a new disk in linear array. It causes an inconsistence between each
disk in the array and the "Array State" value of "mdadm --examine DISK"
is wrong. For example, when adding the first new disk into linear array
it will be:
Array State : RAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
('A' == active, '.' == missing, 'R' == replacing)
Adding the second disk into linear array it will be
Array State : .AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
('A' == active, '.' == missing, 'R' == replacing)
Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Code is 80 characters wide, so lets try to respect that. In addition, we
should never have one-line 'if () action()' statements. Fixup various
whitespace abuse.
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
mdassemble doesn't handle container based arrays, no support for sysfs,
etc. It has not been actively maintained for years, so time to send it
off to retirement.
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Some hard-coded values for disk status are replaced
with bit definitions.
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
This can be used with --assemble for super1 and with --update-subarray
for imsm to enable or disable PPL in the metadata.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Enable creating and assembling raid5 arrays with PPL for 1.x metadata.
When creating, reserve enough space for PPL and store its size and
location in the superblock and set MD_FEATURE_PPL bit. Write an initial
empty header in the PPL area on each device. PPL is stored in the
metadata region reserved for internal write-intent bitmap, so don't
allow using bitmap and PPL together.
While at it, fix two endianness issues in write_empty_r5l_meta_block()
and write_init_super1().
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Show the currently enabled consistency policy in the output from
--detail. Add 3 spaces to all existing items in Detail output to align
with "Consistency Policy : ".
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Add a new parameter to mdadm: --consistency-policy=. It determines how
the array maintains consistency in case of unexpected shutdown. This
maps to the md sysfs attribute 'consistency_policy'. It can be used to
create a raid5 array using PPL. Add the necessary plumbing to pass this
option to metadata handlers. The write journal and bitmap
functionalities are treated as different policies, which are implicitly
selected when using --write-journal or --bitmap options.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
There is corner case for setting device role,
if new device has failfast flag.
The failfast flag should be ignored.
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
It doesn't make sense to write_bitmap with less than 2 nodes,
in order to avoid 'write_bitmap' received invalid nodes number,
it would be better to do checking nodes in getopt operations.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Allow per-device "failfast" flag to be set when creating an
array or adding devices to an array.
When re-adding a device which had the failfast flag, it can be removed
using --nofailfast.
failfast status is printed in --detail and --examine output.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This patch introduces the function for getting sector size of
given device (fd).
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Commit f79bbf4f69 ("super1: don't put the bblog at the end of the free
space.") changed the location of the bad block log to be after the
write-intent bitmap, but a fixed offset was used and it can make bbl
overlap with the bitmap, especially when using a small bitmap chunk.
This patch changes it to use the actual offset and size of the bitmap.
It also joins the cases for v1.1 and v1.2 superblock because the code
was very similar.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Determining internal bitmap size is performed using two different
functions (bitmap_sectors() and calc_bitmap_size()) and in
getinfo_super1() it is calculated in yet another way. Each of these
methods give slightly different results. The most accurate is
calc_bitmap_size() but it also has a rounding issue. So:
- fix the rounding issue in calc_bitmap_size() using bitmap_bits()
- replace usages of bitmap_sectors() and open-coded calculations with
calc_bitmap_size()
- remove bitmap_sectors()
- move bitmap_bits() to mdadm.h as inline - otherwise mdassemble won't
compile (it does not use bitmap.c)
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
For older mdadm version, v1.x metadata has different bitmap_offset,
we can't ensure all the bitmaps are on a 4K boundary since writing
4K for bitmap could corrupt the superblock, and Anthony reported
the bug about it at below link.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837964
So let's check about the alignment for bitmap_offset before set
the boundary to 4096 unconditionally. Thanks for Neil's detailed
explanation.
Reported-by: Anthony DeRobertis <anthony@derobert.net>
Fixes: 95a05b37e8 ("Create n bitmaps for clustered mode")
Cc: Neil Brown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
struct mdp_superblock_1.set_name is 32B long, but struct mdinfo.name
is 33B long. So we need strncpy instead strcpy to avoid buffer
overflow.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This gets rid of 5 nearly identical copies of the same code, and
reduces the binary size of mdadm by over 700 bytes on x86_64.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
There are some cases which didn't need to check the space
is enough or not for NodeNumUpdate option.
1. for array which does not have clustered bitmap.
2. "--nodes" parameter is 0 (eg, add a disk to clustered raid).
3. if "--nodes" parameter is set to a smaller num than
current bms->nodes.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
add_internal_bitmap() returned 1 on success and 0 on error which is
inconsistent. This changes it to return 0 on success and use more
reasonable error codes on error.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
We missed to check the version is BITMAP_MAJOR_CLUSTERED
or not, otherwise mdadm can't create array with other 1.x
metadatas (1.0 and 1.1).
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
We at least need two nodes for cluster raid so make the
check before update node nums.
Reported-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
load_super1() did not clear memory allocated for the superblock +
bitmap. This causes issues if the superblock does not contain a bitmap
as later checks of bitmap features would rely on the bits being
cleared.
This bug has been around for a long time, but was only exposed in
mdadm-3.4 with the introduction of the clustering code.
Reported-by: Jan Stodola <jstodola@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Actually, we need to use NodeNumUpdate here to
ensure there are enough spaces for those nodes.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
cmap_get_string() used to retrieve cluster_name does not restrict it's
size. To prevent buffer overflows use the size of the destination
buffer, not strlen() of the source, and null terminate the copied
string.
Fixes: 0aa2f15b ("mdadm: add the ability to change cluster name)"
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
For cluster raid, we need to displays bitmap related
contents from different bitmaps which are based on node
num. So bitmap_file_open and locate_bitmap are changed a
little bit for the purpose.
Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Fixes: b98043a2f8 ("Show all bitmaps while examining bitmap")
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
bblog_size is 16bit so using le32_to_cpu on it is not wise
and leads to errors on big-endian machines.
Change all such calls to use le16.
Bug was introduced in mdadm-3.3
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
A simple revert doesn't work here because the reshape_position is
in the critical section.
The best approach is to let the reshape progress a bit and then
go backwards.
If that isn't possible, assembling with --update=revert-reshape and
--invalid-backup should work.
Reported-by-tested-by: George Rapp <george.rapp@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
This code was meant to update 'earliest' but clearly never doesn't.
This bug would only affect an array with a very large bitmap so it is unlikely
to be significant.
Signed-off-by: NeilBrown <neilb@suse.com>
This forcibly removed the bad-block log. There can be situations where it is hard to
remove bad blocks by writing to them - partiularly on RAID5.
Signed-off-by: NeilBrown <neilb@suse.com>
This commit does the following jobs:
1. rename is_clustered to dlm_funs_ready since it match the
function better.
2. st->cluster_name can't be use to identify the raid is a
clustered or not, we should check the bitmap's version to
perform the identification.
3. for cluster_get_dlmlock/cluster_release_dlmlock funcs, both
of them just need the lockid as parameter since the cluster
name can get by get_cluster_name().
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>