Wait(): stat2devnm() returns NULL for non block devices. Check the
pointer is valid derefencing it. This can happen when using --wait,
such as the 'f' and 'd' file type, causing a core dump.
such as: ./mdadm --wait /dev/md/
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
mdmon.c: abort_reshape() has implemented in Grow.c,
this function doesn't make a lot of sense here.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
In Detail.c the buffer path in function Detail is defined as path[200],
in fact the max lenth of content which needs to write to the buffer is
287. Because the length of dname of struct dirent is 255.
During building it reports error:
error: ‘%s’ directive writing up to 255 bytes into a region of size 189
[-Werror=format-overflow=]
In function examine_super0 there is a buffer nb with length 5.
But it need to show a int type argument. The lenght of max
number of int is 10. So the buffer length should be 11.
In human_size function the length of buf is 30. During building
there is a error:
output between 20 and 47 bytes into a destination of size 30.
Change the length to 47.
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
There are many errors like 'error: this statement may fall through'.
But the logic is right. So add the flag Wimplicit-fallthrough=0
to disable the error messages. The method I use is from
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
#index-Wimplicit-fallthrough-375
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
mdadm:Both clustered and internal array don't need
to specify --bitmap when assembling array.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
In build and create mode:
--symlinks
Auto creation of symlinks in /dev to /dev/md, option --symlinks
must be 'no' or 'yes' and work with --create and --build.
In assemble mode:
--symlinks
See this option under Create and Build options.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Michael Shigorin reports that the 'lcc' compiler isn't able
to deduce that 'st' must be initialized in
if (c->SparcAdjust)
st->ss->update_super(st, NULL, "sparc2.2",
just because the only times it isn't initialised, 'err' is set non-zero.
This results in a 'possibly uninitialised' warning.
While there is no bug in the code, this does suggest that maybe
the code could be made more obviously correct.
So this patch:
1/ moves the "err" variable inside the for loop, so an error in
one device doesn't stop the other devices from being processed
2/ calls 'continue' early if the device cannot be opened, so that
a level of indent can be removed, and so that it is clear that
'st' is always initialised before being used
3/ frees 'st' if an error occured in load_super or load_container.
Reported-by: Michael Shigorin <mike@altlinux.org>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
It doesn't make sense to write_bitmap with less than 2 nodes,
in order to avoid 'write_bitmap' received invalid nodes number,
it would be better to do checking nodes in getopt operations.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
mdadm assumed that a pathname started with a "/", while an array
name didn't. This alters the logic so that if the first character
is not a "/" it tries to open an array, and if that fails it drops
through to the pathname code rather than terminating immediately
with an error.
Signed-off-by: Wol <anthony@youngman.org.uk>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
If user tries to migrate from raid0 to raid5 and there is no spare
drive to perform it - mdadm will exit with errorcode, but
no error message is printed.
Print error instead of debug message when this condition occurs,
so user is informed why requested migration is not started.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Updated the static version in the release, but forgot to fix the
Makefile generated version when extracting from git
Reported-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Number of blocks used to calculate array size is based on 512 block size
so the size displayed is incorrect for arrays with 4k disks.
Signed-off-by: Maksymilian Kunt <maksymilian.kunt@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
If it can't connect monitor, now the error message is just
Error waiting for xxx to be clean. Add detail error message
in connect_monitor.
Suggested-by: Oleg Samarin <osamarin68@gmail.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
OROM defines maximum number of arrays supported. On array creation mdadm
checks if number of arrays doesn't exceed that limit, however it is not
calculated correctly for VMD now.
The current code performs a lookup of HBA using the id. VMD HBAs have
the same id so each lookup returns the same structure (first
encountered). Take a different approach for VMD HBAs. As id is not
unique and cannot be used for lookups, iterate over all VMD HBAs and
compare both id and HBA path.
Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
When VMD is enabled but no drive is attached to the PCIe port, mdadm
crashes trying to parse the path. Skip entry if valid path has not been
returned. Do it early to avoid unnecessary memory allocation.
Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Prior to this patch there was an error during compiling
on 32-bit arch. This patch fixes this issue.
Reported-by: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Enable bad block support for imsm metadata as commit e522751d605d
("seq_file: reset iterator to first record for zero offset") has been
accepted in upstream kernel. Prior to that patch mdmon had not been able
to read bad blocks sysfs file.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This patch prevents mdadm from updating metadata if migration is
not possible. The same check is done in analyse_change(),
but in that place - metadata is already modified.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
get_component_size() still assumes that all array are
/sys/block/md%d or /sys/block/md_d%d
and so doesn't work with e.g. /sys/block/md_foo.
This cause "mdadm --detail" to report
Used Dev Size : unknown
and causes problems when added spares and in other circumstances.
So change it to use stat2devnm() which does the right thing with all
types of array names.
Reported-and-tested-by: Robert LeBlanc <robert@leblancnet.us>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This test cases checks data integrity of raid5 write back cache
under various scenarios:
degraded mode, non-overwrite, raid-5/6.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
For 4K disks 'endofpart' is an index of the last 4K sector used by partition.
mdadm is using number of 512-byte sectors, so value returned by
get_last_partition_end must be multiplied by 8 for devices with 4K sectors.
Also, unused 'ret' variable has been removed.
Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
mdadm is using invalid byte-offset while reading GPT header to get
partition info (size, first sector, last sector etc.). Now this offset
is hardcoded to 512 bytes and it is not valid for disks with sector
size different than 512 bytes because MBR and GPT headers are aligned
to LBA, so valid offset for 4k drives is 4096 bytes.
Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
IMSM doesn't set 'events' field with generation number, so sometimes mdadm
tries to re-assembly container using metadata which isn't most recent (e. g.
from spare disk).
Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This patch adds checking if platform (preOS) supports
non-Intel NVMe drives under VMD domain,
and - if so - allow creating IMSM Raid Volume
with those drives.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
There is no need to request write access when opening
the md device, as we never write to it, and none of the
ioctls we use require write access.
If we do open with write access, then when we close, udev notices that
the device was closed after being open for write access, and it
generates a CHANGE event.
This is generally unwanted, and particularly problematic when mdadm is
trying to --stop the array, as the CHANGE event can cause the array to
be re-opened before it completely closed, which results in a new mddev
being allocated.
So just use O_RDONLY instead of O_RDWR.
Reported-by: Marc Smith <marc.smith@mcc.edu>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Implement "--examine-badblocks" command to provide list of bad blocks in
metadata for a disk.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Provide list of bad blocks using memory allocated in advance so it's
safe to call it from monitor.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
If a disk fails or goes missing, clear the bad blocks associated with it
from metadata. If necessary, update disk ordinals.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Check for a duplicate first or try to merge it with existing bad block.
If block range exceeds BBM_LOG_MAX_LBA_ENTRY_VAL (256) blocks, it must
be split into multiple ranges. Fail if maximum number of bad blocks has
been already reached.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
On create set bad block support flag for each drive. On assmble also
provide a list of known bad blocks. Bad blocks are stored in metadata
per disk so they have to be checked against volume boundaries
beforehand.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Pre-allocate memory for largest possible bad block section when monitor
is being opened to avoid a need for memory allocation on metadata sync.
If memory for a structure has been allocated in mpb buffer but it hasn't
been used yet, it will be taken by next buffer grow, leading to
insufficient memory on metadata flush. Start tracking such memory and
take it into calculation when growing a buffer. Also assert has been
added to debug mode to warn when more metadata has been written than
memory allocated.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Always allocate memory for all log entries to avoid a need for memory
allocation when monitor requests to record a bad block.
Also some extra checks added to make static code analyzer happy.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
We currently use '1' to indicate that a flag (writemostly or failfast)
needs to be set, and '2' to indicate that it needs to be cleared.
Using magic number like this is not a best-practice.
So replaced them with values from a enum.
No functional change.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
If an update of acknowledged bad blocks file is notified, read entire
bad block list from sysfs file and compare it against local list of bad
blocks. If any obsolete entries are found, remove them from metadata.
As mdmon cannot perform any memory allocation, new superswitch method
get_bad_blocks is expected to return a list of bad blocks in metadata
without allocating memory. It's up to metadata handler to allocate all
required memory in advance.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
If md has changed the state to 'blocked' and metadata handler supports
bad blocks, try process them first. If metadata handler has successfully
stored bad block, acknowledge it to md via 'badblocks' sysfs file. If
metadata handler has failed to store the new bad block (ie. lack of
space), remove bad block support for a disk by writing "-external_bbl"
to state sysfs file. If all bad blocks have been acknowledged, request
to unblock the array.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Acked-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Open 'badblocks' and 'unacknowledged_bad_blocks' sysfs files for each
disk in the array. Add them to the list of files observed by monitor.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
If metadata handler provides support for bad blocks, tell md by writing
'external_bbl' to rdev state file (both on create and assemble),
followed by a list of known bad blocks written via sysfs 'bad_blocks'
file.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This patch adds updataing num_data_stripes during reshape.
Previously this field once set during creation was never updated.
Also, num_data_strips value multipied by chunk_size is used
for set proper component size for RAID5.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Maksymilian Kunt <maksymilian.kunt@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Allow per-device "failfast" flag to be set when creating an
array or adding devices to an array.
When re-adding a device which had the failfast flag, it can be removed
using --nofailfast.
failfast status is printed in --detail and --examine output.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Bad block support has incremented sysfs disk state reported by kernel
("external_bbl") so it became longer than 20 bytes. It causes reshape to
fail as it reads truncated entry from sysfs.
Increase buffer so it can accommodate the string including all state
values currently implemented in kernel at the same time.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
'unacknowledged_bad_blocks' is a long name for sysfs property and it
makes sysfs path over 50 characters long. Increase buffer to the double
length of the longest path available in sysfs at the moment.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Convert general migration record for 4Kn drives prior to write and post
read. Calculate record location based on sector size, don't just assume
it's 512. Assure buffer address is aligned to 4096 so write operation
avoids caching.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This patch adds support for drives with 4Kn sector size
for IMSM metadata. Mixing member drives with 4kn and 512
is not allowed. Some offsets were aligned with sector size.
Internal metadata representation and all calculations
are still based on 512-byte sector sizes. This
implementation converts only sector based values
when reading/writing to drive, because they needs to be
stored in metadata according to accual member drive sector size.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This patch adds retriving device sector size at startup
and set it in intel_super, so it can be used in other places.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>