Every place where the paths for mdadm or mdmon is explicit,
it should use the BINDIR setting, not "/sbin/".
Reported-by: member graysky <graysky@archlinux.us> (https://bugs.archlinux.org/task/37330)
Signed-off-by: NeilBrown <neilb@suse.de>
Having RAMFS_MAGIC defined as 0x858458f6 causing problems when trying
to compare it directly against statfs.f_type being cast from long to
unsigned long.
This hack is extremly ugly, but it should at least do the right thing
for every situation.
Thanks to Arnd Bergmann for suggesting the fix.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
As soon as the array is assembled, udev or systemd might run
fsck and mount it. So we need to drop O_EXCL promptly.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ enough_fd doesn't use avail_disks any more, so discard it.
2/ Manage_Add increments 'found' at the wrong place, so it can
waste time before calling enough().
Signed-off-by: NeilBrown <neilb@suse.de>
commit b31df43682
changed load_super_imsm to not insist on finding a partition if
ignore_hw_compat was set.
Unfortunately this is set for '--assemble' so arrays could get
assembled badly.
The comment says this was to allow e.g. --examine of image files.
A better fixes for this is to change test_partitions to not report
a regular file as being a partition.
The errors from the BLKPG ioctl are:
ENOTTY : not a block device.
EINVAL : not a whole device (probably a partition)
ENXIO : partition doesn't exist (so not a partition)
Reported-by: "David F." <df7729@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The Anaconda installer (via its "loader" program) will try to kill
many processes at shutdown, but not "mdmon".
However when mdadm runs mdmon in the Anaconda environment, mdmon
sets argv[0][0] to '@' resulting in "@dmon" which confuses
"loader".
So change mdadm to set argv[0] to a path so that mdmon becomes e.g.
"@usr/sbin/mdmon"
which "loader" will recognise as being "mdmon".
Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
On some systems, this code caused a "comparison between signed
and unsigned" error.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
An ext[234] filesystem larger than 2TB was beign reported with
a negative size - which looks odd.
So fix it to use suitably large and unsigned values.
Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: NeilBrown <neilb@suse.de>
When testing we want to run mdmon directly, not use
systemctl to get systemd to run it.
So allow an environment variable to make that choice.
Signed-off-by: NeilBrown <neilb@suse.de>
Now that mdmon responds fairly well to SIGTERM, stop lying to
systemd about being started on the initrd.
Note that if mdmon is rerun (--takeover) for some reason, and systemd
chooses to kill processes before remounting / readonly, then the
unmount will hang.
If systemd ever lets us tell it that we don't want to be killed until
root is readonly, then we should do that.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently the extra space to leave before the data in the array
is calculated in two separate places, and they can be inconsistent.
Instead, do it all in validate_geometry. This records the
'data_offset' chosen which all other devices then use.
'write_init_super' now just uses the value rather than doing all the
calculations again.
This results in more consistent numbers.
Also, load_super sets st->data_offset so that it is used by "--add",
so the new device has a data offset matching a pre-existing device.
Signed-off-by: NeilBrown <neilb@suse.de>
There is only one called to find_free_devnum and it is in mdopen.c
The removes a dependency between util.c and config.c which allows
us to now drop config.o from mdmon.
Signed-off-by: NeilBrown <neilb@suse.de>
When we crete or assemble an array, we wait for udev to create the
device file in /dev so that as soon as mdadm complete, the device can
be used.
This waiting is performed in multiples of 200ms, which can sometimes
be too long to wait.
So change to an exponential backoff. Wait 1, then 2, then 4 msec etc.
Once we get to 256msec, stop backing off and continue waiting 256ms at
a time until we reach the limit which is now 4.608sec rather than 5sec
which it was before.
Ditto for open_dev_excl.
Signed-off-by: NeilBrown <neilb@suse.de>
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.
Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)
Move check_env() from util.c to lib.c
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit the NOFILE limit.
So raise the limit if it ever looks like it might be a problem.
Signed-off-by: NeilBrown <neilb@suse.de>
We call systemctl to see if systemd will run mdmon for us.
If it cannot, we run mdmon directly, so we aren't interested
in the error message.
So redirect stderr to /dev/null.
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
If launching mdmon via systemctl fails, we fall back to the old method
of fork/exec. This allows for having mdmon launched via systemctl
which avoids problems with it getting killed by systemd due to it
ending up in the parent's cgroup (udev).
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We still allow --offroot to be given - for compatibility with scripts
- but ignore it.
The whole point of --offroot is to get systemd to not auto-kill mdmon,
and we always want that.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
map_dev can be slow, and doesn't really provide a better result
than just creating a temporary device.
So discard it and use mknod/open/unlink to open a major:minor device.
Signed-off-by: NeilBrown <neilb@suse.de>
map_dev can be slow so it is best to not call it when
not necessary.
The final test in "find_free_devnum" is not relevant when
udev is being used, so remove the test in that case.
Signed-off-by: NeilBrown <neilb@suse.de>
It is important to check for compatibility with 'platform' or
Option ROM when creating or changing and array. However there is no
real need when simply assembling the array.
On some systems there are situations where the platform information is
not available. e.g. on some UEFI systems, UEFI is not available
during 'kdump' handling. This makes it impossible to assemble
an IMSM array to receive the dump.
So remove the requirements that the platform be visible to assemble
an IMSM array.
Signed-off-by: NeilBrown <neilb@suse.de>
open_container should open a container which contains the device,
but sometimes it would open another volume which contains the
device. Be more careful in 'holder' selection.
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ...
The size is in K unless a suffix: K M G is given.
The suffix 's' means sectors.
Signed-off-by: NeilBrown <neilb@suse.de>
We will shortly introduce --data-offset= which is allowed to
be zero. We will want to use parse_size() so it needs to be
able to return '0' without it being an error.
So define INVALID_SECTORS to be an impossible value (currently '1')
and return and test for it consistently.
Signed-off-by: NeilBrown <neilb@suse.de>
The 'enough' function is written to work with 'near' arrays only
in that is implicitly assumes that the offset from one 'group' of
devices to the next is the same as the number of copies.
In reality it is the number of 'near' copies.
So change it to make this number explicit.
Reported-by: Jakub Husák <jakub@gooseman.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
When using human_size_brief, only IEC prefixes were supported. Now
it's possible to specify which format we want to see - either IEC
(kibi, mibi, gibi) or JEDEC (kilo, mega, giga).
Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
It would be better if two size-calculating methods had the same
calculating algorithm. The human_size way of calculation seems
more readable, so let's use it for both methods.
Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
An error in parse_size() should be reported by 0, not -1,
because -1 is changed to the max value of unsigned long long
during calculations of size (e.g. at mdadm.c:412).
A negative value of size should be reported as error
(e.g. size equal -1 has been changed to the max value of
unsigned long long so far).
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This avoid code duplication for utilities that do not link to
util.c and everything that comes with it, such as test_restripe and
raid6check
Signed-off-by: NeilBrown <neilb@suse.de>
high-number names like "/dev/md126" shouldn't be in /etc/mdadm.conf,
but if they are they should be ignored when choosing an
unused number.
Signed-off-by: NeilBrown <neilb@suse.de>
malloc should never fail, and if it does it is unlikely
that anything else useful can be done. Best approach is to
abort and let some super-daemon restart.
So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit. Then use those
removing all the tests for failure.
Also replace all "malloc;memset" sequences with 'xcalloc'.
Signed-off-by: NeilBrown <neilb@suse.de>
When we can for devices using GET_DISK_INFO we currently
limit to 1024. But some arrays can have more than this.
So raise it to 4096 and make the constant a #define.
Signed-off-by: NeilBrown <neilb@suse.de>
It isn't sufficient to use '0' for 'error' as well will
later have fields that can validly be '0'.
So return "-1" on error.
Also fix parsing of --bitmap_check so that '0' is treated
as an error: we don't support 512B anyway.
Signed-off-by: NeilBrown <neilb@suse.de>
no point calling info_to_blocks_per_member when it just returns size*2 for level==1
calc_array_size can be used for all levels
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When result from strchr() is NULL and it is assigned to subarray,
NULL pointer can be passed to strdup() function and coredump file
is generated.
Subarray is checked for NULL pointer, so it is assumed that it can
be NULL at this moment.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
It can easily be calculated from 'avail' and 'raid_disks', and we
will soon have a case where we don't have it easily available to pass
in.
Signed-off-by: NeilBrown <neilb@suse.de>