The purpose od devid2devnm is to return a kernel name of an
md device, whether that device is a whole device or a partition,
we want the whole device. md4, never md4p2.
In one place I was using devid2devnm where I really wanted the
partition if there was one ... and wasn't really interested in it
being an md device.
So introduce a new 'devid2kname' for that case.
Signed-off-by: NeilBrown <neilb@suse.de>
To be able to revert-reshape of raid4/5/6 which is changing
the number of devices, the reshape must has been stopped on a multiple
of the old and new stripe sizes.
The kernel only enforces the new stripe size multiple.
So we enforce the old-stripe-size multiple by careful use of
"sync_max" and monitoring "reshape_position".
Signed-off-by: NeilBrown <neilb@suse.de>
We have several places that wait for activity on a sysfs
file. Combine most of these into a single 'sysfs_wait' function.
Signed-off-by: NeilBrown <neilb@suse.de>
The moment we change a RAID0 to a RAID5 it will try to recovery. This
will abort quite quickly as there are not spare devices, but it could
confuse the attempt to freeze the array.
So allow 'freeze' to work even on a recovering array.
Signed-off-by: NeilBrown <neilb@suse.de>
When we read devices from sysfs (../md/dev-*), store them in the same
order that they appear. That makes more sense when exposed to a
human (as the next patch will).
Signed-off-by: NeilBrown <neilb@suse.de>
Array can be inactive when e.g. -I is in the process of assembling them.
This change allows --detail to report limited information about
these arrays.
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
Commit 5da9ab9874
Grow_reshape re-factor
in mdadm-3.2 broke conversion from RAID5 and RAID1 - and we
never noticed.
This fixes it.
Signed-off-by: NeilBrown <neilb@suse.de>
We must only remove from a container if the device isn't a
member of any member array.
To check we look at the 'holders' directory in sysfs.
We currently skip that check if ->devname is "detached", however
that can never be true since the change that introduced
add_detached().
Also sysfs_unique_holder returns status in 'errno' which isn't
entirely safe as e.g. closedir() is probably allowed to clear it.
So make sysfs_unique_holder return an unambigious value, and us
it to decide what to report.
Signed-off-by: NeilBrown <neilb@suse.de>
malloc should never fail, and if it does it is unlikely
that anything else useful can be done. Best approach is to
abort and let some super-daemon restart.
So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit. Then use those
removing all the tests for failure.
Also replace all "malloc;memset" sequences with 'xcalloc'.
Signed-off-by: NeilBrown <neilb@suse.de>
mdinfo->bitmap_offset is a signed long and needs to be treated as
such when passed to the kernel.
This resolves the problem with adding internal bitmaps to a 1.0 array.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
As the bitmap can be before the superblock, bitmap_offset is signed.
But some of the code didn't honour that :-(
Signed-off-by: NeilBrown <neilb@suse.de>
If the array is already frozen when Manage_subdevs is called we don't
want it to unfreeze the array.
This is because Grow calls Manage_subdevs to add devices to an array
being reshaped, and the array must stay frozen over this call.
So if sysfs_freeze_array find the array to be frozen it returns '0',
meaning that it didn't and cannot freeze it. Then the caller will not
try to unfreeze, which is good.
Signed-off-by: NeilBrown <neilb@suse.de>
Adding a bitmap via ioctl can only add it at a fixed location.
That location is not suitable for 4K-block devices.
So allow setting the bitmap location via sysfs if kernel supports it
and aim to always use 4K alignments.
Signed-off-by: NeilBrown <neilb@suse.de>
readlink() returns the number of bytes in the buffer.
If we do something like
len = readlink(path, buf, sizeof(buf));
buf[len] = '\0';
we might write one byte past the end of the buffer.
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When sysfs_set_array() function is called, it tests if array
can be configured using sysfs. Setting metadata_version entry
can accidentally unblock mdmon when array is under reshape.
To avoid this, blocking character '-' is checked and if is is set,
it is used for array test.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Not sure how this ever worked, but now we just try to parse a directory
name that looks like <host>:<bus>:<target>:<lun>.
Array creation segfaults on Fedora 14 without this.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
During migration raid_disks field contains new disks number now.
It should be set old disks number first and then new disks number
to allow md to calculate e.g. delta_disks parameter.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Now that the external metadata handler must provide an md-compatible
old/new geometry, sys_set_array can do all of the array set-up for
an array that is undergoing reshape.
That leave less for reshape_array to do.
Also clean up how reshape_array tells if the reshape has started or
not.
Don't use ->reshape_active as that doesn't tell us anything consistent
at this stage, only use the 'restart' flag passed in.
Signed-off-by: NeilBrown <neilb@suse.de>
Problem: sysfs_disk_to_scsi_id() not returns correct scsi_id value.
Reason: sysfs format has been changed
This patch adapt sysfs_disk_to_scsi_id() to new sysfs format.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
- when reshaping a container, ->reshape_active is already set
even though it isn't really active yet, so we need to set
the new geometry even when reshape_active is set. This is safe.
- When restarting a reshape, make sure the reshape_position is set
appropriately when external metadata is used.
Signed-off-by: NeilBrown <neilb@suse.de>
Sometimes we want to convert a devnum to a devname without allocating
memory. So provide function to do the formatting without allocation.
Signed-off-by: NeilBrown <neilb@suse.de>
When disk is added, set valid slot numbers (positive) only.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
In order to support reshape and atomic removal of spares from containers
we need to prevent mdmon from activating spares. In the reshape case we
additionally need to freeze sync_action while the reshape transaction is
initiated with the kernel and recorded in the metadata.
When reshaping a raid0 array we need to freeze the array *before* it is
transitioned to a redundant raid level. Since sync_action does not exist
at this point we extend the '-' prefix of a subarray string to flag
mdmon not to activate spares.
Mdadm needs to be reasonably certain that the version of mdmon in the
system honors this 'freeze' indication. If mdmon is not already active
then we assume the version that gets started is the same as the mdadm
version. Otherwise, we check the version of mdmon as returned by the
extended ping_monitor() operation. This is to catch cases where mdadm
is upgraded in the filesystem, but mdmon started in the initramfs is
from a previous release.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
...i.e. GET_DEVS == (GET_DEVS|SKIP_GONE_DEVS)
A null pointer dereference in Incremental.c can be triggered by
replugging a disk while the old name is in use. When mdadm -I is called
on the new disk we fail the call to sysfs_read(). I audited all the
locations that use GET_DEVS and it appears they can tolerate missing a
drive. So just make SKIP_GONE_DEVS the default behaviour.
Also fix up remaining unchecked usages of the sysfs_read() return value.
Reported-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that mdmon records periodic checkpoints, and checkpoints every
->set_array_state() event we no longer need to 'idle' sync_action from
--wait-clean.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Minimal changes needed to permit reassembling partially recovered
external metadata arrays. The biggest logical change is that
->container_content() can now surface partially rebuilt members rather
than omitting them from the disk list.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The patch increases the capacity of buffers used to store
sysfs path names. Originally the buffers were too small to
hold the canonical representation of sysfs path (in case
of a SAS device, especially a device installed behind an
expander).
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Connect to the monitor in the old namespace and use that connection for
WaitClean requests when stopping the victim mdmon instance. This allows
ping_monitor() to work post chroot().
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When building container members with -IR, we need to ensure that
devices added to an active array preserve the 'in_sync' status so they
don't needlessly get rebuilt.
So allow sysfs_add_disk to do this (only works in kernels since
2.6.30) and pass the relevant flag down.
Signed-off-by: NeilBrown <neilb@suse.de>