If the restarted reshape needs a backup file and we don't have one,
that should be reported before we try to start the array.
Also we shouldn't say the "Cannot grow" but "cannot complete".
Signed-off-by: NeilBrown <neilb@suse.de>
To be able to revert-reshape of raid4/5/6 which is changing
the number of devices, the reshape must has been stopped on a multiple
of the old and new stripe sizes.
The kernel only enforces the new stripe size multiple.
So we enforce the old-stripe-size multiple by careful use of
"sync_max" and monitoring "reshape_position".
Signed-off-by: NeilBrown <neilb@suse.de>
We have several places that wait for activity on a sysfs
file. Combine most of these into a single 'sysfs_wait' function.
Signed-off-by: NeilBrown <neilb@suse.de>
For RAID10, we must have head/tail space for reshape.
For RAID4/5/6 we can use a spare or a backup file.
So make that distinction.
Signed-off-by: NeilBrown <neilb@suse.de>
When changing the chunksize of an array, the new chunksize must
divide the device size.
If it doesn't we report a very brief message.
Make this message a bit longer and suggest a way forward be reducing
the size of the array.
Reported-by: Mark Knecht <markknecht@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
There are now 3 places which change level.
And they all do it slightly differently with different
messages etc.
Make a single function for this and use it.
Signed-off-by: NeilBrown <neilb@suse.de>
When converting to RAID0, all spares and non-data drives
need to be removed first.
It is possible that the first HOT_REMOVE_DISK will fail because the
personality hasn't let go of it yet, so retry a few times.
Signed-off-by: NeilBrown <neilb@suse.de>
After changing the level, the meaning of layout numbers changes,
so we will keeping a new_layout value around can cause later confusion.
Signed-off-by: NeilBrown <neilb@suse.de>
This means it will be set for a "--data-offset" only reshape so that
case doesn't complain that the array is getting smaller.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ ignore failed devices - obviously
2/ We need to tell the kernel which direction the reshape should
progress even if we didn't choose the particular data_offset
to use.
Signed-off-by: NeilBrown <neilb@suse.de>
Setting new_offset can fail if the v1.x "data_size" is too small.
So if that happens, try increasing it first by writing "0".
That can fail on spare devices due to a kernel bug, so if it doesn't
try writing the correct number of sectors.
Signed-off-by: NeilBrown <neilb@suse.de>
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit the NOFILE limit.
So raise the limit if it ever looks like it might be a problem.
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible that the devices in an array have different sizes, and
different data_offsets. So the 'before_space' and 'after_space' may
be different from drive to drive.
Any decisions about how much to change the data_offset must work on
all devices, so must be based on the minimum available space on
any devices.
So find this minimum first, then do the calculation.
Signed-off-by: NeilBrown <neilb@suse.de>
If space_after and space_before are zero (the default) then assume that
metadata doesn't support changing data_offset.
Signed-off-by: NeilBrown <neilb@suse.de>
If we can modify the data_offset, we can avoid doing any backups at all.
If we can't fall back on old approach - but not if --data-offset
was requested.
Signed-off-by: NeilBrown <neilb@suse.de>
raid10 currently uses the 'backup_blocks' field to store something
else: a minimum offset change.
This is bad practice, we will shortly need to have both for RAID5/6,
so make a separate field.
Signed-off-by: NeilBrown <neilb@suse.de>
For RAID5, not being able to set new_data_offset because of
old kernel is not a problem. So make this fatal on for RAID10.
Also remove an unused assignment to 'rv'.
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
As 'layout' doesn't map neatly from RAID4 to RAID5, we need to
set it correctly for RAID4.
Also, when no reshape is needed we should set re->level to the final
desired level.
Signed-off-by: NeilBrown <neilb@suse.de>
commit 1f9b0e2845
Grow - be careful about 'delayed' reshapes.
Introduced a bug where a list of devices longer than 1
would cause an infinite loop. Oops.
Signed-off-by: NeilBrown <neilb@suse.de>
It fixes the following uninitialized variables compilation-time error:
WARN - Grow.c: In function ‘reshape_array’:
WARN - Grow.c:2413:21: error: ‘min_space_after’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
WARN - Grow.c:2376:39: note: ‘min_space_after’ was declared here
WARN - Grow.c:2414:22: error: ‘min_space_before’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
WARN - Grow.c:2376:21: note: ‘min_space_before’ was declared here
WARN - cc1: all warnings being treated as errors
WARN - make: *** [Grow.o] Error 1
It occurs during compilation of mdadm on Fedora 17.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Commit 5da9ab9874
Grow_reshape re-factor
in mdadm-3.2 broke conversion from RAID5 and RAID1 - and we
never noticed.
This fixes it.
Signed-off-by: NeilBrown <neilb@suse.de>
As it was the code would crash due to "mdstat" being NULL.
Code is now more sane, but hasn't been tested on an array that
needs to grow.
Signed-off-by: NeilBrown <neilb@suse.de>
When reducing the number of devices in a RAID10, we increase the
data offset to avoid the need for backup area.
If there is no room at the end of the device to allow this, we need
to first reduce the component size of each device. However if there
is room, we don't want to insist on that, otherwise growing then
shrinking the array would not be idempotent.
So find the min before/after space before analysing a RAID10 for
reshape, and if the after space is insufficient, reduce the total size
of the array and the component size accordingly.
Signed-off-by: NeilBrown <neilb@suse.de>
RAID10 reshape requires that data_offset be changed.
So we only allow it if the new_data_offset attribute is available,
and we compute a suitable change in data offset.
Signed-off-by: NeilBrown <neilb@suse.de>
This can be used to over-ride the automatic assignment of
data offset.
For --create, it is useful to re-create old arrays where different
defaults applied.
For --grow it may be able to force a reshape in the reverse direction.
Signed-off-by: NeilBrown <neilb@suse.de>
If multiple reshapes are activated on the same devices (different
partitions) then one might be forced to wait for the other to
complete.
As reshaping suspends access to small sections of the array
at time, this cause a region to be suspended for a long time,
which isn't good.
To try to detect this and don't start suspending until
the reshape is actually happening.
This is only effective on 3.7 and later as prior kernels
don't report when the delayed reshape can progress. For
the earlier kernels, just give a warning.
Signed-off-by; NeilBrown <neilb@suse.de>
The 'size' has been changed to be unsigned recently.
Analogous changes should be made to reshape_super().
'0' should be used in case of 'no change' now.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
commit d04f65f48c
Change the values for "max size" from -1 to 1.
Messed up 's->size' - leaving it as '1' (MAX_SIZE) in some cases and
causing the array reshape to fail.
Signed-off-by: NeilBrown <neilb@suse.de>
Both are impossible, and '1' allows size to be unsigned,
which is neater.
Also #define MAX_SIZE to be '1' to make it all more explicit.
Signed-off-by: NeilBrown <neilb@suse.de>
If we change some functions to accept 'verbose', where <0 means to be
quiet, in place of 'quiet', then we will be able to merge
'quiet' and 'verbose' together for simplicity.
Signed-off-by: NeilBrown <neilb@suse.de>
malloc should never fail, and if it does it is unlikely
that anything else useful can be done. Best approach is to
abort and let some super-daemon restart.
So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit. Then use those
removing all the tests for failure.
Also replace all "malloc;memset" sequences with 'xcalloc'.
Signed-off-by: NeilBrown <neilb@suse.de>
This is most likely to happen if the array has been stopped,
in which case the error is pointless.
Reported-by: Patrik Horník <patrik@dsl.sk>
Signed-off-by: NeilBrown <neilb@suse.de>
I think there was some confusion about what --layout=preserve
actually means, but in any case it wasn't doing what the man
page says it should.
So add some case analysis and make sure it does the right thing,
or complains if it cannot.
Reported-by: Patrik Horník <patrik@dsl.sk>
Signed-off-by: NeilBrown <neilb@suse.de>
mdinfo->bitmap_offset is a signed long and needs to be treated as
such when passed to the kernel.
This resolves the problem with adding internal bitmaps to a 1.0 array.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Setting "sync_action" to "idle" while extending size of raid0 array
is racy and sometimes fails.
"sync_action" should be set to "frozen" instead.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Size change is possible as standalone change only. To make sure size change
is not requested pass '-1' as size parameter.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Some setting size error cases were not detected.
When error occurs, stop setting new size action and rollback metadata
changes.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When reshape_super() updates metadata with new size, due to some metadata
limitations saved value can be different than requested value by user.
Update size (read it from metadata) for setting it in md.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
For raid0, takeover operation is required for size change.
Add takeover to degraded raid4 before size change and back to raid0 after.
Array information has to be read again from md after takeover.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Function reshape_super() guards metadata changes.
It is used to apply changes rollback in error case also.
As change (apply and rollback) can be not bi-directional reshape_super()
has to know if current action is metadata change that should be guarded
using metadata restrictions, or this is metadata rollback change
executed due to error occurrence.
In second case change has to be unconditional.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
For external metatdata ioctl doesn't set new size. Set new size using sysfs.
Put code for size change in to function to re-use the same code as during
On-line Capacity Expansion
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
RAID1 can only be converted to RAID0 or RAID5 if the size is
a multiple of 4K as we cannot have chunks smaller than 4K.
If this might happen, report a useful error message.
Signed-off-by: NeilBrown <neilb@suse.de>
If a RAID6 array is in a state which doesn't have a
RAID5 equivalent, the code currently dereferences a NULL.
If it does have an equivalent - use that.
If it doesn't but it already in the RAID5-compatible layout
with the Q block last, handle that case,
else require the new layout to be explicitly requested.
Signed-off-by: NeilBrown <neilb@suse.de>
Reading sysfs entry that is '0' long should cause an error.
Reshape position cannot be empty.
Absence of reshape position should be ignored. It is possible
that we are about raid0 reshape continuation and it is before takeover.
This means that according metadata (changed by mdmon) it should be reshaped
but md knows nothing about it at this moment. Reshape continuation
in reshape_array() will change it to raid4 and reshape position appears
in sysfs.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When one of arrays is inactive, do not try to continue reshape
on this array. Just skip it.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When reshape is restarted from '0', very begin of array
it is possible that for external metadata reshape and array
configuration doesn't happen.
Check if md has the same opinion, and reshape is restarted
from 0. If so, this is regular reshape start after reshape
switch in metadata to next array only.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When reshape is broken, it can occur that metadata is not saved properly.
This can cause that reshape process is farther in md than metadata states.
On reshape restart use md position as start position, if it is farther than
position specified in metadata. Opposite situation treat as error.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Using takeover operation for grow purposes, mdadm has to be sure
that mdmon processes all updates, and if necessary it will be closed
at takeover to raid0 operation. If mdmon is late, next array in container
is processed and due to race condition mdmon closes itself instead to monitor
next reshape operation.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When raid0 reshape is executed mdmon can dissappear due to raid level
takeover operation. If this happen before mdmon check, mdadm would treat
it as error condition. It is not true for this case.
Remove mdmon check from reshape_container() function.
Error condition check will remain using reshape_array() reentry test
for the same array (line 2577).
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
External metadata sometimes is not updated.
It can be observed during 2 raid0 arrays Capacity Expansion.
New array size is not set, because metadata is not updated and on the reshape
end mdadm doesn't read new array size from metadata.
This happens when mdmon finishes his work (due to takeover to raid0),
before all metadata updates are processed.
Make sure that all updates are flushed to disk before executing takeover.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Adding a bitmap via ioctl can only add it at a fixed location.
That location is not suitable for 4K-block devices.
So allow setting the bitmap location via sysfs if kernel supports it
and aim to always use 4K alignments.
Signed-off-by: NeilBrown <neilb@suse.de>
The value in info->array.raid_disks is the total number of
devices, which is the 'after' number when the number is increasing,
and the 'before' number when the number is decreasing.
The code currently assumes it is always the 'after' number - so fix
that.
Signed-off-by: NeilBrown <neilb@suse.de>
When an array is being reshaped to fewer data devices the relationship
between sync_max and reshape_progress is different to when the number
of devices increases - we need to allow for that when setting
sync_max/sync_min.
Signed-off-by: NeilBrown <neilb@suse.de>
Add proper error message for container reshape when device cannot be opened.
fd variable operation is moved down to display information what particular
device cannot be opened.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
It can happen during reshape restart that reshape_array() can exit without
error (e.g. Grow.c:1915) and reshape is not moved to next array.
reshape_array() is called again for the same device.
Do not allow for such execution and check if last reshaped array is not
the current one.
This patch can be treat not as solution, but it allows for such errors
detection.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When mdmon is absent metadata is not updated, and container_reshape()
can fall in to endless loop. This can cause user data corruption.
In case when mdmon is absent do not continue container reshape process.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
check_reshape should not try to parse the subarray string - only
metadata handlers are allowed to do that.
The common code and only interpret a subarray string by passing it to
"container_content" which will then return only the member for that
subarray.
So remove check_reshape and place similar logic explicitly at the two
call-sites. They are different enough that it is probably clearer to
have explicit code.
Signed-off-by: NeilBrown <neilb@suse.de>
container_content retrieves volume information from disks in the
container. For unsupported volumes the function was not returning
mdinfo. When all volumes were unsupported the function was returning
NULL pointer to block actions on the volumes. Therefore, such volumes
were not activated in Incremental and Assembly. As side effect they
also could not be deleted using kill-subarray since "kill" function
requires to obtain a valid mdinfo from container_content.
This patch fixes the kill-subarray problem by allowing to obtain
mdinfo of all volumes types including unsupported and introducing new
array.status flags.
There are following changes:
1. Added MD_SB_BLOCK_VOLUME for blocking an array, other arrays in the
container can be activated.
2. Added MD_SB_BLOCK_CONTAINER_RESHAPE block container wide reshapes
(like changing disk numbers in arrays).
3. IMSM container_content handler is to load mdinfo for all volumes
and set both blocking flags in array.state field in mdinfo of
unsupported volumes. In case of some errors, all volumes can be
affected. Only blocked array is not activated (also reshaped as
result). The container wide reshapes are also blocked since by
metadata definition they require modifications of both arrays.
4. Incremental_container and Assemble functions check array.state and
do not activate volumes with blocking bits set.
5. assemble_container_content is changed to check container wide reshapes
before activating reshapes of assembled containers.
6. Grow_reshape and Grow_continue_command checks blocking bits
before starting reshapes or continueing (-G --continue) reshapes.
7. kill-subarray ignores array.state info and can remove requested array.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>