We have several places that wait for activity on a sysfs
file. Combine most of these into a single 'sysfs_wait' function.
Signed-off-by: NeilBrown <neilb@suse.de>
We can only revert a reshape if the reshape_position aligns
properly for the old geometry.
If it doesn't we just fail for now.
Also fix a +/- error with updating raid_disks for super1.c
Signed-off-by: NeilBrown <neilb@suse.de>
For RAID10, we must have head/tail space for reshape.
For RAID4/5/6 we can use a spare or a backup file.
So make that distinction.
Signed-off-by: NeilBrown <neilb@suse.de>
When changing the chunksize of an array, the new chunksize must
divide the device size.
If it doesn't we report a very brief message.
Make this message a bit longer and suggest a way forward be reducing
the size of the array.
Reported-by: Mark Knecht <markknecht@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If being built from a git tree, use the version and date
information from the top commit rather than the hard-coded
values.
Signed-off-by: NeilBrown <neilb@suse.de>
When we crete or assemble an array, we wait for udev to create the
device file in /dev so that as soon as mdadm complete, the device can
be used.
This waiting is performed in multiples of 200ms, which can sometimes
be too long to wait.
So change to an exponential backoff. Wait 1, then 2, then 4 msec etc.
Once we get to 256msec, stop backing off and continue waiting 256ms at
a time until we reach the limit which is now 4.608sec rather than 5sec
which it was before.
Ditto for open_dev_excl.
Signed-off-by: NeilBrown <neilb@suse.de>
The moment we change a RAID0 to a RAID5 it will try to recovery. This
will abort quite quickly as there are not spare devices, but it could
confuse the attempt to freeze the array.
So allow 'freeze' to work even on a recovering array.
Signed-off-by: NeilBrown <neilb@suse.de>
As the Makefile encourages users to set CXFLAGS for extra flags,
we should only conditionally set it.
That way it can be over-ridden in the environment as well as on
the command line.
Suggested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Have mdadm --Detail --brief --verbose print the list of devices in
alphabetical order.
This is useful for debugging purposes. E.g. the test script
10ddf-create compares the output of two mdadm -Dbv calls which
may be different if the order is not deterministic.
(I confess: I use a modified "test" script that always runs
"mdadm --verbose" rather than "mdadm --quiet", otherwise this
wouldn't happen in 10ddf-create).
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
This is useful for reshaping a RAID0 to a higher level.
The recovery will happen at the same time as the reshape.
Signed-off-by: NeilBrown <neilb@suse.de>
There are now 3 places which change level.
And they all do it slightly differently with different
messages etc.
Make a single function for this and use it.
Signed-off-by: NeilBrown <neilb@suse.de>
When converting to RAID0, all spares and non-data drives
need to be removed first.
It is possible that the first HOT_REMOVE_DISK will fail because the
personality hasn't let go of it yet, so retry a few times.
Signed-off-by: NeilBrown <neilb@suse.de>
After changing the level, the meaning of layout numbers changes,
so we will keeping a new_layout value around can cause later confusion.
Signed-off-by: NeilBrown <neilb@suse.de>
This means it will be set for a "--data-offset" only reshape so that
case doesn't complain that the array is getting smaller.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ ignore failed devices - obviously
2/ We need to tell the kernel which direction the reshape should
progress even if we didn't choose the particular data_offset
to use.
Signed-off-by: NeilBrown <neilb@suse.de>
Setting new_offset can fail if the v1.x "data_size" is too small.
So if that happens, try increasing it first by writing "0".
That can fail on spare devices due to a kernel bug, so if it doesn't
try writing the correct number of sectors.
Signed-off-by: NeilBrown <neilb@suse.de>
When an active/degraded RAID6 array is force-started we clear the
'active' flag, but it is still possible that some parity is
no in sync. This is because there are two parity block.
It would be nice to be able to tell the kernel "P is OK, Q maybe not".
But that is not possible.
So when we force-assemble such an array, trigger a 'repair' to fix up
any errant Q blocks.
This is not ideal as a restart during the repair will not be continued
after the restart, but it is the best we can do without kernel help.
Signed-off-by: NeilBrown <neilb@suse.de>
When we read devices from sysfs (../md/dev-*), store them in the same
order that they appear. That makes more sense when exposed to a
human (as the next patch will).
Signed-off-by: NeilBrown <neilb@suse.de>
If lseek64() failed it was still writing to the disks, which would introduce
data corruption.
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
Using hard coded numbers is error prone and hard to read by humans.
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 1 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B19: check_stripes (raid6check.c:151)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 2 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B67: check_stripes (raid6check.c:155)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.
Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)
Move check_env() from util.c to lib.c
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Some failure scenarios can leave a spare with a higher event count
than an in-sync device. Assembling an array like this will confuse
the kernel.
So detect spares with event counts higher than the best non-spare
event count and exclude them from the array.
Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit the NOFILE limit.
So raise the limit if it ever looks like it might be a problem.
Signed-off-by: NeilBrown <neilb@suse.de>