Commit Graph

2035 Commits

Author SHA1 Message Date
Adam Kwolek cd0430a17c FIX: Always report new raid_disks during migration
To behave in the similar way as native metadata during migration,
new raid disks number has to be reported by metadata handler.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-18 10:11:33 +10:00
Adam Kwolek 77f8d358b5 FIX: Use successfully loaded metadata only
Values greater than 0, means error. We exit from loop on error
with empty super-block pointer when sd pointer is valid.
This cannot be detected by check condition as error.
For sure we shouldn't go forward with error condition.
It leads to throwing exception with core file when metadata handler
wants to access non existing super-block.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-14 17:50:17 +10:00
Piergiorgio Sartor 2cf3112111 RAID-6 check standalone fix component list parsing
Fix the parsing of the component list, i.e. skipping the "spare" one.

I also added a check in case the array is degraded.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-14 17:28:31 +10:00
Jonathan Liu 4019ad0701 Monitor: avoid NULL dereference with 0.90 metadata
0.90 array do not report the metadata type in /proc/mdstat, so
we cannot assume that mse->metadata_version is non-NULL.

So add an appropriate check.

This adds an additional check missed by commit
eb28e119b0.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-13 08:25:45 +10:00
Adam Kwolek b357ef43f9 FIX: Raid0 expansion cannot be restarted
When raid0 expansion is restarted, mdadm refuses to correctly assemble
array because critical section cannot be restored from backup file.
mdadm exits with information:
	mdadm: Failed to restore critical section for reshape - sorry.

For raid0 new level is 0, current array level is 4.
Function Grow_restart() doesn't allow for level change.

Grow_restart really shouldn't be checking for level changes.
As they are always instantaneous they should never appear
in the metadata so it doesn't mean anything to check for them.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-11 15:00:13 +10:00
Mike Frysinger 8ccca44dde mdadm/mdmon: use CFLAGS when linking
People often put flags that control ABI options into CFLAGS (like -mcpu)
and don't duplicate them in LDFLAGS because most build systems nowadays
(like autotools) use both when linking.  So make that work with mdadm's
custom build system too.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-11 14:54:42 +10:00
Mike Frysinger b1bac75b26 mdadm: respect --syslog in monitor mode
A few places don't accept syslog as a monitor mode, so fix that.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-11 14:54:27 +10:00
Mike Frysinger 237d87904e mdadm: add missing --syslog option to monitor help
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-11 14:54:18 +10:00
Mike Frysinger 9e69baf3d4 move .man targets from "all" to "man" - and "everything"
These .man files are never installed, nor generally used, so don't force
people who generally want to build/install mdadm to build them up.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-11 14:54:16 +10:00
Adam Kwolek 139dae1137 imsm: fix: report aligned component size value
OROM can create array with chunk size not aligned.
To resolve this problem in mdadm, metadata handler has to report
component size aligned value for mdadm operations
while metadata value stays unchanged.

Do not correct alignment for raid1 and in error case.

Correction allows check in analyse_change() (Grow.c:905) to pass.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-06 12:40:31 +10:00
Adam Kwolek 2a4a08e7d3 imsm: FIX: Check array alignment before expansion
It can occur that OROM creates array not aligned properly.
Expansion cannot be run in such cases. It is detected in analyse_change().
It is too late. This causes that metadata is in migration state already,
when expansion cannot be started.
This problem has to be detected before metadata is updated,
in all arrays in reshaped container.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-06 12:40:04 +10:00
Adam Kwolek 6dc0be309d imsm: Warn user about reboot risk
Current check-pointing implementation doesn't allow for interrupting reshape of boot arrays
due to checkpoint restore has to be done before system start.
There is problem with passing backup file name to array automatically mounted during boot time,
especially when scan mode is used.

Until IMSM check-pointing implementation will be introduced, warning about reboot risk
should be placed in mdadm.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-06 12:38:50 +10:00
NeilBrown d47a29257a restripe: make sure zero buffer is always large enough.
If restripe is called to restore stripes of one size and then
save stripes with a larger chunk size, the 'zero' buffer will not
be large enough and a double-degraded RAID6 will over-run the buffer.

So record the current size of the zero buffer and use it when deciding
if we need to allocate a new buffer.

Reported-by: Brad Campbell <lists2009@fnarfbargle.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 21:43:52 +10:00
Czarnowska, Anna 64385908bb Create: fix size after setting default chunk
When -e option is given then the first validate_geometry
sets default chunk. Size must be rounded there and do_default_chunk
needs to be set to 0 so that we don't repeat the message below.

If we start without st then what we find on the the first disk determines
the st and sets chunk. So after running
validate_geometry on the first disk we need to fix the size too.
At this point chunk should always be set but it is safer to keep the check.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 09:29:45 +10:00
Czarnowska, Anna db975ab5c3 Create: check for UnSet when looking at chunk
A default chunk size of 0 gets modified to UnSet, so any location that
checks for !chunk really needs to check for !(chunk || chunk == UnSet).

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 09:27:40 +10:00
Adam Kwolek 84f3857fec FIX: After discarding array give chance monitor to remove it
When raid0 expansion occurs, takeover operation is used.
After backward takeover monitor remains in memory.

This happens due to remaining just removed active array in mdmon structures.
If there is no other monitored arrays, mdmon has to finish his work.

Problem was introduced in patch (2011.03.22):
    mdmon: Stop keeping track of RAID0 (and LINEAR) arrays.
Prior to this patch mdmon kicking occurs via replace_array() where
wakeup_monitor() was called.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 09:24:17 +10:00
NeilBrown eb28e119b0 Monitor: avoid NULL dereference with 0.90 metadata
0.90 array do not report the metadata type in /proc/mdstat, so
we cannot assume that mse->metadata_version is non-NULL.

So add an appropriate check.

Reported-by: Eugene <hdejin@yahoo.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 09:16:57 +10:00
Piergiorgio Sartor af3c375034 RAID-6 check standalone code cleanup
Major change is code cleanup and simplification.
Furthermore, a better error handling and a couple
of bug fixes.
Last but not least, the command line parameters are
changed from "bytes" to "stripes", which is more
convenient, I guess.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 09:16:55 +10:00
Piergiorgio Sartor a9c2c6c697 RAID-6 check standalone md device
Allow RAID-6 check to be passed only the
MD device, start and length.
The three parameters are mandatory.

All necessary information is collected using
the "sysfs_read()" call.
Furthermore, if "length" is "0", then the check
is performed until the end of the array.

Some checks are done, for example if the md device
is really a RAID-6. Nevertheless I guess it is not
bullet proof...

Next patch will include the "suspend" action.
My idea is to do it "per stripe", please let me
know if you've some better options.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:56:41 +10:00
NeilBrown 78c0a3b17f Split some of util.c into a new lib.c
Some of util.c is dependent on lots of other code, some of it
is stand-alone.
Move some of the stand-alone stuff into a new lib.c so it can be used
by smaller utilities.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:44:54 +10:00
NeilBrown 32367cb558 split name/number maps into separate file.
This reduced some interdependencies between files.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:40:49 +10:00
NeilBrown 679eb882fc Move WaitClean from sysfs to Monitor.c
It might not really belong in Monitor, but it really doesn't
belong in sysfs.c, and fits well with Wait()

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:21:03 +10:00
NeilBrown 7b0bbd0f71 Release 3.2.1
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-28 13:30:29 +11:00
NeilBrown 76ae482075 test: Don't use dev6 and dev7 together in a non-multipath test
dev6 and dev7 refer to the same storage and are used for
multipath testing.  So using them both in any other test will
be confusing.  So change 11spare-migration test 5 to use
dev10 rather than dev7

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-28 13:24:04 +11:00
Hawrylewicz Czarnowski, Przemyslaw aae4c11166 imsm: reading of UEFI variables needs an update
Content of EFI variable is stored in "data" file. Moreover size of data
provided by given variable can be initially validated by reading value of
"size" file.
Function read_efi_variable() has been introduced to simplify the code.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-28 10:42:07 +11:00
Hawrylewicz Czarnowski, Przemyslaw 96687b797f imsm: remove OEM table from detection of OROM and EFI.
OEM table does not suit our needs so it cannot be used.
This patch removes feature added in commit 8a0bf4f378.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-28 10:41:35 +11:00
NeilBrown ab65d72387 tests: Make sure config file is empty when required.
We need to have no config at all for this test so
make sure it is empty.

Reported-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-28 10:41:09 +11:00
Czarnowska, Anna ed02d9cc04 tests: use $config to store test config path
We also need to tell Monitor where to look for Policy in 11spare-migration tests

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-28 10:34:17 +11:00
NeilBrown 7187750e8d open_dev_excl: allow device to be read-only.
For many operations we don't need a writable device.  So if
opening O_RDWR fails in open_dev_excl, then try again O_RDONLY.

If we really needed write, a subsequent operation will failed.  But
if we didn't, we succeed when otherwise we wouldn't have.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24 14:21:58 +11:00
NeilBrown 972728bb1b tests: use /tmp/mdadm.conf rather than /etc/mdadm.conf.
Modifying /etc/mdadm.conf for testing is just wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24 12:45:23 +11:00
NeilBrown 51d9a2ce33 Merge branch 'master' into devel-3.2
Conflicts:
	Incremental.c
	Manage.c
	ReadMe.c
	inventory
	mdadm.8.in
	mdadm.spec
	mdassemble.8
	mdmon.8
2011-03-24 12:00:55 +11:00
Krzysztof Wojcik b4add146a0 FIX: imsm: Do not change serial if disk failed
This patch rollback one change connected with mdadm-OROM
compatibility:
adding ':0' at the end of disk serial number if disk is
detected as failed.
Current mdadm's implementation does not distinguish two
cases when disk is marked as failed:
1. If disk is really failed- disconnected, broken
2. Just marked as failed by mdadm- using "-f" option

Second case is not yet fully handled and compatible with
IMSM standard.
Changing serial number of existing, operational disk causes
problems in "thunderdome" and "load_super" functions that use
serial numbers to disks comparisons and searching.
The change must be recalled until full support will be
developed.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24 10:15:01 +11:00
Krzysztof Wojcik e53d022c72 FIX: Tests: raid0->raid10 without degradation
raid0->raid10 transition needs at least 2 spare devices.
After level changing to raid10 recovery is triggered on
failed (missing) disks. At the end of recovery process
we have fully operational (not degraded) raid10 array.

Initialy there was possibility to migrate raid0->raid10
without recovery triggering (it results degraded raid10).
Now it is not possible.
This patch adapt tests to new mdadm's behavior.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24 10:11:58 +11:00
Krzysztof Wojcik 3a5d04735b FIX: imsm: Rebuild does not start on second failed disk
Problem:
If we have an array with two failed disks and the array is in degraded
state (now it is possible only for raid10 with 2 degraded mirrors) and
we have two spare devices in the container, recovery process should be
triggered on booth failed disks. It does not.
Recovery is triggered only for first failed disk.
Second failed disk remains unchanged although the spare drive exists
in the container and is ready to recovery.

Root cause:
mdmon does not check if the array is degraded after recovery of first
drive is completed.

Resolution:
Check if current number of disks in the array equals target number of disks.
If not, trigger degradation check and then recovery process.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24 10:10:56 +11:00
NeilBrown 945f0fcd01 Release mdadm-3.1.5
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 15:50:18 +11:00
NeilBrown ebdc4d1253 Incr: don't exclude 'active' devices from auto inclusion in a container.
For containers, it is always appropriate to include a device in the
container.
Whether it should then be included in an array is a separate question.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 15:42:35 +11:00
NeilBrown fb0d4b9ca2 --stop: separate 'is busy' test for 'did it stop properly'.
Stopping an md array requires that there is no other user of it.
However with udev and udisks and such there can be transient other
users of md devices which can interfere with stopping the array.

If there is a transient users, we really want "mdadm --stop" to wait a
little while and retry.
However if the array is genuinely in-use (e.g. mounted), then we
don't want to wait at all - we want to fail immediately.

So before trying to stop, re-open device with O_EXCL.  If this fails
then the device is probably in use, so give up.

If it succeeds, but a subsequent STOP_ARRAY fails, then it is possibly
a transient failure, so try again for a few seconds.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 15:42:24 +11:00
NeilBrown a28232b83f Assemble: improve efficacy of -Af in assembling degraded dirty arrays.
If a degraded dirty array has some superblocks which are clean and
others that are dirty, and the dirty ones are newer by precisely '1'
in the event count, then the current code to force the array to be
clean will not work.
We need to make sure to find a superblock with most recent event count
and force that one to be 'clean'.

Reported-by: A J Wyborny <ajwyborny@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 12:10:31 +11:00
Labun, Marcin ea2bc72b00 super-intel: enable loading metadata from non-IMSM compliant disks
Honor ignore_hw_compat to load metadata from disk attached to non-IMSM
controller or when there are no IMSM OROM/EFI capabilities.
Used only for guessing and examining metadata format.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 12:05:53 +11:00
Labun, Marcin df3346e675 examine: allows to examine a disk metadata on non-metadata compliant systems
Allow for loading metadata from disk attached to non-metadata compliant
system. Affects mdadm --examine and guess_super.

Added ignore_hw_compat in supertype to pass information to load_super
handler. If ignore_hw_compat is set the handler should load metadata
also from disks that do not comply with metadata requirements (i.e. disk is not
attached to native controller, etc).

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 12:04:46 +11:00
Adam Kwolek 246cebdb76 man mdadm: Add note about auto-assembly during array reshape
Add note to man that auto-assembly cannot be used for reshaped arrays.

Revisions: NeilBrown

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 12:02:28 +11:00
Adam Kwolek ca24ddb08d man mdadm: add information for MDADM_EXPERIMENTAL flag
Update man for MDADM_EXPERIMENTAL flag.

Minor revisions by Mathias Burén <mathias.buren@gmail.com> and Neil Brown.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:45:03 +11:00
NeilBrown ce7a187b9a Monitor: handle v.quick removal of devices better.
If a device fails and then is removed before Monitor sees
the failure, GET_DISK_INFO returns nothing so Monitor relies
on mdstat info where '_' is incorrectly interpreted as 'a spare'.

We should treat '_' as 'removed' - that is safer.

Without this, a v.quick fail+remove gets reported as 'Failed' then
'SpareActive'.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:09:48 +11:00
NeilBrown 7c19b781d5 ddf: fix up detection of failed/missing devices.
If a device hasn't been found yet we can still tell if it is
expected to be working, and we must to do to make sure
'working_disks' is correct.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:09:38 +11:00
Piergiorgio Sartor b306624e2b restripe: allow test code to have an offset on each device.
If device name ends :number, e.g.
   /dev/sda0:1234

then assume the RAID data starts that many sectors from start of
device.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:09:29 +11:00
NeilBrown 89ced23b58 Assemble: improve efficacy of -Af in assembling degraded dirty arrays.
If a degraded dirty array has some superblocks which are clean and
others that are dirty, and the dirty ones are newer by precisely '1'
in the event count, then the current code to force the array to be
clean will not work.
We need to make sure to find a superblock with most recent event count
and force that one to be 'clean'.

Reported-by: A J Wyborny <ajwyborny@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:07:27 +11:00
NeilBrown 7023e0b8ae mdmon: Stop keeping track of RAID0 (and LINEAR) arrays.
Tracking RAID0 arrays doesn't really work.  There is no need,
and there are some sysfs files which won't exist when the array
appears and then won't be opened when the level is changed.

So simply ignore RAID0 and LINEAR arrays - don't add them when they
appear and if an array we are monitoring turns into one of these,
discard it promptly.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 17:23:17 +11:00
NeilBrown d998b738f5 mdmon: don't wait for O_EXCL when shutting down.
If mdmon is shutting down because there are no devices
left to look at, then don't wait 5 seconds for an O_EXCL open,
and that can block progress of --grow.

Only wait for O_EXCL if we received a signal.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 16:10:22 +11:00
NeilBrown 4e2c1a9a32 mdmon: allow manage_member to cope with ->container becoming NULL.
As monitor() can set ->container to NULL, we need to be careful
about dereferencing it.
So take a copy in manage_member, return if it is NULL, and only
use the copy.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 14:52:37 +11:00
NeilBrown 0d5ac3c6ef Grow: increase raid_disks before adding specific spares.
When we add spared that have been targeted at a specific slot,
we need raid_disks to be bigger than the slot number.
But currently we don't increase raid_disks until after we add
these spares.

So introduce an early increase of raid_disks to allow the spares
to be added.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 14:52:36 +11:00