raid10 currently uses the 'backup_blocks' field to store something
else: a minimum offset change.
This is bad practice, we will shortly need to have both for RAID5/6,
so make a separate field.
Signed-off-by: NeilBrown <neilb@suse.de>
This allows the metadata on a device to be saved and later restored.
This can be useful before experimenting on an array that is misbehaving.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
With the 'devnm' infrastructure fixed, it is quite easy to support
names like "md_home" for md arrays.
The currently defaults to "off" and can be enabled in mdadm.conf with
CREATE names=yes
This is incase other tools get confused by the new names.
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
the code that was exposed on anything else than dietlibc and klibc
is entirely glibc specific and broke the build on musl libc.
Signed-off-by: John Spencer <maillist-mdadm@barfooze.de>
Signed-off-by: NeilBrown <neilb@suse.de>
We still allow --offroot to be given - for compatibility with scripts
- but ignore it.
The whole point of --offroot is to get systemd to not auto-kill mdmon,
and we always want that.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
map_dev can be slow so it is best to not call it when
not necessary.
The final test in "find_free_devnum" is not relevant when
udev is being used, so remove the test in that case.
Signed-off-by: NeilBrown <neilb@suse.de>
--replace can be used to replace a device without completely failing
it. Once the replacement completes the device will be failed.
--with can indicate which of several spares to use.
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ...
The size is in K unless a suffix: K M G is given.
The suffix 's' means sectors.
Signed-off-by: NeilBrown <neilb@suse.de>
Some arrays (raid10) never need a backup file, so during assembly
we can avoid the whole Grow_continue check in that case.
Achieve this using a flag set by the metadata handler.
Also get "mdadm -I" to fail if a backup process would be
needed. It currently does fail as the kernel rejects things,
but it is nicer to have this explicit.
Signed-off-by: NeilBrown <neilb@suse.de>
This can be used to over-ride the automatic assignment of
data offset.
For --create, it is useful to re-create old arrays where different
defaults applied.
For --grow it may be able to force a reshape in the reverse direction.
Signed-off-by: NeilBrown <neilb@suse.de>
This is currently only useful for 1.x metadata and will allow an
explicit --data-offset request on command line.
Signed-off-by: NeilBrown <neilb@suse.de>
We will shortly introduce --data-offset= which is allowed to
be zero. We will want to use parse_size() so it needs to be
able to return '0' without it being an error.
So define INVALID_SECTORS to be an impossible value (currently '1')
and return and test for it consistently.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ When printing the "name=" entry for --brief output,
enclose name in quotes if it contains spaces etc.
Quotes are already supported for reading mdadm.conf
2/ When a name is used as a device name, translate spaces
and tabs to '_', as well as the current translation of
'/' to '-'.
Signed-off-by: NeilBrown <neilb@suse.de>
When using human_size_brief, only IEC prefixes were supported. Now
it's possible to specify which format we want to see - either IEC
(kibi, mibi, gibi) or JEDEC (kilo, mega, giga).
Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
An error in parse_size() should be reported by 0, not -1,
because -1 is changed to the max value of unsigned long long
during calculations of size (e.g. at mdadm.c:412).
A negative value of size should be reported as error
(e.g. size equal -1 has been changed to the max value of
unsigned long long so far).
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Both are impossible, and '1' allows size to be unsigned,
which is neater.
Also #define MAX_SIZE to be '1' to make it all more explicit.
Signed-off-by: NeilBrown <neilb@suse.de>
The value of 'verbose' is sometimes mixed into 'brief', particularly
for Examine.
This is messy and confusing. So keep them separate.
'brief' still gets assumed when 'scan' is set, unless we are very
verbose.
Signed-off-by: NeilBrown <neilb@suse.de>
If we change some functions to accept 'verbose', where <0 means to be
quiet, in place of 'quiet', then we will be able to merge
'quiet' and 'verbose' together for simplicity.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than passing a long list of little flags etc to various
functions we will soon pass a small collection of structures.
This first step combines a collection of variables local to
'main' into a single structure.
Signed-off-by: NeilBrown <neilb@suse.de>
malloc should never fail, and if it does it is unlikely
that anything else useful can be done. Best approach is to
abort and let some super-daemon restart.
So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit. Then use those
removing all the tests for failure.
Also replace all "malloc;memset" sequences with 'xcalloc'.
Signed-off-by: NeilBrown <neilb@suse.de>
As we don't allow '-K' for '--zero-super' there is no point
using it internally. Just define a 'KillOpt' like with
other options.
Signed-off-by: NeilBrown <neilb@suse.de>
->percent sometimes stores negative values recording states
like 'pending' or 'delayed'.
The value '-2' means both 'delayed' and in Monitor, 'unknown'.
Also, '-1' has a meaning but not #define.
So change the #defines to be prefixed with "RESYNC_", instead
of "PROCESS_", add new "_NONE" and "_UNKNOWN", and use correct
value in each location.
Signed-off-by: NeilBrown <neilb@suse.de>
Now that /run seems to be a good standard, make that
the default for storing various run-time files, rather than
/var/run or /dev/.mdadm.
Signed-off-by: NeilBrown <neilb@suse.de>
mdinfo->bitmap_offset is a signed long and needs to be treated as
such when passed to the kernel.
This resolves the problem with adding internal bitmaps to a 1.0 array.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Both --detail and --monitor can report the names of member
devices on an array, and do so by searching /dev and finding
the shortest name that matches.
If
--prefer=foo
is given, they will instead prefer a name that contain /foo/.
So
mdadm --detail /dev/md0 --prefer=by-path
will list the component devices via their /dev/disk/by-path/xxx
names.
Signed-off-by: NeilBrown <neilb@suse.de>
When we can for devices using GET_DISK_INFO we currently
limit to 1024. But some arrays can have more than this.
So raise it to 4096 and make the constant a #define.
Signed-off-by: NeilBrown <neilb@suse.de>
Function reshape_super() guards metadata changes.
It is used to apply changes rollback in error case also.
As change (apply and rollback) can be not bi-directional reshape_super()
has to know if current action is metadata change that should be guarded
using metadata restrictions, or this is metadata rollback change
executed due to error occurrence.
In second case change has to be unconditional.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
As the bitmap can be before the superblock, bitmap_offset is signed.
But some of the code didn't honour that :-(
Signed-off-by: NeilBrown <neilb@suse.de>
It can easily be calculated from 'avail' and 'raid_disks', and we
will soon have a case where we don't have it easily available to pass
in.
Signed-off-by: NeilBrown <neilb@suse.de>
When --offroot is specified, mdadm will change the first character of
argv[0] to '@'. This is used to signal to systemd that mdadm was
launched from initramfs and should not be shut down before returning
to the initramfs.
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If a 'remove' fails there is no certainty that another event will
happen soon, so make sure we retry soon anyway.
Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Adding a bitmap via ioctl can only add it at a fixed location.
That location is not suitable for 4K-block devices.
So allow setting the bitmap location via sysfs if kernel supports it
and aim to always use 4K alignments.
Signed-off-by: NeilBrown <neilb@suse.de>
This fields doesn't work any more as ->getinfo_super clears the info
structure at an awkward time. So get rid of it and do it differently.
The issue is that the metadata handler cannot tell if the uuid it has
was randomly generated or explicitly requested, except on the first
call.
And we don't want to accept explicit requests for IMSM.
So when it was auto-generated, make it look distinctive by having the
same int copied in all 4 positions. If someone requests a uuid like
that, I guess they get away with it.
Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm allowes to assemble 2 volumes with the same names based on the
config file. The issue is fixed by iterating over the list of md device
identifiers and comparing the names of md devices against each other,
detecting identical names and blocking the assembly should the same names
be found.
Now having detected duplicate names, mdadm terminates without assembling
the container, displaying appropriate prompt.
Signed-off-by: Lukasz Orlowski <lukasz.orlowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
search_mdstat and conf_match are almost identical.
Put all the functionality in conf_match, and remove search_mdstat.
Reported-by: Jes.Sorensen@redhat.com
Signed-off-by: NeilBrown <neilb@suse.de>
container_content retrieves volume information from disks in the
container. For unsupported volumes the function was not returning
mdinfo. When all volumes were unsupported the function was returning
NULL pointer to block actions on the volumes. Therefore, such volumes
were not activated in Incremental and Assembly. As side effect they
also could not be deleted using kill-subarray since "kill" function
requires to obtain a valid mdinfo from container_content.
This patch fixes the kill-subarray problem by allowing to obtain
mdinfo of all volumes types including unsupported and introducing new
array.status flags.
There are following changes:
1. Added MD_SB_BLOCK_VOLUME for blocking an array, other arrays in the
container can be activated.
2. Added MD_SB_BLOCK_CONTAINER_RESHAPE block container wide reshapes
(like changing disk numbers in arrays).
3. IMSM container_content handler is to load mdinfo for all volumes
and set both blocking flags in array.state field in mdinfo of
unsupported volumes. In case of some errors, all volumes can be
affected. Only blocked array is not activated (also reshaped as
result). The container wide reshapes are also blocked since by
metadata definition they require modifications of both arrays.
4. Incremental_container and Assemble functions check array.state and
do not activate volumes with blocking bits set.
5. assemble_container_content is changed to check container wide reshapes
before activating reshapes of assembled containers.
6. Grow_reshape and Grow_continue_command checks blocking bits
before starting reshapes or continueing (-G --continue) reshapes.
7. kill-subarray ignores array.state info and can remove requested array.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When container is assembled while reshape is active on one of its member
whole container can be required to be blocked from monitoring.
For such purpose field recovery blocked is added to mdinfo structure.
When metadata handler finds active reshape in container it should set
recovery_blocked field to disable whole container monitoring during
reshape.
For arrays that doesn't use containers, recovery_blocked field
has the same value as reshape_active field e.g. super0/1.
In fact,recovery is blocked during reshape for such arrays.
For ddf, metadata handler doesn't set reshape_active field,
so recovery_blocked is not set also.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
To allow for reshape continuation '--continue' option is added
to grow command.
Function that will be executed in grow-continue case doesn't require
information about reshape geometry. All required information are read
from metadata.
For external metadata reshape can be run for monitored array/container
only. In case when array/container is not monitored run mdmon for it.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
During initrd phase continuing reshape will cause file system context
lost. This blocks ability to control reshape using checkpoints.
To avoid this, during initrd phase assemble has to be executed with
'--freeze-reshape' option. This causes that mdadm restores reshape
critical section only.
Reshape can be continued later after system full boot.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
1. Three missing map_unlock() calls were added.
2. Map file must be unlocked on fork, else child will hold lock.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When validate_geometry finds that we haven't committed to
a metadata yet and that the subdev is a member of 'our'
container, it needs to report any errors it finds as Create()
cannot report them effectively.
So make a slight change to the semantics of the 'verbose' flag
and allow validate_geometry to report if it printed any error
messages.
Signed-off-by: NeilBrown <neilb@suse.de>
Reshape backup should be able to be restored during reshape continuation
also. To reuse already existing code it is moved to function.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If you create a two drive raid1 array with one device writemostly, then
fail the readwrite drive, when you add a new device, it will get the
writemostly bit copied out of the remaining device's superblock into
it's own. You can then remove the new drive and readd it as readwrite,
which will work for the readd, but it leaves the stale WriteMostly1 bit
in devflags resulting in the device going back to writemostly on the
next assembly.
The fix is to make sure that A) when we readd a device and we might have
filled the st->sb info from a running device instead of the device being
readded, then clear/set the WriteMostly1 bit in the super1 struct in
addition to setting the disk state (ditto for super0, but slightly
different mechanism) and B) when adding a clean device to an array (when
we most certainly did copy the superblock info from an existing device),
then clear any writemostly bits.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
0.90 arrays can only use up to 4TB per device. So when a larger
device is added, complain a bit. Still allow it if --force is given
as there could be a valid use.
Signed-off-by: NeilBrown <neilb@suse.de>
Initially there is no proper translation mdstat's DELAYED/PENDING processes
to "--detail" output.
For example, if we have recover=DELAYED in mdstat, "--detail"
shows "State: recovering" and "Rebuild Status = 0%".
It was incorrect in case of process waiting on checkpoint different
than 0%. In fact rebuild status is differnt than 0% and user is misled.
The patch fix the problem. Current "--detail" command shows
in the exampe: "State: recovering (DELAYED)" and no information
about precentage.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Patch introduces support for reshape process restart for external metadata
using metadata specific data handling methods.
It introduces recover_backup() function that restores array to stable state
It is equivalent to Grow_restart() functionality for native metadata.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Before reshape is started, mdadm should check again if there is only one
array (in container) under reshape. Then function "divides" array in to
"migration units" that can fits migration copy area and enters main loop.
It checks if current "migration unit" requires to be backed up.
If necessary mdadm saves it to copy area and updates migration record.
Then MD-driver is directed to perform reshape step (by "migration unit" size)
and checkpoint is moved forward. In this way reshape is executed until
array ends.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
For external metadata backup location and saving methods depends
on metadata specific implementation details. Currently restore_stripes()
function is able to restore data only from the given backup file handles
and it is used only for assembling partially reshaped arrays.
As this function will be very helpful for external metadata backup
mechanism, add the support for restoring data from the given source buffer.
Add possibility for save_stripes() to work without designation targets.
Save_stripes() can now prepare data for restore_stripes() only.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Some code currently clears 'info' before calling getinfo_super,
some code doesn't.
To be consistent, change it so no caller ever clears 'info',
but ever getinfo_super function must clear it.
Note that ->raid_disk may be meaningful if that 'map' is passed
non-NULL. In that case it is copied out before the structure
is zeroed.
Signed-off-by: NeilBrown <neilb@suse.de>
When an array is resized to have larger members, --assume-clean will
disable any resync if the kernel supports it (2.6.40 and later).
Signed-off-by: NeilBrown <neilb@suse.de>
Allow for loading metadata from disk attached to non-metadata compliant
system. Affects mdadm --examine and guess_super.
Added ignore_hw_compat in supertype to pass information to load_super
handler. If ignore_hw_compat is set the handler should load metadata
also from disks that do not comply with metadata requirements (i.e. disk is not
attached to native controller, etc).
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If mdmon is shutting down because there are no devices
left to look at, then don't wait 5 seconds for an O_EXCL open,
and that can block progress of --grow.
Only wait for O_EXCL if we received a signal.
Signed-off-by: NeilBrown <neilb@suse.de>
If single-disk RAID0 or RAID1 array is created, user may preserve data on
disk. If array given size covers all partitions on disk, all data will be
available on created array. If array size is too small (not covers
all partitions), data will be not accessible.
This patch introduces warning message during array creation if given size
is too small. User may interrupt creation process to avoid data loss.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This is done via conversion to RAID4 and back.
To grow the array, extra devices will be needed which cannot
already be present as spares - so allow a list of new devices
to be included in grow request which changed the number of devices.
Signed-off-by: NeilBrown <neilb@suse.de>
As containers can now grow, we need to use both Grow_restart (to
replay any backup-file) and Grow_continue when assembling the content
of a container.
Note that we don't pass a backup-file when doing incremental assembly.
If such is needed in that case, the assembly will fail.
To restart such arrays, explicit assembly is required.
Signed-off-by: NeilBrown <neilb@suse.de>
When chunk size is not set from command line we need to guess it
depending on metadata given on command line or found on listed devices.
Validate_geometry sets the default for it's metadata if chunk is not set.
For external metadata chunk is set only when creating in a container.
For imsm validate_geometry_imsm_orom is responsible for finding default
chunk depending on container metadata loaded. Container will already know
which controller it is attached to, and have this controllers orom
available.
do_default_chunk indicates that we need to find default chunk and
if validate_geometry fails for some metadata it tells us to reset chunk
that may have been set.
Current solution would set default chunk correctly for imsm only if
container device was given on command line. With the list of devices
chunk was always set to 512.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Delta_disk can be set to UnSet value.
This can a cause to pass wrong parameter to reshape_super().
To avoid such situations raid_disks and delta_disks parameters
have to be passed to reshape_super() separately.
It will be up to reshape_super() function validation
and usage of this parameters to avoid not valid values.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
For containers, it is always appropriate to include a device in the
container.
Whether it should then be included in an array is a separate question.
Signed-off-by: NeilBrown <neilb@suse.de>
Arrays on partitions are not supported for external metadata
so do not take such spare from native array.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Neil,
Please consider this patch that once was discussed and I think agreed with in general direction. It was sent a while ago
but somehow did not merged into your devel3-2. This patch enables hot-plug of so called bare devices (as understand by domain policies rules in mdadm.conf).
Without this patch we do NOT serve hot-plug of bare devices at all.
Thanks,
Marcin Labun
Subject was: FW: Autorebuild, new dynamic udev rules for hot-plugs
>>From c0aecd4dd96691e8bfa6f2dc187261ec8bb2c5a2 Mon Sep 17 00:00:00 2001
From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Date: Thu, 23 Dec 2010 16:35:01 +0100
Subject: [PATCH] Dynamic hot-plug udev rules for policies
Cc: linux-raid@vger.kernel.org, Williams, Dan J <dan.j.williams@intel.com>, Ciechanowski, Ed <ed.ciechanowski@intel.com>
When introducing policies, new hot-plug rules were added to support
bare disks. Mdadm was started for each hot plugged block device
to determine if it could be used as spare or as a replacement member for
degraded array.
This patch introduces limitation of range of devices that are handled
by mdadm.
It limits them to the ones specified in domains associated with
the actions: spare-same-port, spare and spare-force.
In order to enable hot-plug for bare disks one must update udev rules
with command
mdadm --activate-domains[=filename]
Above command writes udev rule configuration to stdout. If 'filename'
is given output is written to the file provided as parameter. It is up
to system administrator what should be done later. To make such rule
permanent (i.e. remain after reboot) rule should be writen to
/lib/udev/rules.d directory. Other cases will just need to write it to
/dev/.udev/rules.d directory where temporary rules lies. One should be
aware of the meaning of names/priorities of the udev rules.
After mdadm.conf is changed one is obliged to re-run
"mdadm --activate-domains" command in order to bring the system
configuration up to date.
All hot-plugged disks containing metadata are still handled by existing
rules.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When we restart an array in the middle of a reshape, we reuse a lot of
the code for starting the reshape, but it needs to know that
circumstances are slightly different.
So add a 'restart' arg which is used:
- skip checking and adding spares
- activate the array (rather than start reshape)
- allow the backup file to already exist
Signed-off-by: NeilBrown <neilb@suse.de>
Child_monitor was design to perform 'manage_reshape' for native
arrays. So change the signature for ->manage_reshape to match
child_monitor and move the all to the same place that child_monitor
is called from.
Also give super-intel a manage_reshape handler which simple calls
child_monitor.
Signed-off-by: NeilBrown <neilb@suse.de>
container_chose_spares in Monitor.c and
get_spares_for_grow in super-intel.c
do the same thing: search for spares in a container.
Another version will also be needed for Incremental
so a more general solution is presented here and
applied in two previous contexts.
Normally domlist==NULL would lead an empty list but
this is typically checked earlier so here it is interpreted
as "do not test domains".
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
uuid_match_any is replaced by uuid_zero for imsm spares.
Function fixup_container_spare_uuid not needed as it gives
unwanted uuid to spares.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Sometimes one metadata update will require allocating several
larger data structures. As 'monitor' cannot allocate, 'manager'
must, so it must be able to attach a list of allocates to the
update, and importantly it must be able to easily free them.
So add a 'space_list' element to metadata updates where each
element on the list starts with a pointer to the next.
Signed-off-by: NeilBrown <neilb@suse.de>
During Online Capacity Expansion metadata has to be updated to show
array changes and allow for future assembly of array. To do this
mdadm prepares and sends reshape_update metadata update to mdmon.
The update contains the old and new number of raid disks, and the
indices of the spare disks that will be used to fill the spaces.
This works as follows:
1. reshape_super() prepares metadata update.
2. mdadm discovers the spares and adds them to the array
3. mdadm sends the update to mdmon
4. managemon in prepare_update() allocates required memory for bigger
device object
5. monitor in process_update() updates the metadata to record the
new sizes and the newly assigned devices.
6. mdadm initiates the reshape
Based on code From: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Sometimes we want to convert a devnum to a devname without allocating
memory. So provide function to do the formatting without allocation.
Signed-off-by: NeilBrown <neilb@suse.de>
Manager thread shall pass the information to monitor thread (mdmon)
that some devices are removed from container. Otherwise, monitor
(mdmon) might use such devices (spares) to rebuild the array that has
gone degraded.
This problem happens for imsm containers, since a list of the
container disks is maintained in intel_super structure. When array
goes degraded, the list is searched to find a spare disks to start
rebuild. Without this fix the rebuild could be stared on the spare
device that was a member of the container, but has been removed from
it.
New super type function handler has been introduced to prepare
metadata format specific information about removed devices.
int (*remove_from_super)(struct supertype *st, mdu_disk_info_t *dinfo)
The message prepared in remove_from_super is later processed by
process_update handler in monitor thread.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This is useful with 1.1 and 1.2 metadata to update the metadata if
the device size has changed.
The same functionality can be achieved by writing to the device size
in sysfs after re-adding normally, but in some cases this might be
easier.
Signed-off-by: NeilBrown <neilb@suse.de>
Growing an array when there aren't enough spares can make the array
degraded. This works but might not be what is wanted.
So warn the user in this case and require a --force to go ahead
with the reshape.
Signed-off-by: NeilBrown <neilb@suse.de>
Move opening backup file to the function for future reuse during
container reshape.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Though not having the proper backup file can cause data corruption, it
is not enough to justify not being able to start the array at all.
So allow "--invalid-backup" to be specified which says "just continue
even if a backup cannot be restored".
Signed-off-by: NeilBrown <neilb@suse.de>
As chunk_size in mdstat_ent is never set, we shouldn't copy
it into a->info.array.
In fact, it is safest to get rid of the field altogether.
Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
number of backup blocks evaluation is put in to function for code reuse.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
fd handles table creation is put in to function for code reuse.
In manage_reshape(), child_grow() function from Grow.c will be reused.
To prepare parameters for this function, code from Grow.c can be
reused also.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Due to fact that IMSM Windows compatibility was not tested yet,
feature has to be treated as experimental until compatibility
verification will be performed.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Also not that the leading '-' on the metadata names now
simply means that mdmon must not reconfiure the array.
Signed-off-by: NeilBrown <neilb@suse.de>
For consistency with makedev().
int is not sufficient.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If a devices - typically in a mirrored set - is assembled
independently of the other devices, and then attempted to be brought
back into the set, it could contain inconsistent data. It should not
be included.
So detect this situation by ensuring that the 'most recent' device is
believed to be active by every other device. If a device is wayward,
it will only consider fellow wayward devices to be active and will
think all others are failed or missing.
This patches fixes --incremental, --assemble was done in an earlier
patch.
Signed-off-by: NeilBrown <neilb@suse.de>
In several cases, two different long options map to the same short
option. So e.g. you could give '--brief' and it would be interpreted
as '--bitmap'. That isn't really good.
So for every shared short option, define an option number and return
that for the long option instead. Then always check for both the
short and long options.
Also give some bugs like " mode == 'G'" which should be '== GROW'.
Signed-off-by: NeilBrown <neilb@suse.de>
In the native metadata case Grow_reshape() and the kernel validate what
reshapes are possible / supported and the kernel handles all the metadata
updates. In the external case the metadata format may have specific
constraints above this baseline. External formats also introduce the
constraint of only permitting some reshapes at container scope versus subarray
scope. For exmaple imsm changes to 'raiddisks' must be applied to all arrays
in the container.
This operation assumes that its 'st' parameter has been obtained from
super_by_fd() (such that st->subarray is up to date), and that a snapshot of
the metadata has been loaded from the container.
Why a new method, versus extending an existing one?
->validate_geometry: this routine assumes it is being called from Create(),
adding reshape complicates the cases that this routine needs to handle. Where
we find that checks can be shared between the two cases those routines
refactored into common code internal to the metadata handler, i.e. no need to
provide a unified external interface. ->validate_geometry() also does not
expect to update the metadata.
->update_super: this is meant to update single fields at Assembly() and only at
the container scope. Reshape potentially wants to update multiple fields at
either container or subarray scope.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Support metadata specific level, layout and chunksize defaults. Kill an
uneeded superswitch methods ahead of adding more for the reshape case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
In order to support reshape and atomic removal of spares from containers
we need to prevent mdmon from activating spares. In the reshape case we
additionally need to freeze sync_action while the reshape transaction is
initiated with the kernel and recorded in the metadata.
When reshaping a raid0 array we need to freeze the array *before* it is
transitioned to a redundant raid level. Since sync_action does not exist
at this point we extend the '-' prefix of a subarray string to flag
mdmon not to activate spares.
Mdadm needs to be reasonably certain that the version of mdmon in the
system honors this 'freeze' indication. If mdmon is not already active
then we assume the version that gets started is the same as the mdadm
version. Otherwise, we check the version of mdmon as returned by the
extended ping_monitor() operation. This is to catch cases where mdadm
is upgraded in the filesystem, but mdmon started in the initramfs is
from a previous release.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
...before introducing another open coded instace of this conversion.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Platform (metadata) domain let the metadata handlers differentiate
disk domains based on controllers that the disk belongs to.
Platform domain is sub-domain inside user specified domain
in mdadm.conf configuration files inheriting all parameters from it.
The metadata domain name is used disk domain matching functions.
The disk with the same metadata domain name belong to the same metadata
domain.
New metadata handler is added that retrieves platform domain string based
on disk path:
const char *(*get_disk_controller_domain)(const char *path);
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than only migrating between arrays with the same spare_group,
we now migrate based on domains set in the policy.
In order for spare_group to continue to work, we treat it as a domain
of the destination array, and a domain of any device we might remove
from a source array.
Signed-off-by: NeilBrown <neilb@suse.de>
If getinfo_super is called on a container supertype we only get information
on first disk. As a parameter it uses reference to preallocated
mdinfo structure. Amending getinfo_super to return full list of disks
would require ammending all previous calls and subsequently freeing memory
allocated for mdinfo list.
Function container_content that returns a mdinfo list is written
specifically for assembly, performing actions not needed to just fill
mdinfo. It also does not include spares so is unsuitable.
As an alternative a new function getinfo_super_disks is created
to obtain information about all disks states in array.
Existing function sysfs_free is used to free memory
allocated by getinfo_super_disks.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
--no-sharing option disables moving spares between arrays/containers.
Without the option spares are moved if needed according to config rules.
We only allow one process moving spares started with --scan option.
If there is such process running and another instance of Monitor
is starting without --scan, then we issue a warning but allow it
to continue.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When "mdadm -I" is given a device with no metadata, mdadm tries to add
it as a 'spare' somewhere based on policy.
This patch changes the behaviour in two ways:
1/ If the device is at a 'path' where a previous device was removed
from an array or container, then we preferentially add the spare to
that array or container.
2/ Previously only 'bare' devices were considered for adding as
spares. Now if action=spare-same-slot is active, we will add
non-bare devices, but *only* if the path was previously in use
for some array, and the device will only be added to that array.
Based on code
From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If the disk is taken out from its port this port information is
lost. Only udev rule can provide us with this information, and then we
have to store it somehow. This patch adds writing 'cookie' file in
/dev/.mdadm/failed-slots directory in form of file named with value of
f<path-id> containing the metadata type and uuid of the array (or
container) that the device was a member of. The uuid is in exactly
the same format as in the mapfile.
FAILED_SLOTS_DIR constant has been added to hold the location of
cookie files.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
<path-id> allows to identify the port to which given device is plugged
in. In case of hot-removal, udev can pass this information for future
use (eg. write this name as 'cookie' allowing to detect the fact of
reinserting device to the same port).
--path <path-id> parameter has been added to device removal handle
(and char *path has been added to IncrementalRemove() to pass this
value) in order to pass path-id to this handler.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Now that the next_member loop is much smaller it is easy to
just use 'content' rather than stashing it in 'tmpdev->content'.
So we can remove the 'content' field from 'struct mddev_dev'.
Signed-off-by: NeilBrown <neilb@suse.de>
Remove the _t pointer typedef and remove the _s suffix for the
structure,
These things do not help readability.
Signed-off-by: NeilBrown <neilb@suse.de>
This handles the 'container' part of 'load_super', so we can
soon make them completely separate - it is just confusing to
overload these two.
Signed-off-by: NeilBrown <neilb@suse.de>
This allows the info for a single array to be extracted,
so we don't have to write it into st->subarray.
For consistency, implement container_content for super0 and super1,
to just return the mdinfo for the single array.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than hiding this in the 'st', return it explicitly.
In the one case we still need it, copy it into st where needed.
This will disappear in a future patch.
Signed-off-by: NeilBrown <neilb@suse.de>
Rather than hiding this arg in the 'st' structure, pass it explicitly.
This is a first step to getting rid of 'subarray' from 'supertype'.
The strcpy in open_subarray should have better error checking, but it
will disappear soon so there is little point.
Signed-off-by: NeilBrown <neilb@suse.de.
To accurately detect when an array has been split and is now being
recombined, we need to track which other devices each thinks is
working.
We should never include a device in an array if it thinks that the
primary device has failed.
This patch just allows get_info_super to return a list of devices
and whether they are thought to be working or not.
Signed-off-by: NeilBrown <neilb@suse.de>
Detail: report reshape and check as well as resync and recovery
Wait: if the resync is pending or delayed, wait for that too.
Signed-off-by: NeilBrown <neilb@suse.de>
If an --add is requested and a re-add looks promising but fails or
cannot possibly succeed, then don't try the add. This avoids
inadvertently turning devices into spares when an array is failed but
the devices seem to actually work.
Signed-off-by: NeilBrown <neilb@suse.de>
Allow disk-policy to be computed given the path and
disk type explicitly. This can be used when hunting through
/dev/disk/by-path for something interesting.
Signed-off-by: NeilBrown <neilb@suse.de>
To support incorpating a new bare device into a collection of arrays -
one partition each - mdadm needs a modest understanding of partition
tables.
The main needs to be able to recognise a partition table on one device
and copy it onto another.
This will be done using pseudo metadata types 'mbr' and 'gpt'.
Signed-off-by: NeilBrown <neilb@suse.de>
A device can be in a number of domains.
The domains of an array is the union of the domains of all devices.
A device is allowed to join an array when its set of domains is a
subset of the array's domains.
Signed-off-by: NeilBrown <neilb@suse.de>
Policy can be stated as lines in mdadm.conf like:
POLICY type=disk path=pci-0000:00:1f.2-* action=ignore domain=onboard
This defines two distinct policies which apply to any disk (but not
partition) device reached through the pci device 0000:00:1f.2.
The policies are "action=ignore" which means certain actions will
ignore the device, and "domain=onboard" which means all such devices
as treated as being united under the name 'onboard'.
This patch just adds data structures and code to read and
manipulate them. Future patches will actually use them.
Signed-off-by: NeilBrown <neilb@suse.de>