[RFC PATCH 2/3] discover: Add support for device-mapper snapshots
Samuel Mendoza-Jonas
sam.mj at au1.ibm.com
Wed Apr 29 10:16:44 AEST 2015
On 29/04/15 02:25, Cyril Bur wrote:
> On Tue, 2015-04-28 at 10:51 +1000, Samuel Mendoza-Jonas wrote:
>> On 28/04/15 02:09, Cyril Bur wrote:
>>> On Mon, 2015-04-27 at 15:52 +1000, Samuel Mendoza-Jonas wrote:
>>>> Device Mapper allows the creation of CoW snapshots backed by ramdisks.
>>>> This will allow Petitboot to perform potentially destructive actions
>>>> such as filesystem recovery without affecting the backing disk.
>>>> This commit adds several functions to create and destroy these snapshots
>>>> and to keep track of available ramdisks.
>>>>
>>> Hey Sam,
>>>
>>> Been waiting to see how you did this, looks nice!
>>>
>>> Got a few comments.
>>>
>>> Firstly, is there a real advantage to the in_use flag? It looks like you
>>> create and destroy the DM device regardless, all you're saving is a few
>>> allocations? Could you do it in such a way that you could save the
>>> DM_DEVICE_CREATE/DM_DEVICE_REMOVE actions, I feel like they're the most
>>> costly.
>>
>> Not strictly required, more an artifact of some older checking. I'm not sure I
>> follow you on avoiding the DM_DEVICE_* actions - there are two things we
>> essentially keep track of:
>
> Going to admit right off the bat that I don't really know how any of the
> DM really works.
>
>> - Available /dev/ramX ramdisks - these are not the snapshots themselves just
>> the backing memory device. Petitboot boots up with 16 of these (IIRC) so we
>> use these first before making new nodes.
>
> If you've got 16 existing /dev/ramX that you use as backing devices,
> shouldn't there be code somewhere which inits handler->ramdisks with 16
> entries?
More accurately, the kernel that Petitboot ships with in the current FSP firmware(s)
happens to initialise 16 ramdisks at boot - Petitboot itself doesn't actually
know this. What happens instead is that Petitboot receives a udev event for each
one (which leads to device_handler_add_ramdisk()).
>
>> - A snapshot for every eligible disk: We DM_DEVICE_CREATE a snapshot when a
>> new valid disk is recognised, and DM_DEVICE_REMOVE it when the disk is unmounted.
>> There's not a way to 'avoid' the actions since the snapshot exists for the life
>> of the disk as far as Petitboot is concerned.
>>
> Right, what I was getting at is that in the event of someone repeatedly
> mounting and unmounting you're going to create and remove
> the /dev/mapper/%s a bunch of times, can you optimise this to keep
> the /dev/mapper/%s around?
>
> If not (which may be entirely possible... I'll take your word for it),
> it looks like ramdisk.in_use corresponds to ramdisk.snapshot being NULL
> or not... (provided you do ramdisk->snapshot = NULL before the out: in
> device_handler_create_snapshot() ).
Strictly speaking, mounting/unmounting the snapshot doesn't require us to destroy
the snapshot (indeed we keep them around when remounting for read-write).
However if we assume Petitboot is the only actor unmounting things the only other
two cases are:
- When the device/snapshot is unmountable (no fs) and we throw it away.
- When we drop all discover_devices and reinit.
I'm open to suggestion on the second idea, but the problem is that when we reinit
we free all our discover_devices, and any knowledge we had of the snapshots goes
out the window. We *could* check that a snapshot already exists when trying to
make one, but this gets a little messy, and arguably reinits are rare enough
that this won't be a performance hit.
>
>> However I don't think I've read your question right so let me know :)
>>
> I believe I have a reputation for bad expression, probably my bad.
>>>
>>>> No functionality change in this patch.
>>>>
>>>> Signed-off-by: Samuel Mendoza-Jonas <sam.mj at au1.ibm.com>
>>>> ---
>>>> discover/Makefile.am | 4 ++
>>>> discover/device-handler.c | 176 ++++++++++++++++++++++++++++++++++++++++++++++
>>>> discover/device-handler.h | 6 ++
>>>> 3 files changed, 186 insertions(+)
>>>>
>>>> diff --git a/discover/Makefile.am b/discover/Makefile.am
>>>> index 7808110..c6bc8af 100644
>>>> --- a/discover/Makefile.am
>>>> +++ b/discover/Makefile.am
>>>> @@ -58,6 +58,10 @@ discover_pb_discover_LDADD = \
>>>> $(core_lib) \
>>>> $(UDEV_LIBS)
>>>>
>>>> +discover_pb_discover_LDFLAGS = \
>>>> + $(AM_LDFLAGS) \
>>>> + -ldevmapper
>>>> +
>>>> discover_pb_discover_CPPFLAGS = \
>>>> $(AM_CPPFLAGS) \
>>>> -DLOCAL_STATE_DIR='"$(localstatedir)"' \
>>>> diff --git a/discover/device-handler.c b/discover/device-handler.c
>>>> index f411d9f..69cd699 100644
>>>> --- a/discover/device-handler.c
>>>> +++ b/discover/device-handler.c
>>>> @@ -10,6 +10,8 @@
>>>> #include <sys/wait.h>
>>>> #include <sys/mount.h>
>>>>
>>>> +#include <libdevmapper.h>
>>>> +
>>>> #include <talloc/talloc.h>
>>>> #include <list/list.h>
>>>> #include <log/log.h>
>>>> @@ -46,6 +48,12 @@ enum default_priority {
>>>> DEFAULT_PRIORITY_DISABLED = 0xff,
>>>> };
>>>>
>>>> +struct ramdisk_device {
>>>> + const char *path;
>>>> + bool in_use;
>>>> + char *snapshot;
>>>> +};
>>>> +
>>>> struct device_handler {
>>>> struct discover_server *server;
>>>> int dry_run;
>>>> @@ -57,6 +65,9 @@ struct device_handler {
>>>> struct discover_device **devices;
>>>> unsigned int n_devices;
>>>>
>>>> + struct ramdisk_device **ramdisks;
>>>> + unsigned int n_ramdisks;
>>>> +
>>>> struct waitset *waitset;
>>>> struct waiter *timeout_waiter;
>>>> bool autoboot_enabled;
>>>> @@ -747,6 +758,171 @@ void device_handler_add_device(struct device_handler *handler,
>>>> network_register_device(handler->network, device);
>>>> }
>>>>
>>>> +void device_handler_add_ramdisk(struct device_handler *handler,
>>>> + const char *path)
>>>> +{
>>>> + struct ramdisk_device *dev;
>>>> + unsigned int i;
>>>> +
>>>> + if (!path)
>>>> + return;
>>>> +
>>>> + for (i = 0; i < handler->n_ramdisks; i++)
>>>> + if (!strcmp(handler->ramdisks[i]->path, path))
>>>> + return;
>>>> +
>>>> + dev = talloc_zero(handler, struct ramdisk_device);
>>>> + dev->path = talloc_strdup(handler, path);
>>>
>>> You've not checked for NULL when using many of the talloc_* functions?
>>> Is there a good reason for this? I feel like you might be opening
>>> yourself up to mystery segfault.
>>> Of course on the other hand, I'll probably end up agreeing that there
>>> really isn't much that can be done if they ever fail.
>>
>> I choose to plead RFC :P But as you say this one in particular is tricky -
>> if we fail we probably can't make a snapshot, so we *could* mount the
>> disk itself - but then we throw away any read-only guarantee.
>
> Is throwing away the read-only guarantee (without explicit user consent)
> ever a good idea? Could petitboot prompt? Or have a "I really know what
> I'm doing here" mode.
No, it's a terrible idea :)
Something like a pre-boot popup that says "Want to save changes to disk?" for
GRUB updates etc, could work, but
a) Autoboot is a problem
b) Someone will undoubtedly have an aneurysm when they first see that pop-up
and I'll get a bug report about bootloaders never touching disks
>
>> Maybe something like this (later on in the mounting code)
>> if (snapshot available)
>> mount(snapshot)
>> else
>> mount(disk) with -norecovery
>>>
>>>> +
>>>> + i = handler->n_ramdisks++;
>>>> +
>>>> + handler->ramdisks = talloc_realloc(handler, handler->ramdisks,
>>>> + struct ramdisk_device *, handler->n_ramdisks);
>>>> + handler->ramdisks[i] = dev;
>>>
>>> So could be just me, these three lines caused me a minifreakout, it took
>>> me a second to see that it's correct, any reason you're incrementing
>>> n_ramdisks before you resize the allocation? What if the allocation
>>
>> Faith in my memory allocator? :)
>>
> I'm going to go and take (almost) all the ram out of a Tuleta, then
> we'll see :p.
>
>
>>> fails? Why even use i? Might I suggest:
>>>
>>> handler->ramdisks = talloc_realloc(handler, handler->ramdisks,
>>> struct ramdisk_device *, handler->n_ramdisks + 1);
>>> if (!hander->ramdisks) {
>>> /* I still think you should freak out */
>>> }
>>>
>>> handler->ramdisks[handler->n_ramdisks++] = dev;
>>
>> Yep, recovers better.
>>
>>>
>>>> +}
>>>> +
>>>> +void device_handler_create_snapshot(struct device_handler *handler,
>>>> + struct discover_device *device)
>>>> +{
>>>> + unsigned int sectors, i;
>>>> + struct dm_task *task;
>>>> + char *ttype, *params;
>>>> + uint32_t cookie = 0;
>>>> + uint16_t flags = 0;
>>>> + const char *tmp;
>>>> +
>>>> + tmp = discover_device_get_param(device, "ID_PART_ENTRY_SIZE");
>>>> + if (!tmp) {
>>>> + pb_log("Could not retrieve sector size for %s\n",
>>>> + device->device_path);
>>>> + return;
>>>> + }
>>>> +
>>>> + sectors = strtoul(tmp, NULL, 0);
>>>> + if (!sectors) {
>>>> + pb_log("Error reading sector count for %s\n",
>>>> + device->device_path);
>>>> + return;
>>>> + }
>>>> +
>>>> + pb_log("Creating a snapshot for %s, %u sectors, named %s\n",
>>>> + device->device_path, sectors, device->device->id);
>>>> +
>>>> + /* Check if free ramdisk exists */
>>>> + for (i = 0; i < handler->n_ramdisks; i++)
>>>> + if (!handler->ramdisks[i]->in_use)
>>>> + break;
>>>> +
>>>> + /* Otherwise create a new one */
>>>> + if (i == handler->n_ramdisks) {
>>>> + char *name = talloc_asprintf(handler, "/dev/ram%d",
>>>> + handler->n_ramdisks);
>>>> + dev_t id = makedev(1, handler->n_ramdisks);
>>>> + if (mknod(name, S_IFBLK, id)) {
>>>> + if (errno == EEXIST) {
>>>> + /* We haven't yet received updates for existing
>>>> + * ramdisks - add and use this one */
>>>> + pb_debug("Using untracked ramdisk %s\n", name);
>>>> + } else {
>>>> + pb_log("Failed to create new ramdisk %s: %s\n",
>>>> + name, strerror(errno));
>>>> + return;
>>>> + }
>>>> + }
>>>> + device_handler_add_ramdisk(handler, name);
>>>> + talloc_free(name);
>>>> + }
>>>> +
>>> Perhaps have the above code be something along the lines of
>>> get_backing_ramdisk(...)
>>
>> Could do although this doesn't get called anywhere else - snapshot-merges
>> may change that however.
>>
> I just think device_handler_create_snapshot() would flow better, having
> said that, its not bad as is... but if you do envisage a scenario where
> such a new function would get called from two places...
>
And I suspect it probably will, I'll try to minimise any code duplication when
bringing in snapshot-merge - most likely I'll split the devmapper stuff out
into a separate file.
>
> Cyril.
>
>>>
>>>
>>>> + ttype = talloc_asprintf(handler, "snapshot");
>>>> + params = talloc_asprintf(handler, "%s %s N 1",
>>>> + device->device_path, handler->ramdisks[i]->path);
>>>> +
>>>> + task = dm_task_create(DM_DEVICE_CREATE);
>>>> + if (!task) {
>>>> + pb_log("Error creating new dm-task\n");
>>>> + goto err1;
>>>> + }
>>>> +
>>>> + if (!dm_task_set_name(task, device->device->id))
>>>> + goto err2;
>>>> +
>>>> + /* Set the table for this dm-device */
>>>> + if (!dm_task_add_target(task, 0, sectors, ttype, params))
>>>> + goto err2;
>>>> +
>>>> + if (!dm_task_set_add_node(task, DM_ADD_NODE_ON_CREATE))
>>>> + goto err2;
>>>> +
>>>> + /* Petitboot's libdm isn't compiled with --enable-udev_sync, so we set
>>>> + * empty cookie and flags */
>>>> + if (!dm_task_set_cookie(task, &cookie, flags))
>>>> + goto err2;
>>>> +
>>>> + if (!dm_task_run(task)) {
>>>> + pb_log("Error executing dm-task\n");
>>>> + goto err2;
>>>> + }
>>>> +
>>>> + /* This is the magic incantation to get /dev/mapper/sdaX appearing */
>>>> + dm_udev_wait(cookie);
>>>> +
>>>> + device->ramdisk = handler->ramdisks[i];
>>>> + handler->ramdisks[i]->in_use = true;
>>>> + handler->ramdisks[i]->snapshot = talloc_asprintf(handler, "/dev/mapper/%s",
>>>> + device->device->id);
>>>> +
>>>> + pb_log("Snapshot created for device %s\n", device->device->id);
>>>> +
>>>> +err2:
>>>> + dm_task_destroy(task);
>>>> +err1:
>>>> + talloc_free(params);
>>>> + talloc_free(ttype);
>>>> +}
>>>> +
>>>> +int discover_device_destroy_snapshot(struct discover_device *device)
>>>> +{
>>>> + /* Assume we've already unmounted the snapshot */
>>>> + struct dm_task *task;
>>>> + uint32_t cookie = 0;
>>>> + uint16_t flags = 0;
>>>> + int rc = -1;
>>>> +
>>>> + if (!device->ramdisk)
>>>> + return 0;
>>>> +
>>>> + task = dm_task_create(DM_DEVICE_REMOVE);
>>>> + if (!task) {
>>>> + pb_log("Could not create dm_task DM_DEVICE_REMOVE\n");
>>>> + return -1;
>>>> + }
>>>> +
>>>> + if (!dm_task_set_name(task, device->device->id)) {
>>>> + pb_log("No dm-snapshot named '%s'\n", device->device->id);
>>>> + goto out;
>>>> + }
>>>> +
>>>> + /* Petitboot's libdm isn't compiled with --enable-udev_sync, so we set
>>>> + * empty cookie and flags */
>>>> + if (!dm_task_set_cookie(task, &cookie, flags))
>>>> + goto out;
>>>> +
>>>> + if (!dm_task_run(task)) {
>>>> + pb_log("Unable to remove snapshot '%s'\n", device->device->id);
>>>> + goto out;
>>>> + }
>>>> +
>>>> + /* Wait for /dev/mapper/ entries to be removed */
>>>> + dm_udev_wait(cookie);
>>>> +
>>>> + device->ramdisk->in_use = false;
>>>> + device->ramdisk = NULL;
>>>> + rc = 0;
>>>> +out:
>>>> + dm_task_destroy(task);
>>>> + return rc;
>>>> +}
>>>> +
>>>> /* Start discovery on a hotplugged device. The device will be in our devices
>>>> * array, but has only just been initialised by the hotplug source.
>>>> */
>>>> diff --git a/discover/device-handler.h b/discover/device-handler.h
>>>> index b592c46..2425b7b 100644
>>>> --- a/discover/device-handler.h
>>>> +++ b/discover/device-handler.h
>>>> @@ -26,6 +26,7 @@ struct discover_device {
>>>>
>>>> char *mount_path;
>>>> const char *device_path;
>>>> + struct ramdisk_device *ramdisk;
>>>> bool mounted;
>>>> bool mounted_rw;
>>>> bool unmount;
>>>> @@ -72,6 +73,11 @@ struct discover_device *discover_device_create(struct device_handler *handler,
>>>> const char *id);
>>>> void device_handler_add_device(struct device_handler *handler,
>>>> struct discover_device *device);
>>>> +void device_handler_add_ramdisk(struct device_handler *handler,
>>>> + const char *path);
>>>> +void device_handler_create_snapshot(struct device_handler *handler,
>>>> + struct discover_device *device);
>>>> +int discover_device_destroy_snapshot(struct discover_device *device);
>>>> int device_handler_discover(struct device_handler *handler,
>>>> struct discover_device *dev);
>>>> int device_handler_dhcp(struct device_handler *handler,
>>>
>>
>>
>
--
-----------
LTC Ozlabs
IBM
More information about the Petitboot
mailing list