Shallow Copy-On-Write VM Clones with LibVirt

Using Python to create a virtual machine linked clone

Shallow Copy-On-Write VM Clones with LibVirt

Using Python to create a virtual machine linked clone

The LibVirt utility virt-clone, by default, makes a full copy of all qcow2 disks of the original. In this post, I show how to use Python to make new qcow2 disks using the originals as the “backing store” and linking these new images to the LibVirt clone. The resulting virtual machine takes seconds to create and takes up hardly any additional space.

Tip

The --reflink option of the virt-clone utility makes a filesystem-level linked clone only when the storage is on btrfs. For ext4 or any other filesystem, this option is ignored. If your libvirt storage is on btrfs I would reccommend using virt-clone directly instead of the solution presented here.

Note

TL;DR: The virt-linked-clone utility described below is installable as a console application with Python’s package manager pip:

> pip install virt-linked-clone

Use the -h option to get the full usage and help:

> virt-linked-clone -h
usage: virt-linked-clone [-h] [--zsh-completion] [--version] [-c CONNECTION]
                         source target

positional arguments:
  source                Virtual machine from which to create a clone where all
                        writable qcow2-backed drives are linked using copy-on-write.
                        It must be defined with libvirt and accessible via virsh
                        commands.
  target                Name of the new virtual machine to define. Most of the
                        settings of the source image will be copied into the new
                        libvirt domain. Defaults to adding "-clone" to the source
                        domain name.

options:
  -h, --help            show this help message and exit
  --zsh-completion      Print out the zsh autocompletion code for this utility and
                        exit.
  --version             show program's version number and exit
  -c CONNECTION, --connection CONNECTION
                        LibVirt URI to use for connecting to the domain controller.
                        Will honor the value of the VIRSH_DEFAULT_CONNECT_URI
                        environment variable. (default: qemu:///session)

In a GitLab-based CI system I helped setup, we use LibVirt virtual machines for the runners. For this to work, I had to create a GitLab custom executor which prepares and starts the VM, runs the CI job script within it and finally shuts it down and destroys the VM. Originally, I was trying to use virt-clone but found that it always makes a full copy of the disks of the original. What I wanted was a linked clone where all the copy-on-write disks (read: qcow2 files) used by the original were used as backing files for the disks on the clone.

The virt-clone utility makes a full copy by default

While there are a few scripts on the web which do this, I wanted a more complete and robust solution. Here I present my python script which is installable as an executable console application named virt-linked-clone with Python’s pip installation utility. Note that I will use the terms “domain” and “virtual machine” interchangeably in this article.

We want the clone to have overlay copies of the original disks

Preliminary Work

Because we will overlay a new disk image file onto the originals of the virtual machine (more on this later), we need to make sure the orignal “source” domain is shutdown and the disks are set to read-only. This Python script uses the libvirt-python module which is a very thin wrapper around the libvirt library.

I typically run all my virtual machines under LibVirt’s QEMU/KVM user session qemu:///session which has limited networking options, but works fine for all my use cases. It’s not the default, which is qemu:///system and requires root-level privileges, but easily accessible via the “Add Connection” dialog in virt-manager: QEMU user session connection in virt-manager Here is my context to manage the libvirt connection:

import libvirt  # python package: libvirt-python

@contextlib.contextmanager
def libvirt_connection(name='qemu:///session'):
    """Libvirt connection context."""
    # libvirt-host: virConnectOpen(name)
    conn = libvirt.open(name)
    try:
        yield conn
    finally:
        # libvirt-host: virConnectClose()
        conn.close()

Once connected, we need to get a handle to the domains. The default behavior of getting a domain by name with libvirt-python raises an exception if it’s not found, but here I have a function that returns the domain handle or None to simplify code later on:

def get_domain(conn, name):
    """Return libvirt domain object or None if not defined."""
    # libvirt-domain: virConnectListDefinedDomains(connection)
    if name in conn.listDefinedDomains():
        # libvirt-domain: virDomainLookupByName(name)
        return conn.lookupByName(name)

In case the source domain needs to be shutdown, we have a simple function that tries for 3 minutes and gives up with an exception if it can’t shut it down. The user will then have to diagnose the problems outside of this script (likely by using virt-manager or virsh).

import time

def shutdown_domain(domain):
    """Shutdown the domain, trying several times before giving up."""
    # libvirt-domain: virDomainShutdown(domain)
    domain.shutdown()
    start = time.time()
    timeout = 3 * 60  # 3 minutes
    while (time.time() - start) < timeout:
        # libvirt-domain: virDomainGetState(domain)
        state, reason = domain.state()
        if state == libvirt.VIR_DOMAIN_SHUTOFF:
            break
        else:
            time.sleep(1)
    if state != libvirt.VIR_DOMAIN_SHUTOFF:
        raise RuntimeError(f'shutdown of {domain} unsuccessful, currently: {state}')

To round out the preliminary utility functions, I include a function to ensure a domain is shutdown so we can clone it:

def ensure_shutdown(domain, shutdown=True):
    """Raise exception if domain is not or can not be shutdown."""
    # libvirt-domain: virDomainGetState(domain)
    state, reason = domain.state()
    if state == libvirt.VIR_DOMAIN_RUNNING:
        if shutdown:
            shutdown_domain(domain)
        else:
            raise RuntimeError(f'domain {source} must be shut down')
    # libvirt-domain: virDomainGetState(domain)
    state, reason = domain.state()
    if state != libvirt.VIR_DOMAIN_SHUTOFF:
        msg = f'domain {source} must be shut down, current state: {state}'
        raise RuntimeError(msg)

Getting a List of Disks in the Virtual Machine

LibVirt domains, which in this case are QEMU/KVM virtual machines, will have one or more disks attached. These are typically in the raw or qcow2 format. For the qcow2 images, we can create a copy-on-write overlay file, make the original file read-only and use this new overlay as the disk for the clone we are to create.

Here is an example LibVirt domain with a qcow2 disk image shown in the virt-manager interface: Disk details in virt-manager The XML definition of the disks is accessible through the “XML” tab: Disk details XML in virt-manager

To get a list of disks for a virtual machine we can inspect the XML of the domain obtained from LibVirt. The candidate disks which may be used as backing files for qcow2 overlay images are of type “file” and device “disk” and the driver for the disk must be “qemu” with type “qcow2”. The target device name is usually something like “vda” on my system but I’ve seen tutorials and help pages name them “sda” or similar - it doesn’t matter too much in this context, we just need to save it off to refer to it later when making the initial clone.

import pathlib
import xml.etree.ElementTree as xml

def list_cow_disks(domain):
    """Return a list of copy-on-write disks (qcow2) used by this domain."""
    result = []
    # libvirt-domain: virDomainGetXMLDesc(domain, flags)
    domain_xml = xml.fromstring(domain.XMLDesc(0))
    for disk in domain_xml.findall('devices/disk'):
        if disk.get('type') == 'file' and disk.get('device') == 'disk':
            driver = disk.find('driver')
            if driver.get('name') == 'qemu' and driver.get('type') == 'qcow2':
                source_file = pathlib.Path(disk.find('source').get('file'))
                target_dev = disk.find('target').get('dev')
                result.append((source_file, target_dev, disk))
    return result

Creating an Initial Clone Domain

The initial clone we create will have disks that use the the same underlying files as the source domain. This is a temporary state and the disks will be replaced quickly thereafter. First, we need a way to set (and unset) the “readonly” attribute of a disk defined in a domain:

def set_disk_readonly(domain, disk_xml, value=True):
    """Set/unset disk readonly attribute in the given domain."""
    readonly_tags = disk_xml.findall('readonly')
    if value and not readonly_tags:
        disk_xml.append(xml.Element('readonly'))
    elif not value and readonly_tags:
        for readonly_tag in readonly_tags:
            disk_xml.remove(readonly_tag)
    else:
        # no changes neccessary
        return
    disk_xml_str = xml.tostring(disk_xml, encoding='unicode')
    # libvirt-domain: virDomainUpdateDeviceFlags(domain, xml, flags)
    domain.updateDeviceFlags(disk_xml_str, 0)

Using virt-clone, the initial VM is created. Again, the resulting image will be using the same files as the source domain for all disks.

def create_clone(source, target, skip_copy_devices):
    """Clone source to target, reusing the disks as-is (no copies)."""
    cmd = ['virt-clone', '--preserve-data', '--auto-clone']
    cmd += ['--original', source]
    cmd += ['--name', target]
    for disk_device in skip_copy_devices:
        cmd += ['--skip-copy', disk_device]
    subprocess.run(cmd, check=True)

For each qcow2 disk, this is how we’ll create the overlay image using qemu-img create:

def qemu_img_create(new_file, backing_file):
    """Create an overlay disk image based on another qcow2 image."""
    cmd = ['qemu-img', 'create', '-q', '-f', 'qcow2', '-F', 'qcow2']
    cmd += ['-o', f'backing_file={backing_file}']
    cmd += [new_file]
    subprocess.run(cmd, check=True)

and here, we update the domain to use these new overlay image files. Note that we go a step further by adding the “backingStore” XML tag to the domain definition. This helps libvirt manage these VMs more effectively.

def create_overlay_disks(domain, cow_disks):
    """Make existing disk in domain an overlay qcow2 image on the original."""
    # libvirt-domain: virDomainGetName(domain)
    domain_name = domain.name()
    for disk_file, disk_device, disk_xml in cow_disks:
        # make linked copy-on-write clone of the disk image file
        new_file = disk_file.parent / f'{domain_name}-{disk_device}.qcow2'
        qemu_img_create(new_file, backing_file=disk_file)

        # ensure the disk is marked read/write
        set_disk_readonly(domain, disk_xml, value=False)

        # set the new disk as the source file in the target domain
        # set the source file as the backing store, and append
        # source's backing store to the chain
        disk_source = disk_xml.find('source')
        source_file = disk_source.get('file')

        disk_source.set('file', str(new_file))
        backing_store = xml.Element('backingStore', {'type': 'file'})
        backing_store.append(xml.Element('format', {'type': 'qcow2'}))
        backing_store.append(xml.Element('source', {'file': source_file}))
        if source_chain := disk_xml.find('backingStore'):
            backing_store.append(copy.copy(source_chain))
            disk_xml.remove(source_chain)
        disk_xml.append(backing_store)

        disk_xml_str = xml.tostring(disk_xml, encoding='unicode')
        # libvirt-domain: virDomainUpdateDeviceFlags(domain, xml, flags)
        domain.updateDeviceFlags(disk_xml_str, 0)

Putting it All Together

Finally, I present the create_linked_clone() method that brings all the methods above together into a single place. The function does the following:

  1. connect to the libvirt endpoint (qemu:///session in my case)
  2. ensure the source domain exists
  3. ensure the target domain does not exist
  4. ensure the source domain is shutdown and the qcow2 disks are set to read-only
  5. create the initial clone
  6. create the overlay qcow2 images and update the clone definition
def create_linked_clone(
    source, target, connection='qemu:///session', shutdown_source=True
):
    """Clone a libvirt domain, creating overlay images for all qcow2 disks."""
    with libvirt_connection(connection) as conn:
        source_domain = get_domain(conn, source)
        if source_domain is None:
            raise ValueError(f'source libvirt domain "{source}" not found')

        if get_domain(conn, target) is not None:
            raise ValueError(f'target libvirt domain "{target}" already exists')

        cow_disks = list_cow_disks(source_domain)
        if not cow_disks:
            msg = f'source libvirt domain "{source}" has no copy-on-write disks'
            raise ValueError(msg)

        ensure_shutdown(source_domain, shutdown_source)

        for _, _, disk_xml in cow_disks:
            set_disk_readonly(source_domain, disk_xml, value=True)

        cow_disks_dev = [dev for _, dev, _ in cow_disks]
        create_clone(source, target, cow_disks_dev)

        target_domain = get_domain(conn, target)
        try:
            create_overlay_disks(target_domain, cow_disks)
        except:
            # libvirt-domain: virDomainUndefine(domain)
            target_domain.undefine()
            raise

The resulting virtual machine can be inspected using virt-manager where we see that the primary disk image is an overlay with a backing store: Cloned disk details XML in virt-manager

Tip

When making changes to a LibVirt domain using virsh or the libvirt library, I noticed that virt-manager does not see or reflect these changes. The changes will appear if you disconnect and then reconnect to the LibVirt session.

Disconnect and reconnect to see changes in a VM