# Shallow Copy-On-Write VM Clones with LibVirt

## Using Python to create a virtual machine linked clone

The LibVirt utility virt-clone, by default, makes a full copy of all qcow2 disks of the original. In this post, I show how to use Python to make new qcow2 disks using the originals as the “backing store” and linking these new images to the LibVirt clone. The resulting virtual machine takes seconds to create and takes up hardly any additional space.

Tip

The --reflink option of the virt-clone utility makes a filesystem-level linked clone only when the storage is on btrfs. For ext4 or any other filesystem, this option is ignored. If your libvirt storage is on btrfs I would reccommend using virt-clone directly instead of the solution presented here.

Note

TL;DR: The virt-linked-clone utility described below is installable as a console application with Python’s package manager pip:

> pip install virt-linked-clone


Use the -h option to get the full usage and help:

> virt-linked-clone -h
usage: virt-linked-clone [-h] [--zsh-completion] [--version] [-c CONNECTION]
source target

positional arguments:
source                Virtual machine from which to create a clone where all
writable qcow2-backed drives are linked using copy-on-write.
It must be defined with libvirt and accessible via virsh
commands.
target                Name of the new virtual machine to define. Most of the
settings of the source image will be copied into the new
libvirt domain. Defaults to adding "-clone" to the source
domain name.

options:
-h, --help            show this help message and exit
--zsh-completion      Print out the zsh autocompletion code for this utility and
exit.
--version             show program's version number and exit
-c CONNECTION, --connection CONNECTION
LibVirt URI to use for connecting to the domain controller.
Will honor the value of the VIRSH_DEFAULT_CONNECT_URI
environment variable. (default: qemu:///session)


In a GitLab-based CI system I helped setup, we use LibVirt virtual machines for the runners. For this to work, I had to create a GitLab custom executor which prepares and starts the VM, runs the CI job script within it and finally shuts it down and destroys the VM. Originally, I was trying to use virt-clone but found that it always makes a full copy of the disks of the original. What I wanted was a linked clone where all the copy-on-write disks (read: qcow2 files) used by the original were used as backing files for the disks on the clone.

While there are a few scripts on the web which do this, I wanted a more complete and robust solution. Here I present my python script which is installable as an executable console application named virt-linked-clone with Python’s pip installation utility. Note that I will use the terms “domain” and “virtual machine” interchangeably in this article.

# Preliminary Work

Because we will overlay a new disk image file onto the originals of the virtual machine (more on this later), we need to make sure the orignal “source” domain is shutdown and the disks are set to read-only. This Python script uses the libvirt-python module which is a very thin wrapper around the libvirt library.

I typically run all my virtual machines under LibVirt’s QEMU/KVM user session qemu:///session which has limited networking options, but works fine for all my use cases. It’s not the default, which is qemu:///system and requires root-level privileges, but easily accessible via the “Add Connection” dialog in virt-manager: Here is my context to manage the libvirt connection:

import libvirt  # python package: libvirt-python

@contextlib.contextmanager
def libvirt_connection(name='qemu:///session'):
"""Libvirt connection context."""
# libvirt-host: virConnectOpen(name)
conn = libvirt.open(name)
try:
yield conn
finally:
# libvirt-host: virConnectClose()
conn.close()


Once connected, we need to get a handle to the domains. The default behavior of getting a domain by name with libvirt-python raises an exception if it’s not found, but here I have a function that returns the domain handle or None to simplify code later on:

def get_domain(conn, name):
"""Return libvirt domain object or None if not defined."""
# libvirt-domain: virConnectListDefinedDomains(connection)
if name in conn.listDefinedDomains():
# libvirt-domain: virDomainLookupByName(name)
return conn.lookupByName(name)


In case the source domain needs to be shutdown, we have a simple function that tries for 3 minutes and gives up with an exception if it can’t shut it down. The user will then have to diagnose the problems outside of this script (likely by using virt-manager or virsh).

import time

def shutdown_domain(domain):
"""Shutdown the domain, trying several times before giving up."""
# libvirt-domain: virDomainShutdown(domain)
domain.shutdown()
start = time.time()
timeout = 3 * 60  # 3 minutes
while (time.time() - start) < timeout:
# libvirt-domain: virDomainGetState(domain)
state, reason = domain.state()
if state == libvirt.VIR_DOMAIN_SHUTOFF:
break
else:
time.sleep(1)
if state != libvirt.VIR_DOMAIN_SHUTOFF:
raise RuntimeError(f'shutdown of {domain} unsuccessful, currently: {state}')


To round out the preliminary utility functions, I include a function to ensure a domain is shutdown so we can clone it:

def ensure_shutdown(domain, shutdown=True):
"""Raise exception if domain is not or can not be shutdown."""
# libvirt-domain: virDomainGetState(domain)
state, reason = domain.state()
if state == libvirt.VIR_DOMAIN_RUNNING:
if shutdown:
shutdown_domain(domain)
else:
raise RuntimeError(f'domain {source} must be shut down')
# libvirt-domain: virDomainGetState(domain)
state, reason = domain.state()
if state != libvirt.VIR_DOMAIN_SHUTOFF:
msg = f'domain {source} must be shut down, current state: {state}'
raise RuntimeError(msg)


# Getting a List of Disks in the Virtual Machine

LibVirt domains, which in this case are QEMU/KVM virtual machines, will have one or more disks attached. These are typically in the raw or qcow2 format. For the qcow2 images, we can create a copy-on-write overlay file, make the original file read-only and use this new overlay as the disk for the clone we are to create.

Here is an example LibVirt domain with a qcow2 disk image shown in the virt-manager interface: The XML definition of the disks is accessible through the “XML” tab:

To get a list of disks for a virtual machine we can inspect the XML of the domain obtained from LibVirt. The candidate disks which may be used as backing files for qcow2 overlay images are of type “file” and device “disk” and the driver for the disk must be “qemu” with type “qcow2”. The target device name is usually something like “vda” on my system but I’ve seen tutorials and help pages name them “sda” or similar - it doesn’t matter too much in this context, we just need to save it off to refer to it later when making the initial clone.

import pathlib
import xml.etree.ElementTree as xml

def list_cow_disks(domain):
"""Return a list of copy-on-write disks (qcow2) used by this domain."""
result = []
# libvirt-domain: virDomainGetXMLDesc(domain, flags)
domain_xml = xml.fromstring(domain.XMLDesc(0))
for disk in domain_xml.findall('devices/disk'):
if disk.get('type') == 'file' and disk.get('device') == 'disk':
driver = disk.find('driver')
if driver.get('name') == 'qemu' and driver.get('type') == 'qcow2':
source_file = pathlib.Path(disk.find('source').get('file'))
target_dev = disk.find('target').get('dev')
result.append((source_file, target_dev, disk))
return result


# Creating an Initial Clone Domain

The initial clone we create will have disks that use the the same underlying files as the source domain. This is a temporary state and the disks will be replaced quickly thereafter. First, we need a way to set (and unset) the “readonly” attribute of a disk defined in a domain:

def set_disk_readonly(domain, disk_xml, value=True):
"""Set/unset disk readonly attribute in the given domain."""
else:
# no changes neccessary
return
disk_xml_str = xml.tostring(disk_xml, encoding='unicode')
# libvirt-domain: virDomainUpdateDeviceFlags(domain, xml, flags)
domain.updateDeviceFlags(disk_xml_str, 0)


Using virt-clone, the initial VM is created. Again, the resulting image will be using the same files as the source domain for all disks.

def create_clone(source, target, skip_copy_devices):
"""Clone source to target, reusing the disks as-is (no copies)."""
cmd = ['virt-clone', '--preserve-data', '--auto-clone']
cmd += ['--original', source]
cmd += ['--name', target]
for disk_device in skip_copy_devices:
cmd += ['--skip-copy', disk_device]
subprocess.run(cmd, check=True)


For each qcow2 disk, this is how we’ll create the overlay image using qemu-img create:

def qemu_img_create(new_file, backing_file):
"""Create an overlay disk image based on another qcow2 image."""
cmd = ['qemu-img', 'create', '-q', '-f', 'qcow2', '-F', 'qcow2']
cmd += ['-o', f'backing_file={backing_file}']
cmd += [new_file]
subprocess.run(cmd, check=True)


and here, we update the domain to use these new overlay image files. Note that we go a step further by adding the “backingStore” XML tag to the domain definition. This helps libvirt manage these VMs more effectively.

def create_overlay_disks(domain, cow_disks):
"""Make existing disk in domain an overlay qcow2 image on the original."""
# libvirt-domain: virDomainGetName(domain)
domain_name = domain.name()
for disk_file, disk_device, disk_xml in cow_disks:
# make linked copy-on-write clone of the disk image file
new_file = disk_file.parent / f'{domain_name}-{disk_device}.qcow2'
qemu_img_create(new_file, backing_file=disk_file)

# ensure the disk is marked read/write

# set the new disk as the source file in the target domain
# set the source file as the backing store, and append
# source's backing store to the chain
disk_source = disk_xml.find('source')
source_file = disk_source.get('file')

disk_source.set('file', str(new_file))
backing_store = xml.Element('backingStore', {'type': 'file'})
backing_store.append(xml.Element('format', {'type': 'qcow2'}))
backing_store.append(xml.Element('source', {'file': source_file}))
if source_chain := disk_xml.find('backingStore'):
backing_store.append(copy.copy(source_chain))
disk_xml.remove(source_chain)
disk_xml.append(backing_store)

disk_xml_str = xml.tostring(disk_xml, encoding='unicode')
# libvirt-domain: virDomainUpdateDeviceFlags(domain, xml, flags)
domain.updateDeviceFlags(disk_xml_str, 0)


# Putting it All Together

Finally, I present the create_linked_clone() method that brings all the methods above together into a single place. The function does the following:

1. connect to the libvirt endpoint (qemu:///session in my case)
2. ensure the source domain exists
3. ensure the target domain does not exist
4. ensure the source domain is shutdown and the qcow2 disks are set to read-only
5. create the initial clone
6. create the overlay qcow2 images and update the clone definition
def create_linked_clone(
source, target, connection='qemu:///session', shutdown_source=True
):
"""Clone a libvirt domain, creating overlay images for all qcow2 disks."""
with libvirt_connection(connection) as conn:
source_domain = get_domain(conn, source)
if source_domain is None:

if get_domain(conn, target) is not None:
raise ValueError(f'target libvirt domain "{target}" already exists')

cow_disks = list_cow_disks(source_domain)
if not cow_disks:
msg = f'source libvirt domain "{source}" has no copy-on-write disks'
raise ValueError(msg)

ensure_shutdown(source_domain, shutdown_source)

for _, _, disk_xml in cow_disks:

cow_disks_dev = [dev for _, dev, _ in cow_disks]
create_clone(source, target, cow_disks_dev)

target_domain = get_domain(conn, target)
try:
create_overlay_disks(target_domain, cow_disks)
except:
# libvirt-domain: virDomainUndefine(domain)
target_domain.undefine()
raise


The resulting virtual machine can be inspected using virt-manager where we see that the primary disk image is an overlay with a backing store:

Tip

When making changes to a LibVirt domain using virsh or the libvirt library, I noticed that virt-manager does not see or reflect these changes. The changes will appear if you disconnect and then reconnect to the LibVirt session.