Using GlusterFS volumes to host VM images and data was sub-optimal due to the FUSE overhead involved in accessing gluster volumes via GlusterFS native client. However this has changed now with two specific enhancements:
- A new library called libgfapi is now available as part of GlusterFS that provides POSIX-like C APIs for accessing gluster volumes. libgfapi support is available from GlusterFS-3.4 release.
- QEMU (starting from QEMU-1.3) will have GlusterFS block driver that uses libgfapi and hence there is no FUSE overhead any longer when QEMU works with VM images on gluster volumes.
GlusterFS with its pluggable translator model can serve as a flexible storage backend for QEMU. QEMU has to just talk to GlusterFS and GlusterFS will hide different file systems and storage types underneath. Various GlusterFS storage features like replication and striping will automatically be available for QEMU. Efforts are also on to add block device backend in Gluster via Block Device (BD) translator that will expose underlying block devices as files to QEMU. This allows GlusterFS to be a single storage backend for both file and block based storage types.
GlusterFS specifcation in QEMU
VM image residing on gluster volume can be specified on QEMU command line using URI format
gluster[+transport]://[server[:port]]/volname/image[?socket=...]
-
gluster
is the protocol. -
transport
specifies the transport type used to connect to gluster management daemon (glusterd). Valid transport types aretcp, unix and rdma.
If a transport type isn’t specified, then tcp type is assumed. -
server
specifies the server where the volume file specification for the given volume resides. This can be either hostname, ipv4 address or ipv6 address. ipv6 address needs to be within square brackets [ ]. If transport type is unix, then server field should not be specified. Instead the socket field needs to be populated with the path to unix domain socket. -
port
is the port number on which glusterd is listening. This is optional and if not specified, QEMU will send 0 which will make gluster to use the default port. If the transport type is unix, then port should not be specified. -
volname
is the name of the gluster volume which contains the VM image. -
image
is the path to the actual VM image that resides on gluster volume.
Examples:
gluster://1.2.3.4/testvol/a.img
gluster+tcp://1.2.3.4/testvol/a.img
gluster+tcp://1.2.3.4:24007/testvol/dir/a.img
gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img
gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img
gluster+tcp://server.domain.com:24007/testvol/dir/a.img
gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
gluster+rdma://1.2.3.4:24007/testvol/a.img
NOTE: (GlusterFS URI description and above examples are taken from QEMU documentation)
Configuring QEMU with GlusterFS backend
While building QEMU from source, in addition to the normal configuration options, ensure that –enable-glusterfs options are specified explicitly with ./configure script to get glusterfs support in qemu.
Starting with QEMU-1.6, pkg-config is used to configure the GlusterFS backend in QEMU. If you are using GlusterFS compiled and installed from sources, then the GlusterFS package config file (glusterfs-api.pc) might not be present at the standard path and you will have to explicitly add the path by executing this command before running the QEMU configure script:
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
Without this, GlusterFS driver will not be compiled into QEMU even when GlusterFS is present in the system.
- Creating a VM image on GlusterFS backend
qemu-img command can be used to create VM images on gluster backend. The general syntax for image creation looks like this:
For ex:
qemu-img create gluster://server/volname/path/to/image size
How to setup the environment:
This usecase ( using glusterfs backend for VM disk store), is known as 'Virt-Store' usecase. Steps for the entire procedure could be split to:
- Steps to be done on gluster volume side
- Steps to be done on Hypervisor side
Steps to be done on gluster side
These are the steps that needs to be done on the gluster side. Precisely this involves
Creating "Trusted Storage Pool"
Creating a volume
Tuning the volume for virt-store
Tuning glusterd to accept requests from QEMU
Tuning glusterfsd to accept requests from QEMU
Setting ownership on the volume
Starting the volume
- Creating "Trusted Storage Pool"
Install glusterfs rpms on the NODE. You can create a volume with a single node. You can also scale up the cluster, as we call as Trusted Storage Pool
, by adding more nodes to the cluster
gluster peer probe <hostname>
- Creating a volume
It is highly recommended to have replicate volume or distribute-replicate volume for virt-store usecase, as it would add high availability and fault-tolerance. Remember the plain distribute works equally well
gluster volume create replica 2 <brick1> .. <brickN>
where, <brick1> is <hostname>:/<path-of-dir>
Note: It is recommended to create sub-directories inside brick and that could be used to create a volume.For example, say, /home/brick1 is the mountpoint of XFS, then you can create a sub-directory inside it /home/brick1/b1 and use it while creating a volume.You can also use space available in root filesystem for bricks. Gluster cli, by default, throws warning in that case. You can override it by using force option
gluster volume create replica 2 <brick1> .. <brickN> force
If you are new to GlusterFS, you can take a look at QuickStart (http://www.gluster.org/community/documentation/index.php/QuickStart) guide.
- Tuning the volume for virt-store
There are recommended settings available for virt-store. This provide good performance characteristics when enabled on the volume that was used for virt-store
Refer to http://www.gluster.org/community/documentation/index.php/Virt-store-usecase#Tunables for recommended tunables and for applying them on the volume, http://www.gluster.org/community/documentation/index.php/Virt-store-usecase#Applying_the_Tunables_on_the_volume
- Tuning glusterd to accept requests from QEMU
glusterd receives the request only from the applications that run with port number less than 1024 and it blocks otherwise. QEMU uses port number greater than 1024 and to make glusterd accept requests from QEMU, edit the glusterd vol file, /etc/glusterfs/glusterd.vol and add the following,
option rpc-auth-allow-insecure on
Note: If you have installed glusterfs from source, you can find glusterd vol file at /usr/local/etc/glusterfs/glusterd.vol
Restart glusterd after adding that option to glusterd vol file
service glusterd restart
- Tuning glusterfsd to accept requests from QEMU
Enable the option allow-insecure
on the particular volume
gluster volume set <volname> server.allow-insecure on
IMPORTANT : As of now(april 2,2014)there is a bug, as allow-insecure is not dynamically set on a volume.You need to restart the volume for the change to take effect
- Setting ownership on the volume
Set the ownership of qemu:qemu on to the volume
gluster volume set <vol-name> storage.owner-uid 107
gluster volume set <vol-name> storage.owner-gid 107
- Starting the volume
Start the volume
gluster volume start <vol-name>
Steps to be done on Hypervisor Side:
To create a raw image,
qemu-img create gluster://1.2.3.4/testvol/dir/a.img 5G
To create a qcow2 image,
qemu-img create -f qcow2 gluster://server.domain.com:24007/testvol/a.img 5G
Booting VM image from GlusterFS backend
A VM image 'a.img' residing on gluster volume testvol can be booted using QEMU like this:
qemu-system-x86_64 -drive file=gluster://1.2.3.4/testvol/a.img,if=virtio
In addition to VM images, gluster drives can also be used as data drives:
qemu-system-x86_64 -drive file=gluster://1.2.3.4/testvol/a.img,if=virtio -drive file=gluster://1.2.3.4/datavol/a-data.img,if=virtio
Here 'a-data.img' from datavol gluster volume appears as a 2nd drive for the guest.
It is also possible to make use of libvirt to define a disk and use it with qemu:
Create libvirt XML to define Virtual Machine
virt-install is python wrapper which is mostly used to create VM using set of params. How-ever virt-install doesn't support any network filesystem [ https://bugzilla.redhat.com/show_bug.cgi?id=1017308 ]
Create a libvirt VM xml - http://libvirt.org/formatdomain.html where the disk section is formatted in such a way, qemu driver for glusterfs is being used. This can be seen in the following example xml description
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source protocol='gluster' name='distrepvol/vm3.img'>
<host name='10.70.37.106' port='24007'/>
</source>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
-
Define the VM from the XML file that was created earlier
virsh define
-
Verify that the VM is created successfully
virsh list --all
-
Start the VM
virsh start
-
Verification
You can verify the disk image file that is being used by VM
virsh domblklist <VM-Domain-Name/ID>
The above should show the volume name and image name. Here is the example,
[root@test ~]# virsh domblklist vm-test2
Target Source
------------------------------------------------
vda distrepvol/test.img
hdc -
Reference:
For more details on this feature implementation and its advantages, please refer:
http://raobharata.wordpress.com/2012/10/29/qemu-glusterfs-native-integration/
http://www.gluster.org/community/documentation/index.php/Libgfapi_with_qemu_libvirt