libvirt by Red Hat is a management package that manages virtualizations for several packages under Linux. While very good for single server workstations and servers, we've run into limitations when put in a high availability environment where multiple hypervisors are used and iSCSI servers are implemented to provide the back end block devices. Since libvirt uses the terms 'node' (for hypervisor) and 'domain' (for virtual), we use that here. We also use the term 'cluster' to mean a group of nodes (hypervisors) responsible for managing multiple domains (virtuals). Limitations of libvirt include * inability to have a central repository of domain definitions to provide consistency across multiple nodes. If a domain definition is modified on one node, it is not synchronized to other nodes in the cluster. * while the ability to use "foriegn" block device providers is possible, it is intricate, requiring additional record keeping and commands to be executed on each node of the cluster to add/remove block devices from them * No safeguards to keep domains from running on multiple nodes (which can result in block device corruption) havirt is a set of scripts to overcome these limitations. The first step is record keeping; knowing which domain is running on which node without having to manually go to each node to record it (which is exactly what havirt does). Our setup: * an NFS share is mounted on each node, preferably at a consistent location. In the following example, it is mounted at /media/shared. This contains the scripts and files used by havirt. In our case, havirt is under a separate directory in the NFS share, with other subdirectoriesused for things like ISO's and images. * nodes in the cluster can make a passwordless ssh connection to any other node in the cluster using public key authentication. INSECURE if any node is compromised, all other nodes can be connected through trivially. Be sure to limit access to all nodes with firewalls and high end authentication. * each node has a /root/.ssh/config file allowing us to access other nodes by a short alias. Installation is simple assuming you have a shared storage area svn co http://svn.dailydata.net/svn/sysadmin_scripts/trunk/virtuals /media/shared/virtuals ln -s /media/shared/virtuals/havirt /usr/local/bin/havirt The directory chosen is self contained; scripts, configuration files and database files are stored in that tree. The file /media/shared/virtuals/havirt.conf can be used to override some of these locations if desired, but the files must be accessible and writable to all nodes in the cluster. === Currently (2024-03-17), record keeping is implemented. The following commands currently exist. havirt node update [node] [node]... # update a given node (or ALL) havirt node list # display tab delimited list of node specs havirt node scan # find domains on all nodes havirt domain update ALL|RUNNING|[domain] [domain]... # update domains havirt node update Gets resources available on node passed in. Issues command 'virsh nodeinfo' on each node, parses the result and populates the definition in var/node.yaml. Adds new entry if one does not exist. havirt node list Generates a tab delimited output of information about all nodes in cluster. havirt node scan scans each node in cluster to determine which domains are currently running on it. Stores information in var/node_population.yaml. This should be run regularly to ensure the database is always up to date. We have it set up on a cron job that runs every 5 minutes. havirt domain update * Parses the config file for the domain (conf/domainname.xml) for some useful information such as VNC port, number of vcpu's and amount of memory, updating this value in var/domain.yaml. * If the config file for a domain does not exist, gets a copy by running virsh xmldump on the appropriate node. * if domain is set to ALL, will do this for all domains already in var/domain.yaml. If domain is set to RUNNING, will scan all nodes for running domains and act on them. NOTE: this does not refresh the config file. I intend to put a 'force' flag in later, but for now, you should remove conf/domainname.xml if you want this file refreshed. havirt domain list Dumps the definition of one or more domains to STDOUT as a tab delimited list of values. === Additional functionality is planned in the near future. NOTE: By default, havirt will simply display a list of commands to be run from the shell, though this can be overriden by a config file change or a command line flag. havirt node maintenanceon nodename Will flag nodename as having maintenance run on it and remove it from the pool. Will then migrate all domains off of node to other nodes in cluster. havirt node maintenanceoff nodename Toggles maintenance flag to off for nodename, allowing it to accept migration/running of domains. Generally followed by havirt cluster balance havirt cluster balance Checks amount of resources used on each node and determine a way to even the resources (memory, vcpu's) out by migrating domains to different nodes. havirt cluster validate Checks all nodes in cluster to ensure A) the same vnets exist B) the same iscsi targets are mounted C) /root/.ssh/authorized_keys contains all other domains D) /root/.ssh/config are the same havirt node iscsiscan scans iscsi on domain[s], adding/removing targets. Generally used after changes made to iSCSI target. havirt domain start domainname [nodename] Will start domain domainname on nodename (or local node) using config file from conf/domainname.xml. Validates the domain is not running on any node before executing a virsh create domainname. havirt domain stop domainname Locates the node domainname is running on and issues a shutdown command. Upon success, sets domain to 'manual' (to override 'keepalive') havirt domain migrate domainname nodename migrates domainname to nodename after verifying enough resources exist on nodename.