1.5 Troubleshooting General VM Management Issues

The following sections provide solutions to the problems you might encounter while working with general VM management operations:

Volume Tools Hang While Scanning a Suspended Device

Source: Scanned device.
Explanation: When a mapped device is in a suspended state, volume tools such as vgscan, lvscan, and pvscan hang. If the vmprep job is run on such a device, it throws an error such as the following to alert you to the condition:
vmquery: /var/adm/mount/vmprep.df8fd49401e44b64867f1d83767f62f5: Failed to
mount vm image "/mnt/nfs_share/vms/rhel4tmpl2/disk0": Mapped device
/dev/mapper/loop7p2 appears to be suspended. This might cause scanning for
volume groups (e.g. vgscan) to hang.
WARNING! You may need to manually resume or remove this mapped device (e.g.
dmsetup remove /dev/mapper/loop7p2)!
Action: Because of this behavior, we recommend against using LVM and similar volume tools on a virtual machine managed by Orchestration Services.

SUSE Linux VMs Might Attempt To Partition a Read-only Device

Source: YaST Partitioner.
Explanation: When you build a SUSE Linux VM and specify a read-only virtual device for that VM, in some instances the YaST partitioner might propose a re-partitioning of the read-only virtual device.
Possible Cause: Although Xen normally attempts to notify the guest OS kernel about the mode (ro or rw) of the virtual device, under certain circumstances the YaST partitioner proposes a re-partitioning of the virtual device that has the most available disk space without considering the other device attributes. For example, if a specified CD-ROM device happens to be larger than the specified hard disk device, YaST attempts to partition the CD-ROM device, which causes the VM installation to fail.
Action: To work around this issue, connect a VNC console to the VM being built during the first stage of the VM install, then verify the partition proposal before you continue with the installation. If the partition proposal has selected an incorrect device, manually change the selected device before you continue with the installation of the VM.

RHEL 5 VMs Running the Kudzu Service Do Not Retain Network Interface Changes

Source: Kudzu service.
Explanation: Anytime you modify the hardware configuration (for example, changing the MAC address or adding a network interface card) of a RHEL 5 VM that is running the Kudzu hardware probing library, the VM does not retain the existing network interface configuration.
Possible Cause: When you start a RHEL 5 VM, the Kudzu service recognizes the hardware changes at boot time and moves the existing configuration for that network interface to a backup file. The service then rewrites the network interface configuration to use DHCP instead.
Action: To work around this problem, disable the Kudzu service within the RHEL VM by using the chkconfig --del kudzu command.

Policies Applied to VM Resources Are Deleted

Source: VM clones awaiting provisioning.
Explanation: Provisioning code requires that VMs and VM clones be standalone (that is, they are removed from a template dependency and are no longer considered to be “linked clones”).
Possible Cause: VMs in PlateSpin Orchestrate 2.5 and later must be made standalone to receive and retain associated policies.
Action: Apply a conditional policy to the parent template that can be applied to the clones while they are running. Depending upon the facts set on the clone, the inherited VM host constraint can be conditionally applied to the clone.

The following is an example of a conditional policy that you could apply to the VM template to restrict vmhost based on resource attributes (group membership, etc.).

<policy>
    <constraint type="vmhost">
        <if>
            <contains fact="resource.groups" value="exclude_me"
                      reason="Only apply this vmhost constraint to resources NOT in exclude_me resource group" >
            </contains>
            <else>
                <if>
                    <defined fact="resource.some_boolean_fact" />
                    <eq fact="some_boolean_fact" value="true" />
                    <then>
                        <contains fact="vmhost.resource.groups" value="first_vmhost_group"
                                reason="When a resource is not in the exclude_me group, when some_ boolean_fact is true,
                                        provision to a vmhost in the first_vmhost_group"/>
                    </then>
                    <else>
                        <if>
                            <defined fact="resource.some_other_boolean_fact" />
                            <eq fact="some_other_boolean_fact" value="true" />
                            <not>
                                <and>
                                    <eq fact="resource.id" value="never_use_this_resource"
                                      reason="Specifically exclude this resource from consideration." />
                                    <or>
                                        <eq fact="vmhost.cluster"
                                            factvalue="resource.provision.vmhost.cluster" />
                                        <eq fact="vmhost.cluster"
                                            factvalue="resource.provision.vmhost" />
                                    </or>
                                </and>
                            </not>
                            <then>
                                <contains fact="vmhost.resource.groups" value="another_vmhost_group"
                                        reason="When a resource is not in the exclude_me group, when some_ boolean_fact is false,
                                                and some_other_boolean_fact is true, (but also not some other things),
                                                provision to a vmhost in another_vmhost_group"/>
                            </then>
                        </if>
                    </else>
                </if>
            </else>
        </if>
    </constraint>
</policy>

VMs Provisioned from a VM Template Are Not Restarted When a VM Host Crashes

Source: VM host with VMs provisioned from a template.
Explanation: If a VM host crashes, VMs that were provisioned from a template on that host are not restarted on another active VM host. Instead, the Orchestration Server provisions another VM cloned from the original template, on the next available host. The disk files of the original clone are not destroyed (that is, “cleaned up”) after the crash, but the original VM template files are destroyed.

If a Discover Repository action is issued before the cloned VM is deleted from the crashed host, the Orchestration Server creates a new VM object with the zombie_ string prepended to the VM object name.

Possible Cause: While hosting a provisioned clone, VM host crashed or the Orchestration Agent on that host went offline.
Action: To work around this issue, you can either remove the VM from the file system before the Orchestration Server rediscovers it, or you can issue a Destroy action on the discovered “zombie” VM.

Admin Password on Windows 2003/2008 Workloads Cannot Be Set by Users

Source: Windows 2003/2008 workloads in the Cloud Manager Web Console accessed by users.
Explanation: In order for a user to set the Administrator pass when configuring a Windows 2003/2008 workload, the VM template (from which the workload is created) must not have an Administrator password set.
Action: To leave the Administrator password unset on a VM template, you must turn off the complex password setting in the password policy.

Unable to Provision a VM to Another Cluster Node Due to Reason “VM Networks Are Not Available”

Source: The Orchestration Console
Explanation: A prerequisite for clustering is that every node contained within a cluster should be symmetric. That is, every node in a cluster should have visibility to all the networks and storage provided by the cluster. In this case, because a VM host cluster must be able to place the VM on any node in the cluster, the networks shown as being available to that cluster are the intersection of all the networks available on the VM host nodes that are members of the cluster (see the vmhost.networks fact on the cluster object).
Action: Reconfigure each of the cluster nodes to provide the networks required by the VM host cluster and re-run the Discover Hosts action.

Alternatively, you can reconfigure the VM to use another network available to all cluster nodes. After you choose a new network(s) configuration for a VM, make sure you run the Save Config action to commit these changes to the VM configuration.

When you reconfigure the networks on a VM, at least one network option, all, is available. This option designates that any network can be suitable for VM placement. Choosing this option allows the network constraint to pass, and the provisioning adapter is then responsible for configuring a new network as it sees fit.

If Multiple Workloads are Cloned Simultaneously, They are not Load Balanced Across Repositories

Source: The Orchestration Server
Explanation: When multiple workloads are being cloned at the same time, the cloning process looks at the current state of the storage repositories to determine which repository should be used.
Possible Cause: With multiple asynchronous cloning processes running concurrently, the utilization of the repository does not reflect the state at the completion of other running clone processes. This leads to one repository being identified as the preferred repository until prior cloning jobs have finished running.
Action: Perform a single cloning operation at a time in order to achieve true load balancing, or be aware that multiple cloning operations can result in workload distributions between repositories that are not truly load balanced.

Block Disks Show up as Regular Vdisks in Orchestrate

Source: The Orchestration Client.
Explanation: The first time discovery is run after adding a new block device to a VM, the block device is marked as a regular vdisk in the repository.
Possible Cause: The VM discovery took place before the pdisk was discovered, and Orchestrator hasn’t matched them up.
Action: Run the VM discovery process a second time.