vSphere#

This document is provided for users who use VMware vSphere provider and helps them set up basic configurations on VMware vSphere to meet the needs of running SkyPilot tasks.

Prepare Category & Tag#

The Categories and Tags is needed when using the vSphere provider, please follow bellow steps to create them.

Create the Category:

  1. Open web browser and Login your vSphere Client.

  2. Navigate to Menu -> Tags & Custom Attributes.

vSphere Catagory Creation Navigate
  1. Click Tags -> CATEGORIES -> NEW.

vSphere Catagory Creation Navigate New
  1. Fill in the Create Category form with follow content:

Category Name: skypilot
Tags Per Object: Many tags
Associable Object Types: Datastore, Content Library, Library Item, VirtualMachine
vSphere Catagory Creation
  1. Click the CREATE to finish the creation.

Create the Tag:

  1. Open web browser and Login your vSphere Client.

  2. Navigate to Menu -> Tags & Custom Attributes.

vSphere Catagory Creation Navigate
  1. Click Tags -> TAGS -> NEW.

vSphere Catagory Creation Navigate
  1. Fill in the Create Tag form with follow content:

Name: skypilot
Category: skypilot
vSphere Tags Creation
  1. Click the CREATE to finish the creation.

Create VM Storage Policies#

The vSphere provider depends on the VM Storage Policies to place the VM. A Shared Datastore is recommended.

  1. Open web browser and Login your vSphere Client.

  2. Navigate to Menu -> Inventory.

Datastore Add Tag
  1. Click the Datastore Icon.

    Select the datastore eligible for VM creation, assign the previously created skypilot tag to them.

Datastore Add Tag
  1. Navigate to Menu -> Policies and Profiles.

Storage Policy Navigate
  1. Select VM Storage Policies , and click CREATE button.

Storage Policy Navigate New
  1. In the Name and description step, specify the name as skypilot_policy.

Policy Name
  1. In the Policy structure step, select Enable tag based placement rules.

Policy Rule
  1. In the Tag based placement step, select the tag skypilot.

Policy Tags
  1. In the Storage compatibility step, review the datastores.

Policy Review
  1. Review and Click the FINISH to create the policy.

Prepare VM image#

The VM must be Linux-based; we take Ubuntu 20.04 as the base OS in this document. Other Linux distributions may also works but not promised.

Step 1. Prepare a linux-based VM.

Open web browser and Login your vSphere Client, create a linux-based Virtual Machine.

If you’re unfamiliar with the process of creating a virtual machine, please refer to this guide for assistance: Deploying Virtual Machines

Step 2. Create a user and Enable password-less sudo.

Login your prepared VM, Create a user named ubuntu, and add to the sudo group.

sudo adduser ubuntu
sudo usermod -aG sudo ubuntu

Edit the sudoer file to enable passwordless sudo.

sudo visudo

Add the following line:

ubuntu   ALL=(ALL:ALL) NOPASSWD: ALL

Step 3. Check VMtools version.

Execute the following command to check VMtools version 10.1.0 or later is required.

vmware-toolbox-cmd -v

Step 4. Enabling Custom Scripts in VMtools.

Execute the following command to check if the enable-custom-scripts option is enabled:

sudo vmware-toolbox-cmd config get deployPkg enable-custom-scripts

If the enable-custom-scripts option is disabled, enable it by executing:

sudo vmware-toolbox-cmd config set deployPkg enable-custom-scripts true

Step 5. Install the GPU driver.

Different GPUs require different drivers; please choose the correct driver based on your GPU mode and OS version. Take Ubuntu and Nvidia GPU for example, the following driver works for most of the case:

sudo apt install nvidia-headless-535-server --no-install-recommends

Step 6. Convert the VM to template.

  1. Power off the VM.

  2. Select the VM and right click -> Clone-> Clone as Template to Library.

VM Clone To Temple
  1. Select Template type: OVF, provide the template name and click Next.

VM Clone To Temple OVF
  1. Select a Content Library location and click Next.

    Create one if you haven’t set up a Local content library. Please refer to Create a Library

VM Clone To Temple Content Lib
  1. Review and click Finish.

Step 7. Tag the VM template.

Follow the steps bellow to tag the VM template:

  1. Navigate to Menu -> Content Libraries.

Content Libs
  1. Review the content libraries and click on the one that contains your target VM templates.

Content Libs
  1. Review the content library items and click on your target VM template.

Content Lib Item
  1. Assign the relevant tags to the VM template.

    If you no sure how to assign the tags, please refer to Rules for tagging VM template

Content Lib Item Taging

References#

Rules for tagging VM template

The vSphere driver for SkyPilot uses the Tag to identify the VM images for GPU. VI admin needs to tag the VM template accordingly in the Content Library. Here are a few rules for tagging templates:

  • All workloads that do not need an accelerator will use the VM image tagged with skypilot-cpu. If you have multiple VM templates (Content Item) tagged with skypilot-cpu only the first one will be used, so the best practice is only tagging one image with skypilot-cpu.

  • Vi Admin should create a default VM Image for each accelerator vendor and tag it with the format skypilot-vendorname e.g., skypilot-nvidia. The fallback logic is if the user requires a specific Nvidia GPU(e.g., k1200) and the vSphere driver cannot find an image with tag skypilot-K1200 it will use the VM image tagged with skypilot-nvidia.

  • All the tags should select the Tag category as: skypilot.

Support new GPU

The vSphere cloud provider has a default support GPU list. You can first check the default one by executing the sky check command. Then the GPU list can be found in the ~/.sky/catalogs/v5/vsphere/accelerators.csv file. If you want to support a new GPU, say Nvidia K1200 you can add one more line at the end of the file:

Model,Type,MemoryMB,vCPUs,fullNames
...
K1200,GPU,4096,4,['GM107GL [Quadro K1200]']

If the GPU needs a particular driver, ask the VI admin to create a new VM template and upload it to the Content Library, then add a tag to the VM template, e.g., GPU-k1200. If the new default VM template’s driver supports the new GPU, you will not need to create a new one. For example, if the VM template with the Tag skypilot-nvidia supports the new GPU, then you will not need to create a new VM template.