Kubernetes Cluster Setup


This is a guide for cluster administrators on how to set up Kubernetes clusters for use with SkyPilot.

If you are a SkyPilot user and your cluster administrator has already set up a cluster and shared a kubeconfig file with you, Submitting tasks to Kubernetes explains how to submit tasks to your cluster.

SkyPilot’s Kubernetes support is designed to work with most Kubernetes distributions and deployment environments.

To connect to a Kubernetes cluster, SkyPilot needs:

  • An existing Kubernetes cluster running Kubernetes v1.20 or later.

  • A Kubeconfig file containing access credentials and namespace to be used.

Deployment Guides

Below we show minimal examples to set up a new Kubernetes cluster in different environments, including hosted services on the cloud, and generating kubeconfig files which can be used by SkyPilot.

Deploying locally on your laptop

To try out SkyPilot on Kubernetes on your laptop or run SkyPilot tasks locally without requiring any cloud access, we provide the sky local up CLI to create a 1-node Kubernetes cluster locally.

Under the hood, sky local up uses kind, a tool for creating a Kubernetes cluster on your local machine. It runs a Kubernetes cluster inside a container, so no setup is required.

  1. Install Docker and kind.

  2. Run sky local up to launch a Kubernetes cluster and automatically configure your kubeconfig file:

    $ sky local up
  3. Run sky check and verify that Kubernetes is enabled in SkyPilot. You can now run SkyPilot tasks on this locally hosted Kubernetes cluster using sky launch.

  4. After you are done using the cluster, you can remove it with sky local down. This will terminate the KinD container and switch your kubeconfig back to it’s original context:

    $ sky local down


We recommend allocating at least 4 or more CPUs to your docker runtime to ensure kind has enough resources. See instructions here.


kind does not support multiple nodes and GPUs. It is not recommended for use in a production environment. If you want to run a private on-prem cluster, see the section on on-prem deployment for more.

Deploying on Google Cloud GKE

  1. Create a GKE standard cluster with at least 1 node. We recommend creating nodes with at least 4 vCPUs.

  2. Get the kubeconfig for your cluster. The following command will automatically update ~/.kube/config with new kubecontext for the GKE cluster:

    $ gcloud container clusters get-credentials <cluster-name> --region <region>
    # Example:
    # gcloud container clusters get-credentials testcluster --region us-central1-c
  3. [If using GPUs] If your GKE nodes have GPUs, you may need to to manually install nvidia drivers. You can do so by deploying the daemonset depending on the OS of your nodes:

    # For Container Optimized OS (COS) based nodes:
    $ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
    # For Ubuntu based nodes:
    $ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml

    To verify if GPU drivers are set up, run kubectl describe nodes and verify that nvidia.com/gpu is listed under the Capacity section.

  4. Verify your kubeconfig (and GPU support, if available) is correctly set up by running sky check:

    $ sky check


GKE autopilot clusters are currently not supported. Only GKE standard clusters are supported.

Deploying on Amazon EKS

  1. Create a EKS cluster with at least 1 node. We recommend creating nodes with at least 4 vCPUs.

  2. Get the kubeconfig for your cluster. The following command will automatically update ~/.kube/config with new kubecontext for the EKS cluster:

    $ aws eks update-kubeconfig --name <cluster-name> --region <region>
    # Example:
    # aws eks update-kubeconfig --name testcluster --region us-west-2
  3. [If using GPUs] EKS clusters already come with Nvidia drivers set up. However, you will need to label the nodes with the GPU type. Use the SkyPilot node labelling tool to do so:

    python -m sky.utils.kubernetes.gpu_labeler

    This will create a job on each node to read the GPU type from nvidia-smi and assign a skypilot.co/accelerator label to the node. You can check the status of these jobs by running:

    kubectl get jobs -n kube-system
  4. Verify your kubeconfig (and GPU support, if available) is correctly set up by running sky check:

    $ sky check

Deploying on on-prem clusters

You can also deploy Kubernetes on your on-prem clusters using off-the-shelf tools, such as kubeadm, k3s or Rancher. Please follow their respective guides to deploy your Kubernetes cluster.

Setting up GPU support

If your Kubernetes cluster has Nvidia GPUs, make sure you have the Nvidia device plugin installed (i.e., nvidia.com/gpu resource is available on each node). Additionally, you will need to label each node in your cluster with the GPU type. For example, a node with v100 GPUs must have a label skypilot.co/accelerators: v100.

We provide a convenience script that automatically detects GPU types and labels each node. You can run it with:

$ python -m sky.utils.kubernetes.gpu_labeler

Created GPU labeler job for node ip-192-168-54-76.us-west-2.compute.internal
Created GPU labeler job for node ip-192-168-93-215.us-west-2.compute.internal
GPU labeling started - this may take a few minutes to complete.
To check the status of GPU labeling jobs, run `kubectl get jobs --namespace=kube-system -l job=sky-gpu-labeler`
You can check if nodes have been labeled by running `kubectl describe nodes` and looking for labels of the format `skypilot.co/accelerators: <gpu_name>`.


GPU labelling is not required on GKE clusters - SkyPilot will automatically use GKE provided labels. However, you will still need to install drivers.


If the GPU labelling process fails, you can run python -m sky.utils.kubernetes.gpu_labeler --cleanup to clean up the failed jobs.

Once the cluster is deployed and you have placed your kubeconfig at ~/.kube/config, verify your setup by running sky check:

$ sky check

Observability for Administrators

All SkyPilot tasks are run in pods inside a Kubernetes cluster. As a cluster administrator, you can inspect running pods (e.g., with kubectl get pods -n namespace) to check which tasks are running and how many resources they are consuming on the cluster.

Additionally, you can also deploy tools such as the Kubernetes dashboard for easily viewing and managing SkyPilot tasks running on your cluster.

Kubernetes Dashboard

As a demo, we provide a sample Kubernetes dashboard deployment manifest that you can deploy with:

$ kubectl apply -f https://raw.githubusercontent.com/skypilot-org/skypilot/master/tests/kubernetes/scripts/dashboard.yaml

To access the dashboard, run:

$ kubectl proxy

In a browser, open http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ and click on Skip when prompted for credentials.

Note that this dashboard can only be accessed from the machine where the kubectl proxy command is executed.


The demo dashboard is not secure and should not be used in production. Please refer to the Kubernetes documentation for more information on how to set up access control for the dashboard.