Iteratively Developing a Project
Iteratively Developing a Project¶
This page shows a typical workflow for iteratively developing and running a project on SkyPilot.
Getting an interactive node¶
Interactive nodes are easy-to-spin-up VMs that enable fast development and interactive debugging.
To provision a GPU interactive node named
$ # Provisions/reuses an interactive node with a single K80 GPU. $ sky gpunode -c dev --gpus K80
See the CLI reference for all flags such as changing the GPU type and count.
To run a command or a script on the cluster, use
$ # If the user has written a task.yaml, this directly $ # executes the `run` section in the task YAML: $ sky exec dev task.yaml $ # Run a script inside the workdir. $ # Workdir contents are synced to the cluster (~/sky_workdir/). $ sky exec dev -- python train_cpu.py $ sky exec dev --gpus=V100:1 -- python train_gpu.py
Alternatively, the user can directly
ssh into the cluster’s nodes and run commands:
$ # SSH into head node $ ssh dev $ # SSH into worker nodes $ ssh dev-worker1 $ ssh dev-worker2
SkyPilot provides easy password-less SSH access by automatically creating entries for each cluster in
Referring to clusters by names also allows for seamless integration with common tools
rsync, and Visual Studio Code Remote.
Refer to Syncing Code and Artifacts for more details on how to upload code and download outputs from the cluster.
Ending a development session¶
To end a development session:
$ # Stop at the end of the work day: $ sky stop dev $ # Or, to terminate: $ sky down dev
To restart a stopped cluster:
$ # Restart it the next morning: $ sky start dev
Stopping a cluster does not lose data on the attached disks (billing for the instances will stop while the disks will still be charged). Those disks will be reattached when restarting the cluster.
Terminating a cluster will delete all associated resources (all billing stops), and any data on the attached disks will be lost. Terminated clusters cannot be restarted.