Wine and Llama-2 Finetuning with SkyPilot¶
This tutorial will guide you regarding how to Sky jobs regarding finetuning Wine and Llama-2 models on DKubeX.
Prerequisites¶
Make sure SkyPilot is configured properly on your DKubeX setup. For reference, check Prerequisites and Configuring SkyPilot on DKubeX.
You need to put the files required to run the SkyPilot job in your workspace. You can use the filebrowser application in DKubeX for this or directly get them from an available repository by the DKubeX CLI.
For the examples provided in this guide, we are using the dkubex-examples repository. This repo contains the files for training the wine and Llama-2 model.
Clone the repository and access the files using the following commands:
git clone -b apps-v2 https://github.com/dkubeio/dkubex-examples.git cd dkubex-examples/sky
Running Skypilot Job¶
Two examples regarding running Skypilot jobs are provided here. To run the example you want, click on the appropriate link provided below.
Example |
Description |
Link |
---|---|---|
Wine Model Finetuning |
This example demonstrates how to finetune a wine model using Skypilot. |
|
Llama2 Finetuning |
This example demonstrates how to finetune a llama2 model using Skypilot. |
Additional Commands¶
Cost Reporting¶
Use this command to provide the cost and duration of the user’s model when tried on different accelerators and help the user examine to locate the best resource to run the model:
d3x sky cost-report
Benchmarking¶
Use this command to launch the benchmark clusters on different accelerators. Replace the <benchmark_name> part with the name you want to provide your benchmark.
For wine model finetuning example, use the following command:
d3x sky bench launch -y -n wine wine-benchmark.yaml --benchmark <benchmark_name>
For llama2 finetuning example, use the following command:
Note
For successful training with benchmark, you need A10 , V100 and T4 accelerators in your cloud you have configured with.
d3x sky bench launch -y -n llama2 llama2-benchmark.yaml --benchmark <benchmark_name>
Use this command to display the benchmark report:
d3x sky bench show <benchmark_name>
Use this command to list the benchmark history:
d3x sky bench ls
Checkpointing¶
For job recovery due to preemptions in Managed Spot jobs, the user application code can checkpoint its progress periodically to a SkyPilot Storage mounted cloud bucket. Hence, The program can reload the latest checkpoint when restarted.
For wine model finetuning example, use the following command:
d3x sky launch spot -y --env MLFLOW_TRACKING_TOKEN=$APIKEY -n wine wine.yaml
For llama2 finetuning example, use the following command:
d3x sky launch spot -y --env MLFLOW_TRACKING_TOKEN=$APIKEY -n llama2 llama2.yaml
Use this command to display the storage bucket where the artifacts are stored at every checkpoint:
d3x sky storage ls