Finetuning Wine model using Skypilot Job¶
This example shows how to run a Skypilot job to finetune a Wine model. For the prerequisites, please click on the following link: Prerequisites.
Follow the steps given below to finetune a wine model using Skypilot.
Edit the .yaml file in the sky folder in the cloned repo:
Replace the MLFLOW_TRACKING_URL with the user’s DKubeX URL.
The Experiment name can be of the user’s choice.
The user can use one of three options to launch the resources:
For wine model finetuning example, use the following command:
d3x sky launch -y --env MLFLOW_TRACKING_TOKEN=$APIKEY -n wine-1 wine.yaml
This process seeks out the most cost-effective, on-demand computing resource available and initiates the task when it identifies it. This basically reserves the VM and the user must terminate it on their own.
For wine model finetuning example, use the following command:
d3x sky launch -y --env MLFLOW_TRACKING_TOKEN=$APIKEY --use-spot -n wine-1 wine.yaml
This process involves scanning for the cheapest spot instance across all available zones in a region and launching the job. However, these instances may be reclaimed, leading to the user’s task’s termination.
For wine model finetuning example, use the following command:
d3x sky spot launch -y --env MLFLOW_TRACKING_TOKEN=$APIKEY -n wine-1 wine.yaml
Managed spot jobs ensure uninterrupted work by automatically restarting your task on a different VM if preempted, via SkyPilot, which periodically saves checkpoints for seamless job resumption.
To monitor the status of the clusters, run the command:
d3x sky status
Once the training is completed, run the command:
d3x sky down -y $cluster_name$
Note
User can add -i 10 –down argument in the command which will bring down skypilot cluster after 10 minutes of inactivity (time can be changed if one need cluster to stay up longer between experiments). This can be removed or edited as required.
To see more commands regarding running and managing Skypilot jobs, please click on the following link: Additional Commands.