SkyPilot: Setting up reverse SSH tunneling to run Sky jobs¶
A reverse SSH tunnel is a technique used to securely connect to a client machine (usually behind a firewall or NAT) from a server. It establishes a connection from the client to the server, allowing the server to initiate communication with the client.
In some cases, if the Sky cluster cannot reach the DKubeX cluster, in that case running jobs like ingestion, LLM evaluation, etc cannot be done by SkyPilot as is. In these scenarios setting up a reverse SSH tunnel between the two clusters become necessary.
For example, let’s assume you have run an data ingestion job with this command- d3x dataset ingest -d demodataset --sky-cluster=ingestion-demodataset --dkubex-url https://123.45.67.89:32443/ --dkubex-apikey ${DKUBEX_APIKEY} --config /home/data/ingest.yaml --remote-sky
and you get a Weaviate error. This is because the Sky cluster cannot reach back to your DKubeX cluster. In this case, you will need to set up a reverse SSH tunneling.
Follow the steps provided below to set up a reverse SSH tunnel between the DKubeX and the Sky cluster.
Run the following command and note down the Head IP for the Sky cluster. This will be necessary later.
d3x sky status -ra
On your local terminal, log (SSH) into your DKubeX installer node/DKubeX backend CLI.
Exec into the Sky-Controller pod in the DKubeX backend by running the following command:
kubectl exec -it -n d3x sky-controller-v2-0 -- bash
Get the public key for the Sky cluster and save it as a file named
key-ssh-sky
by running the following command.cat /root/.ssh/sky-key > key-ssh-sky
Set only owner read permission to the
key-ssh-sky
file by running the following command.chmod 400 key-ssh-sky
Run
cat key-ssh-sky
which will show the contents of the file. Copy the entire contents of the file.Run
exit
to exit from the Sky-Controller pod. Here, runvim key-ssh-sky
to create the same file we created in the Sky-Controller pod, paste the content we copied in the previous step, and save the file.Start the reverse SSH tunnel by running the following command. Make sure to replace the
<sky-cluster-ip>
part with the Head IP of the Sky cluster we got earlier. This command will keep on running- do not stop it or close this terminal.ssh -NT -R 32443:localhost:32443 ubuntu@<sky-cluster-ip> -i key-ssh-sky
ssh -NT -R 32443:localhost:32443 ubuntu@98.76.54.32 -i key-ssh-sky
Open another local terminal and log (SSH) into your DKubeX installer node/DKubeX backend CLI.
Log (SSH) into your Sky cluster using the file
key-ssh-sky
we created earlier by running the following command. Make sure to replace the<sky-cluster-ip>
part with the Head IP of the Sky cluster we got earlier.ssh ubuntu@<sky-cluster-ip> -i key-ssh-sky
ssh ubuntu@98.76.54.32 -i key-ssh-sky
Verify reverse proxy and port mapping by running the following command:
curl -k https://localhost:32443/mlflow
If the response given by running the command is similar to the following, reverse tunneling is successful.
<html> <head><title>302 Found</title></head> <body> <center><h1>302 Found</h1></center> <hr><center>nginx</center> </body> </html>
Go back to your DKubeX workspace CLI and run the same command you used to run the job earlier with one change:
In the
--dkubex-url
flag instead of passing your DKubeX setup URL, usehttps://localhost:32443
For example,
d3x dataset ingest -d demodataset --sky-cluster=ingestion-demodataset --dkubex-url https://localhost:32443 --dkubex-apikey ${DKUBEX_APIKEY} --config /home/data/ingest.yaml --remote-sky
Now your Sky job will run properly.
Attention
You need to set up the reverse SSH tunnel again in the following scenarios:
If the Sky cluster is downed/autodowned.
If you are creating a new Sky cluster and want to run jobs on it.