Managing DKube

DKube can be managed from the $HOME/.dkube directory on the master node.

Upgrading DKube

DKube can be upgraded through Helm. The flow is:

  • Get the current values from the installation

  • Upgrade with the same values, but with a new DKube version

The commands are:

helm get values <Release Name> --all | sudo bash -c 'cat - > values-upgrade.yaml' helm upgrade -f values-upgrade.yaml <Release Name> dkube-helm/dkube-deployer --set version=<Upgrade DKube Version> --timeout 1500s

Note

The Release Name is the same name that was used during installation. The complete list of Helm installation Release Names can be obtained using the helm list -a command.


Uninstalling DKube

DKube can be uninstalled using the following command:

helm uninstall <release-name> --timeout 900s

Note

The Release Name is the same name that was used during installation. The complete list of Helm installation Release Names can be obtained using the helm list -a command.

The following command for this should be run from the $HOME/.dkube folder.

sudo ./dkubeadm node cleanup

Reinstalling DKube


Backup and Restore

The DKube database can be backed up and restored from the installation/management node. This function is performed from the $HOME/.dkube directory. Both functions rely on a backup.ini file for configuration, and use the same k8s.ini and dkube.ini files that were edited as part of the installation.

Important

There are some portions of the DKube database that are not backed up. These are explained in the following section.

Backup Exclusions

The following items are not backed up through the DKube backup feature:

  • Any System packages installed from a command line

  • Any files added to folders outside the workdir

Important

In order to save the complete image, it should be pushed to Docker

Editing the backup.ini File

The backup.ini file provides the target for the backup, the backup options, and the credentials where necessary.

_images/backup_ini_R2x.png

Important

The input fields must be enclosed in quotes, as shown in the figure

The “local” backup option will save the snapshot to a folder on the local cluster. Only the following sections should be filled in.

Field

Value

PROVIDER

local

BACKUP_DIR

Target folder on local cluster


Backup

DKube backup takes a snapshot of the database and saves it on the target provider. The backup snapshots can be listed, and restored to a DKube cluster.

Important

There are some items that are not backed up by this process. These are explained in the section Backup Exclusions

Note

For DKube version R2.x only, DKube must be stopped before initiating the backup. The following command is used to stop DKube.

sudo ./dkubeadm dkube shutdown

The backup is initiated by the following command. A backup name must be provided.

sudo ./dkubeadm backup start <name>

List of Snapshots

To see a list of backup snapshots on the current provider, and to identify the ID for each one, the following command is used.

sudo ./dkubeadm backup list

An output similar to the figure below will be shown.

_images/Backup_List.png

Delete Backup Snapshot

A backup snapshot can be removed from the current storage location. This is accomplished by using the ID listed in the backup list.

sudo ./dkubeadm backup delete <ID>

Restore

The DKube restore function will take a DKube database and install it on the cluster.

Important

Restoring the database will overwrite the current database on that cluster

The restore function uses a backup ID to identify which backup to choose. The restore function is initiated by the following command, using the ID from the backup list. It will copy the database to the cluster and start DKube.

sudo ./dkubeadm restore start <ID>

Restore Failure

If the restore fails, the user will be given the option of reverting to the DKube database that was on the cluster prior to the restore.


DKube Migration

The DKube database (Runs, Notebooks, Inferences, Models, Pipelines, etc) can be migrated from one cluster to another. This function is performed from the $HOME/.dkube/cli directory.

Important

The entities must all be in the Stopped state to initiate migration

Editing the Migration.ini File

In order to provide the source and destination information, the migration.ini file must be edited. An example ini file is provided below.

_images/migration_ini.png

Important

The fields in the migration.ini files must be enclosed in quotes, as shown in the example

Field

Value

Name

User-generated name to use for migration tracking

JWT

Authorization token as described below

dkubeurl

IP address of the DKube instance

jobs

List of entities to migrate

Getting the Access Token

The JWT token in the migration.ini file is available from the DKube Operator screen.

_images/Operator_Right_Menu_Developer.png




_images/Operator_Developer_Settings_Popup.png

Executing the Migration

After the migration ini file has been filled in, the command to initate the migration is shown.

sudo ./dkubectl migration start --config migration.ini

The migration will begin, and when the status shows 100% complete the app exits.

_images/Migration_Status.png

If the user wants to execute the migration more than once, one of the following steps must be taken:

  • Edit the migration.ini file to use another name

  • Delete the existing name using the following command.

sudo ./dkubectl migration delete --name<migration name> --config migration.ini

Stop and Start DKube

If DKube needs to be stopped and restarted, the following commands can be used.

Stopping DKube

The following command will stop all activity in DKube:

sudo ./dkubeadm dkube shutdown

Starting DKube

The following command will restart DKube after the cluster is brought back up:

sudo ./dkubeadm dkube start

Restarting DKube After Cluster Restart

When the cluster is restarted after a shutdown, DKube will restart automatically. In order to determine if DKube is operational again, the following command will provide the current status.

sudo ./dkube-infra-reconciler.sh

Docker Status

If DKube does not respond after a few minutes, it is possible that Docker has not restarted properly after the cluster was brought back up. The following command will provide the current status of Docker:

systemctl status docker

If Docker is not active, the following command will restart it:

systemctl restart docker

Managing the Kubernetes Cluster

Adding A100 GPUs to the Cluster

For nodes that include NVIDIA A100 GPUs on them, additional setup needs to be performed for DKube to correctly use them in MIG mode. MIG mode is described in detail at Introduction to MIG

Enable MIG Devices

In order to enable MIG mode for the A100 GPUs, run the following command:

sudo nvidia-smi -mig 1

Note

If the above command asks for a reboot, then reboot the system using the “sudo reboot” command, and rerun the command when the system has restarted

List GPU Instance Profiles

Get a list of the GPU instance profiles by running the following command:

sudo nvidia-smi mig -lgip

This will bring up the information in a form similar to the image below.

_images/A100_GPU_Profiles.png

Create MIG Devices

MIG devices are created using a command of the form:

sudo nvidia-smi mig -cgi <ID>,<ID>,<ID>, ... -C

For example, the following command will create:

  • 3 instances with the MIG profile 1g.5gb, and

  • 2 instances with the MIP profile 2g.10gb

sudo nvidia-smi mig -cgi 19,19,19,14,14 -C

Delete MIG Instances

MIG Instances can be deleted by the following command:

sudo nvidia-smi mig -dci -C ; sudo nvidia-smi mig -dci -C

Changing Nodes or GPUs

Changing nodes or GPUs on the cluster is done through the specific managed k8s platform. The only requirement is that the necessary software drivers be installed on the new node if the platform requires it. This is accomplished based on the k8s platform used.

  • For changes in nodes

    • Edit the $HOME/.dkube/k8s.ini file to add the new nodes as described in this section

    • The other fields should not be changed

    • Ensure that the new node is accessible passwordlessly from the installation node as described in Cluster Access from the Installation Node

  • For either node or GPU changes

    • Run the node setup command as described in Node Setup to install prepare the new nodes

DKube will automatically recognize the new nodes after the setup command.