Managing DKube¶

DKube can be managed from the $HOME/.dkube directory on the master node.

Upgrading DKube¶

DKube can be upgraded through Helm. The flow is:

Get the current values from the installation
Upgrade with the same values, but with a new DKube version

The commands are:

helm get values <Release Name> --all | sudo bash -c 'cat - > values-upgrade.yaml'
helm upgrade -f values-upgrade.yaml <Release Name> dkube-helm/dkube-deployer --set version=<Upgrade DKube Version> --timeout 1500s

Note

The Release Name is the same name that was used during installation. The complete list of Helm installation Release Names can be obtained using the helm list -a command.

Uninstalling DKube¶

DKube can be uninstalled using the following command:

helm uninstall <release-name> --timeout 900s

Note

The Release Name is the same name that was used during installation. The complete list of Helm installation Release Names can be obtained using the helm list -a command.

The following command for this should be run from the $HOME/.dkube folder.

sudo ./dkubeadm node cleanup

Reinstalling DKube¶

Uninstall DKube, including the node cleanup ( Uninstalling DKube )
Set up the nodes in the cluster ( Node Setup )
Perform a new installation ( Helm-Based DKube Installation )

Backup and Restore¶

The DKube database can be backed up and restored from the installation/management node. This function is performed from the $HOME/.dkube directory. Both functions rely on a backup.ini file for configuration, and use the same k8s.ini and dkube.ini files that were edited as part of the installation.

Important

There are some portions of the DKube database that are not backed up. These are explained in the following section.

Backup Exclusions¶

The following items are not backed up through the DKube backup feature:

Any System packages installed from a command line
Any files added to folders outside the workdir

Important

In order to save the complete image, it should be pushed to Docker

Editing the backup.ini File¶

The backup.ini file provides the target for the backup, the backup options, and the credentials where necessary.

Important

The input fields must be enclosed in quotes, as shown in the figure

The “local” backup option will save the snapshot to a folder on the local cluster. Only the following sections should be filled in.

Field	Value
PROVIDER	local
BACKUP_DIR	Target folder on local cluster

The AWS backup option will save the snapshot to an AWS-S3 cloud. Only the following sections should be filled in.

Field	Value
PROVIDER	aws-s3
BUCKET	Storage bucket in cloud
AWS_ACCESS_KEY_ID	https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html
AWS_SECRET_ACCESS_KEY	https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html

The MinIO backup option will save the snapshot to a MinIO server. Only the following sections should be filled in.

Field	Value
PROVIDER	minio
BUCKET	Storage bucket in cloud
MINIO_ACCESS_KEY_ID	https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html
MINIO_SECRET_ACCESS_KEY	https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html
MINIO_ACCESS_IP	url of the MinIO server
MINIO_ACCESS_PORT	Port number of the MinIO server

The GCS backup option will save the snapshot to Google Cloud Services. Only the following sections should be filled in.

Field	Value
PROVIDER	gcs
BUCKET	Storage bucket in cloud
GOOGLE_APPLICATION_CREDENTIALS	JSON file containing the GCP private key

Backup¶

DKube backup takes a snapshot of the database and saves it on the target provider. The backup snapshots can be listed, and restored to a DKube cluster.

Important

There are some items that are not backed up by this process. These are explained in the section Backup Exclusions

Note

For DKube version R2.x only, DKube must be stopped before initiating the backup. The following command is used to stop DKube.

sudo ./dkubeadm dkube shutdown

The backup is initiated by the following command. A backup name must be provided.

sudo ./dkubeadm backup start <name>

List of Snapshots¶

To see a list of backup snapshots on the current provider, and to identify the ID for each one, the following command is used.

sudo ./dkubeadm backup list

An output similar to the figure below will be shown.

Delete Backup Snapshot¶

A backup snapshot can be removed from the current storage location. This is accomplished by using the ID listed in the backup list.

sudo ./dkubeadm backup delete <ID>

Restore¶

The DKube restore function will take a DKube database and install it on the cluster.

Important

Restoring the database will overwrite the current database on that cluster

The restore function uses a backup ID to identify which backup to choose. The restore function is initiated by the following command, using the ID from the backup list. It will copy the database to the cluster and start DKube.

sudo ./dkubeadm restore start <ID>

Restore Failure¶

If the restore fails, the user will be given the option of reverting to the DKube database that was on the cluster prior to the restore.

DKube Migration¶

The DKube database (Runs, Notebooks, Inferences, Models, Pipelines, etc) can be migrated from one cluster to another. This function is performed from the $HOME/.dkube/cli directory.

Important

The entities must all be in the Stopped state to initiate migration

Editing the Migration.ini File¶

In order to provide the source and destination information, the migration.ini file must be edited. An example ini file is provided below.

Important

The fields in the migration.ini files must be enclosed in quotes, as shown in the example

Field	Value
Name	User-generated name to use for migration tracking
JWT	Authorization token as described below
dkubeurl	IP address of the DKube instance
jobs	List of entities to migrate

Getting the Access Token¶

The JWT token in the migration.ini file is available from the DKube Operator screen.

_images/Operator_Right_Menu_Developer.png

Executing the Migration¶

After the migration ini file has been filled in, the command to initate the migration is shown.

sudo ./dkubectl migration start --config migration.ini

The migration will begin, and when the status shows 100% complete the app exits.

If the user wants to execute the migration more than once, one of the following steps must be taken:

Edit the migration.ini file to use another name
Delete the existing name using the following command.

sudo ./dkubectl migration delete --name<migration name> --config migration.ini

Stop and Start DKube¶

If DKube needs to be stopped and restarted, the following commands can be used.

Stopping DKube¶

The following command will stop all activity in DKube:

sudo ./dkubeadm dkube shutdown

Starting DKube¶

The following command will restart DKube after the cluster is brought back up:

sudo ./dkubeadm dkube start

Restarting DKube After Cluster Restart¶

When the cluster is restarted after a shutdown, DKube will restart automatically. In order to determine if DKube is operational again, the following command will provide the current status.

sudo ./dkube-infra-reconciler.sh

Docker Status¶

If DKube does not respond after a few minutes, it is possible that Docker has not restarted properly after the cluster was brought back up. The following command will provide the current status of Docker:

systemctl status docker

If Docker is not active, the following command will restart it:

systemctl restart docker

Managing the Kubernetes Cluster¶

Adding A100 GPUs to the Cluster¶

For nodes that include NVIDIA A100 GPUs on them, additional setup needs to be performed for DKube to correctly use them in MIG mode. MIG mode is described in detail at Introduction to MIG

Enable MIG Devices¶

In order to enable MIG mode for the A100 GPUs, run the following command:

sudo nvidia-smi -mig 1

Note

If the above command asks for a reboot, then reboot the system using the “sudo reboot” command, and rerun the command when the system has restarted

List GPU Instance Profiles¶

Get a list of the GPU instance profiles by running the following command:

sudo nvidia-smi mig -lgip

This will bring up the information in a form similar to the image below.

Create MIG Devices¶

MIG devices are created using a command of the form:

sudo nvidia-smi mig -cgi <ID>,<ID>,<ID>, ... -C

For example, the following command will create:

3 instances with the MIG profile 1g.5gb, and
2 instances with the MIP profile 2g.10gb

sudo nvidia-smi mig -cgi 19,19,19,14,14 -C

Delete MIG Instances¶

MIG Instances can be deleted by the following command:

sudo nvidia-smi mig -dci -C ; sudo nvidia-smi mig -dci -C

Changing Nodes or GPUs¶

Changing nodes or GPUs on the cluster is done through the specific managed k8s platform. The only requirement is that the necessary software drivers be installed on the new node if the platform requires it. This is accomplished based on the k8s platform used.

For changes in nodes
- Edit the $HOME/.dkube/k8s.ini file to add the new nodes as described in this section
- The other fields should not be changed
- Ensure that the new node is accessible passwordlessly from the installation node as described in Cluster Access from the Installation Node
For either node or GPU changes
- Run the node setup command as described in Node Setup to install prepare the new nodes

DKube will automatically recognize the new nodes after the setup command.