Installing DKubeX on EKS Cluster using Terraform

The terraform scripts available in the dkubex-tf GitHub repository can be used to bring up an AWS EKS cluster and install DKubeX.

Prerequisites

  • Terraform needs to be installed in your system. If you do not have Terraform installed in your system, follow the steps in the following link to install Terraform.

    Install Terraform

  • Make sure the latest version of AWS CLI, kubectl and Helm >=v3.5.0 is installed.

    Note

    For more information regarding AWS CLI installation, please visit Install or update the latest version of the AWS CLI. For more information regarding Helm installation, go to Helm installation.

  • Use the following command in the terminal to configure your AWS CLI. When prompted, provide your ‘Access Key’ and ‘Secret Key’ from your AWS account. When asked about ‘Region’, provide the AWS region in which the setup is going to be installed.

    aws configure
    
  • Create S3 bucket in your AWS account in the region in which the setup is going to be installed. Under Object Ownership section, select the following options-

    1. ACLs enabled

    2. Object writer

  • Create DynamoDB table in your AWS account in the region in which the setup is going to be installed. Under ‘Partition Key’ section, provide the string ‘LockID’. Create the table with default settings.

  • Run the following command in the terminal to clone the repo:

    git clone https://github.com/dkubeio/dkubex-tf.git
    
  • On your terminal, go into the cloned repo folder in prerequisites section. Make the following change in the backend-dev-<aws-region>.tfvars file, where <aws-region> is the region in which you are creating your setup-

    • In the field bucket, put the name of the bucket that you created earlier.

    • In the field dynamodb_table, put the name of the dynamoDB table you created earlier.

Attention

  • The S3 bucket in AWS is used to store the tf state. It is extremely important for the user to create the S3 bucket and replace the bucket name in the backend-dev-<aws-region>.tfvars file as the terraform scripts do not bring up this bucket, and terraform expects a backend S3 bucket to exist before running terraform init to store the tf state.

  • It is important to create the dynamoDB table and replace its name in the backend-dev-<aws-region>.tfvars file as the terraform scripts do not bring up this table, and terraform expects it to exist before running terraform init to lock the s3 state.

  • Make sure you have access to the DKubeX Dockerhub registry token. This token is needed to pull various container images which are created as part of DKubeX installation.

Note

For more details on installing kubectl in linux, click on ‘installing kubectl <https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/>’_

Initialize terraform

  • In your terminal, go into the dkubex-tf repository folder that was cloned in the Prerequisites section.

  • Initialize terraform backend by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.

    terraform init -backend-config=backend-dev-<aws-region>.tfvars
    

Create a new Terraform Workspace

  • This step can be skipped for the creation of the first cluster, as terraform by default brings up your infrastructure in the ‘default’ workspace. You can verify the current workspace by running the following command:

    terraform workspace show
    

    If you see ‘default’, it means you are in the default workspace, and Terraform will deploy your resources there. However, if you prefer to use a custom workspace name for the first cluster (instead of the default workspace), you can create one manually, by following the same process as for the subsequent clusters, given below.

  • For each cluster, after initializing Terraform, you should create a new workspace to bring up an isolated environment. Each workspace will maintain a separate state file, enabling isolated deployments. Replace <workspace-name> with the desired name (For convenience, name the workspace the same as your EKS cluster to keep things organized and easily identifiable).

    terraform workspace new <workspace-name>
    

Verify and select the workspace

  • Ensure that you are in the correct workspace. You can list all existing workspaces and verify your current one using the following commands:

    terraform workspace list
    terraform workspace select <workspace-name>
    

Installing DKubeX

Configuring Installation

  • Edit dkubex.tf and make the following changes:

    Field

    Description

    eks_cluster_name

    Provide the name of the EKS cluster to be created

    eks_desired_instance_count

    Provide the desired number of worker (CPU) nodes to be created

    eks_max_instance_count

    Provide the maximum number of worker (CPU) nodes that can be scaled up to

    eks_min_instance_count

    Provide the minimum number of worker (CPU) nodes that can be scaled down to

    eks_controlplane_instance_types

    Provide the instance type of the control plane nodes to be created

    eks_worker_instance_types

    Provide the instance type of the worker nodes to be created

    eks_max_gpu_count

    Provide the maximum number of GPU nodes that can be scaled up to

    eks_min_gpu_count

    Provide the minimum number of GPU nodes that can be scaled down to

    eks_worker_ng_gpu_count

    Provide the type and number of GPU nodes to be created

    installer_action

    Set as install

    helmchart_version

    Helm chart version to be used for DKubeX installation, in this case, 0.1.41

    release

    Version of DKubeX to be installed, in this case, 0.9

    flyte_enabled

    Set as true if you want to install Flyte along with DKubeX

    mlflow_multi_user_enabled

    Set as true to enable user-separation and multi-user support in MLFlow. If set to false, all users in same DKubeX setup will be able to access all experiments.

    mlflow_node_enabled

    Set true if you want to set a dedicated node for MLflow

    mlflow_replica_count

    Minimum number of replica node the MLFlow server will have

    mlflow_workers

    Maximum number of worker nodes for MLFlow server

    control_plane_node_enabled

    Set as true if all control place processes like controllers need to run in a dedicated node

    fm_enabled

    Set as true if you want to enable FM-Controller for DKubeX

    kubeflow_enabled

    Set as true if you want to install Kubeflow along with DKubeX

    enable_ldap_server

    Set as true if you want to enable internal ldap server

  • Edit securellm.tfvars and make the following changes:

    Field

    Description

    securellm_action

    Set as install

    securellm_version

    Version of SecureLLM to be installed

    securellm_password

    Set the password for SecureLLM

Create terraform plan

  • Generate terraform plan which will summarise the resources to be created in aws by the scripts by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.

    terraform plan -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvars
    
  • Provide the following necessary details when asked:

    • var.apply_method should be direct

    • var.registry_password should be the DKubeX dockerhub registry password.

Run the scripts

  • Apply terraform scripts which will bring up an EKS setup and install DKubeX. Replace the <aws-region> part in the command with the region you are going to create your setup in.

    terraform apply -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvars
    
  • Provide the following necessary details when asked:

    • var.apply_method should be direct

    • var.registry_password should be the DKubeX dockerhub registry password.

Access DKubeX setup

  • You can access the installed DKubeX setup using loadbalancer ingress. Get the homepage url by running the following command:

    kubectl get svc -n d3x  ingress-nginx-controller -o=go-template --template='{{(index .status.loadBalancer.ingress 0 ).hostname}}
    

Setting up Authentication

Set up authentication for your DKubeX setup by following the steps below.

  • You need to have a pre-created OAuth application.

    Note

    Currently DKubeX supports OAuth App by ADFS, Azure, GitHub, Google, Keycloak and Okta OAuth providers.

  • On the OAuth app, provide the following details.

    Field

    Description

    Example

    Homepage URL

    https://$homepage-url$

    https://1182e8440d4c4d138415baf0d11b362-62d15cf634758ebd.elb.us-west-2.amazonaws.com

    Authorization callback URL

    https://$homepage-url$/oauth2/callback

    https://1182e8440d4c4d138415baf0d11b362-62d15cf634758ebd.elb.us-west-2.amazonaws.com/oauth2/callback

  • Open the admin page of your DKubeX setup by going to the following URL on your browser. Replace the $node-ip$ part with the IP address of the node on which you have installed DKubeX.

    https://$homepage-url$/admin
    

    Note

    For more information regarding the admin page, refer to Admin Guide.

  • On the Auth tab of the admin page, go to the tab corresponding to your OAuth provider and provide the details regarding the OAuth application you have created. Once done, click on the Save button.

    Field

    Description

    Client ID

    The client ID of the OAuth application.

    Client Secret

    The client secret of the OAuth application.

Upgrading DKubeX

Configuring Upgrade

  • Edit dkubex.tf and make the following changes:

    Field

    Description

    installer_action

    Set as upgrade

    installer_version

    Provide a new installer version if the DKubeX release version is not changing during the upgrade.

    release

    Version of DKubeX to be upgraded to

  • Edit securellm.tfvars and make the following changes:

    Field

    Description

    securellm_action

    Set as reinstall

    securellm_version

    Version of SecureLLM to be upgraded to

Create terraform plan

  • Generate terraform plan which will summarise the resources to be created in aws by the scripts by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.

    terraform plan -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvars
    
  • Provide the following necessary details when asked:

    • var.apply_method should be direct

    • var.registry_password should be the DKubeX dockerhub registry password.

Run the scripts

  • Apply terraform scripts which will upgrade DKubeX and SecureLLM. Replace the <aws-region> part in the command with the region you are going to create your setup in.

    terraform apply -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvars
    
  • Provide the following necessary details when asked:

    • var.apply_method should be direct

    • var.registry_password should be the DKubeX dockerhub registry password.

Uninstalling DKubeX

Configuring Uninstall

  • Edit dkubex.tf and make the following changes:

    Field

    Description

    installer_action

    Set as uninstall

  • Edit securellm.tfvars and make the following changes:

    Field

    Description

    securellm_action

    Set as uninstall

Create terraform plan

  • Generate terraform plan which will summarise the resources to be created in aws by the scripts by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.

    terraform plan -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvars
    
  • Provide the following necessary details when asked:

    • var.apply_method should be direct

    • var.registry_password should be the DKubeX dockerhub registry password.

Run the scripts

  • Apply terraform scripts which will uninstall DKubeX and SecureLLM. Replace the <aws-region> part in the command with the region you are going to create your setup in.

    terraform apply -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvars
    
  • Provide the following necessary details when asked:

    • var.apply_method should be direct

    • var.registry_password should be the DKubeX dockerhub registry password.

Destroying the EKS Setup

Caution

This step will destroy the entire EKS setup along with all the resources created in it. Make sure that DKubeX and SecureLLM has been uninstalled along with all the changes done manually on the setup is reverted before running this step.

After uninstalling DKubeX and SecureLLM, you can destroy the EKS setup by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.

terraform destroy -var-file=variables-<aws-region>.tfvars
  • Provide the following necessary details when asked:

    • var.apply_method should be direct

    • var.registry_password should be the DKubeX dockerhub registry password.