Installing DKubeX on EKS Cluster using Terraform¶
The terraform scripts available in the dkubex-tf GitHub repository can be used to bring up an AWS EKS cluster and install DKubeX.
Prerequisites¶
Terraform needs to be installed in your system. If you do not have Terraform installed in your system, follow the steps in the following link to install Terraform.
Make sure the latest version of AWS CLI, kubectl and Helm >=v3.5.0 is installed.
Note
For more information regarding AWS CLI installation, please visit Install or update the latest version of the AWS CLI. For more information regarding Helm installation, go to Helm installation.
Use the following command in the terminal to configure your AWS CLI. When prompted, provide your ‘Access Key’ and ‘Secret Key’ from your AWS account. When asked about ‘Region’, provide the AWS region in which the setup is going to be installed.
aws configureCreate S3 bucket in your AWS account in the region in which the setup is going to be installed. Under Object Ownership section, select the following options-
ACLs enabled
Object writer
Create DynamoDB table in your AWS account in the region in which the setup is going to be installed. Under ‘Partition Key’ section, provide the string ‘LockID’. Create the table with default settings.
Run the following command in the terminal to clone the repo:
git clone https://github.com/dkubeio/dkubex-tf.gitOn your terminal, go into the cloned repo folder in prerequisites section. Make the following change in the backend-dev-<aws-region>.tfvars file, where <aws-region> is the region in which you are creating your setup-
In the field bucket, put the name of the bucket that you created earlier.
In the field dynamodb_table, put the name of the dynamoDB table you created earlier.
Attention
The S3 bucket in AWS is used to store the tf state. It is extremely important for the user to create the S3 bucket and replace the bucket name in the backend-dev-<aws-region>.tfvars file as the terraform scripts do not bring up this bucket, and terraform expects a backend S3 bucket to exist before running terraform init to store the tf state.
It is important to create the dynamoDB table and replace its name in the backend-dev-<aws-region>.tfvars file as the terraform scripts do not bring up this table, and terraform expects it to exist before running terraform init to lock the s3 state.
Make sure you have access to the DKubeX Dockerhub registry token. This token is needed to pull various container images which are created as part of DKubeX installation.
Note
For more details on installing kubectl in linux, click on ‘installing kubectl <https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/>’_
Initialize terraform¶
In your terminal, go into the dkubex-tf repository folder that was cloned in the Prerequisites section.
Initialize terraform backend by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.
terraform init -backend-config=backend-dev-<aws-region>.tfvars
Create a new Terraform Workspace¶
This step can be skipped for the creation of the first cluster, as terraform by default brings up your infrastructure in the ‘default’ workspace. You can verify the current workspace by running the following command:
terraform workspace showIf you see ‘default’, it means you are in the default workspace, and Terraform will deploy your resources there. However, if you prefer to use a custom workspace name for the first cluster (instead of the default workspace), you can create one manually, by following the same process as for the subsequent clusters, given below.
For each cluster, after initializing Terraform, you should create a new workspace to bring up an isolated environment. Each workspace will maintain a separate state file, enabling isolated deployments. Replace <workspace-name> with the desired name (For convenience, name the workspace the same as your EKS cluster to keep things organized and easily identifiable).
terraform workspace new <workspace-name>
Verify and select the workspace¶
Ensure that you are in the correct workspace. You can list all existing workspaces and verify your current one using the following commands:
terraform workspace list terraform workspace select <workspace-name>
Installing DKubeX¶
Configuring Installation¶
Edit dkubex.tf and make the following changes:
Field
Description
eks_cluster_name
Provide the name of the EKS cluster to be created
eks_desired_instance_count
Provide the desired number of worker (CPU) nodes to be created
eks_max_instance_count
Provide the maximum number of worker (CPU) nodes that can be scaled up to
eks_min_instance_count
Provide the minimum number of worker (CPU) nodes that can be scaled down to
eks_controlplane_instance_types
Provide the instance type of the control plane nodes to be created
eks_worker_instance_types
Provide the instance type of the worker nodes to be created
eks_max_gpu_count
Provide the maximum number of GPU nodes that can be scaled up to
eks_min_gpu_count
Provide the minimum number of GPU nodes that can be scaled down to
eks_worker_ng_gpu_count
Provide the type and number of GPU nodes to be created
installer_action
Set as install
helmchart_version
Helm chart version to be used for DKubeX installation, in this case, 0.1.41
release
Version of DKubeX to be installed, in this case, 0.9
flyte_enabled
Set as true if you want to install Flyte along with DKubeX
mlflow_multi_user_enabled
Set as true to enable user-separation and multi-user support in MLFlow. If set to false, all users in same DKubeX setup will be able to access all experiments.
mlflow_node_enabled
Set true if you want to set a dedicated node for MLflow
mlflow_replica_count
Minimum number of replica node the MLFlow server will have
mlflow_workers
Maximum number of worker nodes for MLFlow server
control_plane_node_enabled
Set as true if all control place processes like controllers need to run in a dedicated node
fm_enabled
Set as true if you want to enable FM-Controller for DKubeX
kubeflow_enabled
Set as true if you want to install Kubeflow along with DKubeX
enable_ldap_server
Set as true if you want to enable internal ldap server
Edit securellm.tfvars and make the following changes:
Field
Description
securellm_action
Set as install
securellm_version
Version of SecureLLM to be installed
securellm_password
Set the password for SecureLLM
Create terraform plan¶
Generate terraform plan which will summarise the resources to be created in aws by the scripts by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.
terraform plan -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvarsProvide the following necessary details when asked:
var.apply_method should be direct
var.registry_password should be the DKubeX dockerhub registry password.
Run the scripts¶
Apply terraform scripts which will bring up an EKS setup and install DKubeX. Replace the <aws-region> part in the command with the region you are going to create your setup in.
terraform apply -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvarsProvide the following necessary details when asked:
var.apply_method should be direct
var.registry_password should be the DKubeX dockerhub registry password.
Access DKubeX setup¶
You can access the installed DKubeX setup using loadbalancer ingress. Get the homepage url by running the following command:
kubectl get svc -n d3x ingress-nginx-controller -o=go-template --template='{{(index .status.loadBalancer.ingress 0 ).hostname}}
Setting up Authentication¶
Set up authentication for your DKubeX setup by following the steps below.
You need to have a pre-created OAuth application.
Note
Currently DKubeX supports OAuth App by ADFS, Azure, GitHub, Google, Keycloak and Okta OAuth providers.
On the OAuth app, provide the following details.
Field
Description
Example
Homepage URL
https://$homepage-url$https://1182e8440d4c4d138415baf0d11b362-62d15cf634758ebd.elb.us-west-2.amazonaws.comAuthorization callback URL
https://$homepage-url$/oauth2/callbackhttps://1182e8440d4c4d138415baf0d11b362-62d15cf634758ebd.elb.us-west-2.amazonaws.com/oauth2/callbackOpen the admin page of your DKubeX setup by going to the following URL on your browser. Replace the $node-ip$ part with the IP address of the node on which you have installed DKubeX.
https://$homepage-url$/adminhttps://1182e8440d4c4d138415baf0d11b362-62d15cf634758ebd.elb.us-west-2.amazonaws.com/adminNote
For more information regarding the admin page, refer to Admin Guide.
On the Auth tab of the admin page, go to the tab corresponding to your OAuth provider and provide the details regarding the OAuth application you have created. Once done, click on the Save button.
Field
Description
Client ID
The client ID of the OAuth application.
Client Secret
The client secret of the OAuth application.
Upgrading DKubeX¶
Configuring Upgrade¶
Edit dkubex.tf and make the following changes:
Field
Description
installer_action
Set as upgrade
installer_version
Provide a new installer version if the DKubeX release version is not changing during the upgrade.
release
Version of DKubeX to be upgraded to
Edit securellm.tfvars and make the following changes:
Field
Description
securellm_action
Set as reinstall
securellm_version
Version of SecureLLM to be upgraded to
Create terraform plan¶
Generate terraform plan which will summarise the resources to be created in aws by the scripts by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.
terraform plan -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvarsProvide the following necessary details when asked:
var.apply_method should be direct
var.registry_password should be the DKubeX dockerhub registry password.
Run the scripts¶
Apply terraform scripts which will upgrade DKubeX and SecureLLM. Replace the <aws-region> part in the command with the region you are going to create your setup in.
terraform apply -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvarsProvide the following necessary details when asked:
var.apply_method should be direct
var.registry_password should be the DKubeX dockerhub registry password.
Uninstalling DKubeX¶
Configuring Uninstall¶
Edit dkubex.tf and make the following changes:
Field
Description
installer_action
Set as uninstall
Edit securellm.tfvars and make the following changes:
Field
Description
securellm_action
Set as uninstall
Create terraform plan¶
Generate terraform plan which will summarise the resources to be created in aws by the scripts by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.
terraform plan -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvarsProvide the following necessary details when asked:
var.apply_method should be direct
var.registry_password should be the DKubeX dockerhub registry password.
Run the scripts¶
Apply terraform scripts which will uninstall DKubeX and SecureLLM. Replace the <aws-region> part in the command with the region you are going to create your setup in.
terraform apply -var-file=variables-<aws-region>.tfvars -var-file=securellm.tfvarsProvide the following necessary details when asked:
var.apply_method should be direct
var.registry_password should be the DKubeX dockerhub registry password.
Destroying the EKS Setup¶
Caution
This step will destroy the entire EKS setup along with all the resources created in it. Make sure that DKubeX and SecureLLM has been uninstalled along with all the changes done manually on the setup is reverted before running this step.
After uninstalling DKubeX and SecureLLM, you can destroy the EKS setup by running the following command. Replace the <aws-region> part in the command with the region you are going to create your setup in.
terraform destroy -var-file=variables-<aws-region>.tfvars
Provide the following necessary details when asked:
var.apply_method should be direct
var.registry_password should be the DKubeX dockerhub registry password.