DKube Developer’s Guide

This section provides instructions on how to develop code that will integrate with the DKube platform.

File Paths

For IDE & Run jobs, DKube provides a method to access the files in code, data, and model repositories without needing to know the exact folder within the DKube storage hierarchy. The repos are available in the following paths:

Repo Type



Fixed path: /mnt/dkube/workspace


Mount path as described at Mount Path


Mount path as described at Mount Path

The Dataset & Model repos are available at the following paths in addition to the user-configured mount paths:

Repo Type



/mnt/dkube/datasets/<user name>/<dataset name>


/mnt/dkube/models/<user name>/<dataset name>

In the case of Amazon S3 and Amazon Redshift the mount paths also include the metadata files with the endpoint configuration.

Configuration File

A general purpose configuration file can be uploaded into DKube, as described in the following sections:

Job Type



Configuration Screen


Configuration File

The configuration file can be accessed from the code at the follosing location:

/mnt/dkube/config/<config file name>

Home Directory

DKube maintains a home directory for each user, at the location:

/home/<user name>

Files for all user-owned resources are created in this area, including metadata for Runs, IDEs, & Inferences. These can be accessed by an IDE.

The following folders are created within the home directory:


Contains folders for each Code Repo owned by the user. These can be updated from a source git repo, edited and committed back to git repo.


Contains folders for each Dataset Repo owned by the user. Each Dataset folder contains subdirectories for each version with the dataset files for the version.


Contains folders for each Model Repo owned by the user. Each Model directory contains subdirectories for each version with the model files for the version.


Contains metadata for user IDE instances


Contains metadata for user Training Run instances


Contains metadata for user Preprocessing Run instances


Contains metadata for user Inference instances

Amazon S3

DKube has native support for Amazon S3. In order to use this within DKube, a Repo must first be created. This is desribed at Add a Dataset

This section describes how to access the data and integrate it into your program. The mount path for the S3 Dataset repo contains the config.json & credentials files.


  "Bucket": "<bucket name>",
  "Prefix": "<prefix>",
  "Endpoint": "<endpoint>"


aws_access_key_id = xxxxxxx
aws_secret_access_key = xxxxxx

In addition, the path /etc/dkube/.aws contains the metadata and credentials for all of the S3 Datasets owned by the user.


bucket = <bucket name 1>
prefix = <prefix 1>
bucket = <bucket name 2>
prefix = <prefix 2>
bucket = <bucket name 3>
prefix = <prefix 3>


aws_access_key_id = xxxxxxx
aws_secret_access_key = xxxxxxxxx
aws_access_key_id = xxxxxxx
aws_secret_access_key = xxxxxxxxx
aws_access_key_id = xxxxxxx
aws_secret_access_key = xxxxxxxxx

Amazon Redshift

DKube has native support for Amazon Redshift. In order to use Redshift within DKube, a Repo must first be created. This is described at Add a Dataset

This section describes how to access the data and integrate it into your program. Redshift-specific environment variables are listed at Redshift Variables Redshift can be accessed with or without an API server.

Redshift Access Configuration

Redshift Access with an API Server

In order to configure the API server to fetch the metadata, a kubernetes config map is configured with the following information:

echo " apiVersion: v1 kind: ConfigMap metadata: name: redshift-access-info namespace: dkube data: token: $TOKEN endpoint: $ENDPOINT " | kubectl apply -f -




Security token for the API server


url for the API server

DKube fetches the list of databases available and their associated configuration information, such as endpoints and availability region. Additionally, DKube fetches the schemas of the databases from the API server.

Redshift Access without an API Server

By default, DKube will use the following query to fetch the redshift schemas and show them as versions in DKube UI when creating a Dataset.

select * from PG_NAMESPACE;

Accessing the Redshift Data from the Program

Redshift data can be accessed from any Notebook or Run.

The metadata to access the Redshift data for the current job is provided from the Mount Path specified when the Job is created.


  "rs_name": "<name>",
  "rs_endpoint": "<endpoint>",
  "rs_database": "<database-name>",
  "rs_db_schema": "<schema-name>",
  "rs_user": "<user-name>"

Metadata for all of the selected Redshift datasets for the User is available at /etc/dkube/redshift.json for the Job.

     "rs_name": "<name 1>",
     "rs_endpoint": "<endpoint 1>",
     "rs_database": "database-name 1>",
     "rs_db_schema": "<schema-name 1>",
     "rs_user": "<user 1>"
     "rs_name": "<name 2>",
     "rs_endpoint": "<endpoint 2>",
     "rs_database": "database-name 2",
     "rs_db_schema": "<schema-name 2>",
     "rs_user": "<user 2>"

Redshift Password

The password for the Redshift data is stored encrypted within DKube. The code segment below can be used to retrieve the information without encryption.

import os, requests, json def rs_fetch_datasets(): user = os.environ.get("DKUBE_USER_LOGIN_NAME") url =
headers={"authorization": "Bearer "+os.environ.get("DKUBE_USER_ACCESS_TOKEN")} datasets = [] for ds in json.load(open('/etc/dkube/redshift.json')): if ds.get('rs_owner', '') != user: continue resp = requests.get(url % (user, ds.get('rs_name')), headers=headers).json() ds['rs_password'] = resp['data']['datum']['redshift']['password'] datasets.append(ds) return datasets

This will return the datasets in the following format:

[ { "rs_name": "name1", "rs_endpoint": "", "rs_database": "dkube", "rs_db_schema": "pg_catalog", "rs_user": "user", "rs_owner": "owner", "rs_password": "*****" }, .... ]

Mount Path

The mount path provides a way for the code to access the repositories. This section describes the steps needed to enable this access.

Before accessing a dataset, featureset, or model from the code, it needs to be created within DKube, as described at Add a Dataset and Add a Model

This will enable DKube to access the resource. The following image shows a Dataset detail screen for a GitHub dataset that has been uploaded to the DKube storage. It shows the actual folder where the dataset resides.


DKube allows the code to access the Dataset, FeatureSet, or Model without needing to know the exact folder structure through the mount path. When creating an IDE or Run, the mount path field should be filled in to correspond to the code.


Environment Variables

This section describes the environment variables that allow the program code to access DKube-specific information. These are accessed from the program code through calls such as:

EPOCHS = int(os.environ.get(‘EPOCHS’, 5))


The variables and mount paths are available in the file /etc/dkube/config.json

General Variables




API Server REST endpoint


Login user name


JWT token for DKube access


Configuration file specified at Job creation Configuration Screen


Mount path for user-owned resources


Mount path for resources configured for an IDE/Run


Jupyterlab command line arguments containing auth token, base url and home dir, used in entrypoint for Jupyterlab


REST API endpoint for pipelines to authenticate pipeline requests. If not set, pipelines are created without authentication


Type of Job (training, preprocessing, custom, notebook, rstudio, inference, tensorboard)


Unique Job ID


Unique Job UUID


Folder for TensorBoard event files

Variables Passed to Jobs

The user can provide program variables when creating an IDE or Run, as described at Configuration Screen

These variables are available to the program based on the variable name. Some examples of these are shown here.




Number of training steps


Batchsize for training


Number of training epochs

Repo Variables





Storage bucket


URL of server


Verify SSL in S3 Bucket


Request timeout for Tensorflow to storage connection in milliseconds


Connection timeout for Tensorflow to storage connection in milliseconds


Use https (1) or http (0)



Access key


Secret key

Redshift Variables




Redshift dataset metadata for user owned Redshift datasets




Dataset url


Database name


Dataset name


User name


SSL Certificate

Hyperparameter Tuning Variables




Configuration file specified when creating a Run


Unique identifier (uuid)


Objective metric


Count of trial runs


One Convergence provides an SDK to allow direct access to DKube actions. In order to make use of this, the SDK needs to be called at the start on the code. An SDK guide is available at DKube SDK

Writing Code for Metric Logging

Metric & Artifact logging within DKube are handled through MLFlow APIs. The APIs that are supported are defined at:

The following steps are required to save a model and its associated metrics and artifacts:

  • Create and/or set an MLFlow Experiment

  • Start an MLFlow Run

  • Perform the Model training within the MLFlow Run

  • Log the pertinent metrics & artifacts from within that Run

This section provides some code segments that show how to log metrics using Python and Tensorflow/Keras. Other training frameworks, such as scikit-learn, will differ in the details.

Import MLFlow Module

import mlflow

Create & Set MLFlow Experiment

MLFlow Runs execute in an Experiment. The Experiment first needs to be created, then the Experiment needs to be set as the current one. In addition, the output folder for the MLFlow data needs to be created.

# Create and set MLFlow Experiment mlflow.create_experiment(MLFLOW_EXPERIMENT_NAME) mlflow.set_experiment(experiment_name=MLFLOW_EXPERIMENT_NAME) # Output directory for MLFlow OUTPUT_MODEL_DIR = os.getcwd()+"/model_mlflow" os.makedirs(OUTPUT_MODEL_DIR, exist_ok=True)

Define Callback Function

The metric logging happens as part of the training process by using a callback function that will log the metric after each epoch.

# MLFlow metric logging function class loggingCallback(keras.callbacks.Callback): def on_epoch_end(self, epoch, logs=None): mlflow.log_metric("train_loss", logs["loss"], step=epoch)


The callback is required for Tensorflow/Keras. Scikit-learn does not have this requirement.

Train Model & Save Model Data

The model, metrics, and artifacts are saved as part of the MLFlow Run.

  • loggingCallback is called for each epoch to log the metrics

  • After the training is complete, the model is saved with using the Mount Path /model

    • The Mount Path is set during and IDE or Run as described at Mount Path

  • Standard artifacts are provided for the Run in the Runs menu screen by selecting the Run name and the Artifacts tab. Additional artifacts can be saved by using the mlflow.log_artifacts function.

# Train model and save metrics & artifacts with mlflow.start_run(run_name="name") as run:, y, epochs=epochs, verbose=False, callbacks=[loggingCallback()]) # Exporting model & metrics"/model/1") mlflow.log_artifacts(OUTPUT_MODEL_DIR) mlflow.keras.log_model(keras_model=model, artifact_path=None)


The callback is required for Tensorflow/Keras. With scikit-learn you can use the mlflow.log_metric function directly in the training run.

Writing Code for Katib

Katib is a Kubeflow framework that executes hyperparameter optimization during training. More details on this can be found at Introduction to Katib

A description of how to use Katib within DKube is available at Hyperparameters

In order to use Katib, the code must be written to accept the tuning file and to output the metrics in the right format. In this example:

  • Epochs & Learning Rater will be varied

  • Training Loss will be minimized

Read the Katib Tuning Input

The Katib tuning file describes the tuning objective and the hyperparameters that will be varied to determine which combination best achieves that objective. This information is saved by parsing the input arguments. The argparse module is used for this function.

  • The expected hyperparameters are read into arg variables

# Set up parsing for the Katib inputs import argparse parser = argparse.ArgumentParser() parser.add_argument('--epochs', type=int, default=5, help='The number of epochs for training') parser.add_argument('--learning_rate', type=float, default=0.01, help="learning rate for optimizer") args = parser.parse_args() # Set up input hyperparameters NUM_EPOCHS = int(os.environ.get("EPOCHS", args.epochs)) LEARNING_RATE = args.learning_rate

Log the Output Metrics

Katib runs the number of trials set by the tuning file, and after each trial it analyzes the output metric. The metric is read from stdout. The MLFlow logging code was shown in a previous section. In order to enable Katib, the metric also needs to be output after each completed trial.

# MLFlow metric logging function class loggingCallback(keras.callbacks.Callback): def on_epoch_end(self, epoch, logs=None): mlflow.log_metric("train_loss", logs["loss"], step=epoch) # Output metric for Katib to collect from stdout print(f"loss={round(logs['loss'],2)}")


During the the option verbose=False must be set in order to get clean metric outputs that Katib can read

Writing Code for TensorBoard

In order to make use of TensorBoard within DKube, the code needs to be instrumented so that the event files are written to the right folder.

TensorBoard in a Notebook

When running your code within a Notebook, the TensorBoard UI expects the event logs to be in the folder defined by the environmental variable DKUBE_TENSORBOARD_DIR.

Write the Event Files

The TensorBoard event files are written using a callback in the training function. An example of the code to log the event files would look like this:

# Set the folder for Tensorboard event logs DKUBE_TENSORBOARD_DIR = os.environ.get('DKUBE_TENSORBOARD_DIR') # Train model with TensorBoard event logs with mlflow.start_run(run_name="name") as run:, y, epochs=epochs, verbose=False, callbacks=[tf.keras.callbacks.TensorBoard(log_dir=DKUBE_TENSORBOARD_DIR)])

TensorBoard during a Training Run

For a Training Run, the TensorBoard event logs can be in one of 2 places:

  • Folder identified by DKUBE_TENSORBOARD_DIR

  • Folder defined within the Model directory when the Run is created

Write the Event Files

The TensorBoard event files are written using a callback in the training function. An example of the code to log the event files would look like this:

# Set the folder for Tensorboard event logs DKUBE_TENSORBOARD_DIR = "/model/tensorboard" # Train model with TensorBoard event logs with mlflow.start_run(run_name="name") as run:, y, epochs=epochs, verbose=False, callbacks=[tf.keras.callbacks.TensorBoard(log_dir=DKUBE_TENSORBOARD_DIR)])

Kubeflow Pipelines Template

Kubeflow Pipelines provide a powerful mechanism to automate your workflow. DKube supports this capability natively, as described at Kubeflow Pipelines

One Convergence offers templates and examples to make pipeline creation convenient.


One Convergence provides a set of component definition files for the necessary functions needed to create a pipeline within DKube (item 1). The files include:

  • A description of the component

  • A list of inputs and outputs that the component accepts

  • Metadata that allow the component to be run within DKube as a pod

They are located in the folder /mnt/dkube/pipeline/components from within a JupyterLab notebook.

They can also be accessed from the GitHub location DKube Pipeline Components

These components are called by the DSL pipeline description (item 2), and allow the developer to focus on the specific inputs and outputs required by the Job rather than the details of how those fields get translated at the lower levels. The DSL compliler will convert the DSL into a pipeline YAML file, which can be passed to Kubeflow to run.

An example of using the templates to create a pipeline is found at DKube Training

The file pipeline.ipynb uses the template to create a pipeline within DKube.

Custom Container Images


DKube jobs run within container images containing framework and preloaded packages. The image is selected when the Job is created. The image can be from several sources:

  • DKube provides standard images based on the framework, version, and environment

  • An image can be manually created, as explained in this section, and stored within an Image Catalog, described at Images

  • An image can be used from a repo, either directly or after being stored in the Image Catalog

If the standard DKube Docker image does not provide the packages that are necessary for your code to execute, you can create a custom Docker image and use this for IDEs and Runs. There are several different ways that DKube enables the creation and use of custom images.

Manual Image Creation

This section describes the process to build a custom Docker image manually.

Getting the Base Image

In order to create a custom image for DKube, you can start with the standard DKube image for the framework and version, and add the packages that you need. The standard images are available from the Image dropdown field during IDE & Run creation.

Adding Your Packages

In order to add your packages to the standard DKube image, you create a Dockerfile with the packages included. The Dockerfile commands are:

FROM &ltBase Image> RUN pip install &ltPackage>

Building the Docker Image

The new image can be built with the following command:

docker build -t &ltusername>/&ltimage:version> -f &ltDockerfile Name> .

Pushing the Image to Docker Hub

In order to push the image, login to Docker Hub and run the following command:

docker push &ltusername>/&ltimage:version>

Using the Custom Image within DKube

When starting a Run or IDE, select a Custom Container and use the name of the image that was saved in the previous step. The form of the image will be:>/&ltimage:version>

JupyterLab Custom Images

When creating a custom image for use in a JupyterLab notebook within DKube, you must include the steps that provide the jovyan user sudo permissions. This allows that user to install system packages within the notebook.

FROM jupyter/base-notebook:latest

USER root
RUN echo "$NB_USER ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/notebook
USER jovyan
CMD ["sh", "-c", "jupyter lab --ip= --port=8888 --allow-root $DKUBE_NB_ARGS"]


DKube provides an automated method to:

  • Build and push images to a Docker registry based on a code change

  • Execute an automated set of steps through DKube

Basic Setup for CI/CD

In order to make use of the CI/CD feature, certain files need to be created in the Code repo to define the necessary actions.

Setting up the Repository

In order for the CI/CD system to operate, the repository needs to be set up with the files that provide the action instructions. The directory structure should be as follows:

Repository Root
     |--- .dkube-ci.yml

The other folders and files described in this section can be in any folder, since the .dkube-ci.yml file will identify them by their path.

Placement of .dkube-ci.yml

There can be a .dkube-ci.yml file in one, more, or all branches. When a code change is made on a branch, the .dkube-ci.yml for that branch will be used to execute the actions defined for that branch. For CI/CD triggering from the DKube UI, the branch will be specified during the submission.

CI/CD Actions

The CI/CD can be triggered in 2 different ways:

  • From the DKube Builds screen, as described at Images

  • From a GitHub Webhook trigger, described in this section

In both cases, the file .dkube-ci.yml is used by the CI/CD system to find the other necessary files to execute the commands. The general format of the .dkube-ci.yml file is as follows:

&ltDeclaration>: &ltDeclaration-specific Instructions>

The following types of actions are supported by the CI/CD mechanism.




Build and push a Docker image using a Dockerfile


Build and push a Docker image using the Conda environment


Register existing Docker images with DKube


Build other Docker images


Add a DKube Jobs template or run Jobs


Build a Kubeflow component


Compile, Deploy, and Run a Kubeflow pipeline

Folder Path

The path: declaration can have a hierarchical designation. So, for example, if the file is in the hierarchy folder1/folder2, as referenced from the base repository, the path: declaration would have that hierarchy.

Combining Declarations

The declarations can be combined in any order.


The actions from the declarations are run in parallel, except for the Pipeline step, which waits for the components to be built. For others, such as the Jobs: declaration, the image must already have been built and ready for use.

More details on the syntax of the actions are available at CI/CD Examples

Automated Execution Through GitHub Webhook

The CI/CD actions can be triggered automatically through a GitHub repo commit. The actions described above will be performed based on the .dkube-ci.yml file.

The Webhook is set up through the procedure in this section.


The Webhook is set up from the root level of the repository, within the branch that will be used for commits. Select the Settings tab.


Select the Webhook menu item on the left.


Select the Add webhook button on the top right.


The Webhook fields should be filled out as follows:



Payload URL

URL used to access DKube, with /cicd/webhook at the end

Content type


Which events…

Just the push event


Check this when ready to enable the trigger


When the Active checkbox is enabled, every commit to the repo will trigger the CI/CD. Leave this unchecked until you are ready to enable the CI/CD actions.

CI/CD Example

This section provides a basic example to demonstrate how to setup and use the CI/CD capability. This example creates and builds a Docker image. The repository that is used for this example is in the following GitHub repo within the training branch:

Follow the readme instructions to execute the example.

Inference Deployment Requirements

  • Once the training is complete for a DKube model, it can be deployed on a test or production inference server Model Deployment Workflow

  • The model can be deployed with or without a Model Serving Transformer

  • A trained model can be deployed with the default DKube image, or with a custom image that the user can provide as described at Images

Optional Transformer

As described in the section referenced above, a transformer can optionally be included. If the model is deployed with a transformer, the file needs to be written with the following prerequisites:

  • A class should be defined with preprocess and postprocess as member functions

  • The class will take the kfserving.KFModel as an argument, and initialize the predictor host.

An example is provided here:

class Transformer(kfserving.KFModel): def __init__(self, name: str, predictor_host: str): super().__init__(name) self.predictor_host = predictor_host
  • The preprocess function will accept a dictionary containing the data to be processed and return the payload

  • The postprocess function will also accept a dictionary containing the output for the model prediction in a dictionary data structure. The function will return the processed model output.

A main function is required to start. An example of this is:

if __name__ == "__main__": transformer = Transformer(args.model_name, predictor_host=args.predictor_host) kfserver = kfserving.KFServer() kfserver.start(models=[transformer])

For more details refer to the sample file at Example


Currently a transformer component can only be written in Python

Accessing DKube MinIO Server

DKube includes an integrated MinIO server that can be used to supply data to Job executions. For example, the DKube Monitoring examples create synthesized datasets for live data and ground truth and serve them from the DKube MinIO server.


The MinIO server is only available in a full DKube installation

In order to make use of the MinIO server, the access IP and credentials need to be added to a DKube Dataset.

Getting MinIO credentials

The credentials can be obtained from the DKube API server. This is accessed from the DKube access URL in the form:

https://<DKube Access IP>:32222/#/api

This will bring up the DKube API screen.


From within that screen, search for logstore

  • Expand the logstore entry

  • Select Try it Out

  • Execute


This will execute the curl command and provide a response that includes the AccessKey and AccessKeyId

Creating Dataset Repo

The credentials will be used to create a Dataset repo within DKube. Create a Dataset repo by selecting the Datasets menu on the left and selecting + Dataset

Fill in the fields as follows, then Add Dataset





User-selected name for the Dataset



Dataset Source



DKube access URL of form http://<DKube IP Address>:32221

Access Key ID

AccessKeyId from previous step

Secret Access Key

AccessKey from previous step




The Endpoint field needs to be http (not https), and the port is 32221

TensorFlow Deployment

  • DKube’s TensorFlow serving image uses TensorFlow Serving to serve models trained using the Tensorflow framework

  • A TensorFlow trained model should be saved in protobuf format. Other file formats are not supported by Tensorflow serving. Tensorflow’s function can be used to save the trained model in the protobuf format.

  • The model should be saved under a version folder (such as [mount_path}/1]). The save-path follows a convention used by TensorFlow Serving where the last path component (/1 in the example) is a version number for your model. This allows tools like TensorFlow Serving to determine the relative freshness of the model.

  • Refer to TensorFlow Saved Model for more details about saving the model

PyTorch Deployment

  • The DKube PyTorch serving image uses the standard torch.load and the predict methods to load and serve models

  • A PyTorch trained model should be saved with a file. The function can be used to save a PyTorch model into the .pt format. The file name should be only.

  • A is also required to be saved within the same model directory. This defines the model class. An example of this can be found at Example

  • The should have a class name Net, and there should not be any other .py file in the model save directory. If there are other files, the serving execution will provide an exception.

  • Refer to Saving and Loading Models for Inference for more details about saving the model

SKLearn Deployment

  • The Dkube Sklearn serving image uses joblib to the load model and predict.

  • A scikit-learn model should be saved in joblib format and the file name should be model.joblib. Other formats are currently not supported.

  • Refer to Model Persistence for more details about saving the model

Custom Deployment

  • In a custom deployment the model can be saved in any user-specific format

  • A custom deployment will also require a custom serving image which would be user-defined