List of LLMs in DKubeX LLM Catalog

Serving Models and Open-source LLMs

The following list contains the base LLMs which are currently registered with the DKubeX LLM Catalog on the current version of DKubeX (DKubeX v0.8.6.3). These LLMs can be deployed on the DKubeX platform without using any config files provided the resource and permission requirements are met. You can deploy these LLMs on DKubeX using local resources or using cloud resources using SkyPilot.

LLMs registered on the DKubeX LLM Registry

LLM Name

Accelerator Type

Provider

Deployment Config

Number of GPUs needed

casperhansen/llama-3-70b-instruct-awq

A10

dkubex

cpu_offload_gb: 25, gpu-memory-utilization: 0.9, max-model-len: 4096, max-num-batched-tokens: 8192, max-num-seqs: 64, max_total_tokens: 4096

1

meta/llama-3.1-405b-instruct

A100

nim

16

meta/llama-3.1-70b-instruct

A100

nim

4

meta/llama-3.1-8b-base

A10G

nim

4

meta/llama-3.1-8b-instruct

A10G

nim

4

meta/llama3-70b-instruct

A100

nim

4

meta/llama3-8b-instruct

A10G

nim

1

meta-llama/Meta-Llama-3-8B-Instruct

A10

dkubex

gpu-memory-utilization: 0.9, max-model-len: 4096, max-num-batched-tokens: 8192, max-num-seqs: 64, max_total_tokens: 4096

1

meta-llama/Meta-Llama-3.1-8B-Instruct

A10

dkubex

gpu_memory_utilization: 0.9, max-model-len: 4096, max-num-batched-tokens: 8192, max-num-seqs: 64, swap-space: 10, max_total_tokens: 8192

1

microsoft/Phi-3-small-8k-instruct

A10

dkubex

gpu-memory-utilization: 0.85, max-model-len: 4096, max-num-batched-tokens: 8192, max-num-seqs: 64, max_total_tokens: 2048

1

mistralai/mistral-7b-instruct-v03

A100

nim

1

mistralai/Mistral-7B-Instruct-v0.2

A10

dkubex

gpu-memory-utilization: 0.9, max-model-len: 4096, max-num-batched-tokens: 8192, max-num-seqs: 64, swap-space: 10, max_total_tokens: 8192

1

mistralai/mixtral-8x22b-instruct-v01

A100

nim

8

mistralai/mixtral-8x7b-instruct-v01

A100

nim

4

mistralai/Pixtral-12B-2409

A10

dkubex

config-format: mistral, gpu-memory-utilization: 0.9, load_format: mistral, max-model-len: 4096, tensor-parallel-size: 4, tokenizer-mode: mistral, max_total_tokens: 4096

1

nvidia/nemotron-4-340b-instruct

A100

nim

16

tokyotech-llm/llama-3-swallow-70b-instruct-v0.1

A10G

nim

8

yentinglin/llama-3-taiwan-70b-instruct

A10G

nim

8

These LLMs can be deployed on the DKubeX platform without using any config files provided the resource and permission requirements are met. You can deploy these LLMs on DKubeX using local resources or using cloud resources using SkyPilot. For more information on deploying LLMs, go to the appropriate page provided below.

Deploying LLMs using Local Resources
./serving/llm_deploy.html
Deploying LLMs using SkyPilot
./skypilot/llm-deployment-with-skypilot.html