LiteLLM Kubernetes Production Deploy and Config

2026.02.27
Technology
655 Words
LiteLLM Kubernetes Production Deploy and Config

Part 2 of 4. In Part 1 we covered the architecture and prerequisites. Here we deploy LiteLLM step by step. Continue to Part 3: Configuring AI Tools and Providers.

LiteLLM is an open-source AI proxy that exposes a single OpenAI-compatible API for 100+ providers. It handles authentication, request translation, and routing so your applications talk to one endpoint instead of managing multiple provider integrations. If you’re new to LiteLLM or Kubernetes AI infrastructure, check out the deploy-ollama-kubernetes guide for related reading.

Step 1: Prepare the LiteLLM Configuration

LiteLLM relies on a proxy_config.yaml file to define which models it serves and how to reach each provider. I store this file on my NAS, mounted at /k3s_storage/ across every Kubernetes node. That keeps the configuration persistent and accessible even when pods recycle. For a homelab, HostPath volumes work fine; in production you’ll want a proper PersistentVolume with ReadWriteMany access.

Create this file on your K8s node at /k3s_storage/litellm/proxy_config.yaml:

/k3s_storage/litellm/proxy_config.yaml
model_list:
- model_name: kimi-code
litellm_params:
model: openai/kimi-for-coding
api_base: "https://api.kimi.com/coding/v1"
api_key: "os.environ/KIMI_API_KEY"
headers:
User-Agent: "claude-code/0.1.0"
X-Kimi-Client: "Kimi-Code"
- model_name: openrouter-free
litellm_params:
model: openai/openrouter/free
api_key: "os.environ/OPENROUTER_API_KEY"
api_base: "https://openrouter.ai/api/v1"
headers:
HTTP-Referer: "https://your-domain.com"
X-Title: "LiteLLM-Automation"
- model_name: openrouter-free-trending
litellm_params:
model: openrouter/*
api_key: "os.environ/OPENROUTER_API_KEY"
api_base: "https://openrouter.ai/api/v1"
headers:
HTTP-Referer: "https://your-domain.com"
X-Title: "LiteLLM-Automation"
- model_name: nvidia-llama
litellm_params:
model: nvidia_nim/meta/llama-4-maverick-17b-128e-instruct
api_key: "os.environ/NVIDIA_NIM_API_KEY"

Important: Replace https://your-domain.com with your actual domain or GitHub profile URL. OpenRouter uses this for ranking and attribution.

Make sure the litellm directory exists on your node before starting: mkdir -p /k3s_storage/litellm. If the directory is missing, the pod will stay stuck in Pending state because Kubernetes can’t mount a non-existent HostPath volume.

Why These Headers Matter

The Kimi Code configuration requires two custom headers. Without them, you’ll hit authentication or compatibility errors even with a valid API key; a classic “tribal knowledge” gotcha buried outside the docs.

  • User-Agent: "claude-code/0.1.0": Kimi’s API demands this specific string to accept the connection. It’s a compatibility shim that signals Kimi’s backend to process the request as if it arrived from a known client.
  • X-Kimi-Client: "Kimi-Code": Identifies the client type for internal routing inside Kimi’s infrastructure. Omit this header and you’ll chase vague authentication errors down a deep debugging rabbit hole.

For OpenRouter, the HTTP-Referer header is non-negotiable for their ranking and attribution pipeline. Skip it, and requests return a 400 Bad Request immediately.

Step 2: Create Kubernetes Secrets for API Keys

Never embed API keys directly in your manifests. Store provider credentials and the LiteLLM master key in Kubernetes Secrets instead. This keeps sensitive values encrypted at rest and decoupled from your deployment YAML:

Terminal window
kubectl create secret generic litellm-provider-keys \
--namespace litellm \
--from-literal=KIMI_API_KEY='sk-your-kimi-key' \
--from-literal=OPENROUTER_API_KEY='sk-or-v1-your-openrouter-key' \
--from-literal=NVIDIA_NIM_API_KEY='nvapi-your-nvidia-key' \
--from-literal=LITELLM_MASTER_KEY='sk-your-master-key'

Make sure the litellm namespace exists first; create it with kubectl create namespace litellm if needed. You’ll reference these secrets in the Deployment using secretKeyRef instead of hardcoded values.

Step 3: Deploy LiteLLM to Kubernetes

Apply this manifest to deploy LiteLLM in the litellm namespace:

litellm-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: litellm-deployment
namespace: litellm
spec:
replicas: 1
selector:
matchLabels:
app: litellm
template:
metadata:
labels:
app: litellm
spec:
containers:
- name: litellm-container
image: ghcr.io/berriai/litellm:main-latest
imagePullPolicy: Always
env:
- name: DATABASE_URL
value: "postgresql://litellm:<PASSWORD>@<POSTGRES_HOST>:<PORT>/litellm"
- name: STORE_MODEL_IN_DB
value: "True"
- name: LITELLM_MASTER_KEY
valueFrom:
secretKeyRef:
name: litellm-provider-keys
key: LITELLM_MASTER_KEY
- name: KIMI_API_KEY
valueFrom:
secretKeyRef:
name: litellm-provider-keys
key: KIMI_API_KEY
- name: OPENROUTER_API_KEY
valueFrom:
secretKeyRef:
name: litellm-provider-keys
key: OPENROUTER_API_KEY
- name: NVIDIA_NIM_API_KEY
valueFrom:
secretKeyRef:
name: litellm-provider-keys
key: NVIDIA_NIM_API_KEY
args:
- "--config"
- "/k3s_storage/litellm/proxy_config.yaml"
- "--port"
- "4000"
ports:
- containerPort: 4000
volumeMounts:
- name: physical-storage
mountPath: /k3s_storage/litellm
livenessProbe:
httpGet:
path: /health/liveliness
port: 4000
initialDelaySeconds: 40
periodSeconds: 15
volumes:
- name: physical-storage
hostPath:
path: /k3s_storage/litellm
type: Directory
---
apiVersion: v1
kind: Service
metadata:
name: litellm-service
namespace: litellm
spec:
type: NodePort
selector:
app: litellm
ports:
- protocol: TCP
port: 4000
targetPort: 4000
nodePort: 30080

Deploy it:

Terminal window
kubectl apply -f litellm-deployment.yaml

Verify the pod is running:

Terminal window
kubectl get pods -n litellm -l app=litellm
# NAME READY STATUS RESTARTS AGE
# litellm-deployment-7c9f4b8d5-x2k9m 1/1 Running 0 45s

The Postgres Database

LiteLLM requires Postgres for user management, rate limiting, and request logging. This is non-negotiable. Skip it and features like spend tracking, key-based access control, and multi-user support simply won’t function. I run Postgres on a separate NAS to decouple it from the cluster lifecycle, which means database maintenance doesn’t require downtime on the gateway. Point LiteLLM to <YOUR_POSTGRES_IP>:<YOUR_POSTGRES_PORT> via the DATABASE_URL environment variable. Without a valid Postgres connection, LiteLLM starts but critical features remain broken.

If you don’t have Postgres yet, deploy it quickly for testing:

Terminal window
kubectl create deployment postgres \
--image=postgres:16-alpine \
--namespace litellm
kubectl set env deployment postgres \
POSTGRES_USER=litellm \
POSTGRES_PASSWORD=your-secure-password \
POSTGRES_DB=litellm \
--namespace litellm
kubectl expose deployment postgres \
--port=5432 \
--namespace litellm

Production tip: Use a managed Postgres instance or persistent volumes with solid backups. The ephemeral deployment above loses all data the moment the pod restarts. For the full production hardening guide, see Part 4.

Frequently Asked Questions

Do I need Postgres for LiteLLM?

Yes; Postgres is required for rate limiting, spend tracking, and multi-user support. LiteLLM starts without it but critical features remain broken.

Can I update the config without redeploying?

Update proxy_config.yaml on the host path and restart: kubectl rollout restart deployment/litellm-deployment -n litellm.

How do I add a new provider?

Add an entry to model_list in proxy_config.yaml, store the API key as a Kubernetes Secret, and reference it via secretKeyRef.

What if my pod stays in Pending?

Verify the HostPath directory exists: mkdir -p /k3s_storage/litellm. Kubernetes can’t mount a non-existent host path.


Your LiteLLM proxy is now running on Kubernetes. Continue to Part 3: Configuring AI Tools and Providers to connect your coding agents.


If you enjoyed this guide, meet the engineer behind these builds.

# litellm # Kubernetes # AI # Llm # proxy # ai-gateway