LiteLLM Kubernetes Production Deploy and Config
Table of Contents
Part 2 of 4. In Part 1 we covered the architecture and prerequisites. Here we deploy LiteLLM step by step. Continue to Part 3: Configuring AI Tools and Providers.
LiteLLM is an open-source AI proxy that exposes a single OpenAI-compatible API for 100+ providers. It handles authentication, request translation, and routing so your applications talk to one endpoint instead of managing multiple provider integrations. If you’re new to LiteLLM or Kubernetes AI infrastructure, check out the deploy-ollama-kubernetes guide for related reading.
Step 1: Prepare the LiteLLM Configuration
LiteLLM relies on a proxy_config.yaml file to define which models it serves and how to reach each provider. I store this file on my NAS, mounted at /k3s_storage/ across every Kubernetes node. That keeps the configuration persistent and accessible even when pods recycle. For a homelab, HostPath volumes work fine; in production you’ll want a proper PersistentVolume with ReadWriteMany access.
Create this file on your K8s node at /k3s_storage/litellm/proxy_config.yaml:
model_list: - model_name: kimi-code litellm_params: model: openai/kimi-for-coding api_base: "https://api.kimi.com/coding/v1" api_key: "os.environ/KIMI_API_KEY" headers: User-Agent: "claude-code/0.1.0" X-Kimi-Client: "Kimi-Code"
- model_name: openrouter-free litellm_params: model: openai/openrouter/free api_key: "os.environ/OPENROUTER_API_KEY" api_base: "https://openrouter.ai/api/v1" headers: HTTP-Referer: "https://your-domain.com" X-Title: "LiteLLM-Automation"
- model_name: openrouter-free-trending litellm_params: model: openrouter/* api_key: "os.environ/OPENROUTER_API_KEY" api_base: "https://openrouter.ai/api/v1" headers: HTTP-Referer: "https://your-domain.com" X-Title: "LiteLLM-Automation"
- model_name: nvidia-llama litellm_params: model: nvidia_nim/meta/llama-4-maverick-17b-128e-instruct api_key: "os.environ/NVIDIA_NIM_API_KEY"Important: Replace
https://your-domain.comwith your actual domain or GitHub profile URL. OpenRouter uses this for ranking and attribution.
Make sure the litellm directory exists on your node before starting: mkdir -p /k3s_storage/litellm. If the directory is missing, the pod will stay stuck in Pending state because Kubernetes can’t mount a non-existent HostPath volume.
Why These Headers Matter
The Kimi Code configuration requires two custom headers. Without them, you’ll hit authentication or compatibility errors even with a valid API key; a classic “tribal knowledge” gotcha buried outside the docs.
User-Agent: "claude-code/0.1.0": Kimi’s API demands this specific string to accept the connection. It’s a compatibility shim that signals Kimi’s backend to process the request as if it arrived from a known client.X-Kimi-Client: "Kimi-Code": Identifies the client type for internal routing inside Kimi’s infrastructure. Omit this header and you’ll chase vague authentication errors down a deep debugging rabbit hole.
For OpenRouter, the HTTP-Referer header is non-negotiable for their ranking and attribution pipeline. Skip it, and requests return a 400 Bad Request immediately.
Step 2: Create Kubernetes Secrets for API Keys
Never embed API keys directly in your manifests. Store provider credentials and the LiteLLM master key in Kubernetes Secrets instead. This keeps sensitive values encrypted at rest and decoupled from your deployment YAML:
kubectl create secret generic litellm-provider-keys \ --namespace litellm \ --from-literal=KIMI_API_KEY='sk-your-kimi-key' \ --from-literal=OPENROUTER_API_KEY='sk-or-v1-your-openrouter-key' \ --from-literal=NVIDIA_NIM_API_KEY='nvapi-your-nvidia-key' \ --from-literal=LITELLM_MASTER_KEY='sk-your-master-key'Make sure the litellm namespace exists first; create it with kubectl create namespace litellm if needed. You’ll reference these secrets in the Deployment using secretKeyRef instead of hardcoded values.
Step 3: Deploy LiteLLM to Kubernetes
Apply this manifest to deploy LiteLLM in the litellm namespace:
apiVersion: apps/v1kind: Deploymentmetadata: name: litellm-deployment namespace: litellmspec: replicas: 1 selector: matchLabels: app: litellm template: metadata: labels: app: litellm spec: containers: - name: litellm-container image: ghcr.io/berriai/litellm:main-latest imagePullPolicy: Always env: - name: DATABASE_URL value: "postgresql://litellm:<PASSWORD>@<POSTGRES_HOST>:<PORT>/litellm" - name: STORE_MODEL_IN_DB value: "True" - name: LITELLM_MASTER_KEY valueFrom: secretKeyRef: name: litellm-provider-keys key: LITELLM_MASTER_KEY - name: KIMI_API_KEY valueFrom: secretKeyRef: name: litellm-provider-keys key: KIMI_API_KEY - name: OPENROUTER_API_KEY valueFrom: secretKeyRef: name: litellm-provider-keys key: OPENROUTER_API_KEY - name: NVIDIA_NIM_API_KEY valueFrom: secretKeyRef: name: litellm-provider-keys key: NVIDIA_NIM_API_KEY args: - "--config" - "/k3s_storage/litellm/proxy_config.yaml" - "--port" - "4000" ports: - containerPort: 4000 volumeMounts: - name: physical-storage mountPath: /k3s_storage/litellm livenessProbe: httpGet: path: /health/liveliness port: 4000 initialDelaySeconds: 40 periodSeconds: 15 volumes: - name: physical-storage hostPath: path: /k3s_storage/litellm type: Directory---apiVersion: v1kind: Servicemetadata: name: litellm-service namespace: litellmspec: type: NodePort selector: app: litellm ports: - protocol: TCP port: 4000 targetPort: 4000 nodePort: 30080Deploy it:
kubectl apply -f litellm-deployment.yamlVerify the pod is running:
kubectl get pods -n litellm -l app=litellm# NAME READY STATUS RESTARTS AGE# litellm-deployment-7c9f4b8d5-x2k9m 1/1 Running 0 45sThe Postgres Database
LiteLLM requires Postgres for user management, rate limiting, and request logging. This is non-negotiable. Skip it and features like spend tracking, key-based access control, and multi-user support simply won’t function. I run Postgres on a separate NAS to decouple it from the cluster lifecycle, which means database maintenance doesn’t require downtime on the gateway. Point LiteLLM to <YOUR_POSTGRES_IP>:<YOUR_POSTGRES_PORT> via the DATABASE_URL environment variable. Without a valid Postgres connection, LiteLLM starts but critical features remain broken.
If you don’t have Postgres yet, deploy it quickly for testing:
kubectl create deployment postgres \ --image=postgres:16-alpine \ --namespace litellm
kubectl set env deployment postgres \ POSTGRES_USER=litellm \ POSTGRES_PASSWORD=your-secure-password \ POSTGRES_DB=litellm \ --namespace litellm
kubectl expose deployment postgres \ --port=5432 \ --namespace litellmProduction tip: Use a managed Postgres instance or persistent volumes with solid backups. The ephemeral deployment above loses all data the moment the pod restarts. For the full production hardening guide, see Part 4.
Frequently Asked Questions
Do I need Postgres for LiteLLM?
Yes; Postgres is required for rate limiting, spend tracking, and multi-user support. LiteLLM starts without it but critical features remain broken.
Can I update the config without redeploying?
Update proxy_config.yaml on the host path and restart: kubectl rollout restart deployment/litellm-deployment -n litellm.
How do I add a new provider?
Add an entry to model_list in proxy_config.yaml, store the API key as a Kubernetes Secret, and reference it via secretKeyRef.
What if my pod stays in Pending?
Verify the HostPath directory exists: mkdir -p /k3s_storage/litellm. Kubernetes can’t mount a non-existent host path.
Your LiteLLM proxy is now running on Kubernetes. Continue to Part 3: Configuring AI Tools and Providers to connect your coding agents.
If you enjoyed this guide, meet the engineer behind these builds.