LiteLLM Kubernetes Production Deploy and Config

Part 2 of 4. In Part 1 we covered the architecture and prerequisites. Here we deploy LiteLLM step by step. Continue to Part 3: Configuring AI Tools and Providers.

LiteLLM is an open-source AI proxy that exposes a single OpenAI-compatible API for 100+ providers. It handles authentication, request translation, and routing so your applications talk to one endpoint instead of managing multiple provider integrations. If you’re new to LiteLLM or Kubernetes AI infrastructure, check out the deploy-ollama-kubernetes guide for related reading.

Step 1: Prepare the LiteLLM Configuration

LiteLLM relies on a proxy_config.yaml file to define which models it serves and how to reach each provider. I store this file on my NAS, mounted at /k3s_storage/ across every Kubernetes node. That keeps the configuration persistent and accessible even when pods recycle. For a homelab, HostPath volumes work fine; in production you’ll want a proper PersistentVolume with ReadWriteMany access.

Create this file on your K8s node at /k3s_storage/litellm/proxy_config.yaml:

model_list:
  - model_name: kimi-code
    litellm_params:
      model: openai/kimi-for-coding
      api_base: "https://api.kimi.com/coding/v1"
      api_key: "os.environ/KIMI_API_KEY"
      headers:
        User-Agent: "claude-code/0.1.0"
        X-Kimi-Client: "Kimi-Code"

  - model_name: openrouter-free
    litellm_params:
      model: openai/openrouter/free
      api_key: "os.environ/OPENROUTER_API_KEY"
      api_base: "https://openrouter.ai/api/v1"
      headers:
        HTTP-Referer: "https://your-domain.com"
        X-Title: "LiteLLM-Automation"

  - model_name: openrouter-free-trending
    litellm_params:
      model: openrouter/*
      api_key: "os.environ/OPENROUTER_API_KEY"
      api_base: "https://openrouter.ai/api/v1"
      headers:
        HTTP-Referer: "https://your-domain.com"
        X-Title: "LiteLLM-Automation"

  - model_name: nvidia-llama
    litellm_params:
      model: nvidia_nim/meta/llama-4-maverick-17b-128e-instruct
      api_key: "os.environ/NVIDIA_NIM_API_KEY"

Important: Replace https://your-domain.com with your actual domain or GitHub profile URL. OpenRouter uses this for ranking and attribution.

Make sure the litellm directory exists on your node before starting: mkdir -p /k3s_storage/litellm. If the directory is missing, the pod will stay stuck in Pending state because Kubernetes can’t mount a non-existent HostPath volume.

Why These Headers Matter

The Kimi Code configuration requires two custom headers. Without them, you’ll hit authentication or compatibility errors even with a valid API key; a classic “tribal knowledge” gotcha buried outside the docs.

User-Agent: "claude-code/0.1.0": Kimi’s API demands this specific string to accept the connection. It’s a compatibility shim that signals Kimi’s backend to process the request as if it arrived from a known client.
X-Kimi-Client: "Kimi-Code": Identifies the client type for internal routing inside Kimi’s infrastructure. Omit this header and you’ll chase vague authentication errors down a deep debugging rabbit hole.

For OpenRouter, the HTTP-Referer header is non-negotiable for their ranking and attribution pipeline. Skip it, and requests return a 400 Bad Request immediately.

Step 2: Create Kubernetes Secrets for API Keys

Never embed API keys directly in your manifests. Store provider credentials and the LiteLLM master key in Kubernetes Secrets instead. This keeps sensitive values encrypted at rest and decoupled from your deployment YAML:

kubectl create secret generic litellm-provider-keys \
  --namespace litellm \
  --from-literal=KIMI_API_KEY='sk-your-kimi-key' \
  --from-literal=OPENROUTER_API_KEY='sk-or-v1-your-openrouter-key' \
  --from-literal=NVIDIA_NIM_API_KEY='nvapi-your-nvidia-key' \
  --from-literal=LITELLM_MASTER_KEY='sk-your-master-key'

Make sure the litellm namespace exists first; create it with kubectl create namespace litellm if needed. You’ll reference these secrets in the Deployment using secretKeyRef instead of hardcoded values.

Step 3: Deploy LiteLLM to Kubernetes

Apply this manifest to deploy LiteLLM in the litellm namespace:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm-deployment
  namespace: litellm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      containers:
        - name: litellm-container
          image: ghcr.io/berriai/litellm:main-latest
          imagePullPolicy: Always
          env:
            - name: DATABASE_URL
              value: "postgresql://litellm:<PASSWORD>@<POSTGRES_HOST>:<PORT>/litellm"
            - name: STORE_MODEL_IN_DB
              value: "True"
            - name: LITELLM_MASTER_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-provider-keys
                  key: LITELLM_MASTER_KEY
            - name: KIMI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-provider-keys
                  key: KIMI_API_KEY
            - name: OPENROUTER_API_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-provider-keys
                  key: OPENROUTER_API_KEY
            - name: NVIDIA_NIM_API_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-provider-keys
                  key: NVIDIA_NIM_API_KEY
          args:
            - "--config"
            - "/k3s_storage/litellm/proxy_config.yaml"
            - "--port"
            - "4000"
          ports:
            - containerPort: 4000
          volumeMounts:
            - name: physical-storage
              mountPath: /k3s_storage/litellm
          livenessProbe:
            httpGet:
              path: /health/liveliness
              port: 4000
            initialDelaySeconds: 40
            periodSeconds: 15
      volumes:
        - name: physical-storage
          hostPath:
            path: /k3s_storage/litellm
            type: Directory
---
apiVersion: v1
kind: Service
metadata:
  name: litellm-service
  namespace: litellm
spec:
  type: NodePort
  selector:
    app: litellm
  ports:
    - protocol: TCP
      port: 4000
      targetPort: 4000
      nodePort: 30080

Deploy it:

kubectl apply -f litellm-deployment.yaml

Verify the pod is running:

kubectl get pods -n litellm -l app=litellm
# NAME                                  READY   STATUS    RESTARTS   AGE
# litellm-deployment-7c9f4b8d5-x2k9m   1/1     Running   0          45s

The Postgres Database

LiteLLM requires Postgres for user management, rate limiting, and request logging. This is non-negotiable. Skip it and features like spend tracking, key-based access control, and multi-user support simply won’t function. I run Postgres on a separate NAS to decouple it from the cluster lifecycle, which means database maintenance doesn’t require downtime on the gateway. Point LiteLLM to <YOUR_POSTGRES_IP>:<YOUR_POSTGRES_PORT> via the DATABASE_URL environment variable. Without a valid Postgres connection, LiteLLM starts but critical features remain broken.

If you don’t have Postgres yet, deploy it quickly for testing:

kubectl create deployment postgres \
  --image=postgres:16-alpine \
  --namespace litellm

kubectl set env deployment postgres \
  POSTGRES_USER=litellm \
  POSTGRES_PASSWORD=your-secure-password \
  POSTGRES_DB=litellm \
  --namespace litellm

kubectl expose deployment postgres \
  --port=5432 \
  --namespace litellm

Production tip: Use a managed Postgres instance or persistent volumes with solid backups. The ephemeral deployment above loses all data the moment the pod restarts. For the full production hardening guide, see Part 4.

Frequently Asked Questions

Do I need Postgres for LiteLLM?

Yes; Postgres is required for rate limiting, spend tracking, and multi-user support. LiteLLM starts without it but critical features remain broken.

Can I update the config without redeploying?

Update proxy_config.yaml on the host path and restart: kubectl rollout restart deployment/litellm-deployment -n litellm.

How do I add a new provider?

Add an entry to model_list in proxy_config.yaml, store the API key as a Kubernetes Secret, and reference it via secretKeyRef.

What if my pod stays in Pending?

Verify the HostPath directory exists: mkdir -p /k3s_storage/litellm. Kubernetes can’t mount a non-existent host path.

Your LiteLLM proxy is now running on Kubernetes. Continue to Part 3: Configuring AI Tools and Providers to connect your coding agents.

If you enjoyed this guide, meet the engineer behind these builds.