LiteLLM Kubernetes: Configure AI Tools Providers

Part 3 of 4. In Part 2 we deployed LiteLLM to Kubernetes. Now we connect AI tools and explore provider configuration. Continue to Part 4: Production Hardening and Troubleshooting.

LiteLLM is an open-source AI proxy that normalizes 100+ provider APIs behind a single OpenAI-compatible endpoint. Your CLI tools, IDEs, and agents all speak the same protocol. LiteLLM translates to each provider’s native format. For a primer on why this matters, revisit Part 1: Architecture Overview.

Step 4: Configure Your AI Tools

Here’s where the architecture pays off. Instead of stuffing API keys into every tool, you point everything at LiteLLM.

OpenCode Configuration

Here’s my OpenCode settings.json that connects to LiteLLM:

{
  "provider": {
    "litellm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "LiteLLM",
      "options": {
        "baseURL": "http://litellm.your-domain.com/v1",
        "apiKey": "sk-your-litellm-master-key"
      },
      "models": {
        "kimi-code": {
          "name": "Kimi Code"
        },
        "nvidia-llama": {
          "name": "NVIDIA Llama"
        },
        "openrouter-free": {
          "name": "OpenRouter Free"
        },
        "openrouter-free-trending": {
          "name": "OpenRouter Free Trending"
        }
      }
    }
  }
}

Note: OpenCode doesn’t autoload models from LiteLLM yet. I define them manually in the models section. If you crack that, tell me how!

Cursor, Continue.dev, or Any OpenAI-Compatible Tool

Any tool supporting custom OpenAI-compatible endpoints works with LiteLLM. The pattern stays identical: set the base URL to your LiteLLM endpoint and use your master key as the API key. For more Kubernetes AI patterns, see deploy-ollama-kubernetes.

Tool	Setting	Value
Cursor	OpenAI API Key	`sk-your-litellm-master-key`
Cursor	OpenAI Base URL	`http://<your-node>:<node-port>/v1`
Continue.dev	`apiBase`	`http://<your-node>:<node-port>/v1`
CLI tools	`OPENAI_API_KEY`	`sk-your-litellm-master-key`
CLI tools	`OPENAI_BASE_URL`	`http://<your-node>:<node-port>/v1`

Accessing via Tailscale

I expose LiteLLM through Tailscale for secure access from anywhere:

http://litellm.your-domain.com/v1

This means my laptop, phone, and cloud VMs can all reach the same AI gateway without opening firewall ports or managing VPNs.

Verification & Testing

Let’s validate the setup.

1. Test LiteLLM Health

curl http://<your-node-ip>:<node-port>/health/liveliness
# Expected: {"status":"healthy"}

2. List Available Models

curl http://<your-node-ip>:<node-port>/v1/models \
  -H "Authorization: Bearer sk-your-litellm-master-key"

You should see your configured models: kimi-code, openrouter-free, nvidia-llama, and openrouter-free-trending.

3. Send a Test Chat Completion

curl http://<your-node-ip>:<node-port>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-litellm-master-key" \
  -d '{
    "model": "kimi-code",
    "messages": [{"role": "user", "content": "Write a Python function to reverse a string"}]
  }'

If you get a valid response, your gateway is working.

4. Test Provider Fallback

Try the same request with openrouter-free:

curl http://<your-node-ip>:<node-port>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-litellm-master-key" \
  -d '{
    "model": "openrouter-free",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

No API key changes. No config updates. Just swap the model name. That’s the entire point of LiteLLM: clients stay frozen while you control routing server-side.

Provider Configuration Deep Dive

Kimi Code

Kimi Code is Moonshot AI’s coding-specialized model. Per Kimi’s API documentation, it requires specific client headers for compatibility.

Parameter	Value	Why
`model`	`openai/kimi-for-coding`	LiteLLM’s provider mapping
`api_base`	`https://api.kimi.com/coding/v1`	Kimi’s coding API endpoint
`User-Agent`	`claude-code/0.1.0`	Required for API compatibility
`X-Kimi-Client`	`Kimi-Code`	Identifies client type

Skip the custom headers and you’ll face vague authentication errors. Took me a few minutes to debug: Kimi’s API is strict about client identification. The User-Agent spoof is a compatibility workaround with sparse documentation.

OpenRouter

OpenRouter aggregates free and paid models from dozens of providers. Their free tier unlocks DeepSeek, Qwen, Mistral and more, all behind reasonable rate limits.

Parameter	Value	Why
`model`	`openai/openrouter/free`	Routes to free-tier models
`HTTP-Referer`	Your domain	Required for OpenRouter ranking
`X-Title`	Your app name	Shows up in OpenRouter analytics

The HTTP-Referer header is mandatory. OpenRouter uses it for attribution and abuse prevention. Supply your real domain or GitHub profile. The X-Title header lets you identify traffic in OpenRouter’s dashboard.

NVIDIA NIM

NVIDIA’s NIM (NVIDIA Inference Microservices) delivers optimized inference for Llama models. Their free tier is generous enough for serious experimentation.

Parameter	Value	Why
`model`	`nvidia_nim/meta/llama-4-maverick-17b-128e-instruct`	Specific NIM model
`api_key`	`os.environ/NVIDIA_NIM_API_KEY`	Pulled from environment

NVIDIA NIM doesn’t require custom headers: just a valid API key from build.nvidia.com. It’s the simplest provider to configure.

Frequently Asked Questions

Can I use any OpenAI-compatible tool with LiteLLM?

Yes. Set the base URL to your LiteLLM endpoint and the API key to your master key. Cursor, Continue.dev, OpenCode, and CLI tools all work.

Why does Kimi Code need custom headers?

Kimi’s API requires a specific User-Agent and X-Kimi-Client header for compatibility. Without them, you’ll get authentication errors.

How do I add a new model?

Add an entry to model_list in your proxy_config.yaml, define the provider routing, and restart the deployment.

What if a provider is down?

You can configure fallback routing. LiteLLM automatically retries with a different provider on failure.

All your tools are now connected through LiteLLM. Continue to Part 4: Production Hardening and Troubleshooting to secure and monitor the setup.

Built this far? Meet the engineer behind the series.