LiteLLM Kubernetes: Configure AI Tools Providers
Table of Contents
Part 3 of 4. In Part 2 we deployed LiteLLM to Kubernetes. Now we connect AI tools and explore provider configuration. Continue to Part 4: Production Hardening and Troubleshooting.
LiteLLM is an open-source AI proxy that normalizes 100+ provider APIs behind a single OpenAI-compatible endpoint. Your CLI tools, IDEs, and agents all speak the same protocol. LiteLLM translates to each provider’s native format. For a primer on why this matters, revisit Part 1: Architecture Overview.
Step 4: Configure Your AI Tools
Here’s where the architecture pays off. Instead of stuffing API keys into every tool, you point everything at LiteLLM.
OpenCode Configuration
Here’s my OpenCode settings.json that connects to LiteLLM:
{ "provider": { "litellm": { "npm": "@ai-sdk/openai-compatible", "name": "LiteLLM", "options": { "baseURL": "http://litellm.your-domain.com/v1", "apiKey": "sk-your-litellm-master-key" }, "models": { "kimi-code": { "name": "Kimi Code" }, "nvidia-llama": { "name": "NVIDIA Llama" }, "openrouter-free": { "name": "OpenRouter Free" }, "openrouter-free-trending": { "name": "OpenRouter Free Trending" } } } }}Note: OpenCode doesn’t autoload models from LiteLLM yet. I define them manually in the
modelssection. If you crack that, tell me how!
Cursor, Continue.dev, or Any OpenAI-Compatible Tool
Any tool supporting custom OpenAI-compatible endpoints works with LiteLLM. The pattern stays identical: set the base URL to your LiteLLM endpoint and use your master key as the API key. For more Kubernetes AI patterns, see deploy-ollama-kubernetes.
| Tool | Setting | Value |
|---|---|---|
| Cursor | OpenAI API Key | sk-your-litellm-master-key |
| Cursor | OpenAI Base URL | http://<your-node>:<node-port>/v1 |
| Continue.dev | apiBase | http://<your-node>:<node-port>/v1 |
| CLI tools | OPENAI_API_KEY | sk-your-litellm-master-key |
| CLI tools | OPENAI_BASE_URL | http://<your-node>:<node-port>/v1 |
Accessing via Tailscale
I expose LiteLLM through Tailscale for secure access from anywhere:
http://litellm.your-domain.com/v1This means my laptop, phone, and cloud VMs can all reach the same AI gateway without opening firewall ports or managing VPNs.
Verification & Testing
Let’s validate the setup.
1. Test LiteLLM Health
curl http://<your-node-ip>:<node-port>/health/liveliness# Expected: {"status":"healthy"}2. List Available Models
curl http://<your-node-ip>:<node-port>/v1/models \ -H "Authorization: Bearer sk-your-litellm-master-key"You should see your configured models: kimi-code, openrouter-free, nvidia-llama, and openrouter-free-trending.
3. Send a Test Chat Completion
curl http://<your-node-ip>:<node-port>/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-your-litellm-master-key" \ -d '{ "model": "kimi-code", "messages": [{"role": "user", "content": "Write a Python function to reverse a string"}] }'If you get a valid response, your gateway is working.
4. Test Provider Fallback
Try the same request with openrouter-free:
curl http://<your-node-ip>:<node-port>/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-your-litellm-master-key" \ -d '{ "model": "openrouter-free", "messages": [{"role": "user", "content": "Hello!"}] }'No API key changes. No config updates. Just swap the model name. That’s the entire point of LiteLLM: clients stay frozen while you control routing server-side.
Provider Configuration Deep Dive
Kimi Code
Kimi Code is Moonshot AI’s coding-specialized model. Per Kimi’s API documentation, it requires specific client headers for compatibility.
| Parameter | Value | Why |
|---|---|---|
model | openai/kimi-for-coding | LiteLLM’s provider mapping |
api_base | https://api.kimi.com/coding/v1 | Kimi’s coding API endpoint |
User-Agent | claude-code/0.1.0 | Required for API compatibility |
X-Kimi-Client | Kimi-Code | Identifies client type |
Skip the custom headers and you’ll face vague authentication errors. Took me a few minutes to debug: Kimi’s API is strict about client identification. The User-Agent spoof is a compatibility workaround with sparse documentation.
OpenRouter
OpenRouter aggregates free and paid models from dozens of providers. Their free tier unlocks DeepSeek, Qwen, Mistral and more, all behind reasonable rate limits.
| Parameter | Value | Why |
|---|---|---|
model | openai/openrouter/free | Routes to free-tier models |
HTTP-Referer | Your domain | Required for OpenRouter ranking |
X-Title | Your app name | Shows up in OpenRouter analytics |
The HTTP-Referer header is mandatory. OpenRouter uses it for attribution and abuse prevention. Supply your real domain or GitHub profile. The X-Title header lets you identify traffic in OpenRouter’s dashboard.
NVIDIA NIM
NVIDIA’s NIM (NVIDIA Inference Microservices) delivers optimized inference for Llama models. Their free tier is generous enough for serious experimentation.
| Parameter | Value | Why |
|---|---|---|
model | nvidia_nim/meta/llama-4-maverick-17b-128e-instruct | Specific NIM model |
api_key | os.environ/NVIDIA_NIM_API_KEY | Pulled from environment |
NVIDIA NIM doesn’t require custom headers: just a valid API key from build.nvidia.com. It’s the simplest provider to configure.
Frequently Asked Questions
Can I use any OpenAI-compatible tool with LiteLLM?
Yes. Set the base URL to your LiteLLM endpoint and the API key to your master key. Cursor, Continue.dev, OpenCode, and CLI tools all work.
Why does Kimi Code need custom headers?
Kimi’s API requires a specific User-Agent and X-Kimi-Client header for compatibility. Without them, you’ll get authentication errors.
How do I add a new model?
Add an entry to model_list in your proxy_config.yaml, define the provider routing, and restart the deployment.
What if a provider is down?
You can configure fallback routing. LiteLLM automatically retries with a different provider on failure.
All your tools are now connected through LiteLLM. Continue to Part 4: Production Hardening and Troubleshooting to secure and monitor the setup.
Built this far? Meet the engineer behind the series.