AI SRE Agent Production Deployment and Monitoring
Table of Contents
This is Part 3 of a 3-part series. Part 1 covered MCP and the 4-layer architecture. Part 2 walked through server code with safety guardrails. Here we deploy to production and explore advanced workflows.
Multi-Tool Workflows
MCP’s real power surfaces when Claude chains multiple tools together. Here is what that looks like in an actual conversation:
User: “The api-gateway deployment in production looks unhealthy. Can you investigate?”
Claude (internal reasoning):
- Call
list_podsinproductionto find pods forapi-gateway. - Call
describe_podon the unhealthy pod. - Call
get_pod_logswithtail_lines=500to see recent errors. - Synthesize: “The pod
api-gateway-7d9f4b2c-x1z9is inCrashLoopBackOff. The logs show a connection timeout to theauth-serviceat10.0.4.15:8080. The last successful health check was 12 minutes ago. Recommended next step: check ifauth-serviceis reachable from theapi-gatewaypod.”
The entire diagnostic sequence completes in under five seconds. The human operator receives a precise summary and a recommendation; not a wall of raw log output. This chaining works because MCP gives Claude structured tool definitions with typed parameters that compose naturally.
Production Deployment
For production, I package the MCP server as a container and deploy it as a sidecar or standalone service in Kubernetes.
# DockerfileFROM python:3.12-slim
WORKDIR /appCOPY pyproject.toml uv.lock ./RUN pip install uv && uv sync --no-dev
COPY server.py approval_queue.py ./
ENV K8S_MCP_MODE=readonlyENV K8S_MCP_DRY_RUN=falseENV PYTHONUNBUFFERED=1
USER 1000CMD ["uv", "run", "python", "server.py"]Deploy with a ServiceAccount that has explicitly scoped RBAC:
apiVersion: v1kind: ServiceAccountmetadata: name: k8s-mcp-server namespace: monitoring---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: k8s-mcp-readonly namespace: defaultrules:- apiGroups: [""] resources: ["pods", "pods/log", "events"] verbs: ["get", "list"]- apiGroups: ["apps"] resources: ["deployments"] verbs: ["get", "list"]---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: k8s-mcp-readonly namespace: defaultroleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: k8s-mcp-readonlysubjects:- kind: ServiceAccount name: k8s-mcp-server namespace: monitoringNeed the restart capability? Create a separate Role with patch on deployments and bind it only after approval workflows are in place.
Troubleshooting
| Symptom | Cause | Solution |
|---|---|---|
kube_config.load_kube_config fails | No kubeconfig in environment | Mount a kubeconfig Secret or use in-cluster config |
| Claude shows “No tools available” | MCP server not starting | Check Claude Desktop logs; verify the JSON config path |
PermissionError on read ops | Namespace allowlist mismatch | Verify K8S_MCP_NAMESPACES includes the target namespace |
| Logs truncated at 1000 lines | tail_lines hard limit | Use kubectl logs directly for full historical logs |
| Approval queue not working | Directory permissions | Ensure the approval directory is writable by the server process |
FAQ
What is an MCP server, and why do I need one?
An MCP server is a lightweight service that exposes tools to an AI assistant through the Model Context Protocol. You need one because it gives the AI structured, validated access to your infrastructure instead of asking it to generate and execute raw shell commands.
Is FastMCP the only way to build MCP servers?
Not at all. Anthropic ships an official TypeScript SDK, and community implementations exist for Go and Rust. I use FastMCP (Python) because the Kubernetes Python client is mature, and most SRE tooling already lives in Python.
Can this agent actually break my cluster?
Only if you configure it that way. The default mode is readonly, namespace-restricted, and every write operation can be wrapped in an approval queue. Treat it like any other automation: start with least privilege and expand deliberately.
How does this compare to kubectl-ai or k8sgpt?
kubectl-ai generates kubectl commands from natural language but still needs you to execute them. k8sgpt diagnoses clusters using static analysis and AI summarization, but never executes commands. This MCP approach fuses both: a natural-language interface with validated tool execution and explicit safety gates.
What if Claude hallucinates a pod name?
The MCP server receives the exact pod name from Claude and queries the Kubernetes API. If the pod does not exist, the API returns a 404, which the server surfaces back to Claude. The agent can then ask the user for clarification. The tool layer validates reality; the AI does not need to hallucinate correctly.
Can I use this with other AI clients besides Claude?
The MCP protocol is open, though client support keeps expanding. As of mid-2025, Claude Desktop has the most mature MCP implementation. OpenAI’s Agents SDK has announced MCP compatibility, and several open-source clients are adding support.
Should I run the MCP server locally or in the cluster?
For personal use, running locally works. For team production, run it inside the cluster with a ServiceAccount and RBAC. This removes kubeconfig management and keeps network traffic internal.
Next Steps
You now have a foundational AI SRE agent that diagnoses Kubernetes issues through natural language. Here is where to go next:
- Explore autonomous remediation workflows where we extend this pattern to safe, self-healing infrastructure.
- Connect your MCP server to a self-hosted Ollama deployment if you want the reasoning layer to stay entirely on-premise.
- Read the Agentic DevOps introduction to understand how agentic systems fit into the broader DevOps transformation.
If you build something interesting with this, share your workflow. I am especially curious about creative spins on the approval queue pattern.