AI SRE Agent Production Deployment and Monitoring

This is Part 3 of a 3-part series. Part 1 covered MCP and the 4-layer architecture. Part 2 walked through server code with safety guardrails. Here we deploy to production and explore advanced workflows.

Multi-Tool Workflows

MCP’s real power surfaces when Claude chains multiple tools together. Here is what that looks like in an actual conversation:

User: “The api-gateway deployment in production looks unhealthy. Can you investigate?”

Claude (internal reasoning):

Call list_pods in production to find pods for api-gateway.
Call describe_pod on the unhealthy pod.
Call get_pod_logs with tail_lines=500 to see recent errors.
Synthesize: “The pod api-gateway-7d9f4b2c-x1z9 is in CrashLoopBackOff. The logs show a connection timeout to the auth-service at 10.0.4.15:8080. The last successful health check was 12 minutes ago. Recommended next step: check if auth-service is reachable from the api-gateway pod.”

The entire diagnostic sequence completes in under five seconds. The human operator receives a precise summary and a recommendation; not a wall of raw log output. This chaining works because MCP gives Claude structured tool definitions with typed parameters that compose naturally.

Production Deployment

For production, I package the MCP server as a container and deploy it as a sidecar or standalone service in Kubernetes.

# Dockerfile
FROM python:3.12-slim

WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN pip install uv && uv sync --no-dev

COPY server.py approval_queue.py ./

ENV K8S_MCP_MODE=readonly
ENV K8S_MCP_DRY_RUN=false
ENV PYTHONUNBUFFERED=1

USER 1000
CMD ["uv", "run", "python", "server.py"]

Deploy with a ServiceAccount that has explicitly scoped RBAC:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: k8s-mcp-server
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: k8s-mcp-readonly
  namespace: default
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log", "events"]
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: k8s-mcp-readonly
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: k8s-mcp-readonly
subjects:
- kind: ServiceAccount
  name: k8s-mcp-server
  namespace: monitoring

Need the restart capability? Create a separate Role with patch on deployments and bind it only after approval workflows are in place.

Troubleshooting

Symptom	Cause	Solution
`kube_config.load_kube_config` fails	No kubeconfig in environment	Mount a kubeconfig Secret or use in-cluster config
Claude shows “No tools available”	MCP server not starting	Check Claude Desktop logs; verify the JSON config path
`PermissionError` on read ops	Namespace allowlist mismatch	Verify `K8S_MCP_NAMESPACES` includes the target namespace
Logs truncated at 1000 lines	`tail_lines` hard limit	Use `kubectl logs` directly for full historical logs
Approval queue not working	Directory permissions	Ensure the approval directory is writable by the server process

FAQ

What is an MCP server, and why do I need one?

An MCP server is a lightweight service that exposes tools to an AI assistant through the Model Context Protocol. You need one because it gives the AI structured, validated access to your infrastructure instead of asking it to generate and execute raw shell commands.

Is FastMCP the only way to build MCP servers?

Not at all. Anthropic ships an official TypeScript SDK, and community implementations exist for Go and Rust. I use FastMCP (Python) because the Kubernetes Python client is mature, and most SRE tooling already lives in Python.

Can this agent actually break my cluster?

Only if you configure it that way. The default mode is readonly, namespace-restricted, and every write operation can be wrapped in an approval queue. Treat it like any other automation: start with least privilege and expand deliberately.

How does this compare to kubectl-ai or k8sgpt?

kubectl-ai generates kubectl commands from natural language but still needs you to execute them. k8sgpt diagnoses clusters using static analysis and AI summarization, but never executes commands. This MCP approach fuses both: a natural-language interface with validated tool execution and explicit safety gates.

What if Claude hallucinates a pod name?

The MCP server receives the exact pod name from Claude and queries the Kubernetes API. If the pod does not exist, the API returns a 404, which the server surfaces back to Claude. The agent can then ask the user for clarification. The tool layer validates reality; the AI does not need to hallucinate correctly.

Can I use this with other AI clients besides Claude?

The MCP protocol is open, though client support keeps expanding. As of mid-2025, Claude Desktop has the most mature MCP implementation. OpenAI’s Agents SDK has announced MCP compatibility, and several open-source clients are adding support.

Should I run the MCP server locally or in the cluster?

For personal use, running locally works. For team production, run it inside the cluster with a ServiceAccount and RBAC. This removes kubeconfig management and keeps network traffic internal.

Next Steps

You now have a foundational AI SRE agent that diagnoses Kubernetes issues through natural language. Here is where to go next:

Explore autonomous remediation workflows where we extend this pattern to safe, self-healing infrastructure.
Connect your MCP server to a self-hosted Ollama deployment if you want the reasoning layer to stay entirely on-premise.
Read the Agentic DevOps introduction to understand how agentic systems fit into the broader DevOps transformation.

If you build something interesting with this, share your workflow. I am especially curious about creative spins on the approval queue pattern.