AI SRE Agent Production Deployment and Monitoring

2026.01.10
Technology
745 Words
AI SRE Agent Production Deployment and Monitoring

This is Part 3 of a 3-part series. Part 1 covered MCP and the 4-layer architecture. Part 2 walked through server code with safety guardrails. Here we deploy to production and explore advanced workflows.

Multi-Tool Workflows

MCP’s real power surfaces when Claude chains multiple tools together. Here is what that looks like in an actual conversation:

User: “The api-gateway deployment in production looks unhealthy. Can you investigate?”

Claude (internal reasoning):

  1. Call list_pods in production to find pods for api-gateway.
  2. Call describe_pod on the unhealthy pod.
  3. Call get_pod_logs with tail_lines=500 to see recent errors.
  4. Synthesize: “The pod api-gateway-7d9f4b2c-x1z9 is in CrashLoopBackOff. The logs show a connection timeout to the auth-service at 10.0.4.15:8080. The last successful health check was 12 minutes ago. Recommended next step: check if auth-service is reachable from the api-gateway pod.”

The entire diagnostic sequence completes in under five seconds. The human operator receives a precise summary and a recommendation; not a wall of raw log output. This chaining works because MCP gives Claude structured tool definitions with typed parameters that compose naturally.

Production Deployment

For production, I package the MCP server as a container and deploy it as a sidecar or standalone service in Kubernetes.

# Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN pip install uv && uv sync --no-dev
COPY server.py approval_queue.py ./
ENV K8S_MCP_MODE=readonly
ENV K8S_MCP_DRY_RUN=false
ENV PYTHONUNBUFFERED=1
USER 1000
CMD ["uv", "run", "python", "server.py"]

Deploy with a ServiceAccount that has explicitly scoped RBAC:

rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: k8s-mcp-server
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: k8s-mcp-readonly
namespace: default
rules:
- apiGroups: [""]
resources: ["pods", "pods/log", "events"]
verbs: ["get", "list"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: k8s-mcp-readonly
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: k8s-mcp-readonly
subjects:
- kind: ServiceAccount
name: k8s-mcp-server
namespace: monitoring

Need the restart capability? Create a separate Role with patch on deployments and bind it only after approval workflows are in place.

Troubleshooting

SymptomCauseSolution
kube_config.load_kube_config failsNo kubeconfig in environmentMount a kubeconfig Secret or use in-cluster config
Claude shows “No tools available”MCP server not startingCheck Claude Desktop logs; verify the JSON config path
PermissionError on read opsNamespace allowlist mismatchVerify K8S_MCP_NAMESPACES includes the target namespace
Logs truncated at 1000 linestail_lines hard limitUse kubectl logs directly for full historical logs
Approval queue not workingDirectory permissionsEnsure the approval directory is writable by the server process

FAQ

What is an MCP server, and why do I need one?

An MCP server is a lightweight service that exposes tools to an AI assistant through the Model Context Protocol. You need one because it gives the AI structured, validated access to your infrastructure instead of asking it to generate and execute raw shell commands.

Is FastMCP the only way to build MCP servers?

Not at all. Anthropic ships an official TypeScript SDK, and community implementations exist for Go and Rust. I use FastMCP (Python) because the Kubernetes Python client is mature, and most SRE tooling already lives in Python.

Can this agent actually break my cluster?

Only if you configure it that way. The default mode is readonly, namespace-restricted, and every write operation can be wrapped in an approval queue. Treat it like any other automation: start with least privilege and expand deliberately.

How does this compare to kubectl-ai or k8sgpt?

kubectl-ai generates kubectl commands from natural language but still needs you to execute them. k8sgpt diagnoses clusters using static analysis and AI summarization, but never executes commands. This MCP approach fuses both: a natural-language interface with validated tool execution and explicit safety gates.

What if Claude hallucinates a pod name?

The MCP server receives the exact pod name from Claude and queries the Kubernetes API. If the pod does not exist, the API returns a 404, which the server surfaces back to Claude. The agent can then ask the user for clarification. The tool layer validates reality; the AI does not need to hallucinate correctly.

Can I use this with other AI clients besides Claude?

The MCP protocol is open, though client support keeps expanding. As of mid-2025, Claude Desktop has the most mature MCP implementation. OpenAI’s Agents SDK has announced MCP compatibility, and several open-source clients are adding support.

Should I run the MCP server locally or in the cluster?

For personal use, running locally works. For team production, run it inside the cluster with a ServiceAccount and RBAC. This removes kubeconfig management and keeps network traffic internal.

Next Steps

You now have a foundational AI SRE agent that diagnoses Kubernetes issues through natural language. Here is where to go next:

If you build something interesting with this, share your workflow. I am especially curious about creative spins on the approval queue pattern.

# Mcp # ai-agent # sre # Kubernetes # agentic # DevOps