Build AI SRE Agent with MCP: Architecture Setup

This is Part 1 of a 3-part series on building an AI SRE agent with MCP. Part 2 covers server code and safety guardrails. Part 3 covers production deployment.

At 2:00 AM last quarter, a memory leak in staging spiraled into a cascading failure across three namespaces. The on-call engineer (read my story) spent forty minutes running the same diagnostic commands I had taught the team the week before: kubectl get pods, kubectl logs, kubectl describe deployment. By the time they found the root cause, the blast radius had already swallowed a production-adjacent service.

That incident forced a hard question: what if an AI agent could run those diagnostics in seconds, surface a precise summary, and escalate only when human judgment actually matters? Not a chatbot that guesses. An agent with tool use: the ability to execute real commands against real infrastructure and return structured results to a reasoning model.

What Is MCP?

MCP stands for Model Context Protocol. An open protocol that standardizes how AI applications connect to external data sources and tools, as defined in the MCP specification. Think of it as a USB-C port for AI applications: one standardized interface, unlimited peripherals.

An MCP server is a lightweight process that exposes tools (functions the AI can call), resources (data the AI can read), and prompts (reusable templates). The AI client (in our case, Claude Desktop) discovers these capabilities dynamically and invokes whichever tools the conversation requires.

A tool in MCP terms is a typed function with a name, description, and JSON Schema parameters. When the AI identifies a relevant tool, it dispatches a structured request to the MCP server, which executes the logic and returns the result.

What You Will Build

By the end of this guide, you’ll have:

A Python FastMCP server exposing Kubernetes diagnostic tools
Safety layers: read-only defaults, dry-run mode, and an approval queue
Claude Desktop integration for natural-language cluster diagnostics
A 4-layer architecture blueprint: observability → MCP → agent runtime → governance

Prerequisites

Requirement	Minimum	Recommended	Verify Command
Python	3.10	3.12	`python3 --version`
kubectl	1.28	1.30+	`kubectl version --client`
Kubernetes access	Read-only	Read-write (for remediation)	`kubectl auth can-i get pods`
Claude Desktop	Latest	Latest	Check app version
uv (optional)	0.4	0.5+	`uv --version`

I reach for uv for fast Python environment management, but pip and venv work just as well.

The 4-Layer Architecture

Before we write code, let me walk through the architecture I use for agentic SRE systems. I learned the hard way that bolting an AI onto kubectl without boundaries spells disaster.

┌─────────────────────────────────────────────────────────────┐
│  Layer 4: Governance & Human Oversight                      │
│  → Approval queues, audit logs, policy enforcement            │
├─────────────────────────────────────────────────────────────┤
│  Layer 3: Agent Runtime (Claude Desktop / API)              │
│  → Reasoning, planning, tool selection, response synthesis    │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: MCP Server (FastMCP + kubernetes client)          │
│  → Tool definitions, input validation, safety checks          │
├─────────────────────────────────────────────────────────────┤
│  Layer 1: Observability & Infrastructure                    │
│  → Kubernetes API, metrics, logs, traces                      │
└─────────────────────────────────────────────────────────────┘

Layer 1 is your cluster. The MCP server talks to the Kubernetes API through the official Kubernetes Python client. Layer 2 is where we validate inputs, enforce read-only defaults, and wrap every command in safety logic. Layer 3 is the AI itself (Claude in this case) which reasons about what the user wants and selects the right tools. Layer 4 is governance: audit trails, approval gates for destructive actions, and organizational policy.

Never skip Layer 4. I have watched demos where an AI agent gleefully deletes namespaces because a prompt was ambiguous. This is why agentic systems demand guardrails, not just capabilities.

FAQ

What is an MCP tool and how is it different from a regular API call?

An MCP tool is a typed function the AI client discovers dynamically at runtime. Unlike a regular API where the caller controls the request, the AI decides when to invoke which tool based on the conversation. The tool definition also serves as documentation that the AI reads to understand available capabilities.

Is the 4-layer architecture secure enough for production?

Yes, when you implement all four layers. Layer 4 provides audit trails, approval queues, and policy enforcement. Combined with read-only defaults and namespace allowlisting covered in Part 2, this system follows the principle of least privilege. Never skip governance.

Can I use this agent with other AI clients besides Claude?

MCP is an open protocol. While Claude Desktop has the most mature MCP support, OpenAI’s Agents SDK and several open-source clients are adding compatibility. The server you build will work with any MCP-compatible client.

What happens if the Kubernetes cluster is unreachable?

The MCP server returns the API error to Claude, which presents it to the user. The agent degrades gracefully by reporting the connection failure rather than attempting unsafe fallback behavior.

Continue to Part 2 where we write the server code, implement safety guardrails, and connect everything to Claude Desktop.