AI Pipeline Error Handling and Handoff Patterns

Part 3 of 3: Part 1 | Part 2*

Error Handling

No pipeline is immune to failures. LLM APIs have rate limits, downtime, and transient errors. I use three layers of error handling for every event-driven AI pipeline, plus two monitoring layers:

Dead Letter Queues (DLQ): Store events that fail all retries for manual review. I set max DLQ size to 10k events and alert when it exceeds 1k.
Exponential Backoff: Retry with increasing delays (1s, 2s, 4s, 8s) up to 5 retries. Add jitter (±30%) to prevent thundering herd problems.
Circuit Breakers: Stop calling LLM APIs if they return repeated 429 errors. I use a circuit breaker that opens after 5 consecutive 429s and closes after 60 seconds.
Alerting: Alert on DLQ size, high error rates, and LLM API latency.
Logging: Log all errors, retries, and DLQ events to Elasticsearch for debugging.

Here’s the DLQ and retry configuration I use with Redis Streams:

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-dlq-config
  namespace: ai-pipelines
data:
  dlq-max-len: "10000"
  retry-max-attempts: "5"
  retry-backoff-base: "1000"
  retry-backoff-jitter: "0.3"
  circuit-breaker-threshold: "5"
  circuit-breaker-timeout: "60000"
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: ai-pipeline-alerts
  namespace: ai-pipelines
spec:
  groups:
  - name: dlq
    rules:
    - alert: DLQSizeHigh
      expr: redis_stream_length{stream="dlq"} > 1000
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "DLQ size exceeds 1000 events"

n8n → Temporal Handoff Pattern

The handoff between n8n and Temporal is where most pipelines break. n8n is good for ingesting and transforming webhooks, but it’s not built for long-running workflow orchestration, complex retry logic, or state management. Temporal handles that. The handoff uses n8n’s HTTP Request node to call Temporal’s gRPC API, passing the event payload and dedup key.

I add three safeguards:

Retry Logic: n8n retries 3 times if Temporal API is unavailable.
Fallback Queue: If Temporal is down for more than 5 minutes, send events to a fallback Redis queue.
Authentication: Use mTLS between n8n and Temporal for production environments.

Here’s the n8n HTTP Request node config to trigger a Temporal workflow:

{
  "method": "POST",
  "url": "https://temporal-api.ai-pipelines.svc.cluster.local:7233/api/v1/namespaces/default/workflows",
  "headers": {
    "Content-Type": "application/json",
    "Authorization": "Bearer {{$env.TEMPORAL_API_KEY}}"
  },
  "body": {
    "workflowType": "processLLMWorkflow",
    "taskQueue": "ai-pipelines",
    "input": "{{$json.event}}",
    "workflowId": "llm-{{$json.dedupKey}}",
    "retryPolicy": {
      "maximumAttempts": 5,
      "initialInterval": "1s",
      "backoffCoefficient": 2
    }
  },
  "timeout": 10000,
  "retryOnFail": true,
  "maxRetries": 3,
  "fallback": {
    "action": "sendToQueue",
    "queue": "fallback-queue"
  }
}

FAQ

What’s the difference between at-least-once and at-most-once processing?

At-least-once guarantees an event is processed one or more times, so duplicates are possible. At-most-once guarantees 0 or 1 times, which means events may be lost. For AI pipelines, at-least-once with idempotency is the safe choice. Losing critical events like customer support tickets or payment notifications is worse than processing them twice.

Why use Redis Streams over a simple Redis list?

Redis Streams support consumer groups, which let multiple workers share the load, track pending messages, and recover from crashes. Simple lists don’t have these features. If a worker dies mid-processing with a simple list, that message is gone. I’ve seen this happen in production; it’s not pretty.

How do I handle LLM API rate limits?

Implement exponential backoff with jitter in your Temporal workflows and use a circuit breaker to stop calling the API if it returns repeated 429 errors. I use the temporal-circuit-breaker plugin for this in production. Set the circuit breaker to open after 5 consecutive 429s and close after 60 seconds for recovery.

What’s a dead letter queue and when should I use it?

A DLQ is a queue for events that failed processing after all retries are exhausted. Review DLQ events daily to fix bugs. Alert when DLQ length exceeds a threshold. I use 1k events as a trigger. If the DLQ grows faster than you can process it, you have a systemic issue that needs immediate attention.

Can I use n8n for the entire pipeline without Temporal?

For small volumes under 100 events per day, yes. For production workloads, n8n lacks native support for long-running workflows, complex retry logic, and stateful orchestration. I tried running n8n-only for a client pipeline processing 50k events/day. Within a week, we had event loss, timeout failures, and no way to recover mid-workflow states. Temporal solved all of that.

How do I monitor backpressure in my pipeline?

Track queue length, consumer lag (for Redis Streams, use XINFO GROUPS), and LLM API error rates. I use Prometheus and Grafana dashboards for this on every cluster I manage. When consumer lag exceeds 10k events, I know it’s time to scale workers.

How do I scale this architecture?

Add more Redis Streams shards, add more Temporal workers (horizontal pod autoscaling works well), and load-balance n8n ingress. I’ve scaled this architecture to 1.2M events/day by adding 5 Temporal workers and 3 Redis shards. The architecture is horizontally scalable by design.

Next Steps

Once your pipeline is built, run this checklist:

Webhook validation enabled with HMAC + IP allowlisting
Redis Streams deployed with persistence and monitoring
Idempotency keys implemented for all events
DLQ configured with alerts
Temporal workflows tested with retry logic
Load tested to 2x expected peak traffic

I’ve used this exact architecture for 6 production deployments. It’s never let me down. If you hit issues, check the Temporal and Redis docs first; they have good production tuning guides.

Parts in this series: ← Part 2