Temporal AI Workflows: Server Workers and Setup

2026.03.11
Technology
770 Words
Temporal AI Workflows: Server Workers and Setup

Part 2 of 3. Read Part 1 for Temporal’s architecture and prerequisites, then Part 3 for production patterns and CI/CD.

Architecture refresher from Part 1: Temporal runs four internal services: Frontend, History, Matching, Worker; coordinated via task queues. Workers poll queues, execute activities, and report results. Temporal persists state in PostgreSQL and indexes history in Elasticsearch. I have deployed this stack across production clusters and the architecture below survived real-world load testing. See Meet the Engineer.

Step 1: Create Secrets and ConfigMaps

Before any Temporal component touches your cluster, create the Kubernetes resources for secrets and configuration. Secrets hold everything you would never commit to git: PostgreSQL credentials, your OpenAI API key, and a base64-encoded encryption key for workflow data at rest. ConfigMaps handle environment-specific settings: advanced visibility in on-prem mode (Elasticsearch), 30-day history retention, and tuning for concurrent requests, persistence rate limiting, history cache sizing, and poll expiration. See the Kubernetes documentation for secret management best practices.

Apply these resources first. Every downstream component: database, server, and worker: needs them at boot time.

temporal-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: temporal-secrets
namespace: temporal
type: Opaque
stringData:
postgres-user: temporal
postgres-password: <your-secure-password>
openai-api-key: <your-openai-key>
temporal-encryption-key: <32-byte-base64-key>
---
# temporal-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: temporal-dynamic-config
namespace: temporal
data:
dynamic_config.yaml: |
system.advancedVisibilityWritingMode: "on-prem"
history.retentionInDays: 30
frontend.maxConcurrentLongRequests: 1000
worker.persistanceRateLimit: 1000
history.defaultCacheSize: 2048
matching.longPollExpirationInterval: "1m"
Terminal window
kubectl apply -f temporal-secrets.yaml
kubectl apply -f temporal-config.yaml

Step 2: Deploy Dependencies

Create a dedicated namespace, then deploy PostgreSQL for persistence and Elasticsearch for visibility.

Namespace

Create the temporal namespace with a matching label. All Temporal resources live here for clean organizational boundaries and simpler policy enforcement.

apiVersion: v1
kind: Namespace
metadata:
name: temporal
labels:
name: temporal

PostgreSQL Deployment

PostgreSQL powers Temporal’s persistence layer: workflow state, task queues, event history, timer data. Every execution, signal, and activity write lands here. For production, configure a persistent volume claim with 50Gi so data survives pod restarts. Refer to the PostgreSQL documentation for production tuning. Set resource requests at 500m CPU and 1Gi memory, limits at 1 CPU and 2Gi memory. The container runs as user 999 and uses pg_isready for liveness and readiness checks.

apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
namespace: temporal
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
securityContext:
runAsUser: 999
runAsGroup: 999
containers:
- name: postgres
image: postgres:16-alpine
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: temporal-secrets
key: postgres-user
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: temporal-secrets
key: postgres-password
- name: POSTGRES_DB
value: temporal
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
ports:
- containerPort: 5432
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
livenessProbe:
exec:
command: ["pg_isready", "-U", "temporal"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["pg_isready", "-U", "temporal"]
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: postgres-data
persistentVolumeClaim:
claimName: postgres-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
namespace: temporal
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: standard
---
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: temporal
spec:
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432

Elasticsearch Deployment

Elasticsearch drives Temporal’s visibility store: the searchable index of every workflow execution. Without it, you can only list by ID. With it, you filter by status, type, custom attributes, time ranges, and full-text search across event histories. Deploy a single-node instance with 100Gi persistent storage and a 4Gi memory limit for the JVM heap. Disable xpack.security.enabled for now; enable it with TLS in production. Consult the Elasticsearch guide for production configuration.

apiVersion: apps/v1
kind: Deployment
metadata:
name: elasticsearch
namespace: temporal
spec:
replicas: 1
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: elasticsearch:8.14.0
env:
- name: discovery.type
value: single-node
- name: xpack.security.enabled
value: "false"
- name: ES_JAVA_OPTS
value: "-Xms1g -Xmx1g"
- name: cluster.name
value: temporal-visibility
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
volumeMounts:
- name: es-data
mountPath: /usr/share/elasticsearch/data
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 1
memory: 4Gi
livenessProbe:
httpGet:
path: /_cluster/health
port: 9200
initialDelaySeconds: 60
periodSeconds: 10
volumes:
- name: es-data
persistentVolumeClaim:
claimName: es-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: es-pvc
namespace: temporal
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: standard
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: temporal
spec:
selector:
app: elasticsearch
ports:
- port: 9200
targetPort: 9200
name: http

Step 3: Deploy Temporal Server

Refer to the Temporal documentation for server configuration options. The Temporal server is the system’s brain. It bundles four internal services: Frontend (gRPC on 7233), History (state machine on 7234), Matching (task routing on 7235), and internal Worker (7239) into one process. Environment variables connect it to PostgreSQL and Elasticsearch. Dynamic configuration loads from the ConfigMap mounted at /etc/temporal/dynamic_config.yaml. Resource limits sit at 1 CPU and 2Gi memory, with a health check on the /health endpoint.

apiVersion: apps/v1
kind: Deployment
metadata:
name: temporal-server
namespace: temporal
spec:
replicas: 1
selector:
matchLabels:
app: temporal-server
template:
metadata:
labels:
app: temporal-server
spec:
containers:
- name: temporal
image: temporalio/server:1.25.0
env:
- name: DB
value: postgresql
- name: DB_PORT
value: "5432"
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: temporal-secrets
key: postgres-user
- name: POSTGRES_PWD
valueFrom:
secretKeyRef:
name: temporal-secrets
key: postgres-password
- name: POSTGRES_SEEDS
value: postgres
- name: POSTGRES_DB
value: temporal
- name: VISIBILITY_STORE
value: elasticsearch
- name: ELASTICSEARCH_SEEDS
value: elasticsearch
- name: ELASTICSEARCH_PORT
value: "9200"
- name: DYNAMIC_CONFIG_FILE_PATH
value: /etc/temporal/dynamic_config.yaml
- name: TEMPORAL_ENCRYPTION_KEY
valueFrom:
secretKeyRef:
name: temporal-secrets
key: temporal-encryption-key
ports:
- containerPort: 7233
name: frontend
- containerPort: 7234
name: history
- containerPort: 7235
name: matching
- containerPort: 7239
name: worker
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
volumeMounts:
- name: dynamic-config
mountPath: /etc/temporal
livenessProbe:
httpGet:
path: /health
port: 7233
initialDelaySeconds: 30
periodSeconds: 10
volumes:
- name: dynamic-config
configMap:
name: temporal-dynamic-config
---
apiVersion: v1
kind: Service
metadata:
name: temporal-server
namespace: temporal
spec:
selector:
app: temporal-server
ports:
- name: frontend
port: 7233
targetPort: 7233
- name: history
port: 7234
targetPort: 7234
- name: matching
port: 7235
targetPort: 7235

Step 4: Deploy Python Worker

Workers run your business logic. They poll Temporal’s task queues, pull activities, execute them, and report results. Below is a production-ready Python worker for AI pipelines with proper error handling, structured logging, and Prometheus metrics.

Worker Requirements

Install the Temporal Python SDK, Kubernetes client, OpenAI SDK, and Prometheus client.

temporalio==1.7.0
kubernetes==28.1.0
openai==1.30.0
prometheus-client==0.20.0

Worker Code

For patterns on integrating Temporal with event-driven architectures, see Event-Driven AI Pipelines. The worker defines four activities for an AI content pipeline. generate_text calls OpenAI’s GPT-4o with the user prompt. summarize_text condenses the output. validate_output checks minimum length thresholds. compensate_generate handles cleanup: deleting partial data; if anything goes wrong. The AIProcessingWorkflow class orchestrates them in sequence: generate → validate → summarize, with a try-except block that fires the compensation handler on summarization failure. Every activity uses exponential backoff starting at 1 second, doubling to 10 seconds, for a maximum of 3 attempts. Prometheus counters track completed workflows and per-activity executions.

import asyncio
import os
import logging
from temporalio.client import Client
from temporalio.worker import Worker
from temporalio import workflow
from temporalio.common import RetryPolicy
import openai
from prometheus_client import start_http_server, Counter
WORKFLOW_COUNTER = Counter('ai_workflows_completed', 'Completed AI workflows')
ACTIVITY_COUNTER = Counter('ai_activities_executed', 'Executed activities', ['activity'])
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
async def generate_text(prompt: str) -> str:
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
logger.info(f"Generating text for prompt: {prompt[:50]}...")
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
result = response.choices[0].message.content
ACTIVITY_COUNTER.labels(activity="generate_text").inc()
return result
async def summarize_text(text: str) -> str:
logger.info(f"Summarizing text of length {len(text)}...")
result = f"Summary: {text[:200]}..."
ACTIVITY_COUNTER.labels(activity="summarize_text").inc()
return result
async def validate_output(text: str) -> bool:
if len(text) < 10:
raise ValueError("Generated text too short")
return True
async def compensate_generate(text: str) -> None:
logger.info(f"Compensating: Cleaning up generated text {text[:50]}...")
ACTIVITY_COUNTER.labels(activity="compensate_generate").inc()
@workflow.defn
class AIProcessingWorkflow:
@workflow.run
async def run(self, prompt: str) -> str:
retry_policy = RetryPolicy(
maximum_attempts=3,
initial_interval=1,
maximum_interval=10,
backoff_coefficient=2.0
)
generated = await workflow.execute_activity(
generate_text, prompt,
retry_policy=retry_policy,
task_queue="ai-tasks",
start_to_close_timeout=60
)
await workflow.execute_activity(
validate_output, generated,
retry_policy=retry_policy,
task_queue="ai-tasks"
)
try:
summarized = await workflow.execute_activity(
summarize_text, generated,
retry_policy=retry_policy,
task_queue="ai-tasks",
start_to_close_timeout=30
)
except Exception as e:
await workflow.execute_activity(
compensate_generate, generated,
task_queue="ai-tasks"
)
raise e
WORKFLOW_COUNTER.inc()
return summarized
async def main():
start_http_server(8000)
client = await Client.connect("temporal-server.temporal.svc.cluster.local:7233")
worker = Worker(
client,
task_queue="ai-tasks",
workflows=[AIProcessingWorkflow],
activities=[generate_text, summarize_text, validate_output, compensate_generate]
)
logger.info("Worker started on ai-tasks queue, listening on :8000 for metrics...")
await worker.run()
if __name__ == "__main__":
asyncio.run(main())

Worker Deployment with HPA

Deploy the worker with a HorizontalPodAutoscaler that scales between 3 and 10 replicas. The HPA uses dual metrics: CPU utilization at 70% as the primary signal, and temporal_task_queue_depth above 10 per pod for burst scaling. This handles both sustained load and sudden spikes.

apiVersion: apps/v1
kind: Deployment
metadata:
name: temporal-ai-worker
namespace: temporal
spec:
replicas: 3
selector:
matchLabels:
app: temporal-ai-worker
template:
metadata:
labels:
app: temporal-ai-worker
spec:
containers:
- name: worker
image: your-registry/temporal-ai-worker:latest
env:
- name: TEMPORAL_HOST
value: "temporal-server.temporal.svc.cluster.local:7233"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: temporal-secrets
key: openai-api-key
ports:
- containerPort: 8000
name: metrics
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: temporal-ai-worker
namespace: temporal
spec:
selector:
app: temporal-ai-worker
ports:
- port: 8000
targetPort: 8000
name: metrics
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: temporal-ai-worker-hpa
namespace: temporal
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: temporal-ai-worker
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: temporal_task_queue_depth
target:
type: AverageValue
averageValue: "10"

Frequently Asked Questions

Why use raw manifests instead of Helm? Raw manifests give full control over resource definitions. Helm charts often abstract configurations you need to tune: PV sizes, security contexts, resource limits. For production, raw manifests are more transparent.

How do I secure PostgreSQL in production? Use strong passwords, enable TLS, restrict network access with NetworkPolicies. Never expose the database outside the cluster. See Securing AI Automation for details.

What happens if the Temporal server goes down? Workers cache state and reconnect on recovery. Workflow executions pause and resume automatically. State is not lost because Temporal persists everything to PostgreSQL.

Continue to Part 3 for persistence configuration, saga patterns, security hardening, monitoring with Prometheus, the n8n hybrid pattern, and CI/CD pipelines.

# Temporal # ai-workflows # Kubernetes # workflow-orchestration # python