Kubernetes Workloads - StatefulSets, DaemonSets, Jobs & CronJobs

📊 StatefulSets Overview

StatefulSets manage stateful applications that require stable network identities, persistent storage, and ordered deployment/scaling.

Stable Identity

Each pod gets a persistent hostname that survives rescheduling

Ordered Operations

Pods are created, scaled, and deleted in a predictable order

Persistent Storage

Each pod can have its own persistent volume that survives pod restarts

When to Use StatefulSets

Use Case	Example	Key Requirement
Databases	MySQL, PostgreSQL, MongoDB	Data persistence, ordered startup
Message Queues	Kafka, RabbitMQ	Stable network identity for brokers
Distributed Systems	Elasticsearch, Cassandra	Cluster coordination, data sharding
Stateful Services	ZooKeeper, etcd	Leader election, consensus

Creating a StatefulSet

YAML

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-db
spec:
  serviceName: postgres-service
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:14
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_DB
          value: myapp
        - name: POSTGRES_USER
          value: admin
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: postgres-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 10Gi

💡 Pro Tip: StatefulSets require a Headless Service (ClusterIP: None) to manage network identities. Each pod gets a DNS name: <pod-name>.<service-name>.<namespace>.svc.cluster.local

📝 Ordered Deployment & Scaling

StatefulSet Deployment Order

Pods are created sequentially: postgres-0 → postgres-1 → postgres-2...

Deployment Strategies

YAML

spec:
  podManagementPolicy: OrderedReady  # Default: sequential
  # OR
  podManagementPolicy: Parallel      # All pods start simultaneously
  
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 2  # Only update pods with ordinal >= 2

⚠️ Important: With OrderedReady policy, if pod-1 fails to start, pods 2, 3, 4... won't be created until pod-1 is healthy.

🗄️ Database Workloads

MySQL StatefulSet Example

YAML

apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
spec:
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql-headless
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
      - name: init-mysql
        image: mysql:8.0
        command:
        - bash
        - "-c"
        - |
          set -ex
          # Generate mysql server-id from pod ordinal
          [[ `hostname` =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          echo [mysqld] > /mnt/conf.d/server-id.cnf
          echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
          # Copy config files
          cp /mnt/config-map/master.cnf /mnt/conf.d/
          cp /mnt/config-map/slave.cnf /mnt/conf.d/
        volumeMounts:
        - name: conf
          mountPath: /mnt/conf.d
        - name: config-map
          mountPath: /mnt/config-map
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
          initialDelaySeconds: 5
          periodSeconds: 2
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi

Database Best Practices

✅

Use Init Containers: Configure database replicas, set server IDs, and prepare configuration files before the main container starts.

✅

Implement Health Checks: Use readiness and liveness probes specific to your database to ensure proper health monitoring.

✅

Backup Strategy: Implement regular backups using CronJobs or dedicated backup operators.

🔧 DaemonSets Overview

DaemonSets ensure that all (or some) nodes run a copy of a pod. Perfect for node-level services like log collectors, monitoring agents, and network plugins.

🖥️

Node Coverage

Automatically deploys to all nodes

🔄

Auto-Scaling

Adds pods when new nodes join

🎯

Node Selection

Target specific nodes with selectors

Common DaemonSet Use Cases

Log Collection

Fluentd, Logstash, Filebeat running on every node to collect logs

Monitoring

Node exporters, Datadog agents, New Relic agents for metrics

Network

CNI plugins like Calico, Weave, Flannel for pod networking

Storage

Storage drivers and CSI plugins for volume provisioning

Creating a DaemonSet

YAML

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      tolerations:
      # Allow this pod to be scheduled on master nodes
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd-elasticsearch
        image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch.logging"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

Example Use Case: This Fluentd DaemonSet collects logs from all nodes and forwards them to Elasticsearch. It mounts host directories to access container logs.

🚀 Advanced DaemonSet Features

Node Selection

YAML

# Deploy only to nodes with SSD storage
spec:
  template:
    spec:
      nodeSelector:
        disktype: ssd
      
      # OR use node affinity for more complex rules
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.kubernetes.io/instance-type
                operator: In
                values:
                - m5.large
                - m5.xlarge

Update Strategy

YAML

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # Update one node at a time
      maxSurge: 0        # Don't create extra pods during update
  
  # For immediate updates (not recommended for production)
  updateStrategy:
    type: OnDelete     # Pods updated only when manually deleted

Priority and Preemption

YAML

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-daemonset
value: 1000000
globalDefault: false
description: "High priority class for critical DaemonSets"
---
spec:
  template:
    spec:
      priorityClassName: high-priority-daemonset

📦 Jobs Overview

Jobs create one or more pods and ensure they run to successful completion. Perfect for batch processing, data migrations, and one-time tasks.

🎯

Single Task

Run once and complete

🔄

Parallel Processing

Multiple pods working together

✅

Completion Tracking

Guaranteed execution to success

Job Patterns

Pattern	Completions	Parallelism	Use Case
Single Job	1	1	Database migration
Fixed Completion Count	N	1 to N	Process N items
Work Queue	null	N	Process until queue empty
Indexed Job	N	N	Parallel array processing

Basic Job Example

YAML

apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: myapp:latest
        command: ["python", "migrate.py"]
        env:
        - name: SOURCE_DB
          value: "postgresql://old-db:5432/myapp"
        - name: TARGET_DB
          value: "postgresql://new-db:5432/myapp"
      restartPolicy: Never
  backoffLimit: 4  # Retry 4 times before marking as failed
  activeDeadlineSeconds: 600  # Timeout after 10 minutes
  ttlSecondsAfterFinished: 86400  # Clean up after 24 hours

Parallel Processing Job

YAML

apiVersion: batch/v1
kind: Job
metadata:
  name: parallel-processing
spec:
  parallelism: 5           # Run 5 pods in parallel
  completions: 20          # Complete 20 successful runs total
  completionMode: Indexed  # Each pod gets a unique index (0-19)
  template:
    spec:
      containers:
      - name: worker
        image: batch-processor:latest
        env:
        - name: JOB_COMPLETION_INDEX
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
        command:
        - sh
        - -c
        - |
          echo "Processing batch $JOB_COMPLETION_INDEX"
          # Process data partition based on index
          python process.py --partition=$JOB_COMPLETION_INDEX --total=20
      restartPolicy: Never

💡 Pro Tip: Use completionMode: Indexed for embarrassingly parallel workloads where each pod processes a different subset of data.

🔄 Job Lifecycle Management

Handling Failures

YAML

spec:
  backoffLimit: 6  # Maximum retries (default: 6)
  
  # Exponential backoff for retries
  # 10s, 20s, 40s, 80s, 160s, 320s (max)
  
  # Pod failure policies (Kubernetes 1.25+)
  podFailurePolicy:
    rules:
    - action: Ignore  # Don't count toward backoffLimit
      onExitCodes:
        values: [1, 2, 3]  # Ignore these exit codes
    - action: FailJob  # Immediately fail the job
      onExitCodes:
        values: [42]  # Fatal error code
    - action: Count  # Normal counting (default)
      onPodConditions:
      - type: DisruptionTarget  # Pod was evicted

Clean Up Policies

YAML

# Automatic cleanup after completion
spec:
  ttlSecondsAfterFinished: 3600  # Delete after 1 hour
  
  # Different TTL for success/failure (proposed feature)
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1

⏰ CronJobs Overview

CronJobs create Jobs on a schedule using cron syntax. Perfect for backups, reports, maintenance tasks, and periodic data processing.

Cron Schedule Format

                            ┌───────────── minute (0-59)

                            │ ┌───────────── hour (0-23)

                            │ │ ┌───────────── day of month (1-31)

                            │ │ │ ┌───────────── month (1-12)

                            │ │ │ │ ┌───────────── day of week (0-6)

                            │ │ │ │ │

                            * * * * *

Common Cron Patterns

0 * * * *

Every hour at minute 0

*/15 * * * *

Every 15 minutes

0 2 * * *

Daily at 2:00 AM

0 0 * * 0

Weekly on Sunday at midnight

0 0 1 * *

Monthly on the 1st at midnight

Creating a CronJob

YAML

apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  timeZone: "America/New_York"  # Kubernetes 1.24+
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:14
            command:
            - /bin/bash
            - -c
            - |
              DATE=$(date +%Y%m%d_%H%M%S)
              pg_dump $DATABASE_URL > /backup/db_$DATE.sql
              aws s3 cp /backup/db_$DATE.sql s3://my-backups/postgres/
            env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: url
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: aws-credentials
                  key: access-key
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-credentials
                  key: secret-key
          restartPolicy: OnFailure
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  concurrencyPolicy: Forbid  # Don't run if previous job still running
  startingDeadlineSeconds: 300  # Skip if can't start within 5 minutes

🎯 Advanced CronJob Configuration

Concurrency Policies

Allow (default)

Multiple jobs can run concurrently

Forbid

Skip new job if previous is still running

Replace

Cancel current job and start new one

Report Generation Example

YAML

apiVersion: batch/v1
kind: CronJob
metadata:
  name: weekly-report
spec:
  schedule: "0 9 * * 1"  # Every Monday at 9 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: report-generator
            image: reporting-app:latest
            command: ["python", "generate_report.py"]
            args: ["--type=weekly", "--format=pdf"]
            volumeMounts:
            - name: reports
              mountPath: /reports
          - name: email-sender
            image: email-service:latest
            command: ["sh", "-c"]
            args:
            - |
              while [ ! -f /reports/weekly_report.pdf ]; do
                sleep 5
              done
              python send_email.py \
                --to=team@example.com \
                --subject="Weekly Report" \
                --attachment=/reports/weekly_report.pdf
            volumeMounts:
            - name: reports
              mountPath: /reports
          volumes:
          - name: reports
            emptyDir: {}
          restartPolicy: OnFailure

🚀 CronJob Best Practices

Optimization Tips

⚡

Idempotency: Ensure jobs can be safely re-run without side effects

⏱️

Timeouts: Set activeDeadlineSeconds to prevent stuck jobs

📊

Monitoring: Export metrics about job success/failure rates

🔒

Concurrency: Use Forbid policy for jobs that shouldn't overlap

Monitoring CronJobs

Bash

# List all cronjobs
kubectl get cronjobs

# View cronjob details
kubectl describe cronjob database-backup

# View job history
kubectl get jobs --selector=cronjob-name=database-backup

# Check last schedule time
kubectl get cronjob database-backup -o jsonpath='{.status.lastScheduleTime}'

# Manually trigger a cronjob
kubectl create job --from=cronjob/database-backup manual-backup-$(date +%s)

🎨 Workload Patterns & Best Practices

Choosing the Right Workload Type

Workload Type	Use When	Don't Use When	Example
Deployment	Stateless apps, web servers, APIs	Need stable network identity or storage	Nginx, Node.js app
StatefulSet	Databases, distributed systems	Stateless applications	MongoDB, Kafka
DaemonSet	Node-level services	Application workloads	Log collectors, monitoring
Job	One-time tasks, batch processing	Long-running services	Data migration, backup
CronJob	Scheduled recurring tasks	Event-driven tasks	Reports, cleanup

Combined Patterns

Pattern: Backup System

Combine StatefulSet (database) + CronJob (backups) + Job (restore)

StatefulSet runs PostgreSQL with persistent storage
CronJob performs daily backups to S3
Job restores from backup when needed

Pattern: Log Pipeline

Combine DaemonSet (collection) + Deployment (processing) + StatefulSet (storage)

DaemonSet runs Fluentd on all nodes
Deployment runs Logstash for processing
StatefulSet runs Elasticsearch cluster

Migration Patterns

YAML

# Pattern: Blue-Green Database Migration
---
# Step 1: Deploy new database version as StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-v2
spec:
  # ... new version configuration
---
# Step 2: Run migration job
apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  template:
    spec:
      initContainers:
      - name: wait-for-new-db
        image: busybox
        command: ['sh', '-c', 'until nc -z postgres-v2-0 5432; do sleep 1; done']
      containers:
      - name: migrate
        image: migrate-tool:latest
        command: ["./migrate.sh"]
---
# Step 3: Switch service to new version
apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  selector:
    app: postgres
    version: v2  # Update selector

Resource Management

⚠️ Important Considerations:

StatefulSets: Reserve enough resources for all replicas
DaemonSets: Account for one pod per node in resource planning
Jobs: Set resource limits to prevent runaway consumption
CronJobs: Consider overlap when setting resource requests

Monitoring & Observability

YAML

# Add Prometheus annotations for metrics
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"

# Common metrics to track:
# - StatefulSet: Ready replicas, persistent volume usage
# - DaemonSet: Node coverage, resource usage per node
# - Job: Success/failure rate, duration
# - CronJob: Schedule adherence, missed runs

🔧 Troubleshooting Guide

Common Issues and Solutions

StatefulSet Stuck

Symptom: Pods not creating in order

Solution: Check PVC binding, previous pod health

DaemonSet Not Scheduling

Symptom: Pods missing on some nodes

Solution: Check taints, tolerations, node selectors

Job Failing Repeatedly

Symptom: Backoff limit exceeded

Solution: Check logs, increase backoffLimit, fix script

CronJob Not Running

Symptom: Missed schedules

Solution: Check startingDeadlineSeconds, timezone

Debugging Commands

kubectl rollout status statefulset/mysql

Check StatefulSet rollout status

kubectl logs -l name=fluentd-elasticsearch --all-containers

View DaemonSet logs from all nodes

kubectl describe job data-migration

Check Job events and status

kubectl get events --sort-by='.lastTimestamp'

View recent cluster events

🎯 Production Checklist

Before Deploying to Production

☑️

Resource Limits: Set appropriate CPU/memory requests and limits

☑️

Health Checks: Configure liveness and readiness probes

☑️

Persistence: Test backup and restore procedures

☑️

Monitoring: Set up alerts for critical metrics

☑️

Security: Use secrets, RBAC, and network policies

☑️

Documentation: Document runbooks and recovery procedures