Volumes & Storage

Master Kubernetes storage concepts including volumes, persistent volumes, storage classes, and StatefulSets for managing stateful applications at scale.

Storage Architecture in Kubernetes

Storage Abstraction Layers

Pod

Consumes Storage

Volume

Ephemeral/Persistent

PVC

Storage Request

PV

Physical Storage

Storage Type Lifecycle Use Case Example
Ephemeral Volume Tied to Pod lifecycle Temporary data, caches emptyDir, configMap, secret
Persistent Volume Independent of Pod Databases, file storage hostPath, NFS, AWS EBS
Dynamic Storage Created on-demand Cloud-native apps StorageClass + PVC
StatefulSet Storage Stable, unique per replica Distributed databases volumeClaimTemplates

Key Concepts

  • Volume: Directory accessible to containers in a pod
  • PersistentVolume (PV): Cluster resource for storage
  • PersistentVolumeClaim (PVC): Request for storage by a user
  • StorageClass: Template for dynamic PV provisioning
  • CSI: Container Storage Interface for storage plugins

Working with Volumes

Volume Types

emptyDir

Temporary directory that shares a pod's lifetime

  • Created when Pod is assigned to Node
  • Deleted when Pod is removed
  • Good for scratch space, caches

hostPath

Mounts file or directory from host node

  • Access to host filesystem
  • Security risks - use carefully
  • Node-specific data

NFS

Network File System mount

  • Shared across multiple pods
  • Persistent across pod restarts
  • Good for shared data

Cloud Volumes

Provider-specific storage

  • AWS EBS, Azure Disk
  • GCE Persistent Disk
  • Managed by cloud provider

ConfigMap/Secret

Configuration as volumes

  • Mount configs as files
  • Dynamic updates possible
  • Read-only access

Downward API

Pod/container fields as files

  • Expose pod metadata
  • Resource limits/requests
  • Annotations and labels

Volume Examples

emptyDir Volume
apiVersion: v1
kind: Pod
metadata:
  name: cache-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: cache-volume
      mountPath: /cache
    
  - name: cache-warmer
    image: busybox
    command: ['sh', '-c', 'echo "Cache warmed" > /cache/ready']
    volumeMounts:
    - name: cache-volume
      mountPath: /cache
      
  volumes:
  - name: cache-volume
    emptyDir:
      sizeLimit: 1Gi  # Optional size limit
      medium: Memory  # Optional: use RAM instead of disk
hostPath Volume
apiVersion: v1
kind: Pod
metadata:
  name: hostpath-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: hostpath-volume
      mountPath: /usr/share/nginx/html
      
  volumes:
  - name: hostpath-volume
    hostPath:
      path: /data/nginx-html
      type: DirectoryOrCreate  # Create if doesn't exist
      # Other types: Directory, File, Socket, CharDevice, BlockDevice
Multi-Volume Pod
apiVersion: v1
kind: Pod
metadata:
  name: multi-volume-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: config
      mountPath: /etc/config
      readOnly: true
    - name: secrets
      mountPath: /etc/secrets
      readOnly: true
    - name: data
      mountPath: /data
    - name: cache
      mountPath: /cache
    - name: podinfo
      mountPath: /etc/podinfo
      
  volumes:
  - name: config
    configMap:
      name: app-config
      
  - name: secrets
    secret:
      secretName: app-secrets
      defaultMode: 0400
      
  - name: data
    persistentVolumeClaim:
      claimName: data-pvc
      
  - name: cache
    emptyDir:
      sizeLimit: 2Gi
      
  - name: podinfo
    downwardAPI:
      items:
      - path: "labels"
        fieldRef:
          fieldPath: metadata.labels
      - path: "annotations"
        fieldRef:
          fieldPath: metadata.annotations
      - path: "cpu_limit"
        resourceFieldRef:
          containerName: app
          resource: limits.cpu

Volume Considerations

  • emptyDir data is lost when pod is deleted
  • hostPath poses security risks - avoid in production
  • Cloud volumes may have zone restrictions
  • Volume mounts are atomic - all or nothing

Persistent Volumes & Claims

PV/PVC Lifecycle

Provisioning
Binding
Using
Reclaiming

Creating Persistent Volumes

PersistentVolume Definition
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-nfs
  labels:
    type: nfs
    environment: production
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany  # RWX - many nodes can mount for read/write
    # - ReadWriteOnce  # RWO - single node can mount for read/write
    # - ReadOnlyMany   # ROX - many nodes can mount for read-only
  persistentVolumeReclaimPolicy: Retain  # What happens when PVC is deleted
    # Retain - manual reclamation
    # Recycle - basic scrub (rm -rf /volume/*)
    # Delete - delete volume (AWS EBS, GCE PD, Azure Disk)
  storageClassName: nfs-storage
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    server: nfs-server.example.com
    path: /exported/path
PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: nfs-storage
  selector:  # Optional: select specific PV
    matchLabels:
      environment: production
    matchExpressions:
      - key: type
        operator: In
        values: [nfs, local]
Using PVC in Pod
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-with-storage
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        volumeMounts:
        - name: data-volume
          mountPath: /var/lib/myapp
        - name: shared-data
          mountPath: /shared
          
      volumes:
      - name: data-volume
        persistentVolumeClaim:
          claimName: data-pvc
          
      - name: shared-data
        persistentVolumeClaim:
          claimName: shared-pvc
          readOnly: true  # Mount as read-only

Volume Expansion

Expanding PVC
# StorageClass must have allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: expandable-storage
provisioner: kubernetes.io/aws-ebs
allowVolumeExpansion: true
parameters:
  type: gp2
---
# Edit PVC to request more storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: expandable-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi  # Increased from 50Gi
  storageClassName: expandable-storage

PV/PVC Best Practices

  • Use dynamic provisioning with StorageClasses when possible
  • Set appropriate reclaim policies based on data sensitivity
  • Use labels and selectors for PV/PVC matching
  • Monitor PV usage and set up alerts for capacity
  • Test backup and restore procedures regularly
  • Consider using CSI drivers for better portability

Storage Classes & Dynamic Provisioning

Storage Class Overview

StorageClasses enable dynamic provisioning of PersistentVolumes, eliminating the need to pre-create PVs manually.

AWS EBS StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  iopsPerGB: "10"
  fsType: ext4
  encrypted: "true"
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer  # Delay binding until Pod creation
mountOptions:
  - debug
  - noatime
GKE StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd-regional
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  replication-type: regional-pd
  zones: us-central1-a,us-central1-b
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
Local Storage Class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
# Local PV must be created manually
apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/disks/ssd1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - node-1

CSI Drivers

CSI Driver Example (AWS EFS)
# Install AWS EFS CSI Driver
# kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-92107410
  directoryPerms: "700"
  gidRangeStart: "1000"
  gidRangeEnd: "2000"
  basePath: "/dynamic_provisioning"
---
# PVC using EFS
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-claim
spec:
  accessModes:
    - ReadWriteMany  # EFS supports RWX
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi  # EFS is elastic, this is for quota
Storage Type Access Mode Performance Use Case
AWS EBS RWO High IOPS Databases, single-node apps
AWS EFS RWX Variable Shared storage, CMS
Azure Disk RWO Premium SSD High-performance workloads
Azure Files RWX Standard File shares, legacy apps
GCE PD RWO/ROX SSD/Standard General purpose
Local SSD RWO Ultra-high Caching, temp processing

Volume Binding Modes

  • Immediate: PV is bound to PVC immediately upon creation
  • WaitForFirstConsumer: Binding delayed until Pod using PVC is scheduled

WaitForFirstConsumer is recommended for topology-constrained storage (zones, regions)

StatefulSets & Stateful Applications

Stable Identity

Ordered, unique Pod names

Stable Storage

Persistent volumes per replica

Stable Network

Predictable DNS names

Ordered Operations

Sequential deployment/scaling

StatefulSet Example

MySQL StatefulSet
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
spec:
  clusterIP: None  # Headless service for StatefulSet
  selector:
    app: mysql
  ports:
  - port: 3306
    name: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql-headless
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
      - name: init-mysql
        image: mysql:8.0
        command:
        - bash
        - "-c"
        - |
          set -ex
          # Generate mysql server-id from pod ordinal index
          [[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          echo [mysqld] > /mnt/conf.d/server-id.cnf
          echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
          # Copy appropriate conf.d files from config-map to emptyDir
          if [[ $ordinal -eq 0 ]]; then
            cp /mnt/config-map/primary.cnf /mnt/conf.d/
          else
            cp /mnt/config-map/replica.cnf /mnt/conf.d/
          fi
        volumeMounts:
        - name: conf
          mountPath: /mnt/conf.d
        - name: config-map
          mountPath: /mnt/config-map
      
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
          initialDelaySeconds: 5
          periodSeconds: 2
      
      volumes:
      - name: conf
        emptyDir: {}
      - name: config-map
        configMap:
          name: mysql-config
  
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 100Gi
MongoDB ReplicaSet with StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongodb
spec:
  serviceName: mongodb-service
  replicas: 3
  selector:
    matchLabels:
      app: mongodb
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mongodb
        image: mongo:5.0
        command:
        - mongod
        - "--replSet"
        - rs0
        - "--bind_ip"
        - "0.0.0.0"
        ports:
        - containerPort: 27017
        volumeMounts:
        - name: mongo-data
          mountPath: /data/db
        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          value: admin
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: password
      
      # Sidecar container for replica set configuration
      - name: mongo-sidecar
        image: cvallance/mongo-k8s-sidecar
        env:
        - name: MONGO_SIDECAR_POD_LABELS
          value: "app=mongodb"
        - name: KUBERNETES_MONGO_SERVICE_NAME
          value: "mongodb-service"
        - name: MONGODB_USERNAME
          value: admin
        - name: MONGODB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: password
        - name: MONGODB_DATABASE
          value: admin
  
  volumeClaimTemplates:
  - metadata:
      name: mongo-data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 50Gi

StatefulSet Operations

Managing StatefulSets
# Scale StatefulSet
kubectl scale statefulset mysql --replicas=5

# Rolling update
kubectl set image statefulset/mysql mysql=mysql:8.0.30

# Delete StatefulSet (keeps PVCs)
kubectl delete statefulset mysql --cascade=orphan

# Delete PVCs
kubectl delete pvc data-mysql-0 data-mysql-1 data-mysql-2

# Get pod names (predictable)
kubectl get pods -l app=mysql
# mysql-0, mysql-1, mysql-2

# Access specific pod
kubectl exec mysql-1 -- mysql -u root -p

# DNS names for pods
# ...svc.cluster.local
# mysql-0.mysql-headless.default.svc.cluster.local

StatefulSet Best Practices

  • Always use a headless service for network identity
  • Use init containers for initialization logic
  • Implement proper readiness/liveness probes
  • Use podAntiAffinity for high availability
  • Plan for backup and disaster recovery
  • Test scaling operations thoroughly
  • Monitor persistent volume usage

Hands-On Practice

Exercise 1: WordPress with MySQL

Deploy WordPress with MySQL using persistent storage.

Requirements:

  1. Create a StorageClass for dynamic provisioning
  2. Deploy MySQL with persistent storage
  3. Deploy WordPress connected to MySQL
  4. Both should use PVCs for data persistence
  5. Test data persistence across pod restarts
# storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
allowVolumeExpansion: true
---
# mysql-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: standard
---
# wordpress-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: wordpress-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard
---
# mysql-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: rootpass
        - name: MYSQL_DATABASE
          value: wordpress
        - name: MYSQL_USER
          value: wordpress
        - name: MYSQL_PASSWORD
          value: wordpresspass
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-storage
        persistentVolumeClaim:
          claimName: mysql-pvc
---
# mysql-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
  clusterIP: None
---
# wordpress-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
spec:
  selector:
    matchLabels:
      app: wordpress
  template:
    metadata:
      labels:
        app: wordpress
    spec:
      containers:
      - name: wordpress
        image: wordpress:latest
        env:
        - name: WORDPRESS_DB_HOST
          value: mysql
        - name: WORDPRESS_DB_USER
          value: wordpress
        - name: WORDPRESS_DB_PASSWORD
          value: wordpresspass
        - name: WORDPRESS_DB_NAME
          value: wordpress
        ports:
        - containerPort: 80
        volumeMounts:
        - name: wordpress-storage
          mountPath: /var/www/html
      volumes:
      - name: wordpress-storage
        persistentVolumeClaim:
          claimName: wordpress-pvc
---
# wordpress-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: wordpress
spec:
  type: LoadBalancer
  selector:
    app: wordpress
  ports:
  - port: 80
    targetPort: 80

Exercise 2: Elasticsearch Cluster

Deploy a 3-node Elasticsearch cluster using StatefulSet.

Tasks:

  1. Create a headless service for cluster discovery
  2. Deploy Elasticsearch as a StatefulSet
  3. Configure persistent storage for each node
  4. Set up proper cluster formation
  5. Verify cluster health
# elasticsearch-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
spec:
  clusterIP: None
  selector:
    app: elasticsearch
  ports:
  - name: rest
    port: 9200
  - name: transport
    port: 9300
---
# elasticsearch-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      initContainers:
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
        env:
        - name: cluster.name
          value: es-cluster
        - name: node.name
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: discovery.seed_hosts
          value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch"
        - name: cluster.initial_master_nodes
          value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
        - name: ES_JAVA_OPTS
          value: "-Xms512m -Xmx512m"
        ports:
        - containerPort: 9200
          name: rest
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        readinessProbe:
          httpGet:
            path: /_cluster/health
            port: 9200
          initialDelaySeconds: 30
          periodSeconds: 10
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 30Gi

Challenge: Multi-Tier Application

Design and deploy a complete multi-tier application with proper storage architecture.

Requirements:

  1. Frontend: React app with nginx (ephemeral storage for cache)
  2. Backend: Node.js API (ConfigMap for config, Secret for API keys)
  3. Database: PostgreSQL with replication (StatefulSet with PVCs)
  4. Cache: Redis cluster (StatefulSet)
  5. File Storage: Shared NFS for uploaded files
  6. Implement backup strategy for database