A database without a backup is not any better than having no database at all. You never know when your server might fail or you accidentally push schema changes that delete all existing data (ask me how I know 😅). This article explores a simple yet robust solution for automating PostgreSQL backups using Kubernetes CronJobs to dump a complete postgres instance and uploads the dump to Backblaze B2 cloud storage storage.

The same approach works with other storage options like AWS S3, MinIO (if you want to stay all OSS). B2 happens to be quite cheap and object-storage in general works great for my approach.

CronJobs

Kubernetes CronJobs provide a flexible way to schedule tasks at fixed intervals, making them ideal for automated processes (like backups). They build upon Kubernetes Jobs and add the ability to run at regular intervals. For scheduling, CronJobs us the beloved Cron format to define when the job should run. This format is so great, that I always have to refer to crontab.guru to understand what I’m doing…

Anyway, let’s look at a simple CronJob manifest.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: test-cron-job
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: test
            image: busybox:latest
            imagePullPolicy: IfNotPresent
            command: ["/bin/sh", "-c"]
            args:
              - | 
                date
                echo "Hello, World!"
          restartPolicy: OnFailure

This CronJob runs every minute and prints the current date and “Hello, World!” to the logs. If you’re already familiar with ’normal’ Jobs, you’ll notice that the CronJob only adds the schedule field and the jobTemplate field (which in turn contains the Job spec).

CronJob for Prostgres Backups

As you’ll shortly see, the CronJob for PostgreSQL backups is a bit more complex than the simple example above. To make digesting it easier I’ll go over the most important parts before showing the full manifest.

  1. Schedule: The CronJob is scheduled to run at 2 AM every day. That’s often enough and at a time where my database usually doesn’t experience any load. You might want to adjust this to your needs.

    schedule: "0 2 * * *"
    
  2. Dumping postgres: This container uses the latest posgres image (which contains all the postgres tools we need) to dump the database. Setting the environment variable PGPASSWORD allows our default user postgres to connect to the database.

    In this simple example, I’ve hardcoded the user and the name of the database service in our cluster. You might want to extract those values into a secret or configmap. The password is already sourced from a secret. If you look closely, you notice that this will only dump the databases, run the dump through gzip and store it in a volume mounted at /mnt/backup. No uploading to B2 yet.

    containers:
      - name: postgres-backup
        image: postgres:latest
        env:
          - name: PGPASSWORD
            valueFrom:
              secretKeyRef:
                name: postgres-secret
                key: POSTGRES_PASSWORD
        command: ["/bin/sh", "-c"]
        args:
          - |
            BACKUP_FILE="pg_backup_$(date +%Y%m%d_%H%M%S).sql.gz"
            echo "Creating backup $BACKUP_FILE..."
            pg_dumpall -h postgres.default.svc.cluster.local -U postgres | gzip > /mnt/backup/$BACKUP_FILE        
        volumeMounts:
          - name: backup-storage
            mountPath: /mnt/backup
    
  3. Uploading dump: Since uploading to B2 is easiest when using the b2 (or b2v4) CLI tool and this is not present in the postgres image, I added a second container that takes the dump and uploads it. This container is configured with an application key and its id as well as the bucket name to upload to. All these values are sourced from a secret. The actual script is quite simple: it takes the latest dump from the shared backup volume and pushes it to B2 using the CLI.

    containers:
      ... 
      - name: b2-uploader
        image: backblazeit/b2:latest
        env:
          - name: B2_APPLICATION_KEY_ID
            valueFrom:
              secretKeyRef:
                name: db-backup-secret
                key: access_key_id
          - name: B2_APPLICATION_KEY
            valueFrom:
              secretKeyRef:
                name: db-backup-secret
                key: application_key
          - name: B2_BUCKET_NAME
            valueFrom:
              secretKeyRef:
                name: db-backup-secret
                key: bucket_name
        command: ["/bin/sh", "-c"]
        args:
          - |
            BACKUP_FILE=$(ls /mnt/backup/pg_backup_*.sql.gz | sort -r | head -n1)
            b2v4 file upload $B2_BUCKET_NAME "$BACKUP_FILE" "$(basename $BACKUP_FILE)"        
        volumeMounts:
          - name: backup-storage
            mountPath: /mnt/backup
    

Putting it all together

One crucial piece is sill missing to make this work: we need to coordinate the two containers so that the uploader only starts once the dump is finished. The simples solution I could think of was to write a special file to the shared volume once the dump is completed and have the uploader wait for this file to appear using a simple while loop.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: postgres-backup
              image: postgres:latest
              env:
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: postgres-secret
                      key: POSTGRES_PASSWORD_ROOT
              command: ["/bin/sh", "-c"]
              args:
                - |
                  BACKUP_FILE="pg_backup_$(date +%Y%m%d_%H%M%S).sql.gz"
                  echo "Creating backup $BACKUP_FILE..."
                  pg_dumpall -h postgres.default.svc.cluster.local -U postgres | gzip > /mnt/backup/$BACKUP_FILE
                  echo "Backup completed."
                  # signal the next container that the dump is completed
                  touch /mnt/backup/.done                  
              volumeMounts:
                - name: backup-storage
                  mountPath: /mnt/backup
            - name: b2-uploader
              image: backblazeit/b2:latest
              env:
                - name: B2_APPLICATION_KEY_ID
                  valueFrom:
                    secretKeyRef:
                      name: db-backup-secret
                      key: access_key_id
                - name: B2_APPLICATION_KEY
                  valueFrom:
                    secretKeyRef:
                      name: db-backup-secret
                      key: application_key
                - name: B2_BUCKET_NAME
                  valueFrom:
                    secretKeyRef:
                      name: db-backup-secret
                      key: bucket_name
              command: ["/bin/sh", "-c"]
              args:
                - |
                  # Check every 5 seconds if the dump is completed
                  while [ ! -f /mnt/backup/.done ]; do
                    sleep 5  
                  done
                  BACKUP_FILE=$(ls /mnt/backup/pg_backup_*.sql.gz)
                  b2v4 file upload $B2_BUCKET_NAME "$BACKUP_FILE" "$(basename $BACKUP_FILE)"                  
              volumeMounts:
                - name: backup-storage
                  mountPath: /mnt/backup
          restartPolicy: OnFailure
          volumes:
            - name: backup-storage
              emptyDir: {}

Conclusion

Implementing an automated backup solution using CronJobs like shown above can help to increase resilience and allow for quick recovery in case of data loss. Building upon this foundation, it’s not rocket science to add an init script to your database that checks for backup availability and restores the latest dump if necessary. You could also move restoring to a init container (which is what I’d recommend, since you can again use the b2 image to download the dump). Or you could add features like only keeping the last n backups or sending notifications if backups fail.