Skip to content

Seed Data

Cyril Rohr edited this page Nov 13, 2020 · 3 revisions

To be useful, staging environments almost always require seed data to be injected into the staging database.

PullPreview will always preserve the state of your staging environment between deployments, so the only question left is how to initially load your data into the environment.

To do this you have the following approaches:

Seeding using a predefined set of objects

If you can test all your features from a limited set of predefined objects, you can take advantage of the seeding capabilities offered by your application framework (e.g. Rails seed data). In that case, you can simply add the seeding process as a one-off service in your Docker Compose configuration.

Example:

# docker-compose.yml
version: '3'
networks:
  - backend
  - frontend
services:
  db:
    image: postgres
    networks:
      - backend
  web:
    build: .
    command: bundle exec rails s
    ...
    networks:
      - backend
      - frontend
    depends_on:
      - db
      - seeder
  seeder:
    command: bundle exec rails db:seed
    restart: on-failure
    networks:
      - backend
    depends_on:
      - db

Notice the restart: on-failure restart policy for the seeder service, ensuring it is only run once.

Seeding using a dump from a production database

If your staging environment requires real data to be useful, then you have two solutions:

  1. Connect to the staging server and restore a dump after the first deployment:

Since you can have admin users allowed to SSH into the preview instances, one of them can upload a database dump on the server and then run a restore command after the environment is set up the first time.

For instance, assuming your Compose file has a db postgres service, you could do:

scp my-dump.gz ec2-user@SERVER_IP:/tmp/
zcat /tmp/my-dump.gz | docker-compose exec -u postgres db pg_restore -d DBNAME

Note that instead of uploading the dump file you could also fetch the file from e.g. S3 (the aws CLI is preinstalled on all servers). Which leads us to the second solution:

  1. Automatically fetch a database dump from S3 using your AWS credentials, and have a seeder service restore it:

To do this you would need to modify your PullPreview workflow file to do something like the following:

# .github/workflows/pullpreview.yml
name: PullPreview
on:
  push:
  pull_request:
    types: [labeled, unlabeled, closed]

jobs:
  deploy:
    name: Deploy
    runs-on: ubuntu-latest
    timeout-minutes: 30
    steps:
    - uses: actions/checkout@v2
    - name: Fetch dump
      env:
        AWS_ACCESS_KEY_ID: "${{ secrets.AWS_ACCESS_KEY_ID }}"
        AWS_SECRET_ACCESS_KEY: "${{ secrets.AWS_SECRET_ACCESS_KEY }}"
      run: |
        mkdir -p dumps/
        aws s3 cp s3://my-backup-bucket/latest-dump.gz > dumps/
    - uses: pullpreview/action@v3
      env:
        AWS_ACCESS_KEY_ID: "${{ secrets.AWS_ACCESS_KEY_ID }}"
        AWS_SECRET_ACCESS_KEY: "${{ secrets.AWS_SECRET_ACCESS_KEY }}"

And then have your seeder process be like:

# docker-compose.yml
...
services:
  ...
  seeder:
    image: postgres
    command: pg_restore -h db -d DBNAME /dumps/latest-dump.gz
    restart: on-failure
    volumes:
      - ./dumps:/dumps
    networks:
      - backend
    depends_on:
      - db

Note that starting with the v4 pullpreview action, you have access to an environment variable named PULLPREVIEW_FIRST_RUN, which will be set to true on the first deployment run. This way, you can easily skip the seeding on the next runs.