Edit this page | Blame

Fallbacks and backups

As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story).

See also

Tags

  • type: enhancement
  • assigned: pjotrp
  • keywords: systems, fallback, backup, deploy
  • status: in progress
  • priority: critical

Tasks

  • [.] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services
  • [X] /etc /home/shepherd backups for Octopus
  • [X] /etc /home/shepherd backups for P2
  • [X] Get backups running again on fallback
  • [ ] fix redis queue for P2 - needs to be on rabbit
  • [ ] fix bacchus large backups
  • [ ] backup octopus01:/lizardfs/backup-pangenome on bacchus

Backup and restore

We are using borg for backing up data. Borg is excellent at deduplication and compression of data and is pretty fast too. Incremental copies work with rsync - so that is fast. To restore the full MariaDB database from a local borg repo takes a few minutes:

wrk@epysode:/export/restore_tux01$ time borg extract -v /export2/backup/tux01/borg-tux01::BORG-TUX01-MARIADB-20210829-04:20-Sun
real    17m32.498s
user    8m49.877s
sys     4m25.934s

This all contrasts heavily with restoring 300GB from Amazon S3.

Next restore the GN2 home dir

root@epysode:/# borg extract export2/backup/tux01/borg-genenetwork::TUX01_BORG_GN2_HOME-20210830-04:00-Mon

Get backups running on fallback

Recently epysode was reinstated after hardware failure. I took the opportunity to reinstall the machine. The backups are described in the repo (genenetwork org members have access)

As epysode was one of the main sheepdog messaging servers I need to reinstate:

  • [X] scripts for sheepdog
  • [X] enable trim
  • [X] reinstate monitoring web services
  • [X] reinstate daily backup from penguin2
  • [X] CRON
  • [X] make sure messaging works through redis
  • [X] fix and propagate GN1 backup
  • [X] fix and propagate IPFS and gitea backups
  • [X] add GN1 backup
  • [X] add IPFS backup
  • [X] other backups
  • [ ] email on fail

Tux01 is backed up now. Need to make sure it propagates to

  • [X] P2
  • [X] epysode
  • [X] rabbit
  • [X] Tux02
  • [ ] bacchus
(made with skribilo)