Edit this page | Blame

Orchestration and fallbacks

After the Penguin2 crash in Aug. 2022 it has become increasingly clear how hard it is to deploy GeneNetwork. GNU Guix helps a great deal with dependencies, but it does not handle orchestration between machines/services well. Also we need to look at the future.

What is GN today in terms of services

[X] Main GN2 server (Python, 20+ processes, 3+ instances: depends on all below)
[X] Matching GN3 server and REST endpoint (Python: less dependencies)
[X] Mariadb
[X] redis
[X] virtuoso (@aruni)
[X] GN-proxy (Racket, authentication handler: redis, mariadb)
[X] Alias proxy (Racket, gene aliases wikidata)
[X] opar server
[+] Jupyter, R-shiny and Julia notebooks, nb-hub server
[X] BNW server (@efraimf)
[+] UCSC browser (@efraimf)
[X] GN1 instances (older python, 12 instances in principle, 2 running today)
[ ] Access to HPC for GEMMA (coming)
[+] Backup services (sheepdog, rsync, borg)
[+] monitoring services (incl. systemd, gunicorn, shepherd, sheepdog)
[ ] mail server
[X] https certificates
[X] http(s) proxy (nginx)
[X] CI/CD services (with github webhooks)
[+] git server (gitea or cgit)
[X] file server (formerly IPFS)
[ ] SPARQL endpoint

Somewhat decoupled services:

[X] genecup
[X] R/shiny power service Dave
[ ] biohackrxiv
[ ] hegp
[ ] covid19
[ ] guix publish server (runs on penguin2, needs tux02 @efraimf)

I am still missing a few! All run by a man and his diligent dog.

For the future the orchestration needs to be more robust and resilient. This means:

A fallback for every service on a separate machine
Improved privacy protection for (future) human data
Separate servers serving different data sources
Partial synchronization between data sources

The only way we *can* scale is by adding machines. But the system is not yet ready for that. Also getting rid of monolithic primary databases in favor of files helps synchronization.