- Genotype database
document created on Jun 03 2022 by Arun Isaac, last updated on Apr 27 2024 by Pjotr Prins
Database layout
genodb is an immutable functional database built on the LMDB key-value store. An immutable database may sound like an oxymoron, but is indeed possible and practical. More precisely...
- Setting up Local Development Database
document created on Aug 19 2022 by Frederick Muriuki Muriithi, last updated on Aug 01 2024 by Arun Isaac
...--protocol tcp -u root
```
Create a database db_webqtl_s
```
MariaDB [mysql]> CREATE DATABASE db_webqtl_s;
```
Load the small database dump into the database. You may find the small database either...
- MariaDB
document created on May 10 2025 by Pjotr Prins, last updated on May 14 2025 by Frederick Muriuki Muriithi
...of the database running on production. We do this by restoring backups of the production database into MariaDB database directory. Here's how.
Backups are managed using Borg as the ibackup user. First...
- Invoking SQLite3: CLI
document created on Mar 30 2024 by Frederick Muriuki Muriithi
...modes, do:
```
$ sqlite3
SQLite version 3.40.0 2022-11-16 12:10:08
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
- Setting Up or Migrating Production Across Machines
document created on Apr 17 2025 by Frederick Muriuki Muriithi, last updated on Oct 07 2025 by Frederick Muriuki Muriithi
...the database.
### /var/lib/genenetwork/sqlite/genenetwork3
This binding must be READWRITE within the container.
This stores various SQLite databases in use with GN3. These are:
* Database for the GNQA...
- Installation
document created on Jun 18 2023 by Pjotr Prins, last updated on Feb 24 2025 by Pjotr Prins
...no
Restart=on-abort
RestartSec=15s
UMask=007
PrivateTmp=false
```
## Load the small database in MySQL
Currently we have two databases for deployment,
'db_webqtl_s' is the small testing database...
- Virtuoso
document created on Aug 27 2021 by Pjotr Prins, last updated on Aug 20 2025 by Pjotr Prins
...-database" repository
=> https://github.com/genenetwork/dump-genenetwork-database/ Dump Genenetwork Database
See the README for instructions.
For the public GN endpoint visit
=> https://sparql...
- Update production checklist
document created on May 10 2025 by Pjotr Prins, last updated on Sep 13 2025 by Pjotr Prins
...people have a minimal profile.
# Check database
* [X] Install mariadb
* [X] Recover database
* [X] Test permissions
* [X] Mariadb update my.cnf
Basically recover the database from a backup is the best...
- LMDB Phenotype/Genotype Store
document created on Jun 22 2023 by Munyoki Kilyungi, last updated on Jun 30 2023 by Alexander_Kabui
...bindings for lmdb for important stuff
* [B] Using hashes to track updates on database---proposal
Alex:
* Fetching all the phenotype data from the database using sql and genotype file
* Using LMBD...
- Developer links for
document created on May 06 2022 by Arun Isaac, last updated on Oct 16 2022 by Pjotr Prins
.../archive/dump-genenetwork-database/latest/sql.svg Continuous SQL schema visualization
=> https://ci.genenetwork.org/archive/dump-genenetwork-database/latest/rdf.svg Continuous RDF schema visualization
- Deploying gn-auth
document created on Mar 04 2024 by Frederick Muriuki Muriithi, last updated on Mar 12 2024 by Frederick Muriuki Muriithi
...yoyo apply --database sqlite:////home/fredm/auth-run-migrations.db ./migrations/auth/
[20221103_01_js9ub-initialise-the-auth-entic-oris-ation-database]
Shall I apply this migration? [Ynvdaqjk?]: Y...
- Synchronising the Different Environments
document created on Mar 12 2025 by Frederick Muriuki Muriithi
...Automate? Will probably need some checks for data sanity.
### Authorisation Database
* [ ] TODO: Describe process
* Copy backup from production
* Update/replace GN2 client configs in database
* What...
- Fire up system container for GN
document created on Mar 02 2024 by Pjotr Prins, last updated on Apr 05 2024 by Pjotr Prins
...see there is no GN database yet.
```
/gnu/store/xj4bfqch8zs3sfzvj65ykbvnpprwaj7f-mariadb-10.10.2/bin/mysql -e 'show databases'
```
mariadb initialized a new database in /var/lib/msyql. We need to stop...
- CLI Utility Scripts
document created on May 29 2023 by Frederick Muriuki Muriithi, last updated on May 30 2023 by Frederick Muriuki Muriithi
...auth(entic|oris)ation database and the MariaDB database.
You could also run the script directly with:
```sh
python3 -m scripts.migrate_existing_data AUTHDBPATH MYSQLDBURI
```
where `AUTHDBPATH` and...
- GeneNetwork Uploader Requirements
document created on Feb 22 2024 by Frederick Muriuki Muriithi, last updated on Feb 22 2024 by Frederick Muriuki Muriithi
...database. This implies use of a data staging area, or even a separate testing database to hold the data. There might need to be a GeneNetwork system with access to the staging area or testing database...
- GeneNetwork SQL Database to RDF
document created on Mar 24 2023 by Pjotr Prins, last updated on Apr 04 2023 by Arun Isaac
GeneNetwork SQL Database to RDF
We use RDF in virtuoso to handle metadata for GN using
=> https://github.com/genenetwork/dump-genenetwork-database
See also
=> ../systems/virtuoso
- Improving RIF+WIKI Search
document created on Jul 04 2024 by Munyoki Kilyungi, last updated on Jul 05 2024 by Munyoki Kilyungi
...a SATA drive, compacting 3 different databases which had been compacted from 50 different databases was significantly faster than compacting one database at once from 150 different databases. The...
- Types of Data in GeneNetwork
document created on Jul 02 2024 by Frederick Muriuki Muriithi, last updated on Jul 10 2024 by Frederick Muriuki Muriithi
...`ProbeSet*` database tables (and other closely related tables like the `Tissue*` tables - fred added this: verify).
These could be saved in the database in a log-tranformed form - verify.
How do you...
- Xapian indexing
document created on Oct 30 2022 by Arun Isaac, last updated on Sep 20 2024 by Munyoki Kilyungi
.../database/setting-up-local-development-database
and load up the backup file using:
> mariadb gn2 < /path/to/backup/file.sql
A backup file can be generated using:
> mysqldump -u mysqluser -pmysqlpasswd...
- GeneNetwork Hacking Documentation
document created on Mar 11 2022 by Frederick Muriuki Muriithi, last updated on Jul 18 2022 by Frederick Muriuki Muriithi
...the database.
* What are they?
Groups are linked to the Species.
### Studies
Stored in the *ProbeFreeze* table in the database.
Linked to platforms (ChipId), groups (InbredSetId) and tissues (TissueId).
- [gn-transform-databases/ADR-001] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata Using predicateObject Lists
document created on Sep 07 2024 by Munyoki Kilyungi, last updated on Sep 20 2024 by Munyoki Kilyungi
...-databases/000-remodel-rif-transform-with-predicateobject-lists [ADR/gn-transform-databases] Remodel GeneRIF Metadata Using predicateObject Lists
However, currently for NCBI RIFs we represent comments...
- My Software Development Journey so far,
document created on Dec 04 2023 by fetche-lab, last updated on Jan 04 2024 by Lisso_
...database. This presents itself as a window of opportunity to improve the functionality of the uploader, where a user can directly update the names when discovering them to be missing in the database.
- Restore backup
document created on Feb 07 2023 by Pjotr Prins, last updated on Oct 18 2024 by Frederick Muriuki Muriithi
.../export2/mysql/borg-backup-mariadb-20211024-03:09-Sun
```
We typically run the database on an nvme partition. Check if there is enough space(!). It may be you need to remove the old database after...
- MariaDB Database Architecture
document created on Jun 29 2024 by Pjotr Prins, last updated on Feb 24 2025 by Pjotr Prins
...their
analogs):
- =StrainName=
- =OrderId=
- =StrainId=: from the database
- =InbredSetId=: from the database
- =Symbol=: This could be named =Strain=
- =GeneChipId=: from the database
- =EnsemblId...
- Add Metadata To The Trait Page (RDF)
document created on Sep 28 2022 by Munyoki Kilyungi, last updated on Dec 03 2023 by Pjotr Prins
...Trait Page (RDF)
Fri 30 Sep 2022 11:48:41 EAT
## Introduction
We are migrating the GN2 relational database to a plain text and RDF database. Matrix-like data (E.g. fetching sample data for a given...
- Precompute steps
document created on May 07 2024 by Pjotr Prins, last updated on Jul 10 2025 by Pjotr Prins
...database and can be loaded into a SQL database on demand. This is all to be able to distribute data and make sure we only compute once.
At this point we can write
```
{"2":9.40338,"3":10.196,"4"...
- Authentication/authorisation design
document created on Oct 17 2022 by Pjotr Prins, last updated on Dec 04 2023 by Frederick Muriuki Muriithi
...manageable. Any changes to the privileges shall require a system redeployment.
## Other Implementation Concerns
* Local database should be independent from other services and copied as a file (SQLite...
- How to upgrade slurm on octopus
document created on Aug 29 2024 by Arun Isaac, last updated on Oct 18 2024 by Arun Isaac
...slurmdbd MySQL database. Enter the password when prompted. The password is specified in StoragePass of /etc/slurm/slurmdbd.conf.
```
$ mysqldump -u slurm -p --databases slurm_acct_db > /somewhere/safe/...
- Improving Metadata Audit
document created on Jul 25 2023 by Frederick Muriuki Muriithi, last updated on Aug 11 2023 by Frederick Muriuki Muriithi
...e.g. showing when a trait was last edited and by whom.
## Notes
### Saving Diffs in Database
It turns out, we only store diffs in the database that have been approved. The diffs awaiting processing...
- Partial Correlation
document created on Oct 15 2021 by Frederick Muriuki Muriithi, last updated on May 16 2022 by Frederick Muriuki Muriithi
...all and Add in search results.
Pick 3 and hit 'Partial'
Put one each in X, Y and Z columns
And compute against database (lower half).
That gives you a list of hits.
## Members
* fredm
* pjotp
* alex...
- Borg backups
document created on Feb 09 2025 by Pjotr Prins, last updated on Sep 13 2025 by Pjotr Prins
...fi
rundir=$(dirname "$0")
# ---- for sheepdog
source $rundir/sheepdog_env.sh
cd $rundir
sheepdog_borg.rb -t borg-tux04-sql --group ibackup -v -b /export/backup/borg/genenetwork /export/mysql/database/...
- Precompute mapping input data
document created on Apr 23 2025 by Pjotr Prins, last updated on Jul 29 2025 by Pjotr Prins
...about how locations are stored. We don't actually
> database locations in the ProbeSetXRef table - we only database the
> peak Locus marker name. This is then cross-referenced against the Geno...
- Adding Species
document created on Apr 24 2024 by Frederick Muriuki Muriithi, last updated on Apr 24 2024 by Frederick Muriuki Muriithi
...the Taxonomy ID value:
=> https://www.ncbi.nlm.nih.gov/ Go to NCBI
* In the "All Databases" drop-down, select "Taxonomy"
* In the search box, enter the species' FullName, e.g. Caenorhabditis elegans...
- AI Community Symposium at iHub
document created on Jan 26 2023 by Brian Muhia, last updated on Jan 30 2023 by Pjotr Prins
...entails converting a traditional SQL database, comprising over 80 tables, to RDF, the language of the semantic web. The goal of this conversion is to leverage the benefits of RDF databases, which are...
- [gn-transform-databases/ADR-002] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata To Be More Compact
document created on Sep 20 2024 by Munyoki Kilyungi
...-transform-databases/ADR-002] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata To Be More Compact
* author: bonfacem
* status: proposal
* reviewed-by: pjotr, jnduli
## Context
Currently, we represent NCBI...
- [gn-transform-databases/ADR-000] Remodel GeneRIF Metadata Using predicateObject Lists
document created on Sep 07 2024 by Munyoki Kilyungi, last updated on Sep 20 2024 by Munyoki Kilyungi
...-databases/ADR-000] Remodel GeneRIF Metadata Using predicateObject Lists
* author: bonfacem
* status: rejected
* reviewed-by: pjotr, jnduli
## Context
In RDF 1.1 Turtle, you have to use a Qname...
- Guix system containers and how we use them
document created on Mar 31 2023 by Arun Isaac, last updated on Mar 25 2024 by Pjotr Prins
...to retain some state. Think a database server needing to persist its database directory, a web server needing to persist its logs, etc. To allow this persistence, we expose (read-only) or share (read-...
- Data Upload Process
document created on Apr 30 2024 by Munyoki Kilyungi, last updated on Apr 30 2024 by Munyoki Kilyungi
...Study for Breast Cancer Dataset".
# Challenges Faced and Solutions
During the data upload process, I encountered several challenges that required solutions. One challenge was identified as a database...
- Update production
document created on Dec 07 2022 by Pjotr Prins, last updated on Dec 09 2022 by Pjotr Prins
...database in a file 'data.ttl' we can test it for correctness with:
```
tux01:~$ rapper --input turtle --count dump.ttl
rapper: Parsing URI file:///home/wrk/dump.ttl with parser turtle
rapper: Parsing...
- Profiling Python code
document created on Dec 03 2023 by Pjotr Prins
...define how to connect to the database.
* `the-script.py` is the name of the python script to be run under the profiler
The output can be redirected, e.g.
* env [various-env-vars] python3 -m cProfile...
- Using autossh to Keep SSH Tunnels Alive
document created on Apr 23 2025 by Alexander_Kabui, last updated on May 14 2025 by Frederick Muriuki Muriithi
...[-M monitor_port[:echo_port]] [-f] [SSH_OPTIONS]
```
## Examples
### Keep a database tunnel alive with autossh
Forward a remote MySQL port to your local machine:
**Using plain SSH:**
```
ssh -L 5000...
- This document has some useful SQL tricks
document created on Aug 08 2023 by Munyoki Kilyungi, last updated on Aug 11 2023 by Munyoki Kilyungi
- Queries for fetching/editing metadata
document created on Jul 22 2023 by Munyoki Kilyungi
- R/qtl2
document created on Oct 07 2024 by Pjotr Prins
...database, gemma, reaper, rqtl2
# Description
R/qtl2 handles multi-parent populations, such as DO, HS rat and the collaborative cross (CC). It also comes with an LMM implementation. Here we describe...
- Modifying dump macros
document created on Jul 01 2023 by Munyoki Kilyungi
- GN-AUTH FAQ
document created on Aug 19 2024 by John Nduli, last updated on Aug 19 2024 by John Nduli
...sqlite3 auth database.
## Errors related to unsupported clients/redirect URIs for client
Rerun
```
FLASK_DEBUG=1 AUTHLIB_INSECURE_TRANSPORT=1 OAUTHLIB_INSECURE_TRANSPORT=1 \
GN_AUTH_CONF=/absolute/...
- Running postgres in a Guix container
document created on Jan 17 2023 by Pjotr Prins
...this
```
. ~/opt/postgresql14/etc/profile
psql test
\dt
etc etc
```
## More
=> https://fluca1978.github.io/2021/09/30/GNU_GUIX_PostgreSQL.html
=> https://guix.gnu.org/cookbook/en/html_node/A-Database-...
- Xapian search
document created on May 02 2023 by Arun Isaac, last updated on Dec 03 2023 by Pjotr Prins
...retrieves data using several SQL queries and indexes them to build the index. Due to the enormous size of the GeneNetwork database, this is quite an expensive operation and relies on various tricks to...
- Genotypes, Assemblies, Markers and GeneNetwork
document created on Sep 23 2024 by Frederick Muriuki Muriithi
...assembly, markers, data, database, genenetwork, uploader
## Markers
```
The marker is the SNP…
— Rob (Paraphrased)
```
SNPs (Single Nucleotide Polymorphisms) are specific locations of interest...
- Working with Virtuoso for Local Development
document created on Sep 20 2024 by Munyoki Kilyungi, last updated on Sep 20 2024 by Munyoki Kilyungi
.../database/folder
cp $HOME/.guix-profile/var/lib/virtuoso/db/virtuoso.ini ./virtuoso.ini
# modify the virtuoso.ini file to save files to the folder you'd prefer
virtuoso-t +foreground +wait +debug...
- Utility Scripts
document created on Jun 05 2023 by Frederick Muriuki Muriithi, last updated on Dec 03 2023 by Pjotr Prins
...certain things that do not render themselves to automation very well.
This is especially relevant for any script that might need to interact with the SQLite database.
This document notes some gotchas...
- Ontologies
document created on Jul 31 2023 by Munyoki Kilyungi, last updated on Oct 11 2023 by Munyoki Kilyungi
- Orchestration and fallbacks
document created on Sep 02 2022 by Pjotr Prins, last updated on Oct 25 2022 by Pjotr Prins
...Partial synchronization between data sources
The only way we *can* scale is by adding machines. But the system is not yet ready for that. Also getting rid of monolithic primary databases in favor...
- GNSoC 2023
document created on Jun 21 2023 by Pjotr Prins, last updated on Aug 24 2023 by Pjotr Prins
...Nextgen databases
lmdb+RDF
* lead: Bonface
* team: Fred, Alex
* contact: Pjotr
git repo genenetwork3
=> ../../topics/next-gen-databases/design-doc Design doc
### Week 1
* RDF dumps
* Parsing S-exp...
- Adding Quantitative Tracks Using BigWig Files
document created on Nov 14 2023 by cel7t, last updated on Dec 03 2023 by Pjotr Prins
...database
Use the fetchChromSizes binary to create chrom.sizes files for the existing wig files
=> http://hgdownload.soe.ucsc.edu/admin/exe/ fetchChromeSizes binary location
### Converting wig files...
- Phenotype Naming Conventions
document created on Nov 23 2022 by Munyoki Kilyungi, last updated on Dec 03 2023 by Pjotr Prins
...for worse, we are apparently one of the major curators for formats for phenotype abbreviations. Perhaps we need to formalize this with the Phenome Database team.
Given the above concerns, the real way...
- Understanding GN's Classification Scheme
document created on Aug 29 2023 by Munyoki Kilyungi, last updated on Aug 31 2023 by Munyoki Kilyungi
- Queries and Prepared Statements in Python
document created on Sep 27 2022 by Frederick Muriuki Muriithi, last updated on Dec 03 2023 by Pjotr Prins
- Fire up system container for GN-QA System
document created on May 13 2024 by Munyoki Kilyungi, last updated on May 13 2024 by Munyoki Kilyungi
..."/tmp"))
(environment-variable
(name "AUTHLIB_INSECURE_TRANSPORT")
(value "true"))))
(mappings (list database-mapping...
- Meeting Notes
document created on Jun 11 2024 by John Nduli, last updated on Jan 10 2025 by Munyoki Kilyungi
...-auth
* @jgart enabling acme service in genecup and rshiny containers.
* @jnduli and @bmunyoki to attempt to get familiar with R2R
Nice to have:
* @bmunyoki fix CI job for GN transformer database i.e.
- Reasons There is HTML and CSS in GN3
document created on Jul 26 2023 by Frederick Muriuki Muriithi
...to the database, but you would still need the user to authenticate themselves (to prevent randos from registering clients willy-nilly).
## Footnotes
=> https://oauth.net/2/grant-types/ fn:grant-types...
- Epochs
document created on Nov 04 2025 by Pjotr Prins, last updated 4 weeks ago by Pjotr Prins
...Tracking the epochs is happening in spreadsheet. According to track changes only one item was changed in two years - BXD10 was marked as extinct.
In the GN SQL database Epoch with its RRID is stored...
- PanGEMMA Genotype Format
document created 5 weeks ago by Pjotr Prins, last updated 5 weeks ago by Pjotr Prins
...database will therefore always the *first* version. These records make it possible to roll forward on changes and present an updated genotype matrix. Used genotypes are retained. This, naturally, can...
- Automated Testing
document created on Oct 12 2022 by Munyoki Kilyungi, last updated on Dec 03 2023 by Pjotr Prins
...given data
* Database-querying functions used in the system respond within specified amount of time
etc.
This is relevant since GN3 is behind Nginx which defines a timeout.
### Regression Tests
Checks...
- OAuth2
document created on May 29 2023 by Frederick Muriuki Muriithi, last updated on Jun 07 2023 by Frederick Muriuki Muriithi
...data and computational requests.
=> https://gitlab.com/fredmanglis/gnqc_py QC and Data Upload
Provides a means to upload new data into the Genenetwork database. It does perform some quality-control...
- Editing Data
document created on Nov 23 2021 by BonfaceKilz, last updated on Jul 25 2023 by Frederick Muriuki Muriithi
...-u webqtlout db_webqtl < metadata_audit.sql
```
And check this works
```
select * FROM information_schema.COLUMNS WHERE table_schema=DATABASE() AND TABLE_NAME='metadata_audit';
```
For everything...
- Developing against GeneNetwork
document created on Jun 18 2023 by Pjotr Prins, last updated on Dec 03 2023 by Pjotr Prins
...org/api/v_pre1/gen_dropdown
```
check the logs. If there is ERROR 1054 (42S22): Unknown column
'InbredSet.Family' in 'field list' it may be you are trying the small
database.
### Run Scripts
As part...
- Backup Drops
document created on Sep 22 2025 by Pjotr Prins, last updated on Sep 24 2025 by Johannes Medagbe
...database
# Info
## Borg backups
Despite our precautions it is advised to use a backup password and *not* store that on the remote.
## Running sheepdog on rabbit
=> https://github.com/pjotrp/deploy...
- Precompute PublishData
document created on Oct 29 2025 by Pjotr Prins, last updated 5 weeks ago by Pjotr Prins
...-guile server and fire up a batch script that pulls the data from the database and runs gemma for every step.
To get precompute going we need a server set up with a recent database. I don't want to use...
- Debugging and developing code
document created on Mar 19 2024 by Pjotr Prins, last updated on Aug 20 2025 by Pjotr Prins
...special branch for now.
Databases, and files will simply be shared on default paths - /export/guix-containers/gndev/...
And if you need different combinations it should be relatively easy to compose a...
- Migrate GN1 Clustering
document created on Jul 18 2021 by Pjotr Prins, last updated on Mar 22 2022 by Frederick Muriuki Muriithi
...'category': 'C57BL/6J +'}, ..., {'value': 1.3089193078506003, 'category': 'C57BL/6J +'}]]
```
but that did work as expected.
Paused on heatmap generation to first test out the database access code.