- Genotype database
document created on Jun 03 2022 by Arun Isaac, last updated on Apr 27 2024 by Pjotr Prins
Database layout
genodb is an immutable functional database built on the LMDB key-value store. An immutable database may sound like an oxymoron, but is indeed possible and practical. More precisely...
- Setting up Local Development Database
document created on Aug 19 2022 by Frederick Muriuki Muriithi, last updated on Aug 01 2024 by Arun Isaac
...--protocol tcp -u root
```
Create a database db_webqtl_s
```
MariaDB [mysql]> CREATE DATABASE db_webqtl_s;
```
Load the small database dump into the database. You may find the small database either...
- Invoking SQLite3: CLI
document created on Mar 30 2024 by Frederick Muriuki Muriithi
...modes, do:
```
$ sqlite3
SQLite version 3.40.0 2022-11-16 12:10:08
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
- MariaDB
document created on Oct 26 2021 by Pjotr Prins, last updated 7 days ago by Pjotr Prins
...of the database running on production. We do this by restoring backups of the production database into MariaDB database directory. Here's how.
Backups are managed using Borg as the ibackup user. First...
- Virtuoso
document created on Aug 27 2021 by Pjotr Prins, last updated on Sep 20 2024 by Munyoki Kilyungi
...MySQL database
See also
=> ../RDF/genenetwork-sql-database-to-rdf
To dump data into a ttl file, first make sure that you are in the guix environment in the "dump-genenetwork-database" repository...
- Installation
document created on Jun 18 2023 by Pjotr Prins, last updated 2 weeks ago by Pjotr Prins
...no
Restart=on-abort
RestartSec=15s
UMask=007
PrivateTmp=false
```
## Load the small database in MySQL
Currently we have two databases for deployment,
'db_webqtl_s' is the small testing database...
- Developer links for
document created on May 06 2022 by Arun Isaac, last updated on Oct 16 2022 by Pjotr Prins
.../archive/dump-genenetwork-database/latest/sql.svg Continuous SQL schema visualization
=> https://ci.genenetwork.org/archive/dump-genenetwork-database/latest/rdf.svg Continuous RDF schema visualization
- LMDB Phenotype/Genotype Store
document created on Jun 22 2023 by Munyoki Kilyungi, last updated on Jun 30 2023 by Alexander_Kabui
...and completeness in a data store?
* [C] guile bindings for lmdb for important stuff
* [B] Using hashes to track updates on database---proposal
Alex:
* Fetching all the phenotype data from the database...
- Deploying gn-auth
document created on Mar 04 2024 by Frederick Muriuki Muriithi, last updated on Mar 12 2024 by Frederick Muriuki Muriithi
...yoyo apply --database sqlite:////home/fredm/auth-run-migrations.db ./migrations/auth/
[20221103_01_js9ub-initialise-the-auth-entic-oris-ation-database]
Shall I apply this migration? [Ynvdaqjk?]: Y...
- Synchronising the Different Environments
document created 14 hours ago by Frederick Muriuki Muriithi
...Automate? Will probably need some checks for data sanity.
### Authorisation Database
* [ ] TODO: Describe process
* Copy backup from production
* Update/replace GN2 client configs in database
* What...
- GeneNetwork SQL Database to RDF
document created on Mar 24 2023 by Pjotr Prins, last updated on Apr 04 2023 by Arun Isaac
GeneNetwork SQL Database to RDF
We use RDF in virtuoso to handle metadata for GN using
=> https://github.com/genenetwork/dump-genenetwork-database
See also
=> ../systems/virtuoso
- CLI Utility Scripts
document created on May 29 2023 by Frederick Muriuki Muriithi, last updated on May 30 2023 by Frederick Muriuki Muriithi
...auth(entic|oris)ation database and the MariaDB database.
You could also run the script directly with:
```sh
python3 -m scripts.migrate_existing_data AUTHDBPATH MYSQLDBURI
```
where `AUTHDBPATH` and...
- Fire up system container for GN
document created on Mar 02 2024 by Pjotr Prins, last updated on Apr 05 2024 by Pjotr Prins
...see there is no GN database yet.
```
/gnu/store/xj4bfqch8zs3sfzvj65ykbvnpprwaj7f-mariadb-10.10.2/bin/mysql -e 'show databases'
```
mariadb initialized a new database in /var/lib/msyql. We need to stop...
- GeneNetwork Uploader Requirements
document created on Feb 22 2024 by Frederick Muriuki Muriithi, last updated on Feb 22 2024 by Frederick Muriuki Muriithi
...database. This implies use of a data staging area, or even a separate testing database to hold the data. There might need to be a GeneNetwork system with access to the staging area or testing database...
- Types of Data in GeneNetwork
document created on Jul 02 2024 by Frederick Muriuki Muriithi, last updated on Jul 10 2024 by Frederick Muriuki Muriithi
...`ProbeSet*` database tables (and other closely related tables like the `Tissue*` tables - fred added this: verify).
These could be saved in the database in a log-tranformed form - verify.
How do you...
- Improving RIF+WIKI Search
document created on Jul 04 2024 by Munyoki Kilyungi, last updated on Jul 05 2024 by Munyoki Kilyungi
...3 different databases which had been compacted from 50 different databases was significantly faster than compacting one database at once from 150 different databases. The conclusion we could draw...
- Xapian indexing
document created on Oct 30 2022 by Arun Isaac, last updated on Sep 20 2024 by Munyoki Kilyungi
.../database/setting-up-local-development-database
and load up the backup file using:
> mariadb gn2 < /path/to/backup/file.sql
A backup file can be generated using:
> mysqldump -u mysqluser -pmysqlpasswd...
- [gn-transform-databases/ADR-001] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata Using predicateObject Lists
document created on Sep 07 2024 by Munyoki Kilyungi, last updated on Sep 20 2024 by Munyoki Kilyungi
...-databases/000-remodel-rif-transform-with-predicateobject-lists [ADR/gn-transform-databases] Remodel GeneRIF Metadata Using predicateObject Lists
However, currently for NCBI RIFs we represent comments...
- GeneNetwork Hacking Documentation
document created on Mar 11 2022 by Frederick Muriuki Muriithi, last updated on Jul 18 2022 by Frederick Muriuki Muriithi
...the database.
* What are they?
Groups are linked to the Species.
### Studies
Stored in the *ProbeFreeze* table in the database.
Linked to platforms (ChipId), groups (InbredSetId) and tissues (TissueId).
- My Software Development Journey so far,
document created on Dec 04 2023 by fetche-lab, last updated on Jan 04 2024 by Lisso_
...database. This presents itself as a window of opportunity to improve the functionality of the uploader, where a user can directly update the names when discovering them to be missing in the database.
- Restore backup
document created on Feb 07 2023 by Pjotr Prins, last updated on Oct 18 2024 by Frederick Muriuki Muriithi
.../borg-backup-mariadb-20211024-03:09-Sun
```
We typically run the database on an nvme partition. Check if there is enough space(!). It may be you need to remove the old database after making a backup...
- Add Metadata To The Trait Page (RDF)
document created on Sep 28 2022 by Munyoki Kilyungi, last updated on Dec 03 2023 by Pjotr Prins
...Trait Page (RDF)
Fri 30 Sep 2022 11:48:41 EAT
## Introduction
We are migrating the GN2 relational database to a plain text and RDF database. Matrix-like data (E.g. fetching sample data for a given...
- MariaDB Database Architecture
document created on Jun 29 2024 by Pjotr Prins, last updated 2 weeks ago by Pjotr Prins
...their
analogs):
- =StrainName=
- =OrderId=
- =StrainId=: from the database
- =InbredSetId=: from the database
- =Symbol=: This could be named =Strain=
- =GeneChipId=: from the database
- =EnsemblId...
- Borg backups
document created 5 weeks ago by Pjotr Prins, last updated 7 days ago by Pjotr Prins
...fi
rundir=$(dirname "$0")
# ---- for sheepdog
source $rundir/sheepdog_env.sh
cd $rundir
sheepdog_borg.rb -t borg-tux04-sql --group ibackup -v -b /export/backup/borg/genenetwork /export/mysql/database/...
- Update production checklist
document created 7 days ago by Pjotr Prins, last updated 7 days ago by Pjotr Prins
...trim in CRON
# Check database
=> topics/systems/mariadb/mariadb.gmi
# Check sending E-mails
The swaks package is quite useful to test for a valid receive host:
```
swaks --to testing-my-server@gmail...
- Precompute steps
document created on May 07 2024 by Pjotr Prins, last updated on Aug 16 2024 by Pjotr Prins
...database and can be loaded into a SQL database on demand. This is all to be able to distribute data and make sure we only compute once.
At this point we can write
```
{"2":9.40338,"3":10.196,"4"...
- Authentication/authorisation design
document created on Oct 17 2022 by Pjotr Prins, last updated on Dec 04 2023 by Frederick Muriuki Muriithi
...manageable. Any changes to the privileges shall require a system redeployment.
## Other Implementation Concerns
* Local database should be independent from other services and copied as a file (SQLite...
- How to upgrade slurm on octopus
document created on Aug 29 2024 by Arun Isaac, last updated on Oct 18 2024 by Arun Isaac
...slurmdbd MySQL database. Enter the password when prompted. The password is specified in StoragePass of /etc/slurm/slurmdbd.conf.
```
$ mysqldump -u slurm -p --databases slurm_acct_db > /somewhere/safe/...
- Improving Metadata Audit
document created on Jul 25 2023 by Frederick Muriuki Muriithi, last updated on Aug 11 2023 by Frederick Muriuki Muriithi
...e.g. showing when a trait was last edited and by whom.
## Notes
### Saving Diffs in Database
It turns out, we only store diffs in the database that have been approved. The diffs awaiting processing...
- Partial Correlation
document created on Oct 15 2021 by Frederick Muriuki Muriithi, last updated on May 16 2022 by Frederick Muriuki Muriithi
...all and Add in search results.
Pick 3 and hit 'Partial'
Put one each in X, Y and Z columns
And compute against database (lower half).
That gives you a list of hits.
## Members
* fredm
* pjotp
* alex...
- Precompute mapping input data
document created on Mar 20 2023 by Pjotr Prins, last updated on Jun 03 2024 by Pjotr Prins
...about how locations are stored. We don't actually
> database locations in the ProbeSetXRef table - we only database the
> peak Locus marker name. This is then cross-referenced against the Geno...
- Adding Species
document created on Apr 24 2024 by Frederick Muriuki Muriithi, last updated on Apr 24 2024 by Frederick Muriuki Muriithi
...the Taxonomy ID value:
=> https://www.ncbi.nlm.nih.gov/ Go to NCBI
* In the "All Databases" drop-down, select "Taxonomy"
* In the search box, enter the species' FullName, e.g. Caenorhabditis elegans...
- AI Community Symposium at iHub
document created on Jan 26 2023 by Brian Muhia, last updated on Jan 30 2023 by Pjotr Prins
...entails converting a traditional SQL database, comprising over 80 tables, to RDF, the language of the semantic web. The goal of this conversion is to leverage the benefits of RDF databases, which are...
- [gn-transform-databases/ADR-002] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata To Be More Compact
document created on Sep 20 2024 by Munyoki Kilyungi
...-transform-databases/ADR-002] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata To Be More Compact
* author: bonfacem
* status: proposal
* reviewed-by: pjotr, jnduli
## Context
Currently, we represent NCBI...
- [gn-transform-databases/ADR-000] Remodel GeneRIF Metadata Using predicateObject Lists
document created on Sep 07 2024 by Munyoki Kilyungi, last updated on Sep 20 2024 by Munyoki Kilyungi
...-databases/ADR-000] Remodel GeneRIF Metadata Using predicateObject Lists
* author: bonfacem
* status: rejected
* reviewed-by: pjotr, jnduli
## Context
In RDF 1.1 Turtle, you have to use a Qname...
- Guix system containers and how we use them
document created on Mar 31 2023 by Arun Isaac, last updated on Mar 25 2024 by Pjotr Prins
...to retain some state. Think a database server needing to persist its database directory, a web server needing to persist its logs, etc. To allow this persistence, we expose (read-only) or share (read-...
- Update production
document created on Dec 07 2022 by Pjotr Prins, last updated on Dec 09 2022 by Pjotr Prins
...database in a file 'data.ttl' we can test it for correctness with:
```
tux01:~$ rapper --input turtle --count dump.ttl
rapper: Parsing URI file:///home/wrk/dump.ttl with parser turtle
rapper: Parsing...
- Profiling Python code
document created on Dec 03 2023 by Pjotr Prins
...define how to connect to the database.
* `the-script.py` is the name of the python script to be run under the profiler
The output can be redirected, e.g.
* env [various-env-vars] python3 -m cProfile...
- This document has some useful SQL tricks
document created on Aug 08 2023 by Munyoki Kilyungi, last updated on Aug 11 2023 by Munyoki Kilyungi
- Data Upload Process
document created on Apr 30 2024 by Munyoki Kilyungi, last updated on Apr 30 2024 by Munyoki Kilyungi
...Study for Breast Cancer Dataset".
# Challenges Faced and Solutions
During the data upload process, I encountered several challenges that required solutions. One challenge was identified as a database...
- R/qtl2
document created on Oct 07 2024 by Pjotr Prins
...database, gemma, reaper, rqtl2
# Description
R/qtl2 handles multi-parent populations, such as DO, HS rat and the collaborative cross (CC). It also comes with an LMM implementation. Here we describe...
- Queries for fetching/editing metadata
document created on Jul 22 2023 by Munyoki Kilyungi
- Modifying dump macros
document created on Jul 01 2023 by Munyoki Kilyungi
- GN-AUTH FAQ
document created on Aug 19 2024 by John Nduli, last updated on Aug 19 2024 by John Nduli
...sqlite3 auth database.
## Errors related to unsupported clients/redirect URIs for client
Rerun
```
FLASK_DEBUG=1 AUTHLIB_INSECURE_TRANSPORT=1 OAUTHLIB_INSECURE_TRANSPORT=1 \
GN_AUTH_CONF=/absolute/...
- Running postgres in a Guix container
document created on Jan 17 2023 by Pjotr Prins
...this
```
. ~/opt/postgresql14/etc/profile
psql test
\dt
etc etc
```
## More
=> https://fluca1978.github.io/2021/09/30/GNU_GUIX_PostgreSQL.html
=> https://guix.gnu.org/cookbook/en/html_node/A-Database-...
- Genotypes, Assemblies, Markers and GeneNetwork
document created on Sep 23 2024 by Frederick Muriuki Muriithi
...assembly, markers, data, database, genenetwork, uploader
## Markers
```
The marker is the SNP…
— Rob (Paraphrased)
```
SNPs (Single Nucleotide Polymorphisms) are specific locations of interest...
- Working with Virtuoso for Local Development
document created on Sep 20 2024 by Munyoki Kilyungi, last updated on Sep 20 2024 by Munyoki Kilyungi
.../database/folder
cp $HOME/.guix-profile/var/lib/virtuoso/db/virtuoso.ini ./virtuoso.ini
# modify the virtuoso.ini file to save files to the folder you'd prefer
virtuoso-t +foreground +wait +debug...
- Utility Scripts
document created on Jun 05 2023 by Frederick Muriuki Muriithi, last updated on Dec 03 2023 by Pjotr Prins
...certain things that do not render themselves to automation very well.
This is especially relevant for any script that might need to interact with the SQLite database.
This document notes some gotchas...
- Ontologies
document created on Jul 31 2023 by Munyoki Kilyungi, last updated on Oct 11 2023 by Munyoki Kilyungi
- Xapian search
document created on May 02 2023 by Arun Isaac, last updated on Dec 03 2023 by Pjotr Prins
...retrieves data using several SQL queries and indexes them to build the index. Due to the enormous size of the GeneNetwork database, this is quite an expensive operation and relies on various tricks to...
- Orchestration and fallbacks
document created on Sep 02 2022 by Pjotr Prins, last updated on Oct 25 2022 by Pjotr Prins
...Partial synchronization between data sources
The only way we *can* scale is by adding machines. But the system is not yet ready for that. Also getting rid of monolithic primary databases in favor...
- GNSoC 2023
document created on Jun 21 2023 by Pjotr Prins, last updated on Aug 24 2023 by Pjotr Prins
...Nextgen databases
lmdb+RDF
* lead: Bonface
* team: Fred, Alex
* contact: Pjotr
git repo genenetwork3
=> ../../topics/next-gen-databases/design-doc Design doc
### Week 1
* RDF dumps
* Parsing S-exp...
- Adding Quantitative Tracks Using BigWig Files
document created on Nov 14 2023 by cel7t, last updated on Dec 03 2023 by Pjotr Prins
...database
Use the fetchChromSizes binary to create chrom.sizes files for the existing wig files
=> http://hgdownload.soe.ucsc.edu/admin/exe/ fetchChromeSizes binary location
### Converting wig files...
- Phenotype Naming Conventions
document created on Nov 23 2022 by Munyoki Kilyungi, last updated on Dec 03 2023 by Pjotr Prins
...for worse, we are apparently one of the major curators for formats for phenotype abbreviations. Perhaps we need to formalize this with the Phenome Database team.
Given the above concerns, the real way...
- Understanding GN's Classification Scheme
document created on Aug 29 2023 by Munyoki Kilyungi, last updated on Aug 31 2023 by Munyoki Kilyungi
- Queries and Prepared Statements in Python
document created on Sep 27 2022 by Frederick Muriuki Muriithi, last updated on Dec 03 2023 by Pjotr Prins
- Fire up system container for GN-QA System
document created on May 13 2024 by Munyoki Kilyungi, last updated on May 13 2024 by Munyoki Kilyungi
..."/tmp"))
(environment-variable
(name "AUTHLIB_INSECURE_TRANSPORT")
(value "true"))))
(mappings (list database-mapping...
- Reasons There is HTML and CSS in GN3
document created on Jul 26 2023 by Frederick Muriuki Muriithi
...to the database, but you would still need the user to authenticate themselves (to prevent randos from registering clients willy-nilly).
## Footnotes
=> https://oauth.net/2/grant-types/ fn:grant-types...
- Meeting Notes
document created on Jun 11 2024 by John Nduli, last updated on Jan 10 2025 by Munyoki Kilyungi
...-auth
* @jgart enabling acme service in genecup and rshiny containers.
* @jnduli and @bmunyoki to attempt to get familiar with R2R
Nice to have:
* @bmunyoki fix CI job for GN transformer database i.e.
- Backup Drops
document created on Oct 27 2022 by Pjotr Prins, last updated 4 weeks ago by Pjotr Prins
...database
# Info
## Borg backups
Despite our precautions it is advised to use a backup password and *not* store that on the remote.
## Running sheepdog on rabbit
=> https://github.com/pjotrp/deploy...
- Automated Testing
document created on Oct 12 2022 by Munyoki Kilyungi, last updated on Dec 03 2023 by Pjotr Prins
...given data
* Database-querying functions used in the system respond within specified amount of time
etc.
This is relevant since GN3 is behind Nginx which defines a timeout.
### Regression Tests
Checks...
- OAuth2
document created on May 29 2023 by Frederick Muriuki Muriithi, last updated on Jun 07 2023 by Frederick Muriuki Muriithi
...data and computational requests.
=> https://gitlab.com/fredmanglis/gnqc_py QC and Data Upload
Provides a means to upload new data into the Genenetwork database. It does perform some quality-control...
- Editing Data
document created on Nov 23 2021 by BonfaceKilz, last updated on Jul 25 2023 by Frederick Muriuki Muriithi
...-u webqtlout db_webqtl < metadata_audit.sql
```
And check this works
```
select * FROM information_schema.COLUMNS WHERE table_schema=DATABASE() AND TABLE_NAME='metadata_audit';
```
For everything...
- Developing against GeneNetwork
document created on Jun 18 2023 by Pjotr Prins, last updated on Dec 03 2023 by Pjotr Prins
...org/api/v_pre1/gen_dropdown
```
check the logs. If there is ERROR 1054 (42S22): Unknown column
'InbredSet.Family' in 'field list' it may be you are trying the small
database.
### Run Scripts
As part...
- Migrate GN1 Clustering
document created on Jul 18 2021 by Pjotr Prins, last updated on Mar 22 2022 by Frederick Muriuki Muriithi
...1.3089193078506003, 'category': 'C57BL/6J +'}]]
```
but that did work as expected.
Paused on heatmap generation to first test out the database access code.
Added tests and fixed issues with older db-...