This describes the process of dumping ProbeSet datasets from the GeneNetwork MariaDB database to LMDB . **requirements**
Run with all dependencies:
guix shell python-wrapper python-mysqlclient python-numpy python-lmdb python-click -- \
python scripts/dump_probesets_lmdb.py list-datasets \
"mysql://user:password@localhost/db_webqtl"
Show all ProbeSet datasets in the database:
python scripts/dump_probesets_lmdb.py list-datasets \
"mysql://user:password@localhost/db_webqtl"
expected results: Table with ID, Name, Short Name, Created date, Public status, and Full Name.
Export a specific dataset to LMDB format:
python scripts/dump_probesets_lmdb.py dump-dataset \
"mysql://user:password@localhost/db_webqtl" \
/path/to/output \
206 \
--batch-size 5000 \
--workers 4
Parameters:
expected results: Creates `/path/to/output/<dataset_name>/` containing:
Export all public ProbeSet datasets:
python scripts/dump_probesets_lmdb.py dump-all-datasets \
"mysql://user:password@localhost/db_webqtl" \
~/lmdb_data/ \
--batch-size 5000 \
--workers 4 \
--skip-existing
Options:
Show metadata for a dumped dataset:
python scripts/dump_probesets_lmdb.py show-metadata \
/path/to/lmdb/HC_M2_0606_P
results:
Show all trait names in a dataset:
python scripts/dump_probesets_lmdb.py list-traits \
/path/to/lmdb/HC_M2_0606_P
Get expression values for a single trait:
# Plain text
python scripts/dump_probesets_lmdb.py fetch-trait \
/path/to/lmdb/HC_M2_0606_P "100244_at"
# JSON format
python scripts/dump_probesets_lmdb.py fetch-trait \
/path/to/lmdb/HC_M2_0606_P "100244_at" --json
Display the full expression matrix (useful for debugging):
python scripts/dump_probesets_lmdb.py print-matrix \
/path/to/lmdb/HC_M2_0606_P
Each dumped dataset contains:
Metadata includes:
python scripts/dump_probesets_lmdb.py --help python scripts/dump_probesets_lmdb.py dump-dataset --help