Editing Data

Tags

assigned: bonfacem
keywords: metadata
status: unclear
priority: medium

Introduction

At the moment, you can edit metadata related to a published phenotype and a probeset. When an edit is done, the diff data is stored in a table, `metadata_audit` in json format that looks something like:

{"Probeset": {"id_": {"old": "108845", "new": "108845"}, "description": {"old": "EST BG068973 (olfactory receptor related sequence).", "new": "EST BG068973 (olfactory receptor related sequence). Test"}}, "probeset_name": "1446112_at", "author": "Bonface Munyoki K.", "timestamp": "2021-07-10 08:31:18"}

{"Probeset": {"id_": {"old": "108845", "new": "108845"}, "description": {"old": "EST BG068973 (olfactory receptor related sequence). Test", "new": "EST BG068973 (olfactory receptor related sequence). Testing"}}, "probeset_name": "1446112_at", "author": "Bonface Munyoki K.", "timestamp": "2021-07-10 08:31:36"}

{"Probeset": {"id_": {"old": "108845", "new": "108845"}, "description": {"old": "EST BG068973 (olfactory receptor related sequence). Testing", "new": "EST BG068973 (olfactory receptor related sequence)."}}, "probeset_name": "1446112_at", "author": "Bonface Munyoki K.", "timestamp": "2021-07-10 10:15:37"}

{"Phenotype": {"pre_pub_description": {"old": "Testing the description", "new": "Testing the description"}, "post_pub_description": {"old": "Central nervous system, morphology: Internal granule layer (IGL) of the cerebellum volume, adjusted for sex, age, body and brain weight [mm3]", "new": "Central nervous system, morphology: Internal granule layer (IGL) of the cerebellum volume, adjusted for sex, age, body and brain weight [mm3]"}, "original_description": {"old": "Original post publication description: Central nervous system, morphology: Internal granule layer (IGL) of the cerebellum volume, adjusted for sex, age, body and brain weight [mm3]", "new": "Original post publication description: Central nervous system, morphology: Internal granule layer (IGL) of the cerebellum volume, adjusted for sex, age, body and brain weight [mm3]"}, "units": {"old": "mm3", "new": "mm3"}, "pre_pub_abbreviation": {"old": "", "new": ""}, "post_pub_abbreviation": {"old": "AdjIGLVol", "new": "AdjIGLVol"}, "lab_code": {"old": "", "new": ""}, "submitter": {"old": "robwilliams", "new": "robwilliams"}, "owner": {"old": "", "new": ""}, "authorized_users": {"old": "robwilliams", "new": "robwilliams"}}, "dataset_id": "10007", "author": "Bonface Munyoki K.", "timestamp": "2021-07-16 16:35:56"}

{"Phenotype": {"pre_pub_description": {"old": "Testing the description", "new": "Testing the descriptiony(test)"}}, "dataset_id": "10007", "author": "Bonface Munyoki K.", "timestamp": "2021-07-16 18:25:15"}

Note how we use one table to store different diffs from different data types. This is important, since if the need ever arises to use a different schema or db altogether, we only use one column.

Setup

First, run this sql script to add the "metadata<sub>audit</sub>" table:

Sql script to run

Example

mysql -u webqtlout db_webqtl < metadata_audit.sql

And check this works

select * FROM information_schema.COLUMNS WHERE table_schema=DATABASE() AND TABLE_NAME='metadata_audit';

For everything to run, use the latest commit on the main branch from gn3. One way to make sure that you are running the latest changes, make sure that you have genenetwork3 cloned somewhere in your path; and keep it updated. Thereafter set, a PATH in "bin/genenetwork2" which should look something similar to `export PYTHONPATH=/home/bonface/projects/genenetwork3:$PYTHON_GN_PATH:$GN2_BASE_DIR/wqflask:$PYTHONPATH`.

Alternatively, install gn3 in it's own profile. This can be sometimes rinconvenient since the latest changes from gn3 can be ahead of the the package definition in our channel.

How to edit a trait Editing a trait/ probeset

These are the steps you can do to edit a phenotype/ probeset:

- Log in to a trait page or a probeset (Make sure that your are logged in). From the "Trait Data and Analysis page" for the probeset/ trait, there's an "Edit" button under "Details and Links". Use this to edit the metadata.

- An example of editing a published phenotype is: "*trait/10007/edit/inbredset-id/10007*"

After editing traits, you could see changes in a diff format at the very top of the update page.

Notes

- Atm, creds are hard-coded. See:

https://github.com/genenetwork/genenetwork2/blob/testing/wqflask/wqflask/decorators.py

That needs to be improved.

- For Probeset data, atm some tables aren't yet updated(unfortunately those tables aren't used anywhere). I need to figure that out.

#### Tue 23 Nov 2021

- Fixed excel issue described here:

https://github.com/genenetwork/genenetwork2/compare/testing...BonfaceKilz:bug/fix-excel-adding-new-line?expand=1

Wed 22 Dec 2021

Consider the case where a user deletes, updates, or modifies data from Genenetwork using the csv download they got from the data-upload page. In the event they do not approve the data, and they upload that csv file again, we have a case in gn2 where data needs to be deleted, inserted, or modified twice. In the case of deletions, or insertions, an error will be generated because you cannot remove or insert that same data twice. Also, the count that's displayed as a flash message will not be correct (visually) since-- at the time of this writing-- we count the number of edits from a CSV file. A working patch will be sent later today(or tomorrow).

Also, the GN2 build is not working. With that in mind below I outline how I've plugged in pudb, inspired by jgarte's demo:

https://git.genenetwork.org/jgart/qc_crlf_demo

- First, in *wqflask/runserver*, I removed Flask DebugToolbar(this somehow made Flask crash, and I haven't investigated it). Here is how the diff looks:

modified   wqflask/runserver.py
@@ -23,9 +23,7 @@ app_config()
 werkzeug_logger = logging.getLogger('werkzeug')
 
 if WEBSERVER_MODE == 'DEBUG':
-    from flask_debugtoolbar import DebugToolbarExtension
     app.debug = True
-    toolbar = DebugToolbarExtension(app)
     app.run(host='0.0.0.0',
             port=SERVER_PORT,
             debug=True,

- As a work-around for the failing gn2 build, install pudb in it's own profile

guix install pudb -p ~/opt/pudb

- Add pudb to your python path in *bin/genenetwork2* as show in this diff:

-# We may change this one:
-export PYTHONPATH=$PYTHON_GN_PATH:$GN2_BASE_DIR/wqflask:$GN3_PYTHONPATH:$PYTHONPATH
+# We may change this one: 
+export PYTHONPATH=$GN2_BASE_DIR/wqflask:/home/bonface/opt/pudb/lib/python3.9/site-packages/:/home/bonface/projects/genenetwork3:$PYTHONPATH

- To use pudb, just stick `import pudb; pu.db` wherever you fancy. Happy debugging!

Wed 26 Jan 2022

Fixed precision point errors. During file upload, excel appended extra values; so instead of `BXD1,18.8,x,x`, you'd have `BXD1,18.8000001,20,x`. There are 2 parts to fixing this:

- During generating the json file(when creating the diff), ignore modifications where |ε| > 0.0001.

- When listing diffs, ignore modification entries where |ε| > 0.0001.

For this fix, the second bit was ignored, because it'll lead to unnecessarily complicated code, given that only one person has really been testing the system.