Edit this page | Blame

Improving Metadata Audit


  • assigned: fredm
  • priority: high
  • keywords: editing, metadata, metadata audit
  • type: documentation
  • status: open


There is an existing metadata audit system that tracks the diffs that are added to the system.

The existing system is missing a few things:

  • No way to tell the status of the diffs
  • Inconsistent way of tracking the user that did the changes
  • No way to tell what the dataset type is (Phenotype, Genotype, ProbeSet, ...)
  • ...

This document proposes that we do improvements to the existing system to mitigate the list of shortcomings above. Once that is done, it will open up the system for some interesting opportunities, e.g. showing when a trait was last edited and by whom.


Saving Diffs in Database

It turns out, we only store diffs in the database that have been approved. The diffs awaiting processing, or those that have been rejected are not stored.

This makes it a little easier to handle some of the newer features requested.

Identifying Trait Diffs

  • Phenotype diffs can be identified with the `trait_name` and `phenotype_id` keys in the `json_diff_data` field
  • Probesets diffs can be identified with the `probeset_name` key in the `json_diff_data` field
  • Currently there are no genotype diffs, but following the theme, we should get a `genotype*` key in the `json_diff_data` field

This still means that we cannot simply query data from the table for a specific trait - instead, we will have to query for the dataset_id, and then filter by `(trait/probeset/genotype)_name` as is appropriate for phenotypes, probesets and genotypes respectively.


For auth purposes, we will need to fetch the name(s) of the datasets. The authorisation privileges are necessary for filtering out the diffs that a user can act on.

(made with skribilo)