Edit this page | Blame

Add Metadata To The Trait Page (RDF)

Fri 30 Sep 2022 11:48:41 EAT

Introduction

We are migrating the GN2 relational database to a plain text and RDF database. Matrix-like data (E.g. fetching sample data for a given data) will be stored inside GN.

So far, we are able to convert the sql data to rdf using "dump.scm" defined in:

What are we trying to solve?

Data stored in genenetwork resembles a tree. As an example: we have several species; each of these species belong to a group; each group belongs to a "data type"; and each data type belongs to a particular dataset. The first step: capturing - albeit requiring more refinement - this data in RDF has been achieved using the aformentioned scheme script.

The overall goal is to be ablet to:

  • Incrementally replace MySQL queries with RDF.
  • Annotating existing data with metadata that does not yet exist in GN2.

Goals

In the Trait Analysis page, for example:

and the corresponding GN1 link:

which on further inspection presents metadata on that specific dataset group here:

We notice that there's metadata in GN1 - which we have in RDF - that we can add to the GN2 traits page. As such, this design doc will be limited to using RDF to:

  • Append metadata about the tissue
  • Append relevant metadata about the dataset group, in particular: about the data values and it's processing; about the array platform; experiment type; and contributors.

Beyond querying metadata, this design doc also proposes the creation of a monadic rdf-fetch similar to what happens in:

Non-goals

  • Refactoring base classes/sql to solely use RDF.
  • Using federated queries - they are slow.
  • Writing a script in Guile to fetch and append extra metadata from wikidata and insert them into RDF as extra nodes. This should be tackled as a separate issue.

Actual Design

  • Rewrite the existing way of fetching RDF using pymonads.
  • React to the change-amplification - should any exist - caused by the above change and add tests where feasible.
  • Create endpoints to add extra annotations for Tissue, Dataset Group, Dataset Values and Processing; array platform; experiment type; and contributors.
  • Add metadata as links, tooltips, or html <summary> tag to the relevant html section(s).

Resources

(made with skribilo)