Edit this page | Blame

Capture Data on the BXDs in RDF

Tags

  • assigned: bonfacem, pjotrp, zsloan
  • priority: high
  • status: unclear
  • type: feature-request, enhancement
  • keywords: RDF, BXD, REST, from github, high priority

Description

Is your feature request related to a problem? Please describe.

We aim to capture metadata on the BXDs in RDF which will make querying metadata fairly trivial. Next to a pangenome, which is a graph on DNA, we should present metadata in a structured way (a graph on everything else).

Metadata and pangenomes ought to go hand in hand

Describe the solution you'd like

For the BXD, let's take the GN metadata forms as a starting point and explain those in RDF terms. Next we slowly add what we think sensible. When you have metadata, what do you want to get out of it? For examples:

  • What experiments are in GN related to BXD and addiction?
  • What experiments are in GN on addiction related genes?
  • What publications are in GN related to addiction and published after 2010?

We want to make search easy and to disambiguate terms.

Describe alternatives you've considered

Modelling everything in MariaDB. The problem with that is that we have redundant data and complexity increases with time.

Additional context

http://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112 View existing metadata here

We should use wikidata because it's based on RDF and is here to stay. For example this mouse gene

relates to mouse

You can find some models

To add the BXD we would start by editing wikidata which has the benefit that it gets presented in Wikipedia (wikidata is the backend).

Look at existing ontology. A few are mentioned in:

When an ontology exists and it looks sensible we should reuse that. If non-existent, we create our own ontology. Obviously we can't get away from adding free flow textual fields. What we want to lift out is what we want to be able to search on.

Resolution

Relevant topic:

This is currently being worked on in:

Work on dumping RDF has already been done in:

Also, vector/matrix data should be put in lmdb, and this is a separate issue on it's own.

  • closed
(made with skribilo)