We aim to capture metadata on the BXDs in RDF which will make querying metadata fairly trivial. Next to a pangenome, which is a graph on DNA, we should present metadata in a structured way (a graph on everything else).
Metadata and pangenomes ought to go hand in hand
For the BXD, let's take the GN metadata forms as a starting point and explain those in RDF terms. Next we slowly add what we think sensible. When you have metadata, what do you want to get out of it? For examples:
We want to make search easy and to disambiguate terms.
Modelling everything in MariaDB. The problem with that is that we have redundant data and complexity increases with time.
http://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112 View existing metadata here
We should use wikidata because it's based on RDF and is here to stay. For example this mouse gene
relates to mouse
You can find some models
To add the BXD we would start by editing wikidata which has the benefit that it gets presented in Wikipedia (wikidata is the backend).
Look at existing ontology. A few are mentioned in:
When an ontology exists and it looks sensible we should reuse that. If non-existent, we create our own ontology. Obviously we can't get away from adding free flow textual fields. What we want to lift out is what we want to be able to search on.
This is currently being worked on in:
Work on dumping RDF has already been done in:
Also, vector/matrix data should be put in lmdb, and this is a separate issue on it's own.