Example:
Related tasks:
We need fan-out to GN-specific pages. Related to task listed below in ⁰
Dataset metadata for a given phenotype.
SPARQL
PREFIX gn: <http://rdf.genenetwork.org/v1/id/>
PREFIX gnc: <http://rdf.genenetwork.org/v1/category/>
PREFIX gnt: <http://rdf.genenetwork.org/v1/term/>
SELECT ?phenotype_dataset
(GROUP_CONCAT(DISTINCT CONCAT(STR(?p), " = ", STR(?o)); separator=" | ") AS ?metadata)
FROM <http://rdf.genenetwork.org/v1>
WHERE {
?set gnt:has_phenotype_data ?phenotype_dataset .
?phenotype_dataset gnt:has_strain ?set ;
?p ?o .
}
GROUP BY ?phenotype_dataset
LIMIT 10;
Example query over curl:
curl "http://localhost:5000/api/v1/search?q=Does%20diabetes%20occur%20naturally%20in%20rats%3F" curl "http://localhost:5000/api/v1/search?q=point%20me%20to%20useful%20phenotypes%20that%20cause%20ADHD%3F"
This script uses ttl files extracted from the SPARQL endpoint. As such, prefixes were replaced with full namespaces in the results.
Output: keys -> subjects; values -> list of predicates. Removes redundant objects. Subjects and objects are linked to form English-like sentences:
docs = []
for key in tqdm(collection):
concat = ""
for value in collection[key]:
text = f"{key} is/has {value}. "
concat += text
docs.append(concat)
Documents look like:
gnc:set is/has skos:member gn:set_B6MRLF2_D2MRLF2 . gnc:set is/has skos:member gn:set_MAGIC_Lines . gnt:family is/has a owl:ObjectProperty . gnt:family is/has rdfs:domain gnc:species . gnt:family is/has skos:definition This resource belongs to this family . gnt:family is/has rdfs:domain gnc:set . ", "gnt:short_name is/has a owl:ObjectProperty .
You can inspect full implementation details at:
You are an expert in biology and genomics. You excel at leveraging the data or context you have been given to address any user query. Give an accurate and elaborate response to the query below. In addition, provide links that the users can visit to verify information or dig deeper. To build link you must replace RDF prefixes by namespaces. Below is the mapping of prefixes and namespaces: gn => http://rdf.genenetwork.org/v1/id gnc => http://rdf.genenetwork.org/v1/category owl => http://www.w3.org/2002/07/owl gnt => http://rdf.genenetwork.org/v1/term skos = http://www.w3.org/2004/02/skos/core xkos => http://rdf-vocabulary.ddialliance.org/xkos rdf => http://www.w3.org/1999/02/22-rdf-syntax-ns rdfs => http://www.w3.org/2000/01/rdf-schema taxon => http://purl.uniprot.org/taxonomy dcat => http://www.w3.org/ns/dcat dct => http://purl.org/dc/terms xsd => http://www.w3.org/2001/XMLSchema sdmx-measure => http://purl.org/linked-data/sdmx/2009/measure qb => http://purl.org/linked-data/cube pubmed => http://rdf.ncbi.nlm.nih.gov/pubmed v => http://www.w3.org/2006/vcard/ns foaf => http://xmlns.com/foaf/0.1 geoSeries => http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc Do not make any mistakes.
It is useful to control the system by defining an output format. This should also help parse output to other tools when the time comes.
Reviewing the options:
Went with Option (c).
class Information(BaseModel):
"""Extract relevant information for query"""
answer: str = Field(description="Specific point addressing the query from the context")
links: List[str] = Field(description="All links associated to RDF entities related to the point")
class ListInformation(BaseModel):
"""Address recursively a query"""
detailed_answers: List[Information] = Field(description="List of answers to the query")
final_answer: str = Field(description="Synthesized and comprehensive answer using detailed answers")
class Generate(dspy.Signature):
"""Wrap generation interface"""
context: list = dspy.InputField(desc="Background information")
input_text: str = dspy.InputField(desc="Query and instructions")
feedback: ListInformation = dspy.OutputField(desc="System response to the query")
See the workings at:
You excel at addressing search query using the context you have. You do not mistakes. Extract answers to the query from the context and provide links associated with each RDF entity. To build links you must replace RDF prefixes by namespaces. Here is the mapping of prefixes and namespaces: gn => http://rdf.genenetwork.org/v1/id gnc => http://rdf.genenetwork.org/v1/category owl => http://www.w3.org/2002/07/owl gnt => http://rdf.genenetwork.org/v1/term skos = http://www.w3.org/2004/02/skos/core xkos => http://rdf-vocabulary.ddialliance.org/xkos rdf => http://www.w3.org/1999/02/22-rdf-syntax-ns rdfs => http://www.w3.org/2000/01/rdf-schema taxon => http://purl.uniprot.org/taxonomy dcat => http://www.w3.org/ns/dcat dct => http://purl.org/dc/terms xsd => http://www.w3.org/2001/XMLSchema sdmx-measure => http://purl.org/linked-data/sdmx/2009/measure qb => http://purl.org/linked-data/cube pubmed => http://rdf.ncbi.nlm.nih.gov/pubmed v => http://www.w3.org/2006/vcard/ns foaf => http://xmlns.com/foaf/0.1 geoSeries => http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc \n
Trait Id and dataset can be linked to a result page. See this URL:
Trait id: 10027; dataset name: BXDPublish. In RDF this is:
That trait has an alias encoded as "owl:equivalentClass BXDPublish_10027."
To build a result page from RDF, we need a trait's unique identifer which can be queried from RDF.
You excel at addressing search query using the context you have. You do not make mistakes. Extract answers to the query from the context and provide links associated with each RDF entity. To build links you must replace RDF prefixes by namespaces. Here is the mapping of prefixes and namespaces: gn => http://rdf.genenetwork.org/v1/id gnc => http://rdf.genenetwork.org/v1/category owl => http://www.w3.org/2002/07/owl gnt => http://rdf.genenetwork.org/v1/term skos = http://www.w3.org/2004/02/skos/core xkos => http://rdf-vocabulary.ddialliance.org/xkos rdf => http://www.w3.org/1999/02/22-rdf-syntax-ns rdfs => http://www.w3.org/2000/01/rdf-schema taxon => http://purl.uniprot.org/taxonomy dcat => http://www.w3.org/ns/dcat dct => http://purl.org/dc/terms xsd => http://www.w3.org/2001/XMLSchema sdmx-measure => http://purl.org/linked-data/sdmx/2009/measure qb => http://purl.org/linked-data/cube pubmed => http://rdf.ncbi.nlm.nih.gov/pubmed v => http://www.w3.org/2006/vcard/ns foaf => http://xmlns.com/foaf/0.1 geoSeries => http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc Link pointing to specific trait should be translated to CD links using the trait id and the dataset name. Original trait link: https://rdf.genenetwork.org/v1/id/trait_BXDPublish_16339 Trait id: 16339 Dataset name: BXDPublish New trait link: https://cd.genenetwork.org/show_trait?trait_id=16339&dataset=BXDPublish \n
Above was enough to get the system to return valid CD links with Claude models :)
Another example:
Query: What are the traits related to the BXD?
System feedback:
{
"detailed_answers": [
{
"answer": "BXD_16337 is a trait related to BXDPublish dataset. It measures Intraocular pressure (IOP) in 1 to 2 months old BXD mice of both sexes, with an average of left and right eyes. The trait has a mean value of 15.27684536576271 mmHg.",
"links": [
"https://cd.genenetwork.org/show_trait?trait_id=16337&dataset=BXDPublish"
]
},
{
"answer": "BXD_16338 is a trait related to BXDPublish dataset. It measures Intraocular pressure (IOP) in 2.1 to 5 months old BXD mice of both sexes, with an average of left and right eyes. The trait has a mean value of 16.091910903133563 mmHg.",
"links": [
"https://cd.genenetwork.org/show_trait?trait_id=16338&dataset=BXDPublish"
]
},
{
"answer": "BXD_16339 is a trait related to BXDPublish dataset. It measures Intraocular pressure (IOP) in 5.1 to 9 months old BXD mice of both sexes, with an average of left and right eyes. The trait has a mean value of 16.563036555975255 mmHg.",
"links": [
"https://cd.genenetwork.org/show_trait?trait_id=16339&dataset=BXDPublish"
]
},
{
"answer": "BXD_16340 is a trait related to BXDPublish dataset. It measures Intraocular pressure (IOP) in 9.1 to 13 months old BXD mice of both sexes, with an average of left and right eyes. The trait has a mean value of 15.659858761411725 mmHg.",
"links": [
"https://cd.genenetwork.org/show_trait?trait_id=16340&dataset=BXDPublish"
]
},
{
"answer": "BXD_16342 is a trait related to BXDPublish dataset. It measures Intraocular pressure (IOP) across all ages (1 to 30 months old) in BXD mice of both sexes, with an average of left and right eyes. The trait has a mean value of 15.797181273159916 mmHg.",
"links": [
"https://cd.genenetwork.org/show_trait?trait_id=16342&dataset=BXDPublish"
]
}
],
"final_answer": "The BXDPublish dataset contains multiple traits related to Intraocular pressure (IOP) measurements in BXD mice. These include: BXD_16337 (IOP in 1-2 months old mice, mean 15.28 mmHg), BXD_16338 (IOP in 2.1-5 months old mice, mean 16.09 mmHg), BXD_16339 (IOP in 5.1-9 months old mice, mean 16.56 mmHg), BXD_16340 (IOP in 9.1-13 months old mice, mean 15.66 mmHg), and BXD_16342 (IOP across all ages 1-30 months, mean 15.80 mmHg). All measurements are from both sexes and represent averages of left and right eyes."
}
Note: For local models e.g. meta-llama/Llama-3.1-8B-Instruct have a high probability of returning broken JSON.
Next thing we want to do is packaging. Previous setup had logic and execution codes mixed. I cleaned that by moving all execution codes to `main.py`. Check it out at:
AI search (GNAIS) can be loaded as a module in any GeneNetwork code and used, provided that parameters for the search are defined.