Edit this page | Blame

Inspect Discrepancies Between XAPIAN and SQL Search.

Description

When doing a XAPIAN search, we miss some data that is unavailable from the SQL Search. The searches we tested:

For the above search, we get 31 results.

For the above search, we get 26 results.

We miss the following entries from the XAPIAN search:

15	1423803_s_at	Gltscr2	glioma tumor suppressor candidate region gene 2
16	1451121_a_at	Gltscr2	glioma tumor suppressor candidate region 2; exons 8 and 9
17	1452409_at	Gltscr2	glioma tumor suppressor candidate region gene 2
25	1416556_at	Sas	sarcoma amplified sequence
26	1430029_a_at	Sas	sarcoma amplified sequence

We want to work out why the above miss in the xapian documents for the given trait. To do that we first use quest to search for one of the symbols to get the exact doc-id:

quest --msize=2 -s en --boolean-prefix="iden:Qgene:" \
"iden:"1423803_s_at:hc_m2_0606_p"" --db=/export/data/genenetwork-xapian/

Parsed Query: Query(0 * Qgene:1423803_s_at:hc_m2_0606_p)                                                                      Exactly 1 matches                                                                                                             MSet:                                                                                                                         9665867: [0]                                                                                                                  {"name": "1423803_s_at", "symbol": "Gltscr2", "description": "glioma tumor suppressor candidate region gene 2", "chr": "1", "mb": 4.687986, "dataset": "HC_M2_0606_P", "dataset_fullname": "Hippocampus Consortium M430v2 (Jun06) PDNN", "species": "mouse", "group": "BXD", "tissue": "Hippocampus mRNA", "mean": 11.749030303030299, "lrs": 11.3847971289981, "additive": -0.0650828877005346, "geno_chr": "5", "geno_mb": 137.010795}

Inspecting the doc-id in XAPIAN, see:

bonfacem@tux02 /export5/xapian-test/xapian-07-04 $ xapian-delve -r 9665867 -d /export/data/genenetwork-xapian/

Data for record #9665867:
{"name": "1423803_s_at", "symbol": "Gltscr2", "description": "glioma tumor suppressor candidate region gene 2", "chr": "1", "mb": 4.687986, "dataset": "HC_M2_0606_P", "dataset_fullname": "Hippocampus Consortium M430v2 (Jun06) PDNN", "species": "mouse", "group": "BXD", "tissue": "Hippocampus mRNA", "mean": 11.749030303030299, "lrs": 11.3847971289981, "additive": -0.0650828877005346, "geno_chr": "5", "geno_mb": 137.010795}
Term List for record #9665867: 1423803_s_at 2 5330430h08rik 9430097c02rik Qgene:1423803_s_at:hc_m2_0606_p XC1 XDShc_m2_0606_p XGbxd XIhippocampus XImrna XPC5 XSmouse XTgene XYgltscr2 ZXDShc_m2_0606_p ZXGbxd ZXIhippocampus ZXImrna ZXSmous ZXYgltscr2 Zbc017637 Zbxd Zcandid Zgene Zglioma Zgltscr2 Zhc_m2_0606_p Zhippocampus Zmous Zmrna Zregion Zsuppressor Ztumor bc017637 bxd candidate gene glioma gltscr2 hc_m2_0606_p hippocampus mouse mrna region suppressor tumor 
(made with skribilo)