We have:
We want to make that entire site is easily navigatable; and make all terms self-explanatory. Beyond that get rid of all the blank nodes. Think of navigation akin to menu navigation from a website.
Useful reference for the right queries from GN1:
From this root node, one should be able to navigate to species, inbred-set groups, and datasets (TODO).
Terms:
Note: In the main GN page, we don't list IndbredSet groups that don't have a family:
mysql> select count(*) FROM InbredSet where Family IS NULL; +----------+ | count(*) | +----------+ | 33 | +----------+ 1 row in set (0.01 sec)
Checking for trait and co-factors:
SELECT
s.Name AS species_name,
i.Id AS inbredset_id,
i.Name AS inbredset_name,
'Traits and Cofactors' AS dataset_type
FROM Species s
JOIN InbredSet i
ON i.SpeciesId = s.Id
JOIN PublishFreeze p
ON p.InbredSetId = i.Id
WHERE p.Name = CONCAT(i.Name, 'Publish');
Checking for DNA Markers and SNPs:
SELECT
s.Name AS species_name,
i.Id AS inbredset_id,
i.Name AS inbredset_name,
'DNA Markers and SNPs' AS dataset_type
FROM Species s
JOIN InbredSet i
ON i.SpeciesId = s.Id
JOIN GenoFreeze g
ON g.InbredSetId = i.Id
WHERE g.Name = CONCAT(i.Name, 'Geno');
Checking for Molecular Traits:
SELECT DISTINCT
s.Name AS species_name,
i.Name AS inbredset_name,
t.Name AS dataset_type,
psf.FullName as dataset_full_name
FROM Species s
JOIN InbredSet i
ON i.SpeciesId = s.Id
JOIN ProbeFreeze pf
ON pf.InbredSetId = i.Id
JOIN ProbeSetFreeze psf
ON psf.ProbeFreezeId = pf.Id
JOIN Tissue t
ON pf.TissueId = t.Id
WHERE s.Name = 'rat' AND i.Name = 'HXBBXH' AND
psf.public > 0 AND t.Name = 'Adipose mRNA' GROUP BY s.Name, i.Name
ORDER BY
s.Name, i.Name, t.Name, psf.FullName;
Markers that belong to more than one species:
SELECT
Geno.*,
COUNT(*) AS cnt
FROM Geno
GROUP BY
Name,
Marker_Name
HAVING COUNT(*) > 1;
From above results we can confirm:
SELECT Geno.* FROM Geno WHERE Marker_Name IN ("D11Mit2", "D11Mit2", "D12Mit1", "D3Mit17") ORDER BY Marker_Name\G
We see that the markers we have markers that can belong to more than one species.
Counting markers per public GenoFreeze:
SELECT
gf.Name AS GenoFreezeName,
COUNT(DISTINCT g.Marker_Name) AS MarkerCount
FROM GenoFreeze gf
INNER JOIN InbredSet i
ON i.Id = gf.InbredSetId
INNER JOIN Species s
ON s.Id = i.SpeciesId
INNER JOIN Geno g
ON g.SpeciesId = s.Id
WHERE
gf.public > 0
AND g.Marker_Name IS NOT NULL
GROUP BY
gf.Name
ORDER BY
MarkerCount DESC;
Results:
+-----------------------------+-------------+ | GenoFreezeName | MarkerCount | +-----------------------------+-------------+ | AD-cases-controls-MyersGeno | 367403 | | BDF2-2005Geno | 120531 | | BXD-MicturitionGeno | 120531 | | CTB6F2Geno | 120531 | | Linsenbardt-BoehmGeno | 120531 | | AXBXAGeno | 120531 | | B6MRLF2-D2MRLF2Geno | 120531 | | BXD-JAX-ADGeno | 120531 | | CCGeno | 120531 | | SOTNOT-OHSUGeno | 120531 | | B6D2F2-PSUGeno | 120531 | | BHHBF2Geno | 120531 | | BXDGeno | 120531 | | DOD-BXD-GWIGeno | 120531 | | BDF2-1999Geno | 120531 | | BXD-MBD-UTHSCGeno | 120531 | | UTHSC-CannabinoidGeno | 120531 | … | HET3-ITPGeno | 120531 | | MDPGeno | 120531 | | HSNIH-PalmerGeno | 29518 | | HRDP_HXB-BXH-BPGeno | 29518 | | NWU_WKYxF344_F2Geno | 29518 | | HXBBXHGeno | 29518 | | MAGIC_LinesGeno | 8933 | | J12XJ58F11Geno | 4938 | | SXMGeno | 792 | | ColXCviGeno | 133 | | BayXShaGeno | 133 | | ColXBurGeno | 133 | +-----------------------------+-------------+
Same markers. Not feasible to have a fan out from genotypes -> markers; too much repetition.
We only have 532,248 markers:
> SELECT COUNT(*) FROM Geno; +----------+ | count(*) | +----------+ | 532248 | +----------+
Instead, present the number of markers. Link the snps/dna-markers to species. Show how to access them.
Genotypes and markers are different but related. Different Species can have different markers