in some cases correlaton return fewer number of results than are required
an example of such a case is computing the below dataset against the against BXD Phenotype, gets 477 results when you select Top 500
Probably causes are :-
The samplelist issue doesn't appear to be causing the issue with fewer results, since it still exists after the fix. There also seems to be an additional - or related - issue where it's either returning wrong results or not returning the actual top results (or both)
Using the sample example above, after the change, the first result has a sample(r) of 0.265. This isn't the top result when run in GN1. There also appears to be a mismatch between the result displayed in the table and the r displayed in the scatterplot (what you see if you click the sample(r) link); those should be roughly the same.
An additional error has been reported by Beni where there's an error about NoneType being passed into string formatting (so I think it's returning None for some results). Steps to reproduce are below: - https://genenetwork.org/show_trait?trait_id=ENSMUST00000031535&dataset=UTHSC-BXD-Harv_Liv-1019 - Correlate against the default dataset (same one as the trait)
issue on handling non float values while parsing addressed on this Pr
with a selection of top 500 results I got the following:
I think the issue here is the sequence of events - the system takes the top 500 results, and then applies the given filters, rather than applying the filters first, then selecting the top 500 of the filtered results.
Pull requests to fix this issue:
An error was reported involving a ProbeSet vs. ProbeSet correlation (which is notable since I thought the issue was only when non-ProbeSet datasets were involved).
with correlation against the default (same) dataset, gives a "IndexError: list index out of range" error