Edit this page | Blame

Correlations: Returning Fewer Results

Tags

  • assigned: alexm
  • priority: high
  • status: ongoing
  • keywords: correlations,fewer results

Description

in some cases correlaton return fewer number of results than are required

an example of such a case is computing the below dataset against the against BXD Phenotype, gets 477 results when you select Top 500

Notes

Updates

The samplelist issue doesn't appear to be causing the issue with fewer results, since it still exists after the fix. There also seems to be an additional - or related - issue where it's either returning wrong results or not returning the actual top results (or both)

Using the sample example above, after the change, the first result has a sample(r) of 0.265. This isn't the top result when run in GN1. There also appears to be a mismatch between the result displayed in the table and the r displayed in the scatterplot (what you see if you click the sample(r) link); those should be roughly the same.

An additional error has been reported by Beni where there's an error about NoneType being passed into string formatting (so I think it's returning None for some results). Steps to reproduce are below: - https://genenetwork.org/show_trait?trait_id=ENSMUST00000031535&dataset=UTHSC-BXD-Harv_Liv-1019 - Correlate against the default dataset (same one as the trait)

Notes 13/11/22

issue on handling non float values while parsing addressed on this Pr

Notes 2022-11-23

with a selection of top 500 results I got the following:

  • Without changing any of the filters, I get 500 results
  • Location: Chr: 5, Mb: 105 to 107 => I get 18 results
  • Location: Chr: 5, Mb: 70 to 150 => I get 327 results

I think the issue here is the sequence of events - the system takes the top 500 results, and then applies the given filters, rather than applying the filters first, then selecting the top 500 of the filtered results.

Pull requests to fix this issue:

Notes 2022-11-25

An error was reported involving a ProbeSet vs. ProbeSet correlation (which is notable since I thought the issue was only when non-ProbeSet datasets were involved).

with correlation against the default (same) dataset, gives a "IndexError: list index out of range" error

(made with skribilo)