Edit this page | Blame

Simplify `dataset.py` in GeneNetwork2


  • assigned:
  • priority: medium
  • type: bug
  • status: pending
  • keywords: technical debt


The entire file

is a mess, and we need to chunk it out into smaller logic.

As part of this, the idea is to begin with the

and split it into various chunks, that

  • compute the `self.sample_list`
  • retrieve `sample_ids` values from the database using the `self.sample_list` values computed above
  • retrieve `trait_sample_data` from `sample_ids` retrieved above. This can have a number of helper function to compute the appropriate queries for each of the `dataset_type` values ("Publish", "Geno", "ProbeSet", "Temp")
  • compute the `self.trait_data` from the `trait_sample_data` above

We can split each of the steps above into one or more methods.

To help with moving away from using classes, we can ensure that each of these methods returns the values computed/retrieved (in addition to setting the class member variables).

(made with skribilo)