Edit this page | Blame

Speed Up QC on R/qtl2 Bundles

Tags

Description

The default format for the CSV files in a R/qtl2 bundle is:

matrix of individuals × (markers/phenotypes/covariates/phenotype covariates/etc.)

(A) (f/F)ile(s) in the R/qtl2 bundle could however

which means the system needs to "un-transpose" the file(s) before processing.

Currently, the system does this by reading all the files of a particular type, and then "un-transposing" the entire thing. This leads to a very slow system.

This issue proposes to do the quality control/assurance processing on each file in isolation, where possible - this will allow parallelisation/multiprocessing of the QC checks.

The main considerations that need to be handled are as follows:

  • Do QC on (founder) genotype files (when present) before any of the other files
  • Genetic and physical maps (if present) can have QC run on them after the genotype files
  • Do QC on phenotype files (when present) after genotype files but before any other files
  • Covariate and phenotype covariate files come after the phenotype files
  • Cross information files … ?
  • Sex information files … ?

We should probably detail the type of QC checks done for each type of file

(made with skribilo)