Speed Up QC on R/qtl2 Bundles

Description

The default format for the CSV files in a R/qtl2 bundle is:

matrix of individuals × (markers/phenotypes/covariates/phenotype covariates/etc.)

(A) (f/F)ile(s) in the R/qtl2 bundle could however

be transposed,

which means the system needs to "un-transpose" the file(s) before processing.

Currently, the system does this by reading all the files of a particular type, and then "un-transposing" the entire thing. This leads to a very slow system.

This issue proposes to do the quality control/assurance processing on each file in isolation, where possible - this will allow parallelisation/multiprocessing of the QC checks.

The main considerations that need to be handled are as follows:

Do QC on (founder) genotype files (when present) before any of the other files
Genetic and physical maps (if present) can have QC run on them after the genotype files
Do QC on phenotype files (when present) after genotype files but before any other files
Covariate and phenotype covariate files come after the phenotype files
Cross information files … ?
Sex information files … ?

We should probably detail the type of QC checks done for each type of file

Speed Up QC on R/qtl2 Bundles

Tags

Description