Edit this page | Blame

Permutations

Currently we use gemma-wrapper to compute the significance level - by shuffling the phenotype vector 1000x. As this is a lengthy procedure we have not incorporated it into the GN web service. The new bulklmm may work in certain cases (genotypes have to be complete, for one).

Because of many changes gemma-wrapper is not working for permutations. I have a few steps to take care of:

R/qtl2 and GEMMA formats

See

One-offs

Phenotypes

For a study Dave handed me phenotype and covariate files for the BXD. Phenotypes look like:

Record ID,21526,21527,21528,21529,21530,21531,21532,21537,24398,24401,24402,24403,24404,24405,24406,24407,24408,24412,27513,27514,27515,27516,
27517
BXD1,18.5,161.5,6.5,1919.450806,3307.318848,0.8655,1.752,23.07,0.5,161.5,18.5,6.5,1919.450806,3307.318848,0.8655,1.752,0.5,32,1.5,1.75,2.25,1.
25,50
BXD100,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x
BXD101,20.6,176.199997,4.4,2546.293945,4574.802734,1.729,3.245,25.172001,0.6,176.199997,20.6,4.4,2546.294189,4574.802734,1.7286,3.2446,0.6,32,
1.875,2.375,2.75,1.75,38
BXD102,18.785,159.582993,6.167,1745.671997,4241.505859,0.771,2.216,22.796667,0.25,159.583328,18.785,6.166667,1745.672485,4241.506348,0.770667,
2.216242,0.25,28.08333,1.5,2,2.875,1.5,28.5
...

which is close to the R/qtl2 format. GEMMA meanwile expects a tab delimited file where x=NA. You can pass in the column number with the -n switch. One thing GEMMA lacks it the first ID which has to align with the genotype file. The BIMBAM geno format, again, does not contain the IDs. See

What we need to do is create and use R/qtl2 format files because they can be error checked on IDs and convert those, again, to BIMBAM for use by GEMMA. In the past I wrote Python converters for gemma2lib:

I kinda abandoned the project, but you can see a lot of functionality, e.g.

We also have bioruby-table as a generic command line tool

which is an amazingly flexible tool and can probably do the same. I kinda abandoned that project too. You know, bioinformatics is a graveyard of projects :/

OK, let's try. The first step is to convert the phenotype file to something GEMMA can use. We have to make sure that the individuals align with the genotype file(!). So, because we work with GN's GEMMA files, the steps are:

  • [X] Read the JSON layout file - 'sample_list' is essentially the header of the BIMBAM geno file
  • [X] Use the R/qtl2-style phenotype file to write a correct GEMMA pheno file (multi column)
  • [X] Compare results with GN pheno output

Running GEMMA by hand it complained

## number of total individuals = 235
## number of analyzed individuals = 26
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs/var        =    21056
## number of analyzed SNPs         =    21056
Calculating Relatedness Matrix ...
rsm10000000001, X, Y, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0.5, 0, 1, 0, 1, 0.5, 0, 1, 0, 0, 0, 1, 1, 0, 0.5, 1, 1, 0.5, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0.5, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0.5, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0.5, 0, 0, 0.5, 0, 1, 0, 1, 0, 0, 1, 0.5, 0, 1, 0, 0.5, 1, 1, 1, 1, 0.5, 0, 0, 0.5, 1, 0.5, 0.5, 0.5, 1, 0.5, 1, 0.5, 0.5, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0.5, 0, 0, 1, 0, 0.5, 1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5
237 != 235
WARNING: Columns in geno file do not match # individuals in phenotypes
ERROR: Enforce failed for not enough genotype fields for marker in src/gemma_io.cpp at line 1470 in BimbamKin

GEMMA on production is fine. So, I counted BXDs. For comparison, GN's pheno outputs 241 BXDs. Daves pheno file has 241 BXDs (good). But when using my script we get 235 BXDs. Ah, apparently they are different from what we use on GN because GN does not use the parents and the F1s for GEMMA. So, my script should complain when a match is not made. Turns out the JSON file only contains 235 'mappable' BXDs and refers to BXD.8 which is from Apr 26, 2023. The header says `BXD_experimental_DGA_7_Dec_2021` and GN says WGS March 2022. So which one is it? I'll just go with latest, but genotype naming is problematic and the headers are not updated.

MOTTO: Always complain when there are problems!

Luckily GEMMA complained, but the script should have also complained. The JSON file with 235 genometypes is not representing the actual 237 genometypes. We'll work on that in the next section.

Meanwhile let's add this code to gemma-wrapper. The code can be found here:

Genotypes

The pheno script now errors with

ERROR: sets differ {'BXD065xBXD102F1', 'C57BL/6J', 'DBA/2J', 'BXD077xBXD065F1', 'D2B6F1', 'B6D2F1'}

Since these are parents and F1s, and are all NAs in Dave's phenotypes, they are easy to remove. So, now we have 235 samples in the phenotype file and 237 genometypes in the genotype file (according to GEMMA). A quick check shows that BXD.geno has 236 genometypes. Same for the bimbam on production. We now have 3 values: 235, 236 and 237. Question is why these do not overlap.

Genotype probabilities for GEMMA

Another problem on production is that we are not using the standard GEMMA values. So GEMMA complains with

WARNING: The maximum genotype value is not 2.0 - this is not the BIMBAM standard and will skew l_lme and effect sizes

This explains why we divide the effect size by 2 in the GN production code. Maybe it is a better idea to fix then geno files!

  • [X] Generate BIMBAM file from GENO .geno files (via R/qtl2)
  • [X] Check bimbam files on production

So we need to convert .geno files as they are the current source of genotypes in GN and contain the sample names that we need to align with pheno files. For this we'll output two files - one JSON file with metadata and sample names and the actual BIMBAM file GEMMA requires. I notice that I actually never had the need to parse a geno file! Zach wrote a tool `gn2/maintenance/convert_geno_to_bimbam.py` that also writes the GN JSON file and I'll take some ideas from that. We'll also need to convert to R/qtl2 as that is what Dave can use and then on to BIMBAM. So, let's add that code to gemma-wrapper again.

This is another tool at

where the generated JSON file helps create the pheno file. We ended up with 237 genometypes/samples to match the genotype file and all of Dave's samples matched. Also, now I was able to run GEMMA successfully and passed in the pheno column number with

gemma -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5
gemma -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -k output/result.cXX.txt -n 5

the pheno file can include the sample names as long as there are no spaces in them. For marker rs3718618 we get values -9 0 X Y 0.317 7.930689e+02 1.779940e+02 1.000000e+05 7.532662e-05. The last value translates to

-Math.log10(7.532662e-05) => 4.123051519468808

and that matches GN's run of GEMMA w.o. LOCO.

The next step is to make the -n switch run with LOCO on gemma-wrapper.

./bin/gemma-wrapper --loco --json --  -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > K.json
./bin/gemma-wrapper --keep --force --json --loco --input K.json -- -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > GWA.json

Checking the output we get

-Math.log10(3.191755e-05) => 4.495970452606926

and that matches Dave's output for LOCO and marker rs3718618. All good, so far. Next step permute.

Permute

Now we have gemma-wrapper working we need to fix it to work with the latest type of files.

  • [X] randomize phenotypes using -n switch
  • [X] Permute gemma and collect results
  • [X] Unseed randomizer or make it an option
  • [X] Fix tmpdir
  • [X] Show final score
  • [ ] Compare small and large BXD set

For the first one, the --permutate-phenotype switch takes the input pheno file. Because we pick a column with gemma we can randomize all input lines together. So, in the above example, we shuffle BXD_pheno_Dave-GEMMA.txt. Interestingly it looks like we are already shuffling by line in gemma-wrapper.

The good news is that it runs, but the outcome is wrong:

["95 percentile (significant) ", 1000.0, -3.0]
["67 percentile (suggestive)  ", 1000.0, -3.0]

Inspecting the phenotype files they are shuffled, e.g.

BXD073xBXD065F1 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
BXD49 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
BXD86 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
BXD161  15.623  142.908997  4.0 2350.637939 3294.824951 1.452 2.08  20.416365 0.363636  142.909088  15.622727 4.0 2350.638672 3294.825928 1.45
1636  2.079909  0.363636  33.545448 2.125 2.0 2.375 1.25  44.5
BXD154  20.143  195.5 4.75  1533.689941 4568.76416  0.727 2.213748  27.9275 0.75  195.5 20.142857 4.75  1533.690796 4568.76416  0.72675 2.2137
48  0.75  54.5  0.75  1.75  3.0 1.5 33.0

which brings out an interesting point. Most BXDs in the genotype file are missing from this experiment. We are computing LOD scores as if we have a full BXD population. So, what we are saying here is that if we have all BXD genotypes and we randomly assign phenotypes against a subset, what is the chance we get a hit at random. I don't think this is a bad assumption, but it not exactly what Gary Churchill had in mind in his 1994 paper:

The idea is to shuffle genotypes against phenotypes. If there is a high correlation we get a result. The idea is to break the correlation and that should work for both the large and the small BXD set. Scoring the best 'random' result out of 1000 permutations at, say 95% highest, sets the significance level. With our new precompute we should be able to show the difference. Anyway, that is one problem, the other is that the stats somehow do not add up to the final result. Score min is set at

The next line says 'if false'. Alright, that explains part of it at least as the next block was disabled for slurm and is never run. I should rip the slurm stuff out, actually, as Arun has come up with a much better solution. But that is for later.

Disabling that permutation stopped with

Add parallel job: time -v /bin/gemma -loco X -k 02fe8482913a998e6e9559ff5e3f1b89e904d59d.X.cXX.txt.cXX.txt -o 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt -p phenotypes-1 -lmm 9 -g BXD-test.txt -n 5 -a BXD.8_snps.txt -outdir /tmp/d20240823-4481-xfrnp6
DEBUG: Reading 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt.1.assoc.txt
./bin/gemma-wrapper:672:in `foreach': No such file or directory @ rb_sysopen - 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt.1.assoc.txt (Errno::ENOENT)

so it created a file, but can't find it because outdir is not shared. Now tmpdir is in the outer block so the file should still exist. For troubleshooting the first step is to seed the randomizer (seed) so we get the same run every time. It turns out there are a number of problems. First of all the permutation output was numbered and the result was not found. Fixing that gave a first result without the -parallel switch:

[0.0008489742, 0.03214928, 0.03426648, 0.0351207, 0.0405179, 0.04688354, 0.0692488, 0.1217158, 0.1270747, 0.1880325]
["95 percentile (significant) ", 0.0008489742, 3.1]
["67 percentile (suggestive)  ", 0.0351207, 1.5]

That is pleasing and it suggests that we have a significant result for the trait of interest: `volume of the first tumor that developed`. Running LOCO withouth parallel is slow (how did we survive in the past!).

The 100 run shows

[0.0001626146, 0.0001993085, 0.000652191, 0.0007356249, 0.0008489742, 0.0009828207, 0.00102203, 0.001091924, 0.00117823, 0.001282312, 0.001471041, 0.001663572, 0.001898194, 0.003467039, 0.004655921, 0.005284387, 0.005628393, 0.006319995, 0.006767502, 0.007752473, 0.008757406, 0.008826192, 0.009018125, 0.009735282, 0.01034488, 0.01039465, 0.0122644, 0.01231366, 0.01265093, 0.01317425, 0.01348443, 0.013548, 0.01399461, 0.01442383, 0.01534904, 0.01579931, 0.01668551, 0.01696015, 0.01770371, 0.01838937, 0.01883068, 0.02011034, 0.02234977, 0.02362105, 0.0242342, 0.02520063, 0.02536663, 0.0266905, 0.02932001, 0.03116032, 0.03139836, 0.03176087, 0.03214928, 0.03348359, 0.03426648, 0.0351207, 0.03538503, 0.0354338, 0.03609931, 0.0371134, 0.03739827, 0.03787489, 0.04022586, 0.0405179, 0.04056273, 0.04076034, 0.04545012, 0.04588635, 0.04688354, 0.04790254, 0.05871501, 0.05903692, 0.05904868, 0.05978341, 0.06103624, 0.06396175, 0.06628317, 0.06640048, 0.06676557, 0.06848021, 0.0692488, 0.07122914, 0.07166011, 0.0749728, 0.08174019, 0.08188341, 0.08647539, 0.0955264, 0.1019648, 0.1032776, 0.1169525, 0.1182405, 0.1217158, 0.1270747, 0.1316735, 0.1316905, 0.1392859, 0.1576149, 0.1685975, 0.1880325]
["95 percentile (significant) ", 0.0009828207, 3.0]
["67 percentile (suggestive)  ", 0.01442383, 1.8]

Not too far off!

The command was

./bin/gemma-wrapper --debug --no-parallel --keep --force --json --loco --input K.json --permutate 100 --permute-phenotype BXD_pheno_Dave-GEMMA.txt -- -lmm 9 -g BXD-test.txt -n 5 -a BXD.8_snps.txt

It is fun to see that when I did a second run the

[100, ["95 percentile (significant) ", 0.0002998286, 3.5], ["67 percentile (suggestive)  ", 0.01167864, 1.9]]

significance value was 3.5. Still, our hit is whopper.

Run permutations in parallel

Next I introduced and fixed parallel support for permutations, now we can run gemma LOCO with decent speed - about 1 permutation per 3s! That is one trait in an hour on my machine.

Now we can run 1000 permutations in an hour, rerunning above we get

["95 percentile (significant) ", 0.0006983356, 3.2]
["67 percentile (suggestive)  ", 0.01200505, 1.9]

which proves that 100 permutations is not enough. It is a bit crazy to think that 5% of randomized phenotypes will get a LOD score of 3.2 or higher!

Down the line I can use Arun's CWL implementation to fire this on a cluster. Coming...

Reduce genotypes for permutations

In the next phase we need to check if shuffling the full set of BXDs makes sense for computing permutations. Since I wrote a script for this exercise to transform BIMBAM genotypes I can reuse that:

If we check the sample names we can write a reduced genotype matrix. Use that to compute the GRM. Next permute with the smaller BXD sample set and genotypes.

Instead of modifying above script I decided to add another one

bimbam-filter.py --json BXD.geno.json --sample-file BXD_pheno_Dave-GEMMA-samples.txt BXD_geno.txt > BXD_geno-samples.txt

which takes as inputs the json file from gn-geno-to-gemma and the GEMMA input file. This is not to mix targets and keeping the code simple. Now create the GRM with

./bin/gemma-wrapper --loco --json --  -gk -g BXD_geno-samples.txt -p BXD_pheno_Dave-GEMMA-samples.txt -n 5 -a BXD.8_snps.txt > K-samples.json
./bin/gemma-wrapper --keep --force --json --loco --input K-samples.json -- -lmm 9 -g BXD_geno-samples.txt -p BXD_pheno_Dave-GEMMA-samples.txt -n 5 -a BXD.8_snps.txt > GWA-samples.json

Now the hit got reduced:

-Math.log10(1.111411e-04)
=> 3.9541253091741235

and with 1000 permutations

./bin/gemma-wrapper --debug --parallel --keep --force --json --loco --input K-samples.json --permutate 1000 --permute-phenotype BXD_pheno_Dave-GEMMA-samples.txt -- -lmm 9 -g BXD_geno-samples.txt -n 5 -a BXD.8_snps.txt
["95 percentile (significant) ", 0.0004184217, 3.4]
["67 percentile (suggestive)  ", 0.006213012, 2.2]

we are still significant. Though the question is now why results differ so much, compared to using the full BXD genotypes.

Why do we have a difference with the full BXD genotypes?

GEMMA strips out the missing phenotypes in a list. Only the actual phenotypes are used. We need to check how the GRM is used and what genotypes are used by GEMMA. For the GRM the small genotype file compares vs the large:

Samples           small    large
BXD1  <->  BXD1   0.248    0.253
BXD24 <->  BXD24  0.255    0.248
BXD1  <->  BXD24 -0.040   -0.045
BXD1  <->  BXD29  0.010    0.009

You can see there is a small difference in the computation of K even though it looks pretty close. This is logical because with the full BXD set all genotypes are used. With a smaller BXD set only those genotypes are used. We expect a difference in values, but not much of a difference in magnitude (shift). The only way to prove that K impacts the outcome is to take the larger matrix and reduce it to the smaller one using those values. I feel another script coming ;)

Above numbers are without LOCO. With LOCO on CHR18

Samples            small    large
BXD1  <->  BXD1    0.254    0.248
BXD1  <->  BXD24  -0.037    -0.042

again a small shift. OK, let's try computing with a reduced matrix and compare results for rs3718618. Example:

gemma -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt -o full-bxd
gemma -lmm 9 -k output/full-bxd.cXX.txt -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt -o full-bxd

we get three outcomes where full-bxd is the full set,

output/full-bxd.assoc.txt:18              rs3718618 7.532662e-05
output/full-reduced-bxd.assoc.txt:18      rs3718618 2.336439e-04
output/small-bxd.assoc.txt:18             rs3718618 2.338226e-04

even without LOCO you can see a huge jump for the full BXD kinship matrix, just looking at our hit rs3718618:

-Math.log10(7.532662e-05)
=> 4.123051519468808
-Math.log10(2.338226e-04)
=> 3.631113514641496

With LOCO the difference may be even greater.

So, which one to use? Truth is that the GRM is a blunt instrument. Essentially every combination of two samples/strains/genometypes gets compressed into a single number that gives a distance between the genomes. This number represents a hierarchy of relationships computed in differences in DNA (haplotypes) between those individuals. The more DNA variation is represented in the calculation, the more 'fine tuned' this GRM matrix becomes. Instinctively the larger matrix, or full BXD population, is a better estimate of distance between the individuals than just using a subset of DNA.

So, I still underwrite using the full BXD for computing the GRM. To run GEMMA, I have just proven we can use the reduced GRM which will be quite a bit faster too, as the results are the same. For permutations we *should* use the reduced form of the full BXD GRM as it does not make sense to shuffle phenotypes against BXDs we don't use. So I need to recompute that.

Recomputing significance with the reduced GRM matrix

  • [ ] Recomute significance with reduced GRM

I can reuse the script I wrote for the previous section.

So, the idea is to rerun permutations with the small set, but with the reduced GRM from the full BXD population. That ought to be straightforward by using the new matrix as an input for GWA. Only problem is that LOCO generates a GRM for every chromosome, so we need to make gemma-wrapper aware about the matrix reduction. As the reduction is fast we can do it for every run of gemma-wrapper and destroy it automatically with tmpdir. So:

  • [X] Compute the full GRM for every LOCO (if not cached) - already part of gemma-wrapper
  • [ ] Run through GRMs and reduce them in tmpdir
  • [ ] Plug new GRM name into computations - which really updates the JSON file that is input for GWA

The interesting bit is that GEMMA requires input of phenotypes, but does not use them to compute the GRM.

After giving it some thought we want GRM reduction to work in production GN because of the speed benefit. That means modifying gemma-wrapper to take a list of samples/genometypes as input - and we'll output that with GN. It is a good idea anyhow because it can give us some improved error feedback down the line.

We'll use the --input switch to gemma-wrapper by providing the full list of genometypes that are used to compute the GRM and the 'reduced' list of genometypes that are used to reduce the GRM and compute GWA after. So the first step is to create this JSON input file. We already created the "gn-geno-to-gemma" output that has a full list of samples as parsed from the GN .geno file. Now we need a script to generate the reduced samples JSON and merge that to "gn-geno-to-gemma-reduced" by addind a "samples-reduced" vector.

The rqtl2-pheno-to-gemma.py script I wrote above already takes the "gn-geno-to-gemma" JSON. It now adds to the JSON:

  "samples-column": 2,
  "samples-reduced": {
    "BXD1": 18.5,
    "BXD24": 27.510204,
    "BXD29": 17.204,
    "BXD43": 21.825397,
    "BXD44": 23.454,
    "BXD60": 22.604,
    "BXD63": 19.171,
    "BXD65": 21.607,
    "BXD66": 17.056999,
    "BXD70": 17.962999,
    "BXD73b": 20.231001,
    "BXD75": 19.952999,
    "BXD78": 19.514,
    "BXD83": 18.031,
    "BXD87": 18.258715,
    "BXD89": 18.365,
    "BXD90": 20.489796,
    "BXD101": 20.6,
    "BXD102": 18.785,
    "BXD113": 24.52,
    "BXD124": 21.762142,
    "BXD128a": 18.952,
    "BXD154": 20.143,
    "BXD161": 15.623,
    "BXD210": 23.771999,
    "BXD214": 19.533117
  },
  "numsamples-reduced": 26

which is kinda cool because now I can reduce and write the pheno file in one go. Implementation:

OK, we are going to input the resulting JSON file into gemma-wrapper. At the GRM stage we ignore the reduction but we need to add these details to the outgoing JSON. So the following commands can run:

./bin/gemma-wrapper --loco --json --input BXD_pheno_Dave-GEMMA.txt.json -- -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > K.json

where K.json has a json["input"] which essentially is above structure.

./bin/gemma-wrapper --keep --force --json --loco --input K.json -- -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > GWA.json

Now I have to deal with phenotype files as they are rewritten. We should still cater for `-p` for GEMMA. We already have `--permute-phenotypes filen` for gemma-wrapper. Now we are adding `--phenotypes` to gemma-wrapper which replaces both! Note that we can use -p if --phenotypes is NOT defined. Problem is we have a few paths now:

  • [X] Check phenotypes are directly passed into GEMMA with -p switch
  • [ ] Check phenotypes are passed in as a file with --phenotypes switch
  • [ ] Check phenotypes are coming in using the JSON file

Fixed the first one with

though that does not do caching (yet). Next stop doing LOCO I notice xz is phenomenally slow. Turns out it was not xz, but when using `tar -C` we switch into the path and somehow xz kept growing its output.

Dealing with epoch

Rob pointed out that the GRM does not necessarily represent epoch and that may influence the significance level. I.e. we should check for that. I agree that the GRM distances are not precise enough (blunt instrument) to capture a few variants that appeared in a new epoch of mice. I.e., the mice from the 90s may be different from the mice today in a few DNA variants that won't be reflected in the GRM.

  • [ ] Deal with epoch

We have two or more possible solutions to deal with hierarchy in the population.

Covariates

  • [ ] Try covariates Dave

Later

  • [ ] Fix running of -p switch when assoc cache exists (bug)
(made with skribilo)