Edit this page | Blame

Partial Correlation

Description

Go to

GN1 on Lily

Search for traits 'synap*'

Select all and Add in search results.

Pick 3 and hit 'Partial'

Put one each in X, Y and Z columns

And compute against database (lower half).

That gives you a list of hits.

Members

fredm
pjotp
alex

Notes

2021-10-15: fredm

#### UI Exploration

The example in the description above says to pick 3 traits from the search results to use for the partial correlation.

If you, however, find need to pick more than 3 traits, the following things are of note:

There can only ever be a single **Primary** trait: if the user selects more than one, indicate an error
At least one control trait MUST be selected. If there is no selected control trait, indicate an error
There can only ever be a maximum of 3 **Control** traits: if the user selects more than three, indicate an error
For **Pearson's r** and **Spearman's rho** computations (the buttons immediately below the table), there must be at least one Target trait. If the user does not provide any, indicate an error

In the lower half, the user can select the database against which the partial correlations are to be run. I still need to figure what this is about - probably read the code and ask a lot of questions as necessary.

The user can also select what to calculate, from the following list:

Genetic Correlation, Pearson's r
Genetic Correlation, Spearman's rho
SGO Literature Correlation
Tissue Correlation, Pearson's r
Tissue Correlation, Spearman's rho

There is no indication,on the UI, of the difference between the **Pearson's r** and **Spearman's rho** with the buttons, and those in the list in the lower half of the page.

The user can also select the total number of results to return

##### Possible Correlation Errors

Tissue Correlation Error
Literature Correlation Error

Maybe there is even a **Genetic Correlation Error**, but I am yet to run into it - I am simply extrapolating from my exploration with the existing system.

#### Code Exploration

These are some of the (possibly) relevant files in GN1 unearthed so far:

/web/webqtl/main.py: Entry-point -> goes to SearchResultPage then the ShowTraitPage/Collections page
/web/webqtl/correlation/PartialCorrInputPage.py: from collections on clicking "Partial"
/web/webqtl/cmdLine/cmdPartialCorrelationPage.py: from PartialCorrInputPage on clicking "Calculate" in the form in the second half of the page
/web/webqtl/correlation/PartialCorrTraitPage.py: from PartialCorrInputPage on clicking "Pearson's r" or "Spearman's rho" buttons in the first half of the page
/web/webqtl/correlation/PartialCorrDBPage.py: from cmdPartialCorrelationPage: form data is saved to file and used via CLI to compute the correlations - This seems to be the results page

The relevant javascript files used on GN1 are:

webqtl.js: see `databaseFunc` and `validateTrait` functions
jqueryFunction.js: see `validateTraitNumber` function

2021-10-18: fredm

#### UI Explorations

Some extra notes on UI that were not noted down:

A trait can ONLY EVER BE in one category, i.e either in **Primary**, **Control**, **Target** or **Ignored** but not in more that one of those categories.

#### Code Exploration

**RISet** values should be interpreted as **group** **strain** values should be interpreted as **sample**

There's some UI setup code after selection of the traits to use for the partial correlation. The UI setup code will probably be migrated to GN2, leaving the heavy-lifting "Partial Correlations" code, that I assume exists, for migration to GN3.

For GN3, it seems like we can simply reuse some of the trait-retrieval code migrated over when working on the clustered heatmaps.

2021-12-22: fredm: Analysis of profiling information

The greatest amount of time is spent in the database accessing code. The two biggest culprits are:

~gn3.db.traits.retrieve_trait_info~
~gn3.db.datasets.retrieve_trait_dataset~

Possible optimisation ideas are as follows:

Most traits share common datasets. For all traits sharing a common dataset, it does not make sense to fetch the dataset multiple times. Instead, we should be able to fetch it once and share among all the traits that are from that dataset.
We can also try fetching the traits in a single call, instead of looping through them, fetching one at a time.