Edit this page | Blame

Phenotype Correlation Error

Tags

  • type: bug
  • priority: high
  • status: closed
  • assigned: fredm, zachs
  • keywords: correlation

Fixed: Correlation against phenotypes fails (for at least some datasets)

Example: Run a correlation against BXD Published Phenotypes (at the top of the drop-down menu) from here -

The bug appears to occur in the rust correlation tool, so I'm not sure how to debug it myself. The last few linnes of the stack trace are as follows:

  File "/export2/local/home/zas1024/gn2-zach/gene/wqflask/wqflask/correlation/rust_correlation.py", line 262, in __compute_sample_corr__
    return run_correlation(
  File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/gn3/computations/rust_correlation.py", line 58, in run_correlation
    subprocess.run(command_list, check=True)
  File "/gnu/store/qar3sks5fwzm91bl3d3ngyrvxs7ipj5z-python-3.9.9/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/local/guix-profiles/gn-latest-20220820/bin/correlation_rust', '/home/zas1024/gn2-zach/tmp/gn2/correlation/IoaglmTgDJ.json', '/home/zas1024/gn2-zach/tmp/gn2']' died with <Signals.SIGSEGV: 11>.

Fixed: Processing for Output too Early

After fixing the issues with the interactions with the rust correlations code, I am now running into the following error when I run a correlation against the "Hippocampus Consortium M430v2 (Jun06) PDNN" dataset with the same trait from the URI above:

Traceback (most recent call last):
  File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/views.py", line 820, in corr_compute_page
    correlation_results = set_template_vars(request.form, correlation_results)
  File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/show_corr_results.py", line 54, in set_template_vars
    table_json = correlation_json_for_table(correlation_data,
  File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/show_corr_results.py", line 104, in correlation_json_for_table
    target_trait_ob = create_trait(dataset=target_dataset_ob,
  File "/home/frederick/genenetwork/genenetwork2/wqflask/base/trait.py", line 44, in create_trait
    the_trait = retrieve_trait_info(
  File "/home/frederick/genenetwork/genenetwork2/wqflask/base/trait.py", line 599, in retrieve_trait_info
    raise KeyError(repr(trait.name)
KeyError: "'1' information is not found in the database."

The error above was caused by processing the data for output way too early. This has been fixed.

Tissue Correlation: Probeset Trait Against Publish/Genotype Dataset

Running "Tissue" correlations on

against the "BXD Published Phenotypes" database fails with the error:

This also fails if you run it against the "BXD Genotypes" dataset.

Traceback (most recent call last):
  File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/gn2/production/gene/wqflask/wqflask/views.py", line 820, in corr_compute_page
    correlation_results = set_template_vars(request.form, correlation_results)
  File "/home/gn2/production/gene/wqflask/wqflask/correlation/show_corr_results.py", line 54, in set_template_vars
    table_json = correlation_json_for_table(correlation_data,
  File "/home/gn2/production/gene/wqflask/wqflask/correlation/show_corr_results.py", line 104, in correlation_json_for_table
    target_trait_ob = create_trait(dataset=target_dataset_ob,
  File "/home/gn2/production/gene/wqflask/base/trait.py", line 44, in create_trait
    the_trait = retrieve_trait_info(
  File "/home/gn2/production/gene/wqflask/base/trait.py", line 599, in retrieve_trait_info
    raise KeyError(repr(trait.name)
KeyError: "'1422223_at' information is not found in the database."

so far, triangulated the issue to possibly being the fact that the "target_dataset" value is not used

Literature Correlation: Probeset Trait Against Publish/Genotype Dataset

Run literature correlation for

against the "BXD Published Phenotype" database and observe the following exception:

This also fails if you run it against the "BXD Genotypes" dataset.

ERROR:wqflask:http://localhost:5033/corr_compute ( 4:26AM UTC Oct 03, 2022)
Traceback (most recent call last):
  File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/views.py", line 818, in corr_compute_page
    correlation_results = compute_correlation(request.form, compute_all=True)
  File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/correlation_gn3_api.py", line 199, in compute_correlation
    return compute_correlation_rust(
  File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/rust_correlation.py", line 326, in compute_correlation_rust
    results = corr_type_fns[corr_type](
  File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/rust_correlation.py", line 299, in __compute_lit_corr__
    (this_trait_geneid, geneid_dict, species) = do_lit_correlation(
  File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/correlation_gn3_api.py", line 237, in do_lit_correlation
    geneid_dict = this_dataset.retrieve_genes("GeneId")
AttributeError: 'PhenotypeDataSet' object has no attribute 'retrieve_genes'

The literature correlations computation calls the `retrieve_genes` method, that is only present in the `base.data_set.mrnaassaydataset.MrnaAssayDataSet` class, which handles traits of type "ProbeSet".

The code seems to imply that we should not run literature correlations against any dataset that is not of type "ProbeSet".

Some Reflections

The `target_dataset` is not used in the

In my (fredm) work on partial correlations, before doing the computations,

that were run.

Should these be present for the full correlations too?

The failures above with the Publish/Genotype datasets implies one of two things:

  • The code is not general enough, or
  • We need to handle the exceptions, and present the selection errors as appropriate.

Better yet, we should probably not present invalid data to the user, i.e. do not present user with a dataset which would lead to errors if a correlation of a particular type is run against it with the given trait.

Trial Against GN1

@zsloan @alexm: Running the failing tissue and literature correlations above with the same trait against the "BXD Published Phenotypes" and the "BXD Genotypes" on

I got the error

Wrong correlation type

    Sorry! Error occurred while processing your request.

    The nature of the error generated is as follows:

    Correlation Type Error :

        It is not possible to compute the Tissue Correlation (Pearson's r) between your trait and data in this BXDGeno database. Please try again after selecting another type of correlation.

for the tissue correlations and

Wrong correlation type

    Sorry! Error occurred while processing your request.

    The nature of the error generated is as follows:

    Correlation Type Error :

        It is not possible to compute the SGO Literature Correlation between your trait and data in this BXDPublish database. Please try again after selecting another type of correlation.

for the literature correlations.

My initial hunch was correct. We should not be running the tissue and literature correlations in the way we were in the cases above.

We now need to check for these combinations and display an error for the user, as is done in GN1

The error reported above

raise KeyError(
KeyError: "'1' information is not found in the database for dataset 'HC_M2_0606_P' with id '112'."

causes the correlation below to fail for maintainability and to fix current bugs this code that does preprocessing of data needs to be modified thats is :-

  • Tissue correlation data
  • top n sample correlation data
  • top n tissue correlation data

Tags

  • assigned: alexm, fredm, zsloan
  • type: bug
  • keywords: correlations
  • status: closed, completed
  • priority: high

Notes

  • 2022-09-29: Successfully reproduced on production
  • 2022-09-29: Fix file format issues
  • 2022-09-30: Fix issues in rust correlation code
  • 2022-10-03: Fix: avoid processing for output early
(made with skribilo)