Example: Run a correlation against BXD Published Phenotypes (at the top of the drop-down menu) from here -
The bug appears to occur in the rust correlation tool, so I'm not sure how to debug it myself. The last few linnes of the stack trace are as follows:
File "/export2/local/home/zas1024/gn2-zach/gene/wqflask/wqflask/correlation/rust_correlation.py", line 262, in __compute_sample_corr__ return run_correlation( File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/gn3/computations/rust_correlation.py", line 58, in run_correlation subprocess.run(command_list, check=True) File "/gnu/store/qar3sks5fwzm91bl3d3ngyrvxs7ipj5z-python-3.9.9/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/usr/local/guix-profiles/gn-latest-20220820/bin/correlation_rust', '/home/zas1024/gn2-zach/tmp/gn2/correlation/IoaglmTgDJ.json', '/home/zas1024/gn2-zach/tmp/gn2']' died with <Signals.SIGSEGV: 11>.
After fixing the issues with the interactions with the rust correlations code, I am now running into the following error when I run a correlation against the "Hippocampus Consortium M430v2 (Jun06) PDNN" dataset with the same trait from the URI above:
Traceback (most recent call last): File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request rv = self.dispatch_request() File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/views.py", line 820, in corr_compute_page correlation_results = set_template_vars(request.form, correlation_results) File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/show_corr_results.py", line 54, in set_template_vars table_json = correlation_json_for_table(correlation_data, File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/show_corr_results.py", line 104, in correlation_json_for_table target_trait_ob = create_trait(dataset=target_dataset_ob, File "/home/frederick/genenetwork/genenetwork2/wqflask/base/trait.py", line 44, in create_trait the_trait = retrieve_trait_info( File "/home/frederick/genenetwork/genenetwork2/wqflask/base/trait.py", line 599, in retrieve_trait_info raise KeyError(repr(trait.name) KeyError: "'1' information is not found in the database."
The error above was caused by processing the data for output way too early. This has been fixed.
Running "Tissue" correlations on
against the "BXD Published Phenotypes" database fails with the error:
This also fails if you run it against the "BXD Genotypes" dataset.
Traceback (most recent call last): File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/guix-profiles/gn-latest-20220820/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "/home/gn2/production/gene/wqflask/wqflask/views.py", line 820, in corr_compute_page correlation_results = set_template_vars(request.form, correlation_results) File "/home/gn2/production/gene/wqflask/wqflask/correlation/show_corr_results.py", line 54, in set_template_vars table_json = correlation_json_for_table(correlation_data, File "/home/gn2/production/gene/wqflask/wqflask/correlation/show_corr_results.py", line 104, in correlation_json_for_table target_trait_ob = create_trait(dataset=target_dataset_ob, File "/home/gn2/production/gene/wqflask/base/trait.py", line 44, in create_trait the_trait = retrieve_trait_info( File "/home/gn2/production/gene/wqflask/base/trait.py", line 599, in retrieve_trait_info raise KeyError(repr(trait.name) KeyError: "'1422223_at' information is not found in the database."
so far, triangulated the issue to possibly being the fact that the "target_dataset" value is not used
Run literature correlation for
against the "BXD Published Phenotype" database and observe the following exception:
This also fails if you run it against the "BXD Genotypes" dataset.
ERROR:wqflask:http://localhost:5033/corr_compute ( 4:26AM UTC Oct 03, 2022) Traceback (most recent call last): File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1523, in full_dispatch_request rv = self.dispatch_request() File "/home/frederick/opt/gn_profiles/gn2_latest/lib/python3.9/site-packages/flask/app.py", line 1509, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/views.py", line 818, in corr_compute_page correlation_results = compute_correlation(request.form, compute_all=True) File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/correlation_gn3_api.py", line 199, in compute_correlation return compute_correlation_rust( File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/rust_correlation.py", line 326, in compute_correlation_rust results = corr_type_fns[corr_type]( File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/rust_correlation.py", line 299, in __compute_lit_corr__ (this_trait_geneid, geneid_dict, species) = do_lit_correlation( File "/home/frederick/genenetwork/genenetwork2/wqflask/wqflask/correlation/correlation_gn3_api.py", line 237, in do_lit_correlation geneid_dict = this_dataset.retrieve_genes("GeneId") AttributeError: 'PhenotypeDataSet' object has no attribute 'retrieve_genes'
The literature correlations computation calls the `retrieve_genes` method, that is only present in the `base.data_set.mrnaassaydataset.MrnaAssayDataSet` class, which handles traits of type "ProbeSet".
The code seems to imply that we should not run literature correlations against any dataset that is not of type "ProbeSet".
The `target_dataset` is not used in the
In my (fredm) work on partial correlations, before doing the computations,
that were run.
Should these be present for the full correlations too?
The failures above with the Publish/Genotype datasets implies one of two things:
Better yet, we should probably not present invalid data to the user, i.e. do not present user with a dataset which would lead to errors if a correlation of a particular type is run against it with the given trait.
@zsloan @alexm: Running the failing tissue and literature correlations above with the same trait against the "BXD Published Phenotypes" and the "BXD Genotypes" on
I got the error
Wrong correlation type Sorry! Error occurred while processing your request. The nature of the error generated is as follows: Correlation Type Error : It is not possible to compute the Tissue Correlation (Pearson's r) between your trait and data in this BXDGeno database. Please try again after selecting another type of correlation.
for the tissue correlations and
Wrong correlation type Sorry! Error occurred while processing your request. The nature of the error generated is as follows: Correlation Type Error : It is not possible to compute the SGO Literature Correlation between your trait and data in this BXDPublish database. Please try again after selecting another type of correlation.
for the literature correlations.
My initial hunch was correct. We should not be running the tissue and literature correlations in the way we were in the cases above.
We now need to check for these combinations and display an error for the user, as is done in GN1
The error reported above
raise KeyError( KeyError: "'1' information is not found in the database for dataset 'HC_M2_0606_P' with id '112'."
causes the correlation below to fail for maintainability and to fix current bugs this code that does preprocessing of data needs to be modified thats is :-