Case-Attributes are essentially the metadata for the samples. In the GN2 system, they are the extra columns in the table in the "Reviews and Edit Data" accordion tab besides the value and its error margin.
To quote @zachs
"Case Attributes" are basically just sample metadata. So stuff like the sex, age, etc of the various individuals (and exist separately from "normal" traits mainly because they're non-numeric)
They are the metadata for the various sample in a trait. The case attributes are determined at the group-level:
Since they're metadata (or "attributes" in this case) for samples, they're group-level so for BXD, case attributes would apply at the level of each sample, across all BXD data
Also From email:
Every strain has a unique attribute and it's fixed, not variable.
We need to differentiate two things:
As is currently implemented (as of before 2023-08-31), both the labels and values are set at group level.
A look at
is a good starting point to help with understanding how case-attributes were implemented and how they worked.
There is code that existed for the case-attributes editing, but it had a critical bug where the data for existing attributes would be deleted/replaced randomly when one made a change. This lead to a pause in this effort.
The chosen course of action will, however, not make use of this existing code. Instead, we will reimplement the feature with code in GN3, exposing the data and its editing via API endpoints.
The existing database tables of concern to us are:
We can fetch case-attribute data from the database with:
SELECT caxrn.*, ca.Name AS CaseAttributeName, ca.Description AS CaseAttributeDescription, iset.InbredSetId AS OrigInbredSetId FROM CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn ON ca.Id=caxrn.CaseAttributeId INNER JOIN StrainXRef AS sxr ON caxrn.StrainId=sxr.StrainId INNER JOIN InbredSet AS iset ON sxr.InbredSetId=iset.InbredSetId WHERE caxrn.value != 'x' AND caxrn.value IS NOT NULL;
which gives us all the information we need to rework the database schema.
Since the Case-Attributes are group-level, we need to move the `InbredSetId` to the `CaseAttribute` table from the `CaseAttributeXRefNew` table.
For more concrete relationship declaration, we can have the `CaseAttributeXRefNew` table have it primary key be composed of the `InbredSetId`, `StrainId` and `CaseAttributeId`. That has the added advantage that we can index the table on `InbredSetId` and `StrainId`.
That leaves the `CaseAttribute` table with the following columns:
while the `CaseAttributeXRefNew` table ends up with the following columns:
There will not be any `NULL` values allowed for any of the columns in both tables. If a strain has no value, we simply delete the corresponding record from the `CaseAttributeXRefNew` table.
... (and exist separately from "normal" traits mainly because they're non-numeric)
The values for Case-Attributes are non-numeric data. This will probably be mostly textual data.
As an example:
we see Case-Attributes as:
though that might be a misunderstanding of the quote
In the following link for example, every column after Value is a case attribute - https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish
**TODO**: Verify whether `N` and `SE` are Case-Attributes
it's probably not okay to let anyone who can edit sample data for a trait also edit case attributes, since they're group level
and from matrix:
The weird bug aside, Bonface had (mostly) successfully implemented editing these through the CSV files in the same way as any other sample data, but for authorization reasons this probably doesn't make sense (since a user having access to editing sample data for specific traits doesn't imply that they'd have access for editing case attributes across the entire group)
From this, it implies we might need a new set of privileges for dealing with case-attributes, e.g.
Considering, however, that groups (InbredSets) are not directly linked to any auth resource, this might mean some indirection of sorts, or maybe add a new resource type that handles groups.
Strains/samples are shared across traits. The values for the case attributes are the same for a particular strain/sample for all traits within a particular InbredSet (group).
Zachary SloanZ I'm pretty sure multiple phenotypes and mRNA datasets can belong to the same experiment (and definitely for the purposes of case attributes since the mRNA datasets are split by tissue genotype traits should all be considered part of the same "experiment" (at least as long as we're still only databasing a single genotype file for each group) pjotrp : Case attribute editing will still need to be group level, at least until the whole feature is completely changed. Since they're basically just phenotypes we choose to show in the trait page table, and phenotypes are at the group level
Zachary SloanZ 21:14 Groups are defined by their list of samples/strains, and the "case attributes" are just "the characteristics of those samples/strains we choose to show on the trait page" (if we move away from the "group" concept entirely that could change, but if we did that we probably would also replace "case attributes" with something else because the way that's implemented is kind of weird to begin with) ZB