Editing Case-Attributes

Introduction

Case-Attributes are essentially the metadata for the samples. In the GN2 system, they are the extra columns in the table in the "Reviews and Edit Data" accordion tab besides the value and its error margin.

To quote @zachs

"Case Attributes" are basically just sample metadata. So stuff like the sex, age, etc of the various individuals (and exist separately from "normal" traits mainly because they're non-numeric)

They are the metadata for the various sample in a trait. The case attributes are determined at the group-level:

Since they're metadata (or "attributes" in this case) for samples, they're group-level so for BXD, case attributes would apply at the level of each sample, across all BXD data

Also From email:

Every strain has a unique attribute and it's fixed, not variable.

Direction

We need to differentiate two things:

Case-Attribute labels/names/categories (e.g. Sex, Height, Cage-handler, etc)
Case-Attribute values (e.g. Male/Female, 20cm, Frederick, etc.)

As is currently implemented (as of before 2023-08-31), both the labels and values are set at group level.

A look at

Case-Attributes on GeneNetwork1

is a good starting point to help with understanding how case-attributes were implemented and how they worked.

Status

There is code that existed for the case-attributes editing, but it had a critical bug where the data for existing attributes would be deleted/replaced randomly when one made a change. This lead to a pause in this effort.

The chosen course of action will, however, not make use of this existing code. Instead, we will reimplement the feature with code in GN3, exposing the data and its editing via API endpoints.

Database

The existing database tables of concern to us are:

InbredSet
CaseAttribute
StrainXRef
Strain
CaseAttributeXRefNew

We can fetch case-attribute data from the database with:

SELECT
	caxrn.*, ca.Name AS CaseAttributeName,
	ca.Description AS CaseAttributeDescription,
	iset.InbredSetId AS OrigInbredSetId
FROM
	CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn
	ON ca.Id=caxrn.CaseAttributeId
INNER JOIN
      StrainXRef AS sxr
      ON caxrn.StrainId=sxr.StrainId
INNER JOIN
      InbredSet AS iset
      ON sxr.InbredSetId=iset.InbredSetId
WHERE
	caxrn.value != 'x'
	AND caxrn.value IS NOT NULL;

which gives us all the information we need to rework the database schema.

Since the Case-Attributes are group-level, we need to move the `InbredSetId` to the `CaseAttribute` table from the `CaseAttributeXRefNew` table.

For more concrete relationship declaration, we can have the `CaseAttributeXRefNew` table have it primary key be composed of the `InbredSetId`, `StrainId` and `CaseAttributeId`. That has the added advantage that we can index the table on `InbredSetId` and `StrainId`.

That leaves the `CaseAttribute` table with the following columns:

InbredSetId: Foreign Key from `InbredSet` table
Id: The CaseAttribute identifier
Name: Textual name for the Case-Attribute
Description: Textual description fro the case-attribute

while the `CaseAttributeXRefNew` table ends up with the following columns:

InbredSetId: Foreign Key from `InbredSet` table
StrainId: The strain
CaseAttributeId: The case-attribute identifier
Value: The value for the case-attribute for this specific strain

There will not be any `NULL` values allowed for any of the columns in both tables. If a strain has no value, we simply delete the corresponding record from the `CaseAttributeXRefNew` table.

To that end, the following script has been added to ease the migration of the table schemas:

https://github.com/genenetwork/genenetwork3/blob/dd0b29c07017ec398c447ca683dd4b4be18d73b7/scripts/update-case-attribute-tables-20230818

The script is meant to be run only once, and makes the changes mentioned above for both tables.

Data Types

... (and exist separately from "normal" traits mainly because they're non-numeric)

The values for Case-Attributes are non-numeric data. This will probably be mostly textual data.

As an example:

Trait Data and Analysis for BXD_10010

we see Case-Attributes as:

Free-form text (no constraints) - see the `Status` column
Enumerations - textual data, but where the user can only pick from specific values
Links - The value displayed also acts as a link - e.g. the 'JAX:*' values in the `RRID` column

For this trait

We see:

Numeric data - see the `N` and `SE` columns

though that might be a misunderstanding of the quote

In the following link for example, every column after Value is a case attribute - https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish

**TODO**: Verify whether `N` and `SE` are Case-Attributes

Authorisation

From email:

it's probably not okay to let anyone who can edit sample data for a trait also edit case attributes, since they're group level

and from matrix:

The weird bug aside, Bonface had (mostly) successfully implemented editing these through the CSV files in the same way as any other sample data, but for authorization reasons this probably doesn't make sense (since a user having access to editing sample data for specific traits doesn't imply that they'd have access for editing case attributes across the entire group)

From this, it implies we might need a new set of privileges for dealing with case-attributes, e.g.

group:resource:add-case-attributes - Allows user to add a completely new case attribute
group:resource:edit-case-attributes - Allows user to edit an existing case attribute
group:resource:delete-case-attributes - Allows user to delete an existing case attribute
group:resource:view-case-attributes - Allows user to view case attributes and their value

Considering, however, that groups (InbredSets) are not directly linked to any auth resource, this might mean some indirection of sorts, or maybe add a new resource type that handles groups.

Features

Editing existing case-attributes: YES
Adding new case attributes: ???
Deleting existing case attributes: ???

Strains/samples are shared across traits. The values for the case attributes are the same for a particular strain/sample for all traits within a particular InbredSet (group).

Related and Unsynthesised Chats

https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$myIoafLp_dIONnyNvEI0k2xf3Y8-LyiI_mkP2vBN08o?via=matrix.org

Zachary SloanZ
I'm pretty sure multiple phenotypes and mRNA datasets can belong to the same experiment (and definitely for the purposes of case attributes
since the mRNA datasets are split by tissue
genotype traits should all be considered part of the same "experiment" (at least as long as we're still only databasing a single genotype file for each group)

pjotrp
: Case attribute editing will still need to be group level, at least until the whole feature is completely changed. Since they're basically just phenotypes we choose to show in the trait page table, and phenotypes are at the group level

https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$P6SNnpY-nAZsDr3VZlRi05m6MT32lXBsCl-BYLh-YLM?via=matrix.org

Zachary SloanZ
21:14
Groups are defined by their list of samples/strains, and the "case attributes" are just "the characteristics of those samples/strains we choose to show on the trait page" (if we move away from the "group" concept entirely that could change, but if we did that we probably would also replace "case attributes" with something else because the way that's implemented is kind of weird to begin with)
ZB

Related issues

References

/topics/data-uploads/datasets