Edit this page | Blame

Tasks for Felix

Tags

  • kanban: felixl
  • assigned: felixl
  • status: in progress

Tasks

Goals

1. Write papers for PhD 2. Load data into GN - serve the communities 3. Get comfortable with programming

#### Previous week(s)

  • [x] Restless Legs Syndrome (RLS) - 'Traditional Phewas' - AI aspect - Johannes
  • [+] Finalize the slide deck - so it can be read on its own
  • [.] Review paper: one-liners for @pjotrp - why is this important for GN and/or thesis
  • - [ ] list of relevant papers with one-liners - the WHY
  • [+] Analyse and discuss BXD case attributes with Rob --- both group level and dataset level
  • [+] Sane representation of case attributes in RDF with @bonfacem
  • [X] Present C.elegans protocol and example mappings with GEMMA/Rqtl
  • [X] Uploader - setting up code with @fredm
  • - [ ] Concrete improvement to work on
  • - [X] run small database mysql locally
  • - [X] aider with Sonnet + code fixes
  • - [ ] document - add to code base - merge with Fred's tree - share changes with Pjotr & team
  • [X] Sort @alexm application with Pwani = this week

This week (07-04-2025 onwards)

  • GN2 tasks
  • [x] Progress on Kilifish
  • [X] - meet with Dennis (send him an email with all the queries needed)
  • [-] - progress to format and upload data to gn2 (to be ready by latest Friday!)
  • [x] Make a milestone with genotype smoothing
  • PhD tasks

* [X] Complete and share concept note and timeline to supervisors, have a meeting for progress * [X] Make a milestone on chapter one manuscript (deep dive into the selected papers){THE BIG PICTURE; a complete draft by early May} * [X] Complete and share concept note and timeline to supervisors, have a meeting for progress

  • Programming

* [x] Make a milestone with the uploader (really push and learn!) - documentation (use ai); add to the code base of the uploader - utilise the hurdles to learn programming priniciples in action

  • AOBs
  • [X] Weekly meetings
  • [X] follow up with Paul on his progress
  • [X] follow up on the MSc bioinformatics project
  • [X] follow up on Alex's application with Pwani

This week (14-04-2025 onwards)

  • gn-uploader programming
  • [X] - Resolve the config file issue with your local uploader
  • [X] - Run the uploader locally, then break the system, see how components connect to each other
  • [X] - document your findings
  • genotype smoothing
  • [X] - resolve errors with plotting, document your findings

This week (21-04-Onwards)

  • genotype smoothing
  • [X] - haplotyping tools for smoothing (plink,., etc) :IN PROGRESS:
  • - see what it can offer with smoothing. See what others say about this.
  • - [X] check the original genotypes and compare with the ones in gn2 {done}
  • - [x] inspect the Xsome column order in comparison with the snp positioning {done}
  • - [x] adapt the plink algorithm to fit your dataset format {done}
  • - consider inspeting the phenotype file too. {in progress}
  • gn-uploader programming
  • [X] - Run the uploader locally, then break the system, see how components connect to each other (ask help from Bonz)
  • [X] - document your findings

Previous week (28-04-Onwards)

  • gn-uploader programming
  • [X] - Run the uploader locally, then break the system, see how components connect to each other (ask help from Bonz)

- start simple, read the script files (one at a time, take your time to understand how it flows) - In running the uploader, consider pair programming to save time with solving issues that your teamates can solve in hours, as they take you days to solve

  • [X] - document your findings

{ Get help from your teammates/AI to jump start this!, swallow your pride! :(}

  • genotype smoothing
  • [X] Keep refining the following:
  • [X] filtering power adapted from plink
  • - the low the r2 value, the strictier the filtering..,
  • [X] the xsomes mix up in the plot (probably the phenotype data?)
  • - individual ids in phenotype data was not in sync with the genotype data
  • [X] Update findings and push to github

This week (05-05-Onwards)

  • programming (gn-uploader)
  • [X] - pick one file each day, review it, understand it
  • [X] - pair programming with Alex on test runs
  • HS rats scripts
  • [X] - prepare/refine scripts to quickly process HS rats file (in progress)
  • [X] - memory hurdles, goal to simplify running script
  • [X] - assist alex with hs rats cross info

(12-05-onwards)

* [X] - HS genotypes scripting

(19-05-onwards)

* [X] - HS genotypes debugging (memory issue) * [X] - pair programming with Bonz to improve the script

this week (26-05-onwards)

* [X] - process the genotype file for hs rats * [X] - approach by tissues categories * [X] - adipose and liver - test by Xsomes for memory capture - run the working commands * [X] - the rest 10 other tissues (in progress) * [X] - *.bed file vs the updated vcf files from the website?

  • ## this week (09-06-onwards)
  • [+] - identify start and end points for haplotypes in hs genotype files
  • - checked the rembination densities first, still need more comprehension (arrange a meeting with Rob)
  • [X] - upload the final updates to gn2, test and see the results
  • [+] - gn-uploader/uploader folder, explore
  • ## this week (16-06-onwards)
  • [X] - hs rats proximal and distal haplotype edges
  • [+] - uploading kilifish using the backend route
  • ## this week (23-06-onwards)
  • [X] - hs rats recombination counts
  • [+] - mapping offsprings to founders
  • - examining founders
  • [-] - kilifish to gn2 via backend
  • ## this week (30-06-onwards)
  • [X] - mapping offsprings to founders (hs rats)
  • [+] - upload kilifish to genenetwork
  • [-] - revise celegans smoothing (genotypes)
  • ## this week (23-06-onwards)
  • [X] - hs rats recombination counts
  • [+] - kilifish to gn2 via backend
  • ## this week (30-06-onwards)
  • [X] - mapping offsprings to founders (hs rats)
  • [X] - upload kilifish to genenetwork
  • [X] - revise celegans smoothing (genotypes)
  • ## this week (07-07-onwards)
  • [X] - generate haplotypes for offsprings and founders combined; intepretation next..,
  • [+] - keep improving the uploader via data uploading and error solving
  • [-] - close smoothing revision for celegans, as left before
  • [X] - why should people read my paper on improving genotyping methods?
  • - on smoothing (low density genotypes for mapping, high density genotypes for fine mapping.,)
  • - liftovers due to reference versions (currently, a challenge to be looked upon)
  • - founders and their offsprings in genotyping
  • - pangenomics and machine learning for improved genotyping

** keys (+; in progress, X; done, -; not yet)

  • ## this week (14-07-onwards)
  • [+] - map founders to offspring, work with only pure recombiantions

[+] - tools available? (plink, rqtl2, beagle, etc) [+] - custom pipeline, to reflect gaps in the existing tools? (dealing with multiparent species) [+] - documentation for the paper write up

  • ## this week (21-07-onwards)
  • [X] - HS rats smoothing continues
  • [+] - documenting the milestones
  • [+] - see the possibility to write a tool from it
  • [-] - Pushing kilifish to genenetwork2/learn the source code build up
  • [-] - resmoothen celegans genotypes with the new knowledge
  • ## this week (28-07-onwards)
  • [-] - predict genotype probabilities with rqlt2 functions

- problems with control setup to load in the needed files for the functions

  • [+] - comparison models for @individual rat vs 8 founders (similarities and percentage composition)

[+] - ongoing discussion with alex, there's progress

  • ## this week (04-08-onwards)
  • [+] - Testing the logic to infer Hs outbred genotypes with the founders

- Managed to identify parents of origin for each snp on each rat per position, corresponding to the 8 founders - Still, need to filter in the disntictive snps, then generate haplo blocks.,

  • ## this week (11 - 08 - onwards)
  • [X] - generate final haplo file and document
  • [+] - testing on local gemma and in gn2
  • ## this week (18-08-onwards)
  • [+] - push for the file to be in gn2, and feedback from the team
  • [X] - complete the local gemma run, interpret the results
  • [+] - process the rest of the Xsomes for a ready file to go to gn2
  • - issues: over filtering snps, neglecting the one parent of origin, takes long to run.
  • [+] - prepare an abstract for CTC conference in Barcelona
  • ## this week (01-09-onWards)
  • [+] - finetune abstract
  • - include more of what i achieved: main focus; genotype smoothing on models with complex traits
  • - thought map: generate plots, compare before and after smoothing, check for overlaps, and whether or not the peaks in traits are same before and after smoothing
  • [X] - troubleshoot inferring scripts for all Xsomes
  • - request bonz/alex's help on this (to save time)
  • ## this week (08-09-onwards)
  • [X] - generate hs haplotype final file
  • - push it to gn2 (with Zach and Arthur's help)
  • [X] - Mapping experiments with the original vs smoothed genotype file on all phenotypes selected
  • [X] - Abstract write up on the results
  • ## this week (15-09-2025)
  • [+] - Arabidopsis is next after HS rats, then Kilifish: Bigger picture, publications..,
  • [X] - Getting HS genotypes and phenotypes to gn2
  • [+] - Mapping experimentation with gn2 tools, share results
  • [+] - Prepare metadata corresponding to the hs data in gn2
  • [X] - refine scripts used for analysis for reproducibility purposes
  • ## this week (23-09-2025)
  • [X] - hs metadata
  • [+] - Arabidopsis dataset
  • ## this week (30-09-2025)
  • [+] - Arabidopsis phenotypes; gemma plots
  • - testing the prepared data with gemma, see what the plots communicate
  • [+] - hs to genenetwork2 testing

- compute against randomized data; 1000 permutation (Gary Churchill); LOD score of 4.0 (p- value 10^4) - precompute, by Pj, check on this.., - qtl results, more suggestive than absolute..,

To New Beginnings

  • ## this week (02-03-onwards)
  • ### Date: 02-03-2026
  • [+] - Get the qtl2_hs_pipeline working | test runs
  • [+] - clear definition of the root problems to the script
  • [+] - test runs to get quick feedback
  • [+] - involve the ai-lab team for more help
  • - pending for now, (waiting to sync in with Alex and/or Bonz for team effort trouble shooting)
  • ### Date: 03-03-2026
  • [+] - Kilifish data to be ready for upload
  • [X] - classical phenotypes and/or expression traits
  • - already done with the expression data, next is the classical phenotypes. Processed files in my tux02 file path:
  • => "/home/felixl/felixl/Kilifish_2026/data/expression/"
  • [X] - genotypes; smoothing? Kilifish Genetics?
  • - Kilifish genotypes were from the F2 generation. Most of the pre-processing was already done. One thing left was to check for recombination transitions, and filter markers that are close with less than 10kb distance apart.
  • - File path to the genotypes:
  • => "/home/felixl/felixl/Kilifish_2026/data/genotypes/"
  • - File path to the scripts used:
  • => "/home/felixl/felixl/Kilifish_2026/codes/kilifish_recombination_pipeline.py"
  • ### Date: 04-03-2026
  • [X] - classical phenotypes
  • [+] - metadata; annotation
  • [X] - description information for the classical phenotypes--adapting to the new format
  • [+] - relevant annotation information for the expression and proteomics data
  • - So, with the annotation file, one observation is made, the gene names are duplicated for the most part, although the marker positions are unique. Something worth looking into keenly.
  • ### 05-03-2026
  • [+] - Still working on the annotation file. Requires careful processing as the information will be used for search in GN2; a continuation for the task started yesterday.
  • [] - Strain, Xref, Case attributes, and group menu information, since it is a new species added
  • [] - local gemma testing
  • [] - Final checks with @Arthur et al, proceed to upload
  • [] - metadata; annotation
  • [] - Strain, Xref, Case attributes
  • [] - local gemma testing
  • [] - Final checks with @Arthur et al, proceed to upload
  • Reviewing/writing papers
  • [+] - refine the overview manuscript
  • [+] - the innovation touch to be clearly defined
  • - re-reading the manuscript I wrote, state of the art overview, I am getting a different perspetive into linking the technical information written versus the missing gaps highlighting the future of genotyping with AI quick advancement.
  • Reviewing/writing papers
  • [+] - refine the overview manuscript
  • [+] - the innovation touch to be clearly defined
  • - re-reading the manuscript I wrote, state of the art overview, I am getting a different perspetive into linking the technical information written versus the missing gaps highlighting the future of genotyping with AI quick advancement.
  • ### 09-03-2026
  • [+] - paper writing:
  • [] - paper writing:
  • - proceeded with literature search. Here's the criteria I use;
  • => {provide link to your documentation}
  • - So, ideally, I focus on methods based literature, then model organism based literature, 2020 to present.
  • - Excluding anything that involves humans.
  • - Key literature repos include: PubMed, Google scholar, Web of scope, BMC bioinfo, etc.
  • - I will need a week to complete literature search and curation, then move on review and extract relevant infor for my write up
  • - writing criteria focuses on methods first, and results of the ongoing hs and kilifish projects, as case studies. Then, other sections follow, something I missed on my previous write up.
  • ### 11-03-2025
  • [+] - Kilifish annotation file.
  • - retrieved ensembl ids for the kilifish gene names; ~6k out of 20k unique ensembl gene id matches (as the actual data has redundant), ~17k for transcript ids for 20k unique gene names, and 250k exon ids for the 20k gene names. The idea is to filter and combine these ids for all [unique gene names + exons/cds + (start, end position)], so as to end up with a complete column of record ids. (suggestive approach..,)
  • - ensembl ids are crucial for the generation of record ids, useful in trait search in GeneNetwork.
  • - added associated information include gene aliases (alternative names of the gene names, entrez gene ids, and descriptions. All these fields are important in that they provide detailed descriptions of each trait under of interest in GeneNetwork.
  • - Next course of action,
  • - Sent an email to the Kilifish team to sync with them on what they in store with respect to the highlighted fields above; status:DONE
  • - Proceed with custom modifications for the gene names missing the ensembl ids. (probably use ensembl transcripts and/or exons ids) status:IN PROGRESS (already have the ids in place)
  • - Here's the link to the intermediate files for ensembl {gene, transcript, and exon ids}
  • => /home/felixl/felixl/Kilifish_2026/data/expression/annotation/tmp

Later weeks (2026 plan)

  • [+] Reverse Genotyping => (ML + Pan-Genome) genotyping
  • [+] - a paper from this experimentation
  • [+] Kilifish into GN
  • [+] Review paper on genotyping
  • [X] HS Rat
  • [-] Prepare others for C.elegans
  • [] Upload Arabidopsis dataset
  • [] Upload Medaka dataset
  • [-] Work on improved DO and Ce genotyping

Done

On going tasks

Rank-ordered list of on-going tasks:

Stalled (To Be Done/Completed)

Unclear Issues

Ad-hoc issues that were picked some where some how:

Closed Issues

Should something in one of these closed issues be amiss, we can always and should re-open the offending issue.

Currently closed issues are:

(made with skribilo)