Edit this page | Blame

Some correlations running very slowly

Tags

  • assigned: zsloan, pjotrp
  • type: bug
  • priority: high
  • status: in-progress
  • keywords: correlations, database

Description

Some correlations (it specifically seems to be ones done against ProbeSet databases) are running extremely slowly.

After looking into this, the cause seemed to be a specific query*, which leads me to think it might be related to the recent DB changes. All the actual calculations only take a couple seconds, but then this query takes like ~80 seconds.

  • https://github.com/genenetwork/genenetwork2/blob/98602d24c64ffafe2c4af150236b72f77709f8de/wqflask/wqflask/correlation/rust_correlation.py#L23-L55

The suspect is the mixing of innodb and myisam tables after a recent migration of tables. ProbeSet is one I haven't converted yet because it has a textual search field.

Info

In the /var/log/mysql/mysql-slow.log slow queries started to appear after converting more tables to innodb. Such as:

SELECT ProbeSet.Name,ProbeSet.Chr,ProbeSet.Mb,
                    ProbeSet.Symbol,ProbeSetXRef.mean,
                    CONCAT_WS('; ', ProbeSet.description, ProbeSet.Probe_Target_Description) AS description,
                    ProbeSetXRef.additive,ProbeSetXRef.LRS,Geno.Chr, Geno.Mb
                    FROM ProbeSet INNER JOIN ProbeSetXRef
                    ON ProbeSet.Id=ProbeSetXRef.ProbeSetId
                    INNER JOIN Geno
                    ON ProbeSetXRef.Locus = Geno.Name
                    INNER JOIN Species
                    ON Geno.SpeciesId = Species.Id
                    WHERE ProbeSet.Name in ('10606948', '10434845', '10548978', '10586110', '10429520', '10477061', '10434719', '10467400', '10414360', '10466947', '10569319', '10363000', '10583056', '10478525', '10419038', '10583044', '10540897', '10456392', '10590909', '10347741', '10483381', '10560131', '10607467', '10539472', '10493990', '10454015', '10403352', '10457640', '10468869', '10449940', '10364675', '10461277', '10578427', '10585010', '10497817', '10514763', '10528207', '10430883', '10567355', '10547758', '10557459', '10512145', '10593174', '10483410', '10554240', '10349904', '10419223', '10548375', '10498076', '10463027', '10430892', '10578649', '10604528', '10600210', '10506454', '10570280', '10402353', '10500545', '10363231', '10446596', '10574498', '10458016', '10414262', '10452639', '10404077', '10372421', '10584787', '10578448', '10389231', '10419156', '10545895', '10366707', '10338580', '10598175', '10569890', '10473367', '10459530', '10559509', '10582837', '10453939', '10546929', '10425808', '10477012', '10560045', '10520371', '10577645', '10352439', '10443021', '10570291', '10513529', '10469923', '10584334', '10419154', '10508986', '10469322', '10388718', '10541301', '10500547', '10428018', '10490923', '10585331', '10474129', '10424245', '10567335', '10513818', '10397145', '10388869', '10562500', '10594812', '10540273', '10395277', '10379044', '10558134', '10407072', '10506301', '10441787', '10452030') AND
                    Species.Name = 'mouse' AND
                    ProbeSetXRef.ProbeSetFreezeId IN (
                      SELECT ProbeSetFreeze.Id
                      FROM ProbeSetFreeze WHERE ProbeSetFreeze.Name = 'EPFLMouseLiverHFDRMA0818');

also, the standard test-gsearch query has started to time out - and that shows up in sheepdog as

2023-03-07 07:01:42 +0000	FAIL	tux01	mariadb-gsearch

I created a new test script that can be run as

tux01:/export/backup/scripts/tux01$ time mysql -u webqtlout -p db_webqtl < ../shared/sql/test-regression1.sql
real    0m29.855s

The good news is that after converting ProbeSet to innodb time changed to

real    0m0.012s
(made with skribilo)