There are a number of errors in the database, which we try to track in this issue. These are best fixed directly in the database rather than by working around them in code.
Some LRS values in the database are wrongly listed as 0.000. They should be NULLs.
MariaDB> SELECT COUNT(*) FROM ProbeSetXRef WHERE LRS=0; +----------+ | COUNT(*) | +----------+ | 237327 | +----------+ MariaDB> SELECT COUNT(*) FROM PublishXRef WHERE LRS=0; +----------+ | COUNT(*) | +----------+ | 1687 | +----------+
Some years are blank strings or strings that contain text other than the year. These should be fixed and the year field must be made an integer.
MariaDB> SELECT COUNT(*) FROM Publication WHERE CONVERT(Year, UNSIGNED)=0; +----------+ | COUNT(*) | +----------+ | 854 | +----------+
At least one e-mail IDs has spaces in it.
MariaDB> SELECT * FROM Investigators WHERE Email LIKE '% %';
At least one e-mail ID is blank.
MariaDB> SELECT * FROM Investigators WHERE Email="";
At least one last name is encoded incorrectly and appear as unprintable characters.
MariaDB> SELECT LastName FROM Investigators WHERE FirstName='Yohan';
The list of authors is not always a strictly comma-separated string. Sometimes, there is an "and" at the end instead of a comma.
MariaDB> SELECT Authors FROM Publication WHERE Authors LIKE '% and %';
Sometimes, not all authors are listed by name. They are only mentioned as "and colleagues".
MariaDB> SELECT Authors FROM Publication WHERE Authors LIKE '%colleagues%';
At other times, there is a trailing comma at the end of the string.
MariaDB> SELECT Authors FROM Publication WHERE Authors LIKE '%,';