Edit this page | Blame

Fix Broken UTF-8 characters in our Database

Tags

  • assigned: bonfacem, arthur
  • type: database
  • priority: high

Description

We have jumbled up text in our database and this has been the case for years. It's impractical for a user to do the fixes using the metadata editing form because there are too many cases. A script that fixes this should be created to fix this issue.

This thread has some really nice ideas

An example of a broken unicode character is: "??????". The character "??????" appears broken because it is not a valid Unicode character. This can happen for a number of reasons, such as a mistake when typing or pasting the character, corruption during transmission (most likely the case) or storage, or a lack of support for the character in the font or software being used to display the text.

To find the correct replacement for the character "??????", or any other character for the matter, you can look up its Unicode code point. In this case, the code point for "??????" is "U+2273", which corresponds to the character "???". You can then use this code point to search for and replace the broken character with the correct character in the text.


(made with skribilo)