1 minute read
Character Encoding Issues
allow for the use of the special formatting markup characters (%i, %k, etc.) in the source text/CSV data. For the latter, see the section Using Text Formatting Within Attributes, in the chapter on Editing Attributes. Also note that when importing several columns, the data can immediately be placed in the correct fields of the DTD. For the latter, see the chapter on Customising the Dictionary Grammar using the DTD. Note also that some software, such as older versions of Excel, are not able to save Unicode/UTF8 format CSV files.
Character Encoding Issues
Advertisement
Pay particular attention to characters with diacritic markers or accents (e.g. “ê”, “é”) when⚠️ importing data, and double-check they import correctly.
If you find that some (or all) characters don’t import correctly, please try open the CSV file in a normal text editor (e.g. Windows Notepad), and re-save it with encoding settings as “UTF-8 with signature” (also called “UTF-8 with marker” or “UTF-8 with beginning of file marker” or “UTF-8 with BOM”), then try import it again. The reason for this is that some versions of software like Excel may export CSV files in a format lacking this special “marker” which TLex/tlTerm/tlDatabase uses to recognize the text as ‘standard’ UTF-8-encoded Unicode. Without this marker, TLex/tlTerm/tlDatabase may try read the file using your system’s default encoding (typically some legacy encoding, e.g. Windows-1252), which will produce the wrong results if it’s a UTF-8 text file. You should also be able to use ‘UTF-16’ Unicode text formats.