Character Encoding Issues

from User Guide for Lexicography and Terminology Software TLex and tlTerm

Multi-Document Smart-Search

allow for the use of the special formatting markup characters (%i, %k, etc.) in the source text/CSV data. For the latter, see the section Using Text Formatting Within Attributes, in the chapter on Editing Attributes. Also note that when importing several columns, the data can immediately be placed in the correct fields of the DTD. For the latter, see the chapter on Customising the Dictionary Grammar using the DTD. Note also that some software, such as older versions of Excel, are not able to save Unicode/UTF8 format CSV files.

Character Encoding Issues

Pay particular attention to characters with diacritic markers or accents (e.g. “ê”, “é”) when⚠️ importing data, and double-check they import correctly.

If you find that some (or all) characters don’t import correctly, please try open the CSV file in a normal text editor (e.g. Windows Notepad), and re-save it with encoding settings as “UTF-8 with signature” (also called “UTF-8 with marker” or “UTF-8 with beginning of file marker” or “UTF-8 with BOM”), then try import it again. The reason for this is that some versions of software like Excel may export CSV files in a format lacking this special “marker” which TLex/tlTerm/tlDatabase uses to recognize the text as ‘standard’ UTF-8-encoded Unicode. Without this marker, TLex/tlTerm/tlDatabase may try read the file using your system’s default encoding (typically some legacy encoding, e.g. Windows-1252), which will produce the wrong results if it’s a UTF-8 text file. You should also be able to use ‘UTF-16’ Unicode text formats.

Character Encoding Issues

Next Article

Multi-Document Smart-Search

Character Encoding Issues

More articles from this publication:

Multi-Document Smart-Search

“Smart Styles” (Dynamically Customisable Styles) [Advanced

Configuring Masks

Style Sets, Or ‘One Database, Many Dictionaries’

Automatic Numbering

Important Note for 64-bit Windows Clients

Creating a Thesaurus

Multi-pass (Multi-stage) Transforms

Exporting a Range of Entries or a Single Alphabetic Section, e.g. “A”

This article is from:

User Guide for Lexicography and Terminology Software TLex and tlTerm