Transliteration - Dealing with Unicode data

The mAPI is fully able to handle Unicode data.  It does this by transliterating Unicode characters into characters from the Latin1 code page (Windows-1252) so that they can be processed by the mAPI core.  The Latin1 code page is generally used by Western European languages, including English, Spanish, French, and German.

Transliteration is not the same as translation, in which words are converted from one language to another; when transliterating, it’s the characters themselves that are converted from one alphabet to another.  For example, the Chinese character 昌 means “prosperous” and is pronounced “chang”, and the Chinese character 李 means “plum” and is pronounced “li”. Transliteration converts the Chinese name 昌李 into “chang li” (translation would convert this to “prosperous plum”).

Characters from alphabets such as Cyrilic (languages include Russian) and scripts such as CJK (Chinese, Japanese, Korean) can be input into mSQL or mHUB. Transliteration will be performed in two places:

  1. When match keys are generated and output via your deployment method (ie in mSQL or mHUB via BulkGenerateKeys stored procedure or GenerateKeys SSIS task (this produces output using Latin1 characters));
  2. When records are compared ie: in mSQL or mHUB via the FindMatches and FindOverlap 

*transliteration is currently not available through our Windows workstation product.