Introduction

The mHUB knowledge base provides information for developers and integrators using mHUB, whether it's a new application being implemented or whether it's an existing application that's being modified to make use of mHUB.

 

What is mHUB?

mHUB is an 'engine' accessible via a single API with a small number of methods. This is provided with interfaces for these programming languages: C, C++, Java, and .NET (including C# and Visual Basic .NET). Whilst there are some minor subtle differences between these, the interfaces are nonetheless fundamentally identical. There is a separate reference guide for each interface. (An application written in other languages - such as Python or PHP - can still use mHUB via the C-compatible interface; whilst beyond the scope of this documentation, SWIG (http://www.swig.org) can be used to generate an interface for many other programming languages.)

A mHUB engine is configured via settings in an XML-formatted string. The string can be read from an XML file, it can be embedded within an application, or it can be dynamically created at runtime. Please refer to the separate Configuration Guide for details on all available configuration settings, and how to create and customize them.

Data can be passed in from any data source, and similarly output to any data source. mHUB itself doesn't provide any data access functionality; for example, it doesn't read data from a database or a text file, and neither does it write results back to these. Instead, input data is fed into the mHUB engine one record at a time in the form of a delimited string of data, and results are similarly returned individually as a delimited string.

 

What can mHUB be used for?

mHUB can currently be used in scenarios such as these:

  1. Matching: Finding duplicate records in a single source of data (for removing duplicate contacts from a master table, for example);
  2. Overlap: Finding duplicate records that intersect two sources of data (for example, identifying contacts to be suppressed from a mailing, for updating contact details, or for merging contacts into a master table);
  3. Lookup: Searching reference data for records that match a specified record (for example, in an online duplicate prevention application).
  4. Normalization: Outputting a 'normalized' copy of the input data. Normalization processes include casing, standardization, name parsing, generation of salutations, extraction of address elements, and parsing of email addresses. Phonetic match keys can also be output, for advanced matching and diagnostic purposes.

Additional functionality will be added to mHUB in future versions.

 

Unicode data

mHUB is fully able to handle Unicode data. It does this by transliterating Unicode characters into characters from the Latin1 code page (Windows-1252) so that they can be processed by the underlying matching engine. The Latin1 code page is generally used by Western European languages, including English, Spanish, French, and German.

Transliteration is not the same as translation, in which words are converted from one language to another; when transliterating, it is the characters themselves that are converted from one alphabet to another. For example, the Chinese character 昌 means "prosperous" and is pronounced "chang", and the Chinese character 李 means "plum" and is pronounced "li". Transliteration converts the Chinese name 昌李 into "chang li" (translation would convert this to "prosperous plum").

Characters from alphabets such as Cyrilic (languages include Russian) and scripts such as CJK (Chinese, Japanese, Korean) can all be input into mHUB.