mSQL - Configuring an XML - Match Keys

The first section that appears in the XML configuration file is the <matchKeys> node, within which there are definitions for three types of keys: <fuzzyKeys>, <exactKeys> and <duplicatePreventionKeys>.

In short, a match key is used to group potential matches within data.  The keys you will want to use will depend on the kind of data you are working with.  An example of a match key that is commonly used is the ‘name key’, which is the phonetic surname combined with the initial of the first name.  The name key for ‘John Smith’ would be ‘smyTJ’, which would group names such as Jonathan Smith and Jon Smythe.  These potential matches would then be compared with each other and scored to rate the validity of the match.

By default, three ‘Composite keys’ are defined for the <fuzzyKeys> node - A composite key is a collection of single keys that are used together in a query to search for data.  You can add or remove composite keys (note that one search is done per composite key, so more composite keys means more processing time) by adding or removing <key> nodes within the <fuzzyKeys> node, as well as modifying the single keys of the <key> nodes themselves.  The <fuzzyKeys> are used in the fuzzy matching processes such as FindMatches and FindOverlap.

Also by default, a ‘Composite key’ containing 8 keys is defined for the <exactKeys> node – They can be modified in exactly the same way that the <fuzzyKeys> are, and are used in the exact matching processes, such as findExactMatches and findExactOverlap.  Records with identical exactKeys are deemed to be exact matches without any further comparisons or processing, hence there is not scoring. It is therefore important to use valid keys for this node.  For example, just using ‘name key’ like in the example above may cause 2 John Smiths at completely different addresses, who are therefore most likely completely different people, to be classed as exact matches.

The <duplicatePreventionKeys> node can similarly contain any number of composite keys (one is defined by default), and these can be modified in exactly the same way as the others, by adding or removing ‘keyX’ attributes (where X is an integer).  The duplicate prevention keys are used in the msp_SingleRecordMatch(2) procedure and mfn_SingleRecordMatch(2) table-valued functions (TVF), which will typically be used in a passive manner, described further on.

It is also worth noting here that it is possible to use mSQL record fields as search keys as well as key fields themselves (i.e. fields with the ‘mk’ prefix). Note however that if there is no field mapping defined for a particular mSQL record field (discussed in the <datasource> section here), the key will not be able to be used as a match key, and an exception will occur if it is.

Note – It is possible to declare SQL string functions with the key definitions, that is, LEFT, RIGHT and SUBSTRING.  If you wanted to use the fist 2 letters of post out for example as a match key, you would set the ‘keyN’ attribute to be LEFT(mkPostOut, 2).  You would do the same with the RIGHT and SUBSTRING functions (with SUBSTRING taking a third counter argument).  The following other functions can be used: LTRIM, RTRIM, TRIM, PUNTRIM, UPPER, LOWER.