mHUB - Compare Settings

 

The following settings are used when records are compared. The default value for each setting is shown.

Location: <settings><advanced><compare>

<compare>
<phonetic>...</phonetic>
<fuzzy>...</fuzzy>
<name>...</name>
<address>...</address>
</compare>

Phonetic

Location: <settings><advanced><compare><phonetic>

<phonetic>
<algorithm>soundIT</algorithm>
<algorithmForFirstNames>none</algorithmForFirstNames>
<looseThresholdForDynamicSoundIT>2</looseThresholdForDynamicSoundIT>
</phonetic>

algorithm: There are two stages to the matching process that the matchIT API uses; the key stage and the scoring stage. The first stage creates standardized and phonetic keys based on the input data, which allows potential matches to be identified. The second stage scores each pair of potential matches, using phonetic and fuzzy matching. This property governs the phonetic algorithm that the API uses when scoring.

There are five choices available:

  • soundIT
  • Loose_SoundIT
  • Dynamic_SoundIT
  • Soundex
  • None

Refer to Appendix H (Phonetic Algorithms) for further details.

algorithmForFirstNames: By default, this property is set to PhoneticAlgorithm.None which simply means that EngineSettings.Compare.Phonetic.Algorithm will be used.

Otherwise, all first names will be phoneticized using this setting.

This can be useful to impose a 'tighter' level of matching for first names than for last names, where firstnames are often abbreviated to short forms.

looseThresholdForDynamicSoundIT: When Dynamic soundIT is in use, this property controls the threshold at which soundIT is switched to Loose soundIT. The default is 2, which means that words containing less than two syllables are phoneticized using soundIT instead of Loose soundIT.

Fuzzy

Location: <settings><advanced><compare><fuzzy>

<fuzzy>
<algorithm>matchIT_Fuzzy</algorithm>
<maximumEditDistance>1</maximumEditDistance>
<minimumScore>0.5</minimumScore>
</fuzzy>

algorithm (From version 2.0.3): This property governs the fuzzy algorithm that the API uses when scoring.

There are two choices available:

  • matchIT_Fuzzy
  • Damerau_Levenshtein

Refer to Appendix I (Fuzzy Algorithms) for further details.

maximumEditDistance: The maximum number of differences between the two strings. (Applicable to Damerau_Levenshtein only)

minimumScore: The minimum fuzzy score. (Applicable to Damerau_Levenshtein only)

Name

Location: <settings><advanced><compare><name>

<name>
<preventMrsMatchingMiss>true</preventMrsMatchingMiss>
<fuzzyMatchNonNormalizedNames>true</fuzzyMatchNonNormalizedNames>
<organizationMatchingOnBlankNames>0</organizationMatchingOnBlankNames>
<matchInitialToEquivalentName>equal</matchInitialToEquivalentName>
<crossMatchInitialToName>true</crossMatchInitialToName>
<fuzzyMatchInitials>full</fuzzyMatchInitials>
<matchInitialsToFirstNames>equal</matchInitialsToFirstNames>
<fuzzyMatchFirstNames>fullFuzzyMatching</fuzzyMatchFirstNames>
</name>

preventMrsMatchingMiss: If this setting is enabled, then two compared names will not match if one has a title of Mrs and the other a title of Miss. For example, "Mrs J Smith" will not match "Miss J Smith" with the setting enabled (the default).

fuzzyMatchNonNormalizedNames: When enabled (the default), this will cause additional matching checks to be performed on names using the non-normalized name matching fields. This can be useful when the generate setting 'useEquivalentNames' is enabled, which will allow Elizabeth and Lisa to match, but will not allow for some misspellings and typos such as Lsia to match.

organizationMatchingOnBlankNames: When two records contain no addressee names, this setting will allow the names to achieve a score depending on what's available in the job title and company name fields. For example, if the two records contain job titles of Managing Director and company name of 360Science, then a positive name score will be given even though the records don't contain an addressee.

0 - Off

1 - On if either name blank

2 - On when both names are blank

matchInitialToEquivalentName: This setting controls how an initial matches a name that's equivalent to the given firstname. For example, when comparing Rebecca Smith and B Smith, then the B could be considered a match for Becky, which is a common abbreviation (or equivalent) of Rebecca. This is the default setting (they're considered 'equal'). The score for this type of match can be reduced by using a value of 'approx', or altogether prevented from matching by using a value of 'unequal'.

crossMatchInitialToName: When enabled (the default), and the first letter of a firstname matches the middle initial (for example, "Richard Smith" and "John R Smith") then the names will be considered a possible match.

fuzzyMatchInitials: This setting controls how similar-sounding initials (M/N, S/F, and G/J) can be matched. When set to 'full' (the default), then one name's initial is permitted to match the first letter of the other name's firstname (for example, "M Smith" versus "Neil Smith"). When set to 'initialsOnly', then only initials are permitted ("M Smith" versus "N Smith"). A setting of 'noMatch' disables such matches.

matchInitialsToFirstNames: This setting controls the result achieved when an initial matches the first letter of a firstname. This defaults to 'equal', so that B Smith versus Bob Smith will achieve the same result as Bob Smith versus Bob Smith (i.e. 'equal' for the firstnames). Reducing this setting to 'approx' or 'contains' will reduce the resultant name score in order to distinguish such matches.

fuzzyMatchFirstNames: This setting can be used to prevent different recognized firstnames from matching. For example, ordinarily Ron and Roy will fuzzy match, but because they're both recognized firstnames they can be prevented from matching by changing this setting.

The default setting ('fullFuzzyMatching') will fuzzy match firstnames regardless of whether one or both is recognized. Changing to 'eitherUnrecognized' will fuzzy match the firstnames if either isn't recognized (e.g. Ron and Rov). 'bothUnrecognized' will only fuzzy match the firstnames where both are unrecognized (e.g. Rov and Row). Lastly, no fuzzy matching will take place if a value of 'noFuzzyMatching' is used.

Address

Location: <settings><advanced><compare><address>

<address>
<matchBoxNumberAndPostcode>false</matchBoxNumberAndPostcode>
<usePremiseRange>true</usePremiseRange>
<looseFuzzyPremiseMatch>false</looseFuzzyPremiseMatch>
<matchDeliveryPoints>false</matchDeliveryPoints>
<matchDeliveryPointsThreshold>1.0</matchDeliveryPointsThreshold>
<defaultDeliveryPoints>9U|9V|9W|9X|9Y|9Z</defaultDeliveryPoints>
<ignorePremiseSuffix>false</ignorePremiseSuffix>
</address>

matchBoxNumberAndPostcode: If this setting is enabled, then two compared addresses score Sure if they contain matching postal box numbers and postcodes (i.e. the remainder of the addresses are ignored).

usePremiseRange: When this setting is enabled, this will allow addresses to contain premise ranges. For example, if one record contains an address line of "11-15 Main Street" and the other "13 Main Street", then the premises are considered a match with this setting enabled; otherwise, the premises will not be matched and, depending on constraints and weights, the addresses might not score a high enough score to be considered matching records.

looseFuzzyPremiseMatch: When enabled, additional fuzzy premise matching is performed. Firstly, two premises can match if they differ numerically by up to 2 (for example, 1719 will now match both 1720 and 1721). And secondly, two premises can match if one premise starts with the other but contains extra trailing characters (for example, 88 and 88/2 will match, but 88 and 887 will not match).

matchDeliveryPoints: When enabled, this will prevent two addresses from matching when both contain two postal codes but different delivery point codes (for example, DPS codes in the UK, DPV codes in the US) and the addresses score below the minimum threshold.

If either record is missing a delivery point, or either is a default, then the addresses will match regardless.

matchDeliveryPointsThreshold: See matchDeliveryPoints, above.

defaultDeliveryPoints: See matchDeliveryPoints, above.

ignorePremiseSuffix: When enabled, this will allow two premises to match regardless of whether one or both has an apartment- or flat-type suffix (for example, 12 and 12a). Normally, such premises will cause the address score to be reduced because the addresses are considered different, which could prevent the two records being flagged as a match.