Common tweaks to improve matching performance/results

Matching Problems

Overview of Matching Problems

When reporting matches or overlap between files, matchIT looks up the full details for each record involved by using its unique reference number as stored in the Matches or Merges database.

When investigating problems with matching, before doing anything else, browse the Main File and check that the imported data looks okay, including the derived fields such as NAME, NAME1 etc.  The Key fields are fairly obscure, but they should not in general be blank.  If they are, or if e.g. the NAME2 field is blank, it could be because you have not allocated the right field names to the data (or the Intelligent Setup Wizard has not correctly identified the fields).  You should check in particular that the right names have been used for people's names – usually ADDRESSEE for a name keyed all in one field and PREFIX, FORENAMES (or INITIALS), SURNAME when they are split up.

Too Few Matches or Scores Too Low

Possible causes of these problems are:

  • the Minimum Score to Report is too high
  • the Weights are poor, or are placed on un-normalized fields such as Addressee
  • the primary Match Keys you have used have caused matchIT not to look at some potential duplicates e.g. you have used NAME_KEY + ZIP in one step, instead of ZIP in one step and NAME_KEY or NAME_KEY + LEFT(ZIP,5) in another step, or you have used an un-normalized field as a primary match key, such as Addressee
  • you may have Match on Location set on in "Matching Setup" when you don't want to use it.  If you want to consider matches based on e.g. contact and company names irrespective of location (address and zip code), you should uncheck this box.
  • although you did more than one Find pass through the data, with different keys, you said "Yes" to "Is this a new Analysis" on the second pass, when you should have said "No."

If you still can't resolve the problem, find two records which should be reported as duplicates and contact your support team or your support provider.

Too Many Matches or Scores too High

Some of the checks above are worth looking at, in case the problem is the reverse of that described above. You may also benefit from using primary Match Keys that do not allow some of the false matches to be reported (e.g. where records are being reported because they match on name and have a blank zip code, but the addresses are different, use LEFT(ZIP,5) + NAME1 starting at zip codes beginning with 'A', instead of NAME1 + LEFT(ZIP,5).  

If you are dealing with foreign data (not from an English-speaking country), you may need to add common words to the Names and Words table to stop matchIT from paying any or too much attention to them when comparing records e.g. add Weg as an Address word for Holland and the Scandinavian countries.

If this does not solve the problem, see the section below.

Totally False Matches Being Reported

Possible causes of this are:

 (a) Unique References are not Unique

If when you merge two databases and then View Matches or View Overlap, records are displayed with no similarities whatsoever, it is possible that the records do not have unique entries in the UNIQUE_REF field.  You can check the unique references by using Database Utilities, Check Unique Refs.  If this field was not allocated by matchIT on Import, rename your input field as some other name (URN will do) and define a new field for UNIQUE_REF and matchIT will allocate references that are unique.

When you import two databases for merging, be sure to start the next reference number in the second database higher than the last record number in the first database  e.g. if the unique reference number of the last record in the first database is 5000, the next record number of the second database needs to be greater than 5000.

If you have merged databases together and forgotten to give them unique references when originally importing them, you can use the Database Utilities option Generate Unique Refs to regenerate them.  You must then Find Matches or Find Overlap again to find the matches for the new reference numbers.

If this happened in a Job Script, make sure that you set the Next Reference Number in the Options (Input Options) for each step, so that each database has a unique range of references, or use the Multiple File Wizard which ensures uniqueness.  

 (b) Inappropriate Report

Alternately, the problem could be due to having selected the wrong report.  When choosing a report, make sure you select one appropriate to the options you have defined.  If using one of matchIT's predefined reports, a sets report, for example, must contain the word 'SETS' in its name.  Similarly, a pairs report must contain 'PAIRS' and an overlap report, 'MERGE'.  Also, if you have found matches to business level, check you have chosen Business in the Report Format drop down list (View Matches dialog).

If you are using a report that you have modified, the report may be corrupt.  It is worthwhile checking  whether you have the same problem using the standard Business or Residential report, as delivered.

 (c) Indexing Problems

Otherwise, it could be caused by an indexing problem.  Check this by selecting Browse Imported Records from the Import menu and ordering it on UNIQUE REF.  If the unique reference numbers are not progressive the Index is corrupt.  Solve this by using the Database Utility option, Reindex to reindex the Main File(s).

 (d) Find matches before you view them

matchIT may display completely false matches, with no similarities whatsoever if you select View Matches before you Find Matches.  The reason is that matchIT retains information from the last matching runs that took place for a Main File in that directory.