mSQL - msp_FindExactMatches

 

Input Parameters:

  • Configuration file – the file path of the configuration file to be used when this procedure is run.
  • Datasource ID – specifies the data source to be used within the configuration file, which contains the table and column mapping specifications.

 

Finds all exactly matching record pairs in the specified database (taken from the supplied datasource).  The following settings are relevant to FindExactMatches:

 

Setting

Description

matchKeys->exactKeys

The match keys that will be used are specified in the XML within the exactKeys keys tags under the match keys section.  Fields can be concatenated together to create an exact match key e.g.

<key key1="mkName1" key2="mkName2" />

Which means that all records with the same phonetic forename and surnames will be recorded as matches.

 

dataSources

Defines the database connection, tables and columns of the dataset that is to be matched.

outputSettings->exactMatchesTable

Specifies the name of the exact_matches table that will be produced (which contains the results from the FindExactMatches processing).

outputSettings->reports

Specifies whether reporting is enabled, what folder the reports will be produced into, and what report format should be used.

 

If the ‘excludeExactMatches’ configuration option is enabled (the default), then exact matches will be automatically excluded when msp_FindMatches is run, to help increase fuzzy deduplication performance.  If the level of duplication is low, however, this option should be disabled.

As the matching process runs, the results are written out to a results table within your SQL Server database (in reality the matching results are written to a temporary file and then bulk loaded on completion of the process). The Find Exact Matches process produces 1 output table as follows (the name of which can be configured through the Web UI or XML):

 

Exact_matches table structure

 

Column

Description

ID

Record ID for each matching pair.

Record1

Reference ID of the first record in the matching pair.

Record2

Reference ID of the second record in the matching pair.