mSQL - msp_FindOverlap - overlap matches and large_clusters tables

Overlap matches table

This table contains the matching pairs that have been identified as a result of the fuzzy matching process.

 

Column

Description

ID

Record ID for each matching pair.

Record1

Reference ID of the first record in the matching pair.

Record2

Reference ID of the record that is the second record in the matching pair. This record belongs to the second datasource specified in the parameters.

Level

The Level column indicates the matching level(s) at which a match was found.  If it contains a 1 then the two records match at the Individual level; if 2, then Family level; if 4, then Household level; and if 8, then Business level.  Multiple levels are indicated by summing values – for example, 9 would indicate a match at both Individual and Business levels (1+8), and 15 a match at all four levels (1+2+4+8).

By default, the Level column is followed by the total score for the four matching levels.  These columns are fully configurable within a configuration file.  Component scores (for name, organization, address, etc.) can also be output for any level(s).

IndividualScore

Individual level total match score.

FamilyScore

Family level total match score.

HouseholdScore

Household level total match score.

BusinessScore

Business level total match score.

These following columns relate to either master record identification or bridging prevention and shouldn’t be used for any other purpose; they are subject to change in future versions of mSQL.

MatchFlags

The MatchFlags column is only used when Bridging Prevention is enabled, which is only applicable to an internal dedupe, so disregard this column when running findoverlap.

MasterPriority1

Used for Master Record Identification.

MasterPriority2

Used for Master Record Identification.

AddressLength1

Used for Master Record Identification.

AddressLength2

Used for Master Record Identification.

Key

Indicates through which match key (as specified in your configuration file) this matching pair was found.

 

large_clusters table

This table lists the clusters that contain too many records (i.e. the Maximum Cluster Size has been exceeded).  Processing the cluster will therefore be skipped to avoid the stored procedure potentially requiring a significant amount of processing time.

 

Column

Description

ID

Record ID

KeyIndex

The composite key being processed.

Search

The current composite key value that identifies the current cluster (for example, the value of mkPostOut+mkName1 – note that a pipe character separates each key value).

Records

The total number of records in the cluster.

MaxRecords

The maximum cluster size constant.