mSQL - SSIS Tutorial - setting up a basic matching process

The first thing is to understand the matching levels, to read more on the matching levels - please see this article first

  1. Open up a the package from the previous article that already has key generation set up
  2. Drag and drop a findmatches and groupmatches tasks, connect them to generatekeys in the specified order 

  3. Generatekeys --> findmatches --> groupmatches

  4. it is important to connect the tasks in the specified order, as one task is dependent on the tables created by the previous task.

  5. Oopen up findmatches, go through iterations of choosing a level, depending on what fields you mapped back in generatekeys, if you can't choose the level you want, then cancel out and go back to your generatekeys task and double check your mappings

  6. you can select individual alone, business alone, all four main levels, or can deselect by clicking again, company only/name only should just be selected by themselves

  7. If you want to do a custom level such as name/address/email, then we would still suggest choosing individual level and adding keys/weights for email, and unchecking constraints like 'must match location' to let in email and name matches as opposed to choosing custom and building a level and choosing keys/weights from scratch

  8. Go back to choosing individual only

  9. In order to proceed, you need to also choose a volume level, choose low volume

  10. Click next, stick with the default keys unless you're sure you want to change it, then save

  11. If you go back to the task, switch to business level (assuming your sample data had an organization as well) and then go to the keys tab and refresh the keys, you'll notice that they changed slightly, you can modify existing tasks, but we suggest building from scratch instead of copy/pasting, in this case we don't want to switch matching levels, so click cancel

  12. Open up group matches

    the matching level and source are already read in from the previous task

  13. Select the matching groups, duplicates, and deduped tables for output, if you're proficient with SQL, you may just select the matching groups table and can build the other tables based off what is in that table.

  14. Execute the package

  15. go to SSMS, refresh tables – you should now see the deduped, duplicates and matching groups

  16. open up the matching groups table

  17. Take note of the relationship of the ID to the matchref, when they’re equal, that record went into the deduped table along with any other unique record which wouldn't be in this table, it's what we picked as the master. When they’re unequal – it’s a duplicate and went into the duplicates table

  18. Open up the duplicates and deduped and the counts of those two tables added up should be the same as the count of your original table

  19. When you rerun the matching, it will just automatically overwrite the results tables each time.  (we do suggest your database be set to a simple recovery model as mSQL does do a lot of reading and writing, and that can easily eat up lots of hard drive space over time)