mSQL - SSIS Tutorial - What are all these key fields?

 

So if you’ve set up a find matches or find overlap, you’ve seen that the keys were chosen for you depending on your matching level.

For starters, you can select a key up top, then peruse the drop down below



You’ll notice it lists both key fields and input fields

The key fields all start with an mk*, whereas the input fields refer back to whatever you mapped back in generate keys.  (town/region/postcode are the same as city/state/zip)

90% of our clients are fine using the default keys, maybe with a minor tweak such as adding a telephone or email to the keys

So why don’t we take a couple records and see what kind of keys they generate.

First we’re going to start with the name related fields

They consist of

mkNormalizedName – Normalized Name
mkName1 – phonetic last name
mkName2 – phonetic first name
mkName3 – phonetic middle name
mkNameKey – phonetic last name plus initial
mksuffix - Suffix
mkGender – Gender

So lets say we have two records that we’re matching

Record A

Fullname : Mr. W. R Dayton Jr
Company: The Palmer Air Charters service company      
Add1: 7350 Airport Rd #106
Add2: Wilmington
Add3: NC
Zip:

Record B:

Firstname: Bill
Lastname: Deighton
Company: PAC
Add1: Suite 106
Add2: 7350 Airport Rd
Add3: Wilmington
Add4: NC
Zip: 28401-4273

Besides the fact that they’re inconsistent structures, these are obviously the same person, but its not so easy to line up considering the missing data, that’s what we have the keys for.

Mknormalizedname is the normalized version of the name, so even though its first/last and fullname in another, it will be more consistent here.  When we’re matching on name we put the emphasis on the last name, then the first name, then the middle name. we also will convert it to its nickname.

This field is important because when we do the scoring on name, this gives you insight into how the software sees the name.

So Mr W R Dayton  becomes

DAYTON,W,R

And  

Firstname: Bill
Lastname: Deighton

Becomes 

DEIGHTON,WILL,

mkName1 -

This is the phonetic of the last name – we follow the same order as the normalzed name, that’s why name1 is the last name, because its what we are putting the most emphasis on.

Both have the same phonetic of

dytym

we use our own proprietary algorithm to generate these phonetic keys, the main thing to point out here is the y represents a vowel sound

mkName2 –

 For Bill we end up with a phonetic first anem of wyl, because we converted Bill to will as part of matching

For the W R, we’re just carrying through the first initial

 

mkName3 –

This is the phonetic of the middle name, or in this case the middle initial of R in W R Dayton

mkNameKey – phonetic last name plus initial

in this case, the name key is the phonetic of the last name, and the first initial

its dytymW for both records

mksuffix

if we detect a suffix of some sort, that’s going to go here, commonly Jr, Sr,  2nd, 3rd,   etc

mkGender – Gender

This is the inferred gender, we use our names and words lexicon that has a list of common first names and their inferred gender, so we know for sure that Bill is male, and the Mr W R Dayton, the Mr is a common male indicator, so we will return one of four codes
M = Male

F = Female

E = Either  (Sam smith – could be short for either Samantha or Samuel)

X = conflicting ( ie:  Mr Christina smith )

So that covers the name fields

Now for company related fields,

Lets go through the list

mkNormalizedOrganization
mkOrgname1
mkorgname2
mkorgname3
mkorganizationkey

mkNormalizedOrganization

This field is important because when we do the scoring on name, this gives you insight into how the software sees the name.

Let's say we have two companies


The Palmer Air Charters service company

PAC inc

With the first example, by default we see ‘The’ as a Noise type word, and strip it out, we see  ‘service’ and ‘company’ as business type words and strip those out as well.  You can manipulate what we ignore or don’t ignore by reviewing the changes suggested in the company only matching video.

With companies we look at them left to right, as opposed to names where its last, first, middle.

So the first example becomes PALMER,AIR,CHARTERS

In the second example, we just recognize inc. as a business word, and strip that out, seeing the PAC in all caps, by default we assume it’s an acronym, so we simply end up with

PAC,,     with blanks for the 2nd and third words

mkOrgname1

in this case for the first example, it’s the phonetic of palmer

pymy

for the second example, since we recognized it as an acroym, we just pass through the PAC

mkorgname2

y   

in this case the phonetic of Air – which is basically one giant vowel sound

blank for the second example, because we’re ignoring ‘inc.’

mkorgname3

in this case, the phonetic of Charters

Fyty

And just blank for the third example

mkorganizationkey

a concatenation of the first 5 characters of mkorgname1 + first 5 of mkorgname2

This is commonly confused with the normalized organization, if you want your matches to be much tighter, you can use mknormalizedorganization instead  or just mkorgname1 + mkorgname2 separately, 

Address related fields

mkPremise
mkphoneticstreet
mkphonetictown
mkAddresskey
mkflatno
mkpostout
mkpostin

so lets take these addresses

A)

Add1: Suite 106
Add2: 7350 Airport Road
Add3: Wilmington
Add4: NC
Zip: 28401-4273

B)

Add1: 7350 Airport Rd #106
Add2: Wilmington
Add3: NC
Zip: 28401-4273

mkpremise

The premise in this case is the house number, the house number isn’t always in the same spot, so in this case we identified the 7350 airport road, it doesn’t matter that one address started with the suite, we look at the address lines as a whole

mkPhoneticstreet

similar to the premise number, the street doesn’t need to be in the first address line, in this case we’ve identified Airport Rd as the street, and since words like road/street/circle are generally noise words, we just make a phonetic of Airport

mkphonetictown

We’re generally going to assume the town comes after the street, in this case we’ve identified Wilmington as the town in both situations, we know NC is likely the state.

Mkaddresskey

By default, the phonetic street and town can both be up to 8 characters long, if you want to pare that town and be a bit looser, but still want to use a combination of both, then the address key is a concatenation of both, starting with the town, then the street.

Mkflatno

A flat, or apartment number

Generally, if there’s some kind of numeric, or alpha numeric  number that’s following the street

This one key is not very commonly used except in exact keys, more clients prefer changing settings like ‘must match premise’ and ‘no one empty premise’ to keep different apartments in the same building separate.

Mkpostout

In the case of US addresses, the mkpostout is the first five digits of the zip code

So with a zip of 28401-4273 , the 28401 would be in the mkpostout

For a UK Postcode “KT22 8DY”, the mAPI populates mkpostout with “KT22”. Which may also be referred to as the  ‘outward part’

Mkpostin

In the case of US addresses, the mkpostout is the last digits of the 9 digit zip code, if just a 5 digit zip the mkpostin would be blank.

So with a zip of 28401-4273 , the 4273 would be in the mkpostin

For a UK Postcode “KT22 8DY”, the mAPI populates mkpostin with “8DY”, or the inward part of the full valid postcode

Optional fields

mkTelAreaCode
mkTelLocalNumber
mkFaxAreaCode
mkFaxLocalNumber

These won’t be available unless you mapped a telephone or fax when you first set up your generatekeys task.  You can ensure these fields are populated by checking the advanced settings under generate keys.

In the case of US phone and fax numbers

For 914 – 600-7243

The 914  would be the area code part

The 6007243 would be the local number part

In the case of UK phone numbers

For “01372 225 904” or “01372225904” the mAPI populates this with “01372”.

For “01372 225 904” or “01372225904” the mAPI populates this with “225904”.

 

New Company Fields added with 2.3.1:

For consistency with the Real Time Processing for SQL Server deployment new keys were added, in the output tab of generatekeys if you click advanced, then you can pick additional key columns to output.

Lets start with company fields - for this exercise we'll still use

The Palmer Air Charters service company

PAC inc

 

mkNormalizedOrg1

Unlike the phonetic of the company(mkOrgName1) - this is instead the normalized version - think of this as a parsed out version of mkNormalizedOrganization

So for 'The Palmer Air Charters service company' , it will be 'Palmer' - as we're ignoring 'The' as a noise word so 'Palmer is the first significant word

For 'PAC inc'  it will be 'PAC' as we recognized that as an acronym

mkNormalizedOrg2

This is the Second significant word from mkNormalizedOrganization

So for 'The Palmer Air Charters service company' , it will be 'Air' - as we're ignoring 'The' as a noise word so 'Air' is the Second significant word

For 'PAC inc'  it will be blank as we recognize 'inc' as a noise word

 

mkNormalizedOrg3

This is the Third significant word from mkNormalizedOrganization

So for 'The Palmer Air Charters service company' , it will be 'Charters' - as we're ignoring 'The' as a noise word so 'Charters' is the Third significant word

For 'PAC inc'  it will be blank as there was just the initial acronym

mkOrgAcronym

This is the acronym we've derived off the first 2 or 3 significant words in the company name

For both examples the acronym will be 'PAC'   as we carried through the first letters from 'Palmer Air Charters' and recognized PAC as an acronym already in the other record, this may make it easier to line up some outliers when the address isn't as reliable because of vanity names for cities for example throwing off the addresskey.

If the company is a single word, an acronym will not be generated for it as a single letter, if a company has more than 3 significant words, an acronym is only generated off the first three.

 

mkAlphabeticOrganizationWordA

This is the first alphabetic Organization word- this may allow you to identify transpositions in company names when the order of the words isn't as consistent for some examples.

for our 'The Palmer Air Charters service company' example it actually is Charters, as air wasn't considered significant as it was less than 4 characters, for 'PAC inc' it was blank as the acronym isn't considered when populating this field.

mkAlphabeticOrganizationWordZ

This is the Last alphabetic Organization word- this may allow you to identify transpositions in company names when the order of the words isn't as consistent for some examples.

For our 'The Palmer Air Charters service company' example it actually is Palmer, as Service wasn't considered significant as it was a Noise type word lookup in the names and words, for 'PAC inc' it was blank as the acronym isn't considered when populating this field as it also falls under the 4 character minimum.

 

 

New Name Fields added with 2.3.1:

 

mkFirstNameIni

The initials of the normalized first name

For  'Mr. W. R Dayton Jr' , as well as for 

Firstname: Bill
Lastname: Deighton

it would be 'W' , as we normalize Bill to Will then take the initial of the normalized version.

mkNormalizedFirstName

This is the normalized first name

For  'Mr. W. R Dayton Jr' , it would be 'W'

For 

Firstname: Bill
Lastname: Deighton

It would be  'Will' , as Bill normalized to Will

 

 

mkNormalizedMiddleName

normalized second name, or mid

For  'Mr. W. R Dayton Jr' , it would be 'R'

For 

Firstname: Bill
Lastname: Deighton

it would be blank

 

mkNormalizedLastName

normalized last name

or  'Mr. W. R Dayton Jr' , it would be 'Dayton'

For 

Firstname: Bill
Lastname: Deighton

it should be 'Deighton'

In this case those examples don't line up, unlike the mkName1, as its using the last name as is instead of a phonetic version.

 

New Other Fields added with 2.3.1:

 

mkEmail

email address with casing standardized

mkEmailUser

user name portion of the email

mkEmailDomain

domain portion of the email