Fact 6 - There is no such thing as "Address Standardization"!
Things you absolutely must know about Address Correction & Standardization
It will fundamentally change how you think about customer Data Matching
There is a commonly held belief regarding address standardization…. and it goes like this…
“If I run data through address standardization software such as USPS CASS Certified Address Correction, then the addresses is verified, standardized and correct. - AND by doing so, the data can absolutely be relied on for data matching.” This is WRONG!
the process of making things of the same, to make standard or uniform; to cause to be without variations or irregularities
standardized; standardizing, standardization
This is probably the most pervasive and damaging beliefs in data quality and data matching! Nearly every textbook, tech article, Youtube video and conventional data quality vendor will tell you the same thing; To create a properly formed and consistent matchkey for the address, you must run your address through address correction software to correct, append, transform and standardize your data before creating a matchkey.
This notion is so prevalent that even many data quality, data integration and analytics software vendors advise that matchkeys should ‘only’ be built from a USPS standardized and corrected addresses fully parsed to the individual components (Street Number, Street Predirectional, Street Name, Street Suffix, Street Post-Directional, PO Box, Street Secondary, City, State, Zip), rather than the using the input address. Unfortunately developers, analysts and DBA’s build data quality routines ‘expecting’ address correction to fully prep their data for matching.
Why? Because as stated earlier, matchkeys are intolerant to any deviation from one input to another.
For example, the following address; 3500 N CAPITAL OF TEXAS HWY, APT 121, AUSTIN TX 78746-3378 must be standardized into:
Contrary to the term “address standardization” - there is no such thing as achieving standardization with address data. This is a bold statement, and it will probably leave even the most entrenched data quality practitioners scratching their head - or even disagreeing.
But if you doubt it, consider this....
According to the State of Texas - my wife and I live in the same house - but we live in different cities.
Yes - our driver licenses say we live in a different city. That’s because the documents that I brought to attain my driver license (electric bill) to prove residency listed my Address as Austin, and her document (water bill) listed Lakeway.
Think about that - two municipal utility companies providing services to the same home differ as to what city the home is in. Now the State of Texas - and every document that references our driver license thereon; whether that be voter registration, passports, financial/lending, etc. consider my wife and I to be in two different cities.
These issues are not unique or mere one-off issues. These issues are prevalent in every database and are what contribute to the complexity of Name Matching in the 21st century.
200 Park Ave New York NY 10166 in the postal database returns a staggering 99 possible address matches, which happens to be the maximum number the USPS site will return. Each of these 99 addresses has different ZIP+4 and mailing industry codes.
From the 7800 Beverly Blvd example earlier - that address is for CBS Studios - and could be correct as LA, Los Angeles, Miracle Mile, or Wilshire La Brea.
What you have to realize is that address accuracy is so important to the United States Postal Service, that USPS developed a method to evaluate the accuracy of commercially available address correction software. called the Coding Accuracy Support System (CASS).
Software that meets the USPS standard is deemed CASS Certified and can validate addresses down to the delivery point and verify that an address is deliverable.
Pay attention to that statement - CASS certified software can “validate addresses to the delivery point and verify that an address is deliverable.” It said nothing about being standardized or making certain it is the same every time.
The USPS primary concern is ‘not’ standardization! They care about one thing, and that one this is deliverability.
The USPS designed ZIP Codes to increase mail delivery efficiency, not data quality. Here are few tips.
With USPS CASS address validation, the city name is whatever USPS says it is, even if that city name isn't the city in which your property is actually located.
9-digit Zip+4 codes do not uniquely identify an address. the Zip+4 represents a postal delivery area. A Zip+4 can be as small as a room, a floor, 5-10 houses or even a building, a company, a military base, military unit or command, and sometimes even a ship.
USPS CASS address standardization will only validate address that receives mail. If the postal service doesn't service an area directly, it won’t be in the database. (common for people in rural areas).
If a physical address does not receive mail it won't be registered in the USPS database e.g. college dorms. PO Boxes - people get their mail there - but they don’t live in them. Cemeteries do not.
CASS address standardization (DPV) won’t add a street number, but it will tell you if it can deliver mail to it
It won’t add a Suite number or even tell you if it’s correct, (but will verify it exists).
A ZIP Code isn’t a ‘boundary’ but rather a collection of lines that represent delivery routes that define where the delivery trucks go.
And, if you’re wondering... Yes - ZIP Codes can and do cross, cities, towns, county boundaries, and even state lines.
These are just top-level issues, and we’ve not even touched on zip+4 code accuracy, mailstops, suite numbers, commercial mail receiving agencies, hyphenated addresses, geocoding, or college or even military APOs, FPOs, government or foreign and diplomatic addresses.
To be clear - address validation is a necessary component of data quality - but you have to understand it’s limitations. Especially as it relates to data matching.
An Important Warning about FRAUD
People who are set out to defraud corporations, financial institutions and government agencies understand these nuances of address data - and they exploit fraudulent claims and payments. You CANNOT rely on “address standardization” as a form of data matching.