Fact #3 - #DataGymnastics and #RegExHell Sucks

Fact #2 - Conventional customer data matching processes are Labor intensive #DataGymnastics and #RegExHell



In fact #4 you'll learn that 'Conventional' Matching algorithms are too limited in their inherent ability to locate matches accurately, so they rely on a ‘matchkey’ - AND matchkeys rely on clean, extracted, parsed, transformed, normalized, standardized, and enriched data,  

To create a matchkey, conventional matching processes require analytics-ready data. What does analytics ready mean?

It means you must have a data structure in place with standardized key fields and normalized values. In other words, you MUST Extract, Transform, standardize, Clean and Enrich data BEFORE you match the data.For instance in data types like an ‘Address’; the Premise Number, Street Name, Suite/APT, City, State, Zip need to extracted to individual fields and have correct/standardized values (e.g Texas = TX, and Avenue = Ave).

For conventional processes and solutions, analytics-ready data is critical to creating consistent matchkeys and building MatchCodes that return good results. A DBA/Data Analyst is required to create as much consistency and uniformity in each column of data as humanly possible. It’s a time-consuming process, and the process looks like this...





There is no question that a MatchCode with properly extracted, transformed and standardized data will find many correct matches. But the fact is, it’s not reasonable to rely on extracted, transformed and standardized data. There is no ability to overcome all of the errors and variations in data (see fact #1 - Customer Data is Never Perfect), and as a result, 'conventional matchkeys' will fail and will miss a lot of good matches.

 matching-table.pngThe reality is, conventional matching processes require labor Intensive #DataGymnastics and #RegExHell. This fact was recently called out in an article published in Forbes titled “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task”. Data scientists spend around 80% of their time on preparing and managing data for analysis., and 76% of data scientists view data preparation as the least enjoyable part of their work.

It doesn't represent the realities of data, it doesn't work for efficiency and it doesn’t work at scale.





Was this article helpful?
0 out of 0 found this helpful

have a question or not finding what you're looking for?

Submit a ticket to get some help