Recently in Matching Category

Tips & Tricks for Global MatchUp Matching Strategies

| No Comments | No TrackBacks

by Tim Sidor, Data Quality Analyst

In the past we've discussed implementing different matching strategies based on how you would like your records grouped. For example. By "Address"? or by "Name and Address". The former would match 'John' and 'Mary Smith' at the same household, whereas the latter would identify them as unique entities.

For Global processing, even after determining and selecting a general strategy, 'Address' for example, it might still require knowing the expected address formats of the source data that needs to be compared and thus reevaluate the logic.


At first glance, a 'Global Address' matchcode might appear to be a safe accurate matching strategy...


But knowing that some countries don't have a reliable Postal Code, which is usually the component MatchUp uses for efficient 'neighborhooding' (also known as 'grouping' or 'clustering'), how can we accurately match these records? Simply removing the Postal Code component would incorrectly match similar addresses that were in different parts of the country.


US & Canada users are so used to using the reliable Postal Code that we rarely use City (Locality). But for processing countries without Postal Codes, or databases with multiple countries, adding a Locality can bring back accuracy and efficient clustering.


Configuring this matchcode to allow 'blank matching' on the Postal Code will accurately match records for most worldwide addresses and is a default distributed matchcode.


However, many countries distinguish addresses by also using a different hierarchy structure which may include a combination of Dependent Locality, Administrative Area and or Sub Administrative area. Or the use a Dependent Thoroughfare to distinguish the delivery address. So knowing the primary data types used in a countries standard address can help you decide the proper matchcode components to include in your matchcode.


How do I know how to construct a good matchcode for specific region processing? Our 'Global Address, Locality' matchcode is a good basic strategy, but using Melissa's resources - such as Global Verification documentation and or actual record processing and parsing can help you determine the necessary components to construct a matchcode to produce accurate results.

Matchcode Caveats - How to Solve Them

| No Comments | No TrackBacks

By Tim Sidor, Data Quality Analyst

"The more advanced I make my matchcode, the more duplicates I'll identify."

This is an assumption - true or false - that many of our new users to MatchUp make, but often leads to false dupes, no dupes, or a process that seems to run forever.


Adding more columns of conditions, can be looked at as 'just adding more ways to return more duplicates.' This additional criteria may or may not result in accurate groups, as you may have actually loosened up your intended criteria. On the flip side, adding matchcode components may result in less duplicates as you may have tightened up your rules too much. Applying fuzzy algorithms (without thoroughly testing) will lead to a slower process, but may not return a significant number of additional matches (diminishing returns of accuracy/speed vs complexity/inefficiency).

"What can I do?"

When learning to use MatchUp, we always suggest starting with the basics - a simple default matchcode that we distribute, and a small data set. This allows you to quickly run and analyze how the matchcode performed against the data. Then make small changes - tweaking the matchcode and repeating the process or running a slightly altered data set with a few variations in format or data values. Eventually, you will migrate towards your end goal of incorporating your business rules into the matching strategy (the matchcode) with your production data.


By following any of the above disciplined paths, you will more quickly arrive at your goal and with a better understanding of how to create the best matchcode for your environment. No diagonal shortcuts!

"OK, I already went straight to 'Production Data and a Custom Matchcode,' what do I do?"

First, evaluate the Result Codes and Dupe Group output properties. In addition to telling you the output disposition of a record (unique, group winner, duplicate, etc.), the Result Codes will tell you which matchcode combination (which column of checkmarks in the matchcode) caused the record to match in a particular Dupe Group. If you find out that a particular column is never finding a match, or never finding a match that another column hasn't already found - you should consider removing it. This may also prompt you to remove duplicated component types which may have been used with alternate settings, from the matchcode. After re-evaluating the remaining components, and concluding they still represent a valid strategy, you may find that your process returns more accurate results AND processes much quicker.

"Can my process run faster?"

Yes, MatchUp uses an advanced clustering method to find duplicates and creating advanced matchcodes prevent efficient clustering, thus slowing processes down. For example, we had a customer who we had drop a matchcode component with a fuzzy setting from the second position to below another component which was using an exact setting (and in all columns). Their process decreased from 47 hours to under 4 - by making this simple change. Expanding on the diminishing returns concept - if an exact matchcode, for example, returns 20,000 duplicates from a 1,000,000 record set - is changing all components to a fuzzy algorithm and then returning 20,003 duplicates worth a process that takes 4x to run?

"What about that Result Code that tells me a specific combination returned a false dupe?" or "Why did these records not match under my rules?"

For details on how a matchcode relates to your data, click here for easy guidance to understanding your matchcode rules, and remember, test thoroughly!

For more info, go to:


Matchbook is a SaaS solution that provides accurate and complete control over your data stewardship processes.

With a fully functional user dashboard, Matchbook enables data stewards to:

         View queue statistics to see how much data has been processed and the status of their matches.

         See at-a-glance how many of the processed company records were low confidence or had no matches at all and then compare that to low confidence match patterns. 

                     View process completions, API usage to date, and the number of matches by confidence code.

         Monitor and update data.


Integration and ETL with Matchbook Services

With Matchbook Services, you can import and export with a variety of sources with no technical development. However, you can also integrate Matchbook Services directly into your architecture for faster matching, enriching, and monitoring functionality. The Matchbook Services Suite manages ETL processes and integration work, so you can get back to reaping the benefits of perfectly mastered data.


When You Need Data Management Expertise....

We offer a range of services for data stewards and IT teams:


1.                  Business Analytics - Companies that master business intelligence understand that effective Information Management solutions can generate significant cost savings, but the real value can come in improved decision making. Matchbook Services brings unparalleled expertise to your team by demonstrating excellence in architecture, design, reporting, and collaboration.


2.                  Data Management - A comprehensive data management strategy that includes capabilities in profiling, cleansing, mastering, and monitoring your data. To improve data quality, Matchbook Services focuses on clear metadata and data governance strategies by providing ongoing attention and enforcement of procedural standards.


3.                  Master Data Deployment - Identify a single version of truth in your data. Matchbook Services provides the highest level of MDM technical expertise when developing and deploying your MDM strategies. We also support credible and consistent data through data quality, data enrichment, and ETL processes.


4.                  Data Warehousing - In addition to focusing on cost reduction, leading companies treat information as a strategic asset - and plan accordingly. Backed by a clear vision and strong executive sponsorship, Matchbook Services helps clients use their valuable data resources through effective use of the right technologies.

With comprehensive and easily deployable data steward functionality, Matchbook Services gives you the tools to unlock the true value of your data. Put your data to work with the Matchbook Services Suite.