Recently in Tips & Solutions Category

Tips & Tricks for Global MatchUp Matching Strategies

| No Comments | No TrackBacks
iStock_000000053527Medium.jpg

by Tim Sidor, Data Quality Analyst

In the past we've discussed implementing different matching strategies based on how you would like your records grouped. For example. By "Address"? or by "Name and Address". The former would match 'John' and 'Mary Smith' at the same household, whereas the latter would identify them as unique entities.


For Global processing, even after determining and selecting a general strategy, 'Address' for example, it might still require knowing the expected address formats of the source data that needs to be compared and thus reevaluate the logic.

 

At first glance, a 'Global Address' matchcode might appear to be a safe accurate matching strategy...

matchupthumbnail.png













But knowing that some countries don't have a reliable Postal Code, which is usually the component MatchUp uses for efficient 'neighborhooding' (also known as 'grouping' or 'clustering'), how can we accurately match these records? Simply removing the Postal Code component would incorrectly match similar addresses that were in different parts of the country.

 

US & Canada users are so used to using the reliable Postal Code that we rarely use City (Locality). But for processing countries without Postal Codes, or databases with multiple countries, adding a Locality can bring back accuracy and efficient clustering.

matchupthumbnail2.png














Configuring this matchcode to allow 'blank matching' on the Postal Code will accurately match records for most worldwide addresses and is a default distributed matchcode.

 

However, many countries distinguish addresses by also using a different hierarchy structure which may include a combination of Dependent Locality, Administrative Area and or Sub Administrative area. Or the use a Dependent Thoroughfare to distinguish the delivery address. So knowing the primary data types used in a countries standard address can help you decide the proper matchcode components to include in your matchcode.

matchupthumbnail3.png















How do I know how to construct a good matchcode for specific region processing? Our 'Global Address, Locality' matchcode is a good basic strategy, but using Melissa's resources - such as Global Verification documentation and or actual record processing and parsing can help you determine the necessary components to construct a matchcode to produce accurate results.

Discover Data Quality Issues Before they Arise

| No Comments | No TrackBacks
melissaprofiler.png

By Taky Djarou, Data Quality Analyst


Melissa has released its new data Profiler API. The Profiler Object offers a unique approach to profiling your data, combining years of contact data quality experience, the power of many Melissa Objects, and data source tables to help you dig deeper into your data and return hundreds of properties about the input table, columns and individual values.

For example, many existing Profilers will allow the user to set a RegEx to capture an email pattern. The Melissa Profiler offers that function, as well as checking the syntax, the domain, and whether it's disposable, has a spammy reputation, or is invalid and will return counts that reflect all of the above.

Data validation is also performed on city, state/province, ZIP and postal code fields to report any discrepancies in your data. Even if you accidentally put a phone number in a name field, Melissa's Profiler can detect and report it.

The Profiler Object returns counts of duplicate records using four different matching criteria (Exact, Address Only, Household, and Contact.) Using the power of our flagship deduplication solution MatchUp, the number of unique records, duplicates and the largest group of duplicate counts will be reported for all four matching criteria.

Melissa's Profiler also provides value specific iterators (pattern, word, data, date, Soundex, etc.) that allow the user to loop through any column in an ascending or descending order to retrieve those values and their respective counts.

The date iterator for example, allows the user to see the busiest/slowest time/day of the month/day of the week using a time stamp field of when a record was created.

To demo the Melissa Profiler, please visit us at:  http://www.melissa.com/data/profiling.html or call 1-800-MELISSA (635-4772) and one of our Sales Representatives will set you up with a free trial.

Categories