Recently in Data Quality Assessment Category


By Kevin Ubay-Ubay, Sales Engineer

Personator World is Melissa's new powerful cloud based web service that gives a simple and single way to clean and verify your contact data globally. This service leverages our experience in name, phone, address & email validation to provide comprehensive contact checking and combines high quality identity level verification.

Some examples of where our new web service can be applied:

·         Age verification

·         Name-address verification

·         Anti-fraud applications

·         Online shopping cart & eCommerce platforms

·         FinTech/Banking

Trusted Reference Data

Personator World uses a number of trusted reference data to verify the identity of an individual. These types of data sources include:

·         Citizen/national databases

·         Credit agency/bureau

·         Utility and telecom sources

·         Driver's licenses

·         Electoral rolls

Personator World can then determine if the identity has been found and matched against those datasources. An example JSON response from the web service may contain something like this:


    "DatasourceName": "CREDIT-2",

    "Results": "KV03,KV04",

    "Messages": [


            "ResultCode": "KV03",

            "Description": "First/given/forename matched"



            "ResultCode": "KV04",

            "Description": "Last/surname matched"





    "DatasourceName": "CONSUMER-1",

    "Results": "KV01,KV14,KV13,KV12,KV10",

    "Messages": [


            "ResultCode": "KV01",

            "Description": "Address matched"



            "ResultCode": "KV14",

            "Description": "Premise/house number matched"



            "ResultCode": "KV13",

            "Description": "Thoroughfare matched"



            "ResultCode": "KV12",

            "Description": "Locality matched"



            "ResultCode": "KV10",

            "Description": "Postal code matched"




As you can see here, datasources where a match has been found are listed as well as what components of the input had been matched.

Using Personator World

To give a walkthrough of this service, when you send your contact data that you want checked and verified, Personator World starts off by standardizing and validating your data. At this stage, the service will parse and standardize your data as well as check for the following:

·         Name is in a valid format

·         Address is deliverable

·         Email address exists

·         Phone number is callable



Full name: John doe

Phone: 8458692102


Address Line 1: 1 unicorn

City: norwich

Administrative Area:

Postal Code: NR33AB

Country: GB

Full name: John Doe

Phone: +44 8458692102


Address line 1: 1 Unicorn Rd

City: Norwich

Administrative Area: Norfolk

Postal Code: NR3 3AB

Country: GB


In the diagram above, you can see the service standardizing (changing casings, abbreviations, etc.) and making corrections to our input data - add in missing street suffixes, add in phone country dialing code, add in missing state/province/administrative area info, correcting typographical email domain errors ( ->, etc.

Next, Personator World takes your data and compares it against trusted reference data in order to verify that individual's information.

In this verification stage, the web service takes the standardized name, phone, email and address from the previous checking stage and performs ID verification. Additional input such as the individuals national ID provisioned by the country's government and date of birth can be verified as well, depending on the country.




Full name: John Doe

National ID: HJDO840230HVZRRL05

Date of birth: 2/30/1984

Phone: +56-222-226-8000


Address Line 1: Paseo De Los Conquistadores 2000

City: Guadalupe

Administrative Area: NL

Postal Code: 67170

Country: MX



·         KV01 - Address matched

·         KV02 - National ID matched

·         KV03 - First name matched

·         KV04 - Last name matched

·         KV05 - Phone number matched

·         KV06 - Email matched

·         KV07 - Date of birth matched




The diagram above shows how the service will return back result codes indicating which pieces of information got matched. By observing what kinds of result codes are returned, you can determine how reliable your data is.


Personator World is currently available as a web service hosted on our servers - meaning you won't have to worry about updates and maintenance. Information about the countries supported and the level of coverage can be found by visiting our online wiki at Sample code and documentation is also provided to help you get jump started on building applications and services incorporating our ID verification technology.



By Edward Dombrowski Data Quality Analyst

Maybe you're already familiar with the way Express Entry and Express Entry Desktop Edition can autocomplete addresses - just select the country and type in the beginning of an address, and the service can fill in the missing information, using address data from over 240 countries.


But did you know the many benefits to autocompleting addresses? Increased efficiency, a single view of the customer at the point-of-entry, and increased security, to name a few.

With Express Entry Desktop Edition, you can easily enter complete addresses with 50 percent less keystrokes, with the bonus (since it uses the Express Entry service) that the addresses are already verified.

When you receive a complete address from the service, you can also geocode it easily with the click of the 'verify' button. Also, Express Entry Desktop Edition's system of templates gives you the flexibility you need to efficiently fill out any form in Windows or on the Web without having to code a solution.

global-address-autocompletion-ee-desktop-step1[1].png global-address-autocompletion-ee-desktop-step2[1].pngglobal-address-autocompletion-ee-desktop-step3[1].png

Another benefit to using Express Entry is the single view of the customer at the point-of-entry. This way, duplication in your database is controlled from the start. Express Entry's standardized addresses are consistent as they have pre-directionals, ordinals, suffixes, and post-directionals already abbreviated throughout.

Standardized addresses mean that you won't have to deduplicate customer tables by address. There are no addresses in Express Entry data that do not conform to the standard.

There may be one benefit you haven't thought of: fraud prevention. When a user selects a verified and standardized address on your form, or your Customer Service Representative enters an address through Express Entry Desktop's GUI, your confidence that the customer is who they say they are is increased. It becomes easier to separate real addresses from the fake ones.

Express Entry data is built from multiple sources so you can rest assured that address data is correct. If your Customer Service Center takes in addresses, Express Entry Desktop Edition can help your Customer Service Representatives enter them in more quickly.

For more info on Express Entry, go to:

Validating Global Phone Numbers 101: A Quick, Easy Tutorial

| No Comments | No TrackBacks

By Allison Moon Data Quality Analyst 

With the advent of global communications and business, chances are you are not only capturing addresses of your contacts, but contact phone numbers as well. This is where Melissa's Global Phone Object comes into play.

Global Phone Object validates registered phone numbers in more than 240 countries and territories, as well as enrich your data by appending information such as latitude, longitude, city, predominant language spoken, and more.

Setting up the object is quick and easy, especially if you're a current customer of any other Melissa API.

Here's a quick tutorial:

First, add the Phone Object DLL or .SL file to your project and instantiate an instance of the Global Phone Object:

SET globalPhonePtr as NEW Instance of GlobalPhone

Second, set your license string: 
CALL SetLicenseString WITH LicenseString

Next, initialize the data files:
CALL Initialize WITH DataPath RETURNING Result
IF Result <> 0 THEN
     CALL GetInitializeErrorString RETURNING
    PRING "Error: " & ErrorString

Now you're ready to start validating phone numbers! Using the 'Lookup' method, you can now loop through and feed your contact phone numbers through the object. For best results, pass in both the phone number and country associated with it with the Lookup method. If you require all the digits necessary for dialing internationally from your country, you will also need to pass in the Country of Origin (e.g. the country from which the caller is dialing from).

The outputs returned from the Global Phone Object are useful for call-centers or businesses that would benefit from knowing about the general area the contact is located. For instance, the UTC (Universal Time Code) output can help the caller schedule an appropriate time to dial the contact and the language output can provide the caller information on the language most comfortable for the contact.

The appended regional information, such as the latitude, longitude, locality (city), and administrative area (state) can be sued to get a general sense of where the majority of your contacts are located.

If you're capturing international contact information, the Global Phone Object will be a useful tool in determining registered phone numbers, standardizing phone number formats, and understanding more about where the contact is located.


To test Global Phone for free, go to:



By Oscar Li, Data Quality Sales Engineer/Channel Manager for Global Email

Melissa recently introduced several improvements and new features to its Global Email Verification Solution - an all-in-one real-time email mailbox validation and correction service. Here's a quick list of our latest improvements:

  • Improved fuzzy matching of domain corrections
  • Updated our TLD database with the newest ICANN info
  • Increased control over the freshness of data returned
  • Better unknown status detection capability

In terms of the service's new features, Global Email now offers two validation modes: Express and Premium.


Express can be used in time-sensitive situations and will give back a response in one or two seconds. Premium will actually perform a real-time validation of the email address and can take up to 12 seconds to receive a response back.

If you want to reduce the time taken in premium mode, we offer an advanced email option which will reduce the freshness of the data return in order to increase speed.

Using the Global Email Verification Solution to Protect & Increase Your Email Reputation Score

What does email reputation mean? A big part of a marketer's campaign is email marketing campaigns. It is important to watch your email reputation with a tool such as senderscore. Once you are blacklisted, your email deliverability will suffer, and as a result, you will have trouble sending out emails in the future as most mail servers subscribe to a spammers list.


Integrators should consult with our data quality experts in order to understand what they need to look out for to avoid a bad campaign, and why certain emails should be flagged/inspected. Even if you avoid spam traps, sending out too many invalid emails will cause the mail server to flag you as a potential spammer. Email marketing campaign servers with a low email reputation score will typically experience aggressive filtering.

On the other hand, maintaining a high reputation score will see less intrusive filtering only applied to individual emails and email campaigns instead of blanket IP addresses. It would be definitely prudent to not allow other users to influence your email reputation.

For example, if you are on a shared server - other companies/users could be sending out their own campaigns without filtering emails through our service. It would be a waste if you spent all the time and investment controlling your email campaigns and another user is email blasting without mailbox validation causing the entire IP to be affected.

How do I improve my email reputation score?

If your reputation scores are already dismal for your current existing email campaign server IPs, it might be beneficial to do email campaigns on a new or more reputable IP to see better return on investment. This would start your email reputation on a clean slate. As a disclaimer, we are not sure how feasible this would be for everybody, but your team will need to discuss internally.

However, using our service on existing IP should raise the reputation score for that specific IP. On IPs with pre-existing high volume campaign history, the scores will be slow to change. You can now see why it is extremely prudent to invest in an email validation system early on and why Global Email is a valuable tool to utilize.

For more info, go to:

Discover Data Quality Issues Before they Arise

| No Comments | No TrackBacks

By Taky Djarou, Data Quality Analyst

Melissa has released its new data Profiler API. The Profiler Object offers a unique approach to profiling your data, combining years of contact data quality experience, the power of many Melissa Objects, and data source tables to help you dig deeper into your data and return hundreds of properties about the input table, columns and individual values.

For example, many existing Profilers will allow the user to set a RegEx to capture an email pattern. The Melissa Profiler offers that function, as well as checking the syntax, the domain, and whether it's disposable, has a spammy reputation, or is invalid and will return counts that reflect all of the above.

Data validation is also performed on city, state/province, ZIP and postal code fields to report any discrepancies in your data. Even if you accidentally put a phone number in a name field, Melissa's Profiler can detect and report it.

The Profiler Object returns counts of duplicate records using four different matching criteria (Exact, Address Only, Household, and Contact.) Using the power of our flagship deduplication solution MatchUp, the number of unique records, duplicates and the largest group of duplicate counts will be reported for all four matching criteria.

Melissa's Profiler also provides value specific iterators (pattern, word, data, date, Soundex, etc.) that allow the user to loop through any column in an ascending or descending order to retrieve those values and their respective counts.

The date iterator for example, allows the user to see the busiest/slowest time/day of the month/day of the week using a time stamp field of when a record was created.

To demo the Melissa Profiler, please visit us at: or call 1-800-MELISSA (635-4772) and one of our Sales Representatives will set you up with a free trial.

Better Marketing Starts with Better Data

| No Comments | No TrackBacks

Improve Data Quality for More Accurate Analysis with Alteryx and Melissa


Organizations are under more pressure than ever to gain accurate contact data for their customers. When your consumer base ranges from Los Angeles to Tokyo, it can be challenging. Poor data quality has a critical impact on both the financial stability as well as the operations of a business. Verifying and maintaining vast quantities of accurate contact data is often inefficient and falls short of the mark. According to IBM, the yearly cost of poor data quality is estimated at 3.1 trillion in the U.S. alone.


Melissa's Global Address Verification and Predictive Analysis for Alteryx are the tools your business needs to grow. Download this whitepaper to find out how to achieve marketing success, while reducing the cost of doing business overall.


Learn how to:

  • ·         Better understand and utilize your big data for marketing success
  • ·         Build better relationships with customers with clean data
  • ·         Target the customers most likely to buy
  • ·         Cut down on undeliverable mail and save on costs


Download free whitepaper now:


Data Quality Dimensions Can Raise Practical Concerns

| No Comments | No TrackBacks
By Elliot King

Elliot King
As everybody knows, data quality is usually measured along seven dimensions--the four Cs of completeness, coverage, consistency, and conformity plus timeliness, accuracy and duplication. And the general method to judge data quality is to establish a standard for each of these dimensions and measure how much of the data meets these standards.

For example, how many records are complete; that is, how many of your records contain all of the essential information that the standard you established requires them to hold? Or how much of your data is accurate; that is, do the values in the records actually reflect something in the real world.

As Malcolm Chisholm pointed at in a series of posts not long ago, conceptualizing data quality as a set of dimensions may be misleading or at least not that useful. The argument is both philosophical and practical and while philosophers can debate the relationship of an abstraction to the real world, the practical concerns about the dimensions of data quality raise interesting questions.

The real issue is this--as they are currently conceptualized, are data quality dimensions too abstract; do they actually reveal something real, meaningful and useful about the data itself? And does measuring data according to those standards--i.e. establishing their quality-- lead to useful directions to improve business processes?

For example, the International Association for Information and Data Quality defines timeliness as "a characteristic of information quality measuring the degree to which data is available when knowledge workers or processes require it."

Obviously, the sense of timeliness in that definition reflects more on the ability to get at data when it is needed than on any quality of the data itself. However, timeliness of the data also could reflect on how up to date the data is.

Do records contain the most current information? But timeliness in that sense could also be subsumed under the idea of accuracy. If the information is not up to date, perhaps it is just inaccurate. Looked at through another lens, however, even if the data is not timely, that is it is not up to date, maybe the record is not inaccurate, per se, but is just incomplete.

Clearly, the assessment of quality according to individual dimensions is a tricky business. They can overlap and when used without caution can lead to more confusion than clarity.

More About Data Quality Assessment

| No Comments | No TrackBacks
By David Loshin

In our last series of blog entries, I shared some thoughts about data quality assessment and the use of data profiling techniques for analyzing how column value distribution and population corresponded to expectations for data quality. Reviewing the frequency distribution allowed an analyst to draw conclusions about column value completeness, the validity of data values, and compliance with defined constraints on a column-by-column basis.

However, data quality measurement and assessment goes beyond validation of column values, and some of the more interesting data quality standards and policies apply across a set of data attributes within the same record, across sets of values mapped between columns, or relationships of values that cross data set or table boundaries.

Data profiling tools can be used to assess these types of data quality standards in two ways. One approach is more of an undirected discovery of potential dependencies that are inherent in the data, while the other seeks to apply defined validity rules and identify violations. The first approach relies on some algorithmic complexity that I would like to address in a future blog series, and instead in the upcoming set of posts we will focus on the second approach.

To frame the discussion, let's agree on a simple concept regarding a data quality rule and its use for validation, and we will focus specifically on those rules applied to a data instance, such as a record in a database or a row in a table. A data instance quality rule defines an assertion about each data instance that must be true if the rule is observed. If the assertion evaluates to be not true, the record or table row is in violation of the rule.

For example, a data quality rule might specify that the END_DATE field must be later in time than the BEGIN_DATE field, and that means that for each record, verifying observance of the rule means comparing the two date fields and making sure that the END_DATE field is later in time than the BEGIN_DATE field.

This all seems pretty obvious, of course, and we can use data profiling tools to both capture and apply the validation of the rules to provide an assessment of observance. In the next set of posts we will focus on the definition and application of cross-column and cross-table data quality rules.


Data Quality Assessment: Value Domain Compliance

| No Comments | No TrackBacks
By David Loshin

To continue the review of techniques for using column value analysis for assessing data quality, we can build on a concept I brought up in my last post about format and pattern analysis and the reasonableness of data values, namely whether the set of values that appear in the column complies with the set of allowable data values altogether.

Many business applications rely on well-defined reference data sets, especially for contact information and product data. These reference data sets are often managed as master data, with the values enumerated in a shared space. For example, a conceptual data domain for the states of the United States can be represented using an enumerated list of 2-character codes as provided by the United States Postal Service (USPS).

That list establishes a set of valid values, which can be used for verification for any dataset column that is supposed to use that format to represent states of the United States.

A good data profiling tool can be configured to perform yet another column analysis that verifies that each value that appears in the column coincides with one of those in the enumerated master reference set. After the values have been scanned and their number of occurrences tallied, the set of unique values can be traversed and each value compared against the reference set.

Any values that appear in the column that do not appear in the reference set can be culled out as potential issues to be reviewed with the business subject matter expert.

In this blog series, we have looked at a number of methods that column value scanning and frequency analysis can be used as part of an objective review of potential data issues.

In a future series, we will look more closely at why these types of issues occur as well as methods for logging the issues with enough context and explanation to share with the business users and solicit their input for determination of severity and prioritization for remediation.


Data Quality Assessment: Value and Pattern Frequency

| No Comments | No TrackBacks
By David Loshin

Once we have started our data quality assessment process by performing column value analysis, we can reach out beyond the scope of the types of null value analysis we discussed in the previous blog post. Since our column analysis effectively tallies the number of each value that appears in the column, we can use this frequency distribution of values to identify additional potential data flaws by considering a number of different aspects of value frequency (as well as lexicographic ordering), including:

  • Range Analysis, which looks at the values, and allows the analyst to consider whether they can be ordered so as to determine whether the values are constrained within a well-defined range.

  • Cardinality Analysis, which analyzes the number of distinct values that appear within the column to help determine if the values that actually appear are reasonable for what the users expect to see.

  • Uniqueness, which indicates if each of the values assigned to the attribute is used once and only once within the column, helping the analyst to determine if the field is (or can be used as) a key.

  • Value Distribution, which presents an ordering of the relative frequency (count and percentage) of the assignment of distinct values. Reviewing this enumeration and distribution alerts the analyst to any outlier values, either ones that appear more than expected, or invalid values that appear few times and are the result of finger flubs or other data entry errors.
In addition, good data profiling tools can abstract the value strings by mapping the different character types such as alphabetic, digits, or special characters to a reduced representation of different patterns. For example, a telephone number like "(301) 754-6350" can be represented as "(DDD) DDD-DDDD." Once the abstract patterns are created, they can also be subjected to frequency analysis, allowing such assessment like:

  • Format and/or pattern analysis, which involves inspecting representational alphanumeric patterns and formats of values and reviewing the frequency of each to determine if the value patterns are reasonable and correct.

  • Expected frequency analysis, or reviewing those columns whose values are expected to reflect certain frequency distribution patterns, validate compliance with the expected patterns.
Recall again that the identification of potential issues can only be verified as a problem by review with a business process subject matter expert.