Recently in Address Quality Category

Where Do You Fit In?

| No Comments | No TrackBacks
By Elliot King

Elliot King
Too often, those of us with our noses to the grindstone have no time to look up. We are so busy putting out fires, monitoring and maintaining what we have, or trying to launch new initiatives that we never look around to see how other organizations are dealing with similar issues.

This may be particularly true in the data quality world. Data quality is often seen as an internal problem and it is often addressed differently in different settings, both organizationally and technically. Indeed, even the terminology is not consistent across industries.

So a recent study conducted by the International Association for Information and Data Quality (IAIDQ) working in conjunction with the Information Quality Program at the University of Arkansas, Little Rock (UALR-IQ) reveals some very interesting trends. The survey of 270 data quality professionals identified the top challenges faced by data quality professionals.

Heading the list is a lack of accountability and responsibility for data quality, followed by too many data and information silos to manage, a lack of awareness and discussion of the size and impact of data quality problems and a lack of understanding of what data quality means. These challenges are fundamental and each was tabbed by more than 50 percent of the respondents.

Considering the basic nature of the challenges, perhaps it should be no surprise that 66 percent of the respondents believed that the effectiveness of the data quality efforts in their organization were only OK (some goals were met) or poor (few goals were met.) Ironically, 70 percent claimed that their organizations recognized that data and information were important strategic assets and managed it with that in mind.

So what is driving companies to improve their data quality efforts? According to the survey, the top driver is just a general desire to improve the quality of data, which was cited by 68 percent of the respondents. Other important motivations to improve data quality were the desire to improve business intelligence, and compliance and legal considerations.


How to Get More Email Delivered into Inboxes

| No Comments | No TrackBacks
By Abby Garcia Telleria

In this recently released article from Internet Retailer, Paul Demery, IR's chief tech editor, talks about why eliminating invalid email addresses from your database is critical to the success of your campaign. It's a great story, and offers tips on how you can improve email deliverability to ensure it gets into the right inboxes.

Also, in case you're interested - Melissa Data offers an email verification solution that can fix common typographical errors to turn invalid email addresses into deliverable ones.

Get a free trial here.

Using Data Quality Tools for Classification

| No Comments | No TrackBacks
By David Loshin

Hierarchical classification schemes are great for scanning through unstructured text for identifying critical pieces of information that can be mapped to an organized analytical profile. To enable this scanning capability, you will need two pieces of technology.

The first involves a text analysis methodology for scanning text and determining which character strings and phrases are meaningful and which ones are largely noise.

The second capability maps identified terms and phrases within existing known hierarchies and perform the classification.

Both of these techniques would work perfectly as long as the input data is always correct and complete - quite an assumption. That is why we need to augment these approaches with data quality techniques, largely in the area of data validation and data standardization/correction. For example, I am particularly guilty of character transposition when I type, and am as likely to tweet about my "Frod F-150" as I would about my "Ford F-150." In this example, the inexact spelling would lead to a failure to classify my automobile preference.

However, using data quality tools, we can create a knowledge base of standard transformations that map common error schemes to their most appropriate matches. Creating a transformation rule mapping "Frod F-150" to "Ford F-150" would suggest the likely intent, supplementing the classification process.

In other words, integrating our text analytics tools with more traditional data quality methodology will not only (yet again) reduce inconsistency and confusion, it will also enhance the precision for analytical results and enable more robust customer profiling - a necessity for customer centricity.

By Allison Moon
Data Quality Analyst

Allison Moon - Data Quality Analyst
In today's e-commerce environment, Web forms or online shopping carts serve to capture valuable contact data, but many times this data can contain inconsistencies, missing or incorrect information.


Fortunately, Melissa Data offers a solution. With a partially entered address line and ZIP Codeā„¢, our new auto-complete feature in Address Object can retrieve all possible address entries from which a user can select from. It's simple to use, and can be easily integrated into your existing solution with just a few steps. Address Object is Melissa Data's address verification solution (available as multiplatform API or Web service.)

Ensuring accuracy before bad data enters your CRM systems will prevent your company from dealing with lost revenue, time, inefficiencies and waste. Uses and Benefits to Implementing Auto-Completion So how can you take advantage of this new feature? For those who need to save as much time and keystrokes as possible, the auto-complete feature is a pretty awesome tool. Having the ability to retrieve a list of suggestions based upon the street number, and even just the first couple letters of the street name, saves time typing out an entire address.

The auto-complete functionality can also help find the correct suites and valid ranges for a building. In a call center setting, auto-complete can allow you to see whether the customer on the phone has forgotten to mention their apartment information, before the call has ended.

Or perhaps you're on the phone with a customer and quickly scribble down their mailing address. But now, when you look back at your notes, it's hard to read except for the first few numbers and characters (and who hasn't done this before?). With auto-complete, you can plug in the information (that you can decipher from your notes) and determine which address you meant to write down by looking at the suggestions returned.

Auto-completion is flexible enough to accommodate varying needs and design requirements, reduce the time spent finding addresses, and prevent issues by returning valid addresses given an incomplete one.

-- Allison Moon is Melissa Data's data quality analyst and software engineer.

To download a free trial of Address Object, please go to:
http://www.melissadata.com/free-trials/address-object-address-verifier.htm

Melissa Data will demonstrate new "Golden Record" functionality in its MatchUp Component for SQL Server Integration Services (SSIS) at Enterprise Data World (EDW). MatchUp SSIS is a powerful tool for advanced matching and deduplication management. By integrating the Golden Record selection tool, MatchUp SSIS represents an industry breakthrough based on its ability to discern contact data information, and select the surviving record based on the level of quality of the information provided.

Melissa Data will also showcase its popular collection of contact data quality and integration solutions. The event will be held April 28 through May 2 in San Diego, Calif. at the Sheraton Hotel and Marina.

Come stop by Booth #412 and say hi!

CLICK HERE FOR MORE INFO

Classifying Data Quality Problems

| No Comments | No TrackBacks
By Elliot King

Elliot King
Data quality is generally most fruitfully defined in the context of its use. Is the data good enough to allow the process with which it is associated to run efficiently and effectively? For example, is the mailing list you are using for a direct solicitation accurate enough that you can achieve your goals and not generate any unwanted and unanticipated negative consequences?

And while that definition may be good enough in a practical sense for specific issues, it really isn't good enough to diagnose the sources of data quality problems generally. Constructing a general framework for data quality problems can be a useful guide in better identifying and resolving specific issues.

One of the earliest efforts to better understand the nature of data quality problems calls for classifying problems into three general categories--operational, conceptual and organizational. Operational data quality issues are those that are generated through problems with data capture and transmission. Inaccurate data is collected. Data may be missing. Or data may be corrupted through some process, for example.

Conceptual data quality problems occur when data is not well defined or it is inappropriate for its intended use. One of the most famous examples of a conceptual data quality problem (though it is not often thought of in this way) was brought to light in the movie Moneyball.

The basic thrust of the movie was not that the information old-time baseball scouts used to evaluate players was wrong per se; it was they were collecting the wrong data to identify productive players. Batting average, for example, is less useful in determining a player's value than on-base percentage. A pressing new conceptual data problem is the attempt to use electronic patient records to judge medical treatment outcomes.

When operational and conceptual data problems persist over time despite repeated attempts to fix them, organizational data quality problems are usually the culprit. In these cases, wrong, missing and invalid data is not really the problem, but the symptom. Something has to be fixed in the organizational structure or culture.

The point is this--data can be wrong for many reasons and it can't fundamentally be fixed without a general understanding of the error's cause.

How Good is Your Address Data? Are You Really Sure?

| No Comments | No TrackBacks
What do you do when you're faced with verifying and correcting international addresses? Melissa Data and Loqate tackle the issues that arise from different language and character sets, the over 130 address formats worldwide, and the many diverse sources of data. Find out how your customer retention, satisfaction and revenue can be impacted with better data quality; how global address verification and geocoding works; and much more! Melissa Data will showcase Contact Zone, its one-click data quality software solution to easily clean your data in leading CRM platforms such as Salesforce, etc. Webinar starts April 16 at 11:00 AM EST.

CLICK HERE TO REGISTER NOW!

Understanding Hierarchies

| No Comments | No TrackBacks
By David Loshin

Defining standards for group classification helps in reducing confusion due to inconsistencies across generated reports and analyses. In the automobile classification example we have been using for the past few posts, we might pick the NHTSA values (mini passenger cars, light passenger cars, compact passenger cars, medium passenger cars, heavy passenger cars, sport utility vehicles, pickup trucks, and vans) as the standard.

Yet, as more organizations look to merge data sets and feeds from different sources, some challenges remain, particularly with the use of unstructured text (such as that presented via Twitter or Facebook.) People cannot be expected to always conform to your organization's data standards, and often use colloquial terms or their own words to describe ideas that would map to your own dimensional values.

For example, if you wanted to filter out the individuals who prefer to drive "pickup trucks" (one of our standard values), it is not enough to scan for that phrase. Many individuals will refer to their pickup truck using different terms, such as a make and model ("Ford F-150," "Chevy Silverado") or a different name ("light truck") or a nickname ("baby monster"), but these terms have to be linked to the overall classification term.

This is an example of a simple hierarchy, in which one concept ("automobiles") is divided into a collection of smaller classes (the NHTSA classifications). Each of those classes in turn contains other phrases and terms. Within each of those included collections, there may be other inclusive categorization, such as by make and then model.

With a well-defined hierarchy for classification, unstructured text can be scanned for matches with values that live within the hierarchy, and that enables the standardized classification. To round out the example, a Twitter tweet exclaiming the author's love of "driving his Ford F-150" can be scanned, with the model name extracted, located within the make and model hierarchy for pickup trucks, thereby allowing us to register his/her automobile driving preference!

Standardizing Classifications

| No Comments | No TrackBacks
By David Loshin

In the most recent post, we posed a straightforward problem: if we have a reporting or analytical objective that depends on using a dimension for classification, what happens when two different value domains are presumed to map to the same conceptual domain?

More concretely, the example we used was mapping individuals to their car purchase preferences, but different applications used different car classifications that did not share the same number of values and the value sets did not directly map in a one-to-one manner. The potential result is confusion in interpreting the results, especially if this classification is just one variable used for creating a customer profile.

One way to address this is to put a standards policy for classification dimensions in effect by selecting a single set of concepts, mapping those conceptual values to a standard single set of values, and then insisting that any application that uses that conceptual domain always use the standard.

This sounds simple, but it actually may entail some effort, since no one person may be aware of all the places that any specific classification domain is used.

This task goes beyond a "data management" activity and essentially becomes a "data governance" one involving a broad solicitation across the community of data consumers to determine the classification dimensions used and the enumerations of values employed within each dimension.

At the same time, the analyst spearheading this effort must have a plan for capturing the classification data, harmonizing values across variant lists, selecting a standard, communicating the standard, and then ensuring that the standard is put into practice.

Establishing good practices and processes for domain harmonization and standardization is an important topic to be considered in upcoming posts, but next time we will look at a growing challenge for classification domains: aligning data from unstructured text with the standard classification dimensions.

The People You Should Care About Most

| No Comments | No TrackBacks


By Elliot King

Elliot King
This goal should be a no-brainer. When a customer interacts with your organization, your point-of-contact personnel should have accurate information about your products and services and about the person especially in the case of a repeat customer. When front-line personal provide incorrect or incomplete information, or don't have access to information they should have, the customer experience suffers.

Ironically, the impact of poor data quality on customer satisfaction is often overlooked. According to a survey of members of the Association of Business Process Management Professionals (ABPMP), of the 45 percent who reported that they are working on improving CRM processes, only 38 percent have evaluated the impact that poor-quality data has on the effectiveness of these processes. That statistic is a little frightening.

Poor data quality coupled with the inability to deliver the right data to the customer at the right time damages overall customer satisfaction in a variety of ways. Consider this scenario. Many retailers run multiple promotions at the same time. Each promotion has different rules and restrictions and run for different time periods.

When customers go to pay, if the final price does not reflect the discounts they anticipated, they will not be real happy. The flip side of the coin is true as well. If a sales process does not include a discount where one is due, an opportunity to build goodwill or perhaps close a deal will be lost.

But misunderstandings and the failure to provide the appropriate product are the tip of the iceberg. If point-of-contact personnel do not have confidence in the information they receive, the process for which the data is needed will be slowed and their productivity diminished.

And there is the annoyance factor. Think about your interactions with your telephone or cable provider. You call with a problem. The computer confirms the number you are calling about. If you are lucky and you finally find a human to talk to, how many times do they have to reconfirm all your information--telephone number, address, and so on--especially when you are passed from one person to another? Okay, part of the problem is most likely the lack of overall system integration, but part of the problem is a fear of faulty information.

Poor data quality is not a theoretical issue. It can hurt you in the place that may hurt most--your relationship with your customers.