Recently in data quality Category

Where Do You Fit In?

| No Comments | No TrackBacks
By Elliot King

Elliot King
Too often, those of us with our noses to the grindstone have no time to look up. We are so busy putting out fires, monitoring and maintaining what we have, or trying to launch new initiatives that we never look around to see how other organizations are dealing with similar issues.

This may be particularly true in the data quality world. Data quality is often seen as an internal problem and it is often addressed differently in different settings, both organizationally and technically. Indeed, even the terminology is not consistent across industries.

So a recent study conducted by the International Association for Information and Data Quality (IAIDQ) working in conjunction with the Information Quality Program at the University of Arkansas, Little Rock (UALR-IQ) reveals some very interesting trends. The survey of 270 data quality professionals identified the top challenges faced by data quality professionals.

Heading the list is a lack of accountability and responsibility for data quality, followed by too many data and information silos to manage, a lack of awareness and discussion of the size and impact of data quality problems and a lack of understanding of what data quality means. These challenges are fundamental and each was tabbed by more than 50 percent of the respondents.

Considering the basic nature of the challenges, perhaps it should be no surprise that 66 percent of the respondents believed that the effectiveness of the data quality efforts in their organization were only OK (some goals were met) or poor (few goals were met.) Ironically, 70 percent claimed that their organizations recognized that data and information were important strategic assets and managed it with that in mind.

So what is driving companies to improve their data quality efforts? According to the survey, the top driver is just a general desire to improve the quality of data, which was cited by 68 percent of the respondents. Other important motivations to improve data quality were the desire to improve business intelligence, and compliance and legal considerations.


How to Get More Email Delivered into Inboxes

| No Comments | No TrackBacks
By Abby Garcia Telleria

In this recently released article from Internet Retailer, Paul Demery, IR's chief tech editor, talks about why eliminating invalid email addresses from your database is critical to the success of your campaign. It's a great story, and offers tips on how you can improve email deliverability to ensure it gets into the right inboxes.

Also, in case you're interested - Melissa Data offers an email verification solution that can fix common typographical errors to turn invalid email addresses into deliverable ones.

Get a free trial here.

Using Data Quality Tools for Classification

| No Comments | No TrackBacks
By David Loshin

Hierarchical classification schemes are great for scanning through unstructured text for identifying critical pieces of information that can be mapped to an organized analytical profile. To enable this scanning capability, you will need two pieces of technology.

The first involves a text analysis methodology for scanning text and determining which character strings and phrases are meaningful and which ones are largely noise.

The second capability maps identified terms and phrases within existing known hierarchies and perform the classification.

Both of these techniques would work perfectly as long as the input data is always correct and complete - quite an assumption. That is why we need to augment these approaches with data quality techniques, largely in the area of data validation and data standardization/correction. For example, I am particularly guilty of character transposition when I type, and am as likely to tweet about my "Frod F-150" as I would about my "Ford F-150." In this example, the inexact spelling would lead to a failure to classify my automobile preference.

However, using data quality tools, we can create a knowledge base of standard transformations that map common error schemes to their most appropriate matches. Creating a transformation rule mapping "Frod F-150" to "Ford F-150" would suggest the likely intent, supplementing the classification process.

In other words, integrating our text analytics tools with more traditional data quality methodology will not only (yet again) reduce inconsistency and confusion, it will also enhance the precision for analytical results and enable more robust customer profiling - a necessity for customer centricity.

By Allison Moon
Data Quality Analyst

Allison Moon - Data Quality Analyst
In today's e-commerce environment, Web forms or online shopping carts serve to capture valuable contact data, but many times this data can contain inconsistencies, missing or incorrect information.


Fortunately, Melissa Data offers a solution. With a partially entered address line and ZIP Codeā„¢, our new auto-complete feature in Address Object can retrieve all possible address entries from which a user can select from. It's simple to use, and can be easily integrated into your existing solution with just a few steps. Address Object is Melissa Data's address verification solution (available as multiplatform API or Web service.)

Ensuring accuracy before bad data enters your CRM systems will prevent your company from dealing with lost revenue, time, inefficiencies and waste. Uses and Benefits to Implementing Auto-Completion So how can you take advantage of this new feature? For those who need to save as much time and keystrokes as possible, the auto-complete feature is a pretty awesome tool. Having the ability to retrieve a list of suggestions based upon the street number, and even just the first couple letters of the street name, saves time typing out an entire address.

The auto-complete functionality can also help find the correct suites and valid ranges for a building. In a call center setting, auto-complete can allow you to see whether the customer on the phone has forgotten to mention their apartment information, before the call has ended.

Or perhaps you're on the phone with a customer and quickly scribble down their mailing address. But now, when you look back at your notes, it's hard to read except for the first few numbers and characters (and who hasn't done this before?). With auto-complete, you can plug in the information (that you can decipher from your notes) and determine which address you meant to write down by looking at the suggestions returned.

Auto-completion is flexible enough to accommodate varying needs and design requirements, reduce the time spent finding addresses, and prevent issues by returning valid addresses given an incomplete one.

-- Allison Moon is Melissa Data's data quality analyst and software engineer.

To download a free trial of Address Object, please go to:
http://www.melissadata.com/free-trials/address-object-address-verifier.htm

Melissa Data will demonstrate new "Golden Record" functionality in its MatchUp Component for SQL Server Integration Services (SSIS) at Enterprise Data World (EDW). MatchUp SSIS is a powerful tool for advanced matching and deduplication management. By integrating the Golden Record selection tool, MatchUp SSIS represents an industry breakthrough based on its ability to discern contact data information, and select the surviving record based on the level of quality of the information provided.

Melissa Data will also showcase its popular collection of contact data quality and integration solutions. The event will be held April 28 through May 2 in San Diego, Calif. at the Sheraton Hotel and Marina.

Come stop by Booth #412 and say hi!

CLICK HERE FOR MORE INFO

Classifying Data Quality Problems

| No Comments | No TrackBacks
By Elliot King

Elliot King
Data quality is generally most fruitfully defined in the context of its use. Is the data good enough to allow the process with which it is associated to run efficiently and effectively? For example, is the mailing list you are using for a direct solicitation accurate enough that you can achieve your goals and not generate any unwanted and unanticipated negative consequences?

And while that definition may be good enough in a practical sense for specific issues, it really isn't good enough to diagnose the sources of data quality problems generally. Constructing a general framework for data quality problems can be a useful guide in better identifying and resolving specific issues.

One of the earliest efforts to better understand the nature of data quality problems calls for classifying problems into three general categories--operational, conceptual and organizational. Operational data quality issues are those that are generated through problems with data capture and transmission. Inaccurate data is collected. Data may be missing. Or data may be corrupted through some process, for example.

Conceptual data quality problems occur when data is not well defined or it is inappropriate for its intended use. One of the most famous examples of a conceptual data quality problem (though it is not often thought of in this way) was brought to light in the movie Moneyball.

The basic thrust of the movie was not that the information old-time baseball scouts used to evaluate players was wrong per se; it was they were collecting the wrong data to identify productive players. Batting average, for example, is less useful in determining a player's value than on-base percentage. A pressing new conceptual data problem is the attempt to use electronic patient records to judge medical treatment outcomes.

When operational and conceptual data problems persist over time despite repeated attempts to fix them, organizational data quality problems are usually the culprit. In these cases, wrong, missing and invalid data is not really the problem, but the symptom. Something has to be fixed in the organizational structure or culture.

The point is this--data can be wrong for many reasons and it can't fundamentally be fixed without a general understanding of the error's cause.

By Allison Moon
Data Quality Analyst

Allison Moon - Data Quality Analyst
In the past year, we have ventured into international waters with our Global Address Verification Web service, but our international services haven't stopped there. Our new Global Phone Object will now verify if an international phone number is accurate, and determine whether the number is valid for over 200 countries and territories.

This new interface in the current Phone Object takes in a phone number and country to determine whether the number is valid for the region. It can also take in a phone number alone and determine what country the number is from - by simply looking at the international access code (these are the digits used to dial into a country).

For example, with +49 3079788829, the Global Phone Object will be able to determine that this number is from Germany, just by looking at the international access code, +49.

In addition, Global Phone Object will also return international geographical information. Just like the current Phone Object, the Global Phone Object will return the latitude and longitude coordinates of the detected region.

It can also return the predominant language spoken in the phone number's region. This field is helpful in preventing phone calls from being lost in translation. By determining the pre-dominant language of the region, phone calls in a call center environment can be distributed to employees who are fluent in the detected language.

The Global Phone Object is a powerful tool that can be used in a multitude of ways to bridge international communication.

-- Allison Moon is a data quality analyst and software engineer for Melissa Data.

-- For more info on Global Phone Object, go to:
http://www.melissadata.com/phone-verification/index.htm


How Good is Your Address Data? Are You Really Sure?

| No Comments | No TrackBacks
What do you do when you're faced with verifying and correcting international addresses? Melissa Data and Loqate tackle the issues that arise from different language and character sets, the over 130 address formats worldwide, and the many diverse sources of data. Find out how your customer retention, satisfaction and revenue can be impacted with better data quality; how global address verification and geocoding works; and much more! Melissa Data will showcase Contact Zone, its one-click data quality software solution to easily clean your data in leading CRM platforms such as Salesforce, etc. Webinar starts April 16 at 11:00 AM EST.

CLICK HERE TO REGISTER NOW!

Understanding Hierarchies

| No Comments | No TrackBacks
By David Loshin

Defining standards for group classification helps in reducing confusion due to inconsistencies across generated reports and analyses. In the automobile classification example we have been using for the past few posts, we might pick the NHTSA values (mini passenger cars, light passenger cars, compact passenger cars, medium passenger cars, heavy passenger cars, sport utility vehicles, pickup trucks, and vans) as the standard.

Yet, as more organizations look to merge data sets and feeds from different sources, some challenges remain, particularly with the use of unstructured text (such as that presented via Twitter or Facebook.) People cannot be expected to always conform to your organization's data standards, and often use colloquial terms or their own words to describe ideas that would map to your own dimensional values.

For example, if you wanted to filter out the individuals who prefer to drive "pickup trucks" (one of our standard values), it is not enough to scan for that phrase. Many individuals will refer to their pickup truck using different terms, such as a make and model ("Ford F-150," "Chevy Silverado") or a different name ("light truck") or a nickname ("baby monster"), but these terms have to be linked to the overall classification term.

This is an example of a simple hierarchy, in which one concept ("automobiles") is divided into a collection of smaller classes (the NHTSA classifications). Each of those classes in turn contains other phrases and terms. Within each of those included collections, there may be other inclusive categorization, such as by make and then model.

With a well-defined hierarchy for classification, unstructured text can be scanned for matches with values that live within the hierarchy, and that enables the standardized classification. To round out the example, a Twitter tweet exclaiming the author's love of "driving his Ford F-150" can be scanned, with the model name extracted, located within the make and model hierarchy for pickup trucks, thereby allowing us to register his/her automobile driving preference!

Your Data Quality Scorecard: The 7 Cs of Data Quality

| No Comments | No TrackBacks
Do you know how accurate your company's data is? How much bad data is costing your organization? How you can get a single, complete view of your customers? Data quality today is a bottom-line issue that businesses must address to stay competitive. To help business managers and executives, marketing professionals, and other non-tech personnel understand what data quality is, and why it's important - we outlined 7 data quality principles in a convenient and easy-to-follow format.