Recently in Address Correction Category

Postal Standards and Address Quality - Take 1

| No Comments | No TrackBacks
By David Loshin

The USPS Postal Standard (Publication 28) provides at least some of the specifications we need for address quality. For example,

 "The Postal Service defines a complete address as one that has all the address elements necessary to allow an exact match with the current Postal Service ZIP+4 and City State files to obtain the finest level of ZIP+4 and delivery point codes for the delivery address."
The next paragraph provides some additional details:

 "A standardized address is one that is fully spelled out, abbreviated by using the Postal Service standard abbreviations (shown in this publication) or as shown in the current Postal Service ZIP+4 file."
A large part of the remainder of the document guides what is valid and what is not valid, as well as the postal standard abbreviations (as mentioned in the definition of standardized). So an address must be complete, which by definition implies that it can be matched with current Postal Service ZIP+4 and City State files.

This match is to obtain the ZIP+4, so the implication is that verification means that a complete address matches the USPS files and has the correct ZIP+4. The address components must be consistent with the postal standard in terms of valid and invalid values. For example, a street address cannot have a number that is outside the range of recognized numbers (that is, if the USPS file says that Main Street goes from 1-104, an address with 109 Main St is invalid). So validation means that the street address is consistent with what is documented by the USPS files. Standardization is also defined by the above reference: it is spelled out, and uses the USPS standard abbreviations.

In turn, the process for address quality would be to:

1) Ensure the address is complete.
2) Ensure that the address values are valid by checking it against the USPS files.
3) Verify the address's ZIP+4 by matching against the USPS fles.
4) Standardize the address according to the USPS standardized abbreviations.

By David Loshin

There are all sorts of tools associated with address standardization, cleansing, and validation. As an example, the USPS has a certification program for software vendors, referred to as CASS (Coding Accuracy Support System)™ certification. According to their website,

CASS enables the Postal Service™ to evaluate the accuracy of address matching software programs in the following areas:

(1) five-digit coding
(2) ZIP + 4/ delivery point (DP) coding
(3) carrier route coding
(4) DPV®
(5) DSF2®
(6) LACSLink®
(7) eLOT®
(8) RDI™ products

CASS allows vendors/mailers the opportunity to test their address-matching software packages and, after achieving a certain percentage of compliance, to be certified by the Postal Service. CASS does not measure the accuracy of ZIP + 4 delivery point, five-digit ZIP, or carrier route codes in a mailer's existing files. CASS enables mailers to measure and diagnose internally written, commercially-available, address-matching software packages. The effectiveness of service bureaus' matching software can also be measured.

There are many vendors selling CASS-certified tools and services. Organizations use CASS-certified tools for address standardization, correction, and validation. End of story, right?

Wrong. Some organizations use many CASS-certified tools for address standardization, correction, and validation, at different places along the processing stream. The addresses are standardized, cleansed, and validated (or not) multiple times. The addresses are changed from their original format, manipulated, and then shoved back into the databases, without considering the actual process dependencies or expectations.

And then you end up with a scenario like this: a process for accepting customer applications including their self-provided addresses, send hard copies of acknowledgements to their self-provided addresses, yet the process includes an elaborate mechanism for managing returned mail. That did not make sense to me: if the organization sends out acknowledgements to the address the customer provided, wouldn't they trust that the customer provided an accurate address?

In fact, the issues was self-created: the provided address passed through at least three different iterations (with different products!) of standardization, correction, and validation, and was transformed from a deliverable ("accurate") address to an invalid one.

So even though the intent was appropriate, the execution of the process got in the way of the results. So I'll throw out two questions: First, is address standardization and validation a tool or a process? And second, at what point and how frequently in the business process flow should address standardization and validation take place?


Context is Key to Measuring Data Quality

| No Comments | No TrackBacks
By Elliot King

Elliot King
Beauty is in the eyes of the beholder, but that is not the case when it comes to data quality or, at least, it is not the whole story. Data quality can be measured along several different dimensions. But in the final analysis data quality depends on the context within which the data is used.

 Perhaps the most obvious criteria by which to measure data quality is accuracy. Does the customer in your customer database actually reside at the associated address? Is the email address actually correct? It's not hard to imagine a customer record filled with inaccurate data.

The issue of completeness is related to the issue of accuracy. All the information you have may be accurate but you may not have all the information you need, particularly the information you need to be able to link records efficiently. If all you have is a customer's name, obviously that will not be good enough to serve as the foundation for a direct marketing campaign. (The flip side of the completeness equation is important as well. Capturing a lot of superfluous information can be just as problematic as missing information.)

Data can also be measured according to its consistency. For example, are customer accounts activated and deactivated appropriately? It doesn't make much business sense to send a subscription solicitation to somebody who already subscribes to a magazine. But it happens.

Other significant criteria by which the quality of data can be assessed are timeliness and the ability to audit it. Does the data enable people to generate reports according to their deadlines? Do your customer service representatives have the most up-to-date pictures of your customers' latest interactions with your organization? Finally, can data be tracked back to the transactions that generated them?

There are other dimensions along which data quality can be assessed. Are records duplicated? Are records captured according to the specified rules?

But the components of data quality are just that--components. Data quality itself is holistic. It allows the processes in which it is used to function efficiently and cost effectively or it doesn't.


By Ira Whiteside

Recently Microsoft released a new Beta Release of SQL Server codenamed "Denali," which included the Data Quality Services (DQS) feature. Here at Melissa Data, we are partners with Microsoft and also participate in the Azure Data Services Market for DQS providing Address Correction references.

Over the next several weeks, we intend to explore the new data quality services capabilities of the upcoming release of SQL Server codenamed Denali, Melissa Data provides additional components that accomplish data quality in the SQL Server and SQL Server Integration Services (SSIS) environment. I have outlined some of the differences below.

Data Quality Services is a service and server-based application relying on a data quality knowledge base, therefore providing a shared knowledge base or data-driven application, most probably available only in the Enterprise Edition.


This will provide a powerful new capability, allowing for the code-free development of data quality capabilities that are accessible in an SSIS environment.

While this is a new set of capabilities provided by Microsoft, many of these capabilities are currently available to Melissa Data's Data Quality Components for SSIS.

The Melissa Data DQC for SSIS are also data driven, share rules and domain knowledge and allow you to store knowledge in a similar fashion , however they differ in that they are implemented as SSIS Custom Components thereby fully leveraging the capabilities of SSIS pipeline capabilities and accessing local data stores.

Similarly, the Melissa Data reference libraries can be stored and accessed locally providing performance gains. They are available for all Editions of SQL Server.
http://www.melissadata.com/data-quality-ssis/index.htm.

In the next blog, we will take a deep dive into the SSIS implementations side-by-side and then move on to the impact on data governance and master data management efforts.


Authors