<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Melissa Data&apos;s Data Quality Authority</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/" />
    <link rel="self" type="application/atom+xml" href="http://blog.melissadata.com/data-quality-authority/atom.xml" />
    <id>tag:blog.melissadata.com,2008-09-24:/data-quality-authority/2</id>
    <updated>2012-02-08T21:28:17Z</updated>
    <subtitle>The Melissa Data Authority Blog is a forum for data quality professionals to gain and share knowledge about data quality issues that impact data warehousing, data integration, data enrichment, and data management strategies.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.21-en</generator>

<entry>
    <title>Reducing Data Quality Risks</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/02/reducing-data-quality-risks.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.166</id>

    <published>2012-02-08T21:23:00Z</published>
    <updated>2012-02-08T21:28:17Z</updated>

    <summary>By Elliot King As Donald Rumsfeld, the former secretary of the defense once famously said, &quot;there are known unknowns and there are unknown unknowns.&quot; In other words, somethings we know we don&apos;t know and consequently we can do the research...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Analyzing Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="data quality" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="dataconsolidation" label="data consolidation" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="datacorruption" label="data corruption" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="dataqualityrisks" label="data quality risks" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="elliotking" label="Elliot King" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="Elliot King">Elliot King</a><br><br>

 

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/kingjpg.gif" alt="Elliot King" border="0">

<div style="margin-left:50px;">

As Donald Rumsfeld, the former secretary of the defense once famously said, "there are known unknowns and there are unknown unknowns." In other words, somethings we know we don't know and consequently we can do the research to learn what we need to know. But other times, we don't even know what we don't know. Unknown unknowns present real risks, as Rumsfeld sadly learned.

</div><br>

Data quality can fall into both camps. Sometimes companies understand that the 

quality of their data varies and they have to assess its quality regularly. But 

in many cases, an organization is completely unaware of data problems. So how 

can you mitigate the risks of developing unknown data quality issues? The best 

solution is prevention.<br>

<br>

The most visible cause of data corruption is poor data entry. If there are no 

rules defining how data is entered into your information systems, 

inconsistencies will inevitably fester. <br>

<br>

For example, should name records be required to include Mr., Ms., Mrs., Miss, 

Dr. and so on? Who is a Ms., who is a Mrs., and who is a Miss? Should the 

records include those titles at all? Can the United States be entered in an 

address record as U.S., USA, U.S.A. or the United States? Those choices need to 

be defined and those definitions need to be enforced. The choices should also be 

rational. If you capture more information than you really need, you become more 

vulnerable to data quality issues.<br>

<br>

Data decay is also a serious driver of data corruption. People move and they 

change their names. According to some estimates, name and address data can decay 

at the rate of two percent a month or 25 percent a year. That kind of decay 

occurs whether you track it or not.<br>

<br>

Other common sources of data corruption are data migration--when information from 

one system or application is moved to another; data mergers--when data is 

combined from different sources into a master file, and data consolidation--when 

companies attempt to eliminate redundant data.<br>

<br>

Companies that are alert to the threats those kinds of processes pose to data 

quality can put safeguards into place to mitigate the problems that might arise. 

But those who are blind to the risks, risk being blindsided.<br>

<br>

<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>Standardizing Your Approach to Monitoring the Quality of Data</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/02/standardizing-your-approach-to-monitoring-the-quality-of-data.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.165</id>

    <published>2012-02-07T17:12:01Z</published>
    <updated>2012-02-07T17:41:51Z</updated>

    <summary>By David Loshin In my last post, I suggested three techniques for maturing your organizational approach to data quality management. The first recommendation was defining processes for evaluating errors when they are identified. These types of processes actually involve a...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Address Standardization" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Analyzing Data" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Cleansing" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Integration" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Profiling" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="data profiling " scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="data quality" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="dataqualitymanagement" label="data quality management" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="datarules" label="data rules" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="davidloshin" label="David Loshin" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="David Loshin">David Loshin</a><br><br>

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/loshin-big.gif">

 

<div style="margin-left:50px;">In my last post, I suggested three techniques for maturing your organizational approach to data quality management. The first recommendation was defining processes for evaluating errors when they are identified. These types of processes actually involve a few key techniques:<br><br>

</div>

 

<blockquote>

<b>1)</b> An approach to specifying data validity rules that can be 

used to determine whether a data instance or record has an error. This is more 

of a discipline that can be guided by formal representations of business or data 

rules. Often metadata management tools and data profiling tools have 

repositories for capturing defined rules, leading to our next technique...<br>

<br>

<b>2) </b>A method for applying those rules to data. This often will take 

advantage of the operational aspects of a data profiling or monitoring tool to 

validate a data instance against a set of rules. It may also incorporate parsing 

and standardization rules to identify known error patterns.<br>

<br>

<b>3)</b> A means for reporting errors to a data analyst or steward. Some data 

analysis and profiling can be configured to automatically notify a data steward 

when a data validity rule is violated. In other situations, the results of 

applying the validation rule can be accumulated in a repository and a front-end 

reporting tool is used to provide visualization and notification of errors.<br>

<br>

<b>4) </b>An inventory of actions to take when specific errors occur. As your 

team becomes more knowledgeable about the types of errors that can occur, you 

will also become accustomed to the methods employed for analysis and 

remediation. <br>

</blockquote>

 

In time, the repeated use of tools and the corresponding actions for remediation 

can be evolved into standardized methods, which can be documented, published, 

and used as the basis for training data quality analysts. <br>

<br>

<br>

<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>Reactivity vs. Proactivity</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/02/reactivity-vs-proactivity.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.164</id>

    <published>2012-02-02T22:20:24Z</published>
    <updated>2012-02-02T22:24:27Z</updated>

    <summary>By David Loshin In the past few months, we have looked at technical approaches to data quality and the use of data quality tools to parse, standardize, and cleanse data. In this next series of posts, it is time to...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Analyzing Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="cleansedata" label="cleanse data" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="davidloshin" label="David Loshin" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="David Loshin">David Loshin</a><br><br>

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/loshin-big.gif">

 

<div style="margin-left:50px;">In the past few months, we have looked at technical approaches to data quality and the use of data quality tools to parse, standardize, and cleanse data. In this next series of posts, it is time to look at harnessing the power of these tools and techniques to support a data quality management program. Most organizations are relatively immature when it comes to addressing data quality issues. Some typical behaviors in an immature organization include:

</div>

<br>

 

&nbsp;&nbsp;&nbsp;· Few or no well-defined processes for evaluating the severity or root causes of data issues<br>

&nbsp;&nbsp;&nbsp;· Little or no coordination among those investigating data errors<br>

&nbsp;&nbsp;&nbsp;· Evaluating the same issues multiple times<br>

&nbsp;&nbsp;&nbsp;· Correcting the same errors multiple times<br>

<br>

These are all manifestations of a more insidious problem: knee-jerk reactivity, 

which presumes that addressing the symptoms solves the problem. But in reality, 

applying these bandages to open wounds is merely a temporary fix. This suggests 

that incremental maturation of data quality processes involves transitioning 

from a reactive environment to one that operates within the context of a series 

of policies and controls.<br>

<br>

The manifestations of immaturity listed here are some fertile areas for improvement, namely:<br>

<br>

&nbsp;&nbsp;&nbsp;· Defining processes for evaluating data errors when they are identified<br>

&nbsp;&nbsp;&nbsp;· Instituting methods for coordinating those evaluations <br>

&nbsp;&nbsp;&nbsp;· Applying corrections once, and only once.<br>

<br>

As a byproduct of coordinating evaluation, your team will be less inclined to 

evaluate the same issues multiple times! In my next set of posts, we will look 

at ideas for each of these suggestions.<br>

<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END --> 

]]>
        
    </content>
</entry>

<entry>
    <title>Four Pillars of Data Quality Improvement</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/01/four-pillars-of-data-quality-improvement.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.163</id>

    <published>2012-01-31T00:19:36Z</published>
    <updated>2012-01-31T00:36:36Z</updated>

    <summary>By Elliot King Almost all data quality management programs have four key elements that serve as the foundations for success--data profiling, data improvement, integration and data augmentation. Put in other words, data quality programs must determine what is broken; fix...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Analyzing Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Integration" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="data profiling " scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="dataqualityprograms" label="data quality programs" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="elliotking" label="Elliot King" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="Elliot King">Elliot King</a><br><br>

 

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/kingjpg.gif" alt="Elliot King" border="0">

<div style="margin-left:50px;">

Almost all data quality management programs have four key elements that serve as the foundations for success--data profiling, data improvement, integration and data augmentation. Put in other words, data quality programs must determine what is broken; fix what can be fixed; consolidate what can be consolidated and enhance what needs to be enhanced. Sounds easy, right? If only.

</div>

<br>

            Data profiling is the process for identifying what records are "broken." It 

            consists of comparing your actual data to what you think you should have. 

            Since data flows into organizations via so many routes, errors are 

            inevitable. But if you never look for them, you won't know the data is 

            flawed until something unexpected--usually unexpectedly bad--happens.<br>

<br>

Once you know what's wrong, you can set about fixing it. But like a house that 

is in disrepair, you don't have to do everything at once. You may not want to 

correct some errors at all if they do not have a significant impact. Some 

mistakes in the data may be so fundamental that you simply cannot risk using it 

at all. Sometimes, a record may be incomplete but adding a placeholder--a 

standard substitute value--may be enough. And all the other errors you find, 

well, you will probably want to fix them.<br>

<br>

The next two elements of data quality improvement programs go beyond finding and 

fixing what can be fixed. Many organizations have a boatload of redundant data--a 

single customer's name and address may be stored in numerous different 

databases. Those records should be consolidated. The more places data is stored, 

the greater the odds that inconsistencies will be introduced and inconsistencies 

inevitably lead to errors.<br>

<br>

Finally, the data you have may not be sufficient to address your business needs. 

Good data quality improvement programs take steps to augment the existing 

corporate information. The more good information you have, the more value you 

can develop from it.<br>

<br>

<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>Modeling Issues and Entity Inheritance</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/01/modeling-issues-and-entity-inheritance.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.162</id>

    <published>2012-01-27T17:24:14Z</published>
    <updated>2012-01-27T17:36:52Z</updated>

    <summary>By David Loshin In our last set of posts, we looked at matching and record linkage and how approximate matching could be used to improve the organization&apos;s view of &quot;customer centricity.&quot; Data quality tools such as parsing, standardization, and business-rule...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Fuzzy Matching" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Record Linkage" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="approximatematching" label="approximate matching" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="davidloshin" label="David Loshin" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="recordlinkage" label="record linkage" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="David Loshin">David Loshin</a><br><br>

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/loshin-big.gif">

 

<div style="margin-left:50px;">In our last set of posts, we looked at matching and record linkage and how approximate matching could be used to improve the organization's view of "customer centricity." Data quality tools such as parsing, standardization, and business-rule based record linkage and similarity scoring can help in assessing the similarity between two records. The result of the similarity analysis is a score that can be used to advise about the likelihood of two records referring to the same real-life individual or organization.

</div><br>

One last thought: this approach is largely a "data-centric" activity. What I 

mean is that it looks at and compares two records regardless of where those 

records came from. They might have come from the same data set (as part of a 

duplicate analysis) or from different data sets (for consolidation or general 

linkage). <br>

<br>

But it does not take into consideration whether one data set models "customer" 

data and another models "employee" data. While you may link a customer record 

with an employee record based on a similarity analysis of a set of corresponding 

data attributes, the contexts are slightly different. <br>

<br>

A match across the two data sets is a bit of a hybrid: we have matched the 

individual but one playing different roles. That introduces a different kind of 

question: are the identifying attributes associated with the "customer" or the 

individual acting in the role of "customer"? The same question applies for 

individual vs. employee. <br>

<br>

And finally, are there attributes of the roles that each individual plays that 

can be used for unique identification within the role context? The answers to 

these questions become important when matching and linkage are integrated as 

part and parcel of a business application (such as the consolidation of data 

being imported into a business intelligence framework). <br>

<br>

<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>It Takes a Team</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/01/it-takes-a-team.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.161</id>

    <published>2012-01-25T22:07:04Z</published>
    <updated>2012-01-25T22:14:18Z</updated>

    <summary>By Elliot King As the cliché has it, data is an organization&apos;s most valuable asset. But the question is--who guards those corporate jewels? Is it the IT staff that is charged with making sure the information infrastructure supports the business...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Analyzing Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Steward" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="accuratedata" label="accurate data" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="datasteward" label="data steward" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="elliotking" label="Elliot King" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="Elliot King">Elliot King</a><br><br>

 

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/kingjpg.gif" alt="Elliot King" border="0">

<div style="margin-left:50px;">As the cliché has it, data is an organization's most valuable asset. But the question is--who guards those corporate jewels? Is it the IT staff that is charged with making sure the information infrastructure supports the business correctly? Is it the database developers and administrators who are the front-line data professionals? Is it the business</div> users who need accurate data to make sure tasks are executed as anticipated? Or is it the executive staff, which is in the best position to have a birds-eye view of the entire operation?

<br><br>

In practice, safeguarding data quality requires an interdisciplinary team 

approach, with different players coming from different parts of the 

organization. As with most teams, you need a team leader or program manager. 

This person is charged with supervising the entire data quality improvement 

program, recommending what resources are needed and where those resources should 

be invested. <br>

<br>

In addition to the program manager, most data quality initiatives require a 

project leader, a person responsible for addressing specific data quality issues 

at hand. Each project team has at least three specific roles that need to be 

filled with representatives from the IT and business staffs. <br>

<br>

The IT professionals must have the technical ability to fix what might be broken 

and the business personnel must serve as the subject matter experts, 

understanding the characteristics the data must have to get the job done. 

Finally, there should be a data steward to set policies, procedures and 

standards to improve standards.<br>

<br>

Finally, one last critical role must be filled--executive sponsorship. Those of 

you who are sports fans may have noticed that some teams are good year after 

year while others aren't. The difference is in the ownership (think the Los 

Angeles Dodgers for a case study in good and bad ownership.) A data quality 

improvement team cannot succeed without a strong commitment from the top. <br>

 
<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->
]]>
        
    </content>
</entry>

<entry>
    <title>Approximate Matching</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/01/approximate-matching.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.160</id>

    <published>2012-01-20T21:06:53Z</published>
    <updated>2012-01-20T21:12:31Z</updated>

    <summary>By David Loshin Actually, my first name is not David - that is really my middle name, but it is the given name my parents used when talking to me. This has actually led to a lot of confusion over...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Analyzing Data" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Duplicate Elimination" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Record Linkage" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="algorithms" label="algorithms" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="approximatematching" label="approximate matching" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="davidloshin" label="David Loshin" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="matching" label="matching" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="matchingtechnique" label="matching technique" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="recordduplication" label="record duplication" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="David Loshin">David Loshin</a><br><br>

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/loshin-big.gif">

 

<div style="margin-left:50px;">Actually, my first name is not David - that is really my middle name, but it is the given name my parents used when talking to me. This has actually led to a lot of confusion over the years, especially when confronted with a form asking for me "first name" and my "last name." For official forms (like my driver's license) I use my real first name as my "first name," but for non-official forms I often just use David. The result is that there is inconsistency in my own representation in records across different data systems.

 

</div>

<br>

If we were to rely solely on an exact data element-to-data element match of 

values to determine record duplication, the variation in use of my first or 

middle name would prevent two records from linking. In turn, you can extrapolate 

and see that any variations across systems of what should be the same values 

will prevent an exact match, leading to inadvertent duplication.<br>

<br>

Fortunately, we can again rely on data quality techniques. We have our stand-bys 

of parsing and standardization, which can be enhanced through the use of 

transformation rules to map abbreviations, acronyms, and common misspellings to 

their standard representations - an example might be mapping "INC" and "INC." 

and "Inc" and "inc" and "inc." and "incorp" and "incorp." and "incorporated" all 

to a standard form of "Inc."<br>

<br>

We can add to this another tool: approximate matching. This matching technique 

allows for two values to be compared with a numeric score that indicates the 

degree to which the values are similar. An example might compare my last name 

"Loshin" with the word "lotion" and suggest that while the two values are not 

strict alphabetic matches, they do match phonetically. <br>

<br>

There are a number of techniques used for approximate matching of values, such 

as comparing the set of characters, the number of transposed, inserted, or 

omitted letters, different kinds of forward and backward phonetic scoring, as 

well as other more complex algorithms. <br>

<br>

In turn, we can apply this approximate matching to the entire set of 

corresponding identifying attributes and weight each score based on the 

differentiation factor associated with each attribute. For example, a 

combination of first name and last name might provide greater differentiation 

than a birth date, since there is a relatively limited number of dates on which 

an individual can be born (maximum 366 per year).<br>

<br>

By applying a weighted approximate match to pairs of records, we can finesse the 

occurrence of variations in the data element values that might prevent direct 

matching from working. More on this topic in future posts.<br>

<br>

<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>Assessment is the Critical First Step</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/01/assessment-is-the-critical-first-step.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.159</id>

    <published>2012-01-13T19:36:18Z</published>
    <updated>2012-01-13T19:40:41Z</updated>

    <summary>By Elliot King Edward Deming taught us long ago about the virtuous cycle of continual quality improvement--plan for change; execute the change; study the results and then take action to improve the process. But Deming&apos;s PDSA (plan, do, study, act)...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Analyzing Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality Assessment" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="dataqualityassessment" label="data quality assessment" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="elliotking" label="Elliot King" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="Elliot King">Elliot King</a><br><br>

 

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/kingjpg.gif" alt="Elliot King" border="0">

<div style="margin-left:50px;">

Edward Deming taught us long ago about the virtuous cycle of continual quality improvement--plan for change; execute the change; study the results and then take action to improve the process. But Deming's PDSA (plan, do, study, act) cycle is a generic approach. The cycle has to be modified and customized to address targeted areas for quality improvement.<br>

<br>

</div>

 

The key steps in the virtuous cycle for data quality improvement are 

assessment, measurement, integration, improvement and management. Each process 

is important but assessment is the critical first step.<br>

<br>

Data quality assessment is a multi-pronged exercise and the key is to start at 

the end. What business tasks and processes can be hurt by inaccurate, invalid 

and incomplete data? And in what ways will poor quality data increase costs, 

reduce revenues, hurt efficiencies or otherwise inflict pain on the 

organization? This exercise helps to identify the data sources that should be 

examined.<br>

<br>

After you have determined where to look, you can profile your data to uncover 

anomalies and data flaws and then bring those flaws to the attention of the data 

users. In some cases, data anomalies may be harmless and have little impact on 

actual business activities. In that case, no remedial action is warranted. But 

when poor data quality does interfere with business operations then further 

action is needed.<br>

<br>

The last piece of the assessment puzzle is to correlate the identified data 

issues with performance through a defined set of data quality business rules 

such as completeness, accuracy, and consistency. The rules provide a framework 

within which data quality can be measured.<br>

<br>

The rule of thumb with assessment is relatively easy. First determine where poor 

quality data will have the most impact within your organization. Then figure out 

if it has to be fixed.<br>

<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>The Challenge of Identifying Information</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/01/the-challenge-of-identifying-information.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.158</id>

    <published>2012-01-11T18:46:13Z</published>
    <updated>2012-01-11T18:57:52Z</updated>

    <summary>By David Loshin In my last post, I introduced the question of determining which characteristics are used to uniquely differentiate between any pair of records within a data set. The same question is relevant when attempting to match a pair...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Analyzing Data" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Integration" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Record Linkage" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="dataintegration" label="data integration" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="davidloshin" label="David Loshin" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="David Loshin">David Loshin</a><br><br>

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/loshin-big.gif">



<div style="margin-left:50px;">In my last post, I introduced the question of determining which characteristics are used to uniquely differentiate between any pair of records within a data set. The same question is relevant when attempting to match a pair of records as well, once they are determined to represent the same entity. I like to call these "identifying attributes," and the values contained therein I call "identifying information."</div>

<br>Let's look at an example for customer data integration: what data element values 

do I compare when trying to link two records together? Let's start with the 

obvious ones, namely (ha ha) first and last names. Of course, we all know that 

there are certain names that are relatively common - just ask my friend John 

Smith, with whom I worked at one of my earlier jobs. <br>

<br>

But even if you have an uncommon name, you might be surprised. For example, if 

you type in my name ("David Loshin") at Google, you will find entries for me, 

but you will also find entries for a dentist in Seattle and a professor. <br>

<br>

Apparently, first and last names are not enough identifying information for 

distinction. Perhaps there is another attribute we can use? You probably know 

that I have written some books, (see http:\\dataqualitybook.com), so maybe that 

is an additional attribute to be used. But if you go to Amazon and do a search 

for "David Loshin," you will find me, but it turns out the professor has also 

written a book.<br>

<br>

Even an uncommon name such as mine still finds multiple hits, and while 

attempting to add more identifying information can reduce that number of hits, a 

poorly selected set of attributes may still not provide the right amount of 

distinction. It may take a number of iterations to review a proposed set of 

identifying attributes, determine their completeness, density, and accuracy 

before settling on a core set of identifying characteristics to be used for 

comparison.<br>

<br>

One more thing to think about, though. Once you get to the point where you are 

pretty confident that those attributes are enough for differentiation, there is 

one last monkey wrench in the works: even if you had the absolute set of 

identifying attributes, there is no guarantee that the values themselves are 

exact matches! 

<br>

 
<br>

<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>Entities and their Characteristics</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2012/01/entities-and-their-characteristics.html" />
    <id>tag:blog.melissadata.com,2012:/data-quality-authority//2.157</id>

    <published>2012-01-09T21:48:49Z</published>
    <updated>2012-01-09T21:55:40Z</updated>

    <summary>By David Loshin How can you tell if two records refer to the same person (or company, or other type of organization)? In our recent posts, we have looked at how data quality techniques such as parsing and standardization help...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Analyzing Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Fuzzy Matching" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Record Linkage" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="davidloshin" label="David Loshin" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="David Loshin">David Loshin</a><br><br>

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/loshin-big.gif">

</p>

<div style="margin-left:50px;">How can you tell if two records refer to the same person (or company, or other type of organization)? In our recent posts, we have looked at how data quality techniques such as parsing and standardization help in normalizing the data values within different records so that the records can be compared. But what is being compared? That is the topic of this next set of entries.



</div>
<br>

A simplistic view might suggest that when looking at two records, comparing 

the corresponding values is the best way to start. For example, we might compare 

the corresponding names, telephone numbers, street addresses - stuff that 

usually appears in records representing customers, residences, patients, etc.

<br>

<br>

But the simple concept belies a much more complex question about the attributes 

used to describe the individual as well as differentiate pairs of individuals. 

Much of this issue revolves around the approaches taken for determining what 

characteristics are being managed within a representative record, the motivation 

for including those characteristics, and importantly, are those data elements 

used solely as "attribution" (or additional description of the entity involved) 

or are they used for "distinction" (to help in unique identification).<br>

<br>

More to the point: what are the core data elements necessary for determining the 

uniqueness of a record? We often take for granted the fact that our relational 

models presume one and only one record per entity, and that there might be 

business impacts should more than one entry exist for each individual. <br>

<br>

Yet individual "entities" may exist in multiple data sets, even in different 

contexts. Some characteristics are part and parcel of each entity, while others 

describe the entity playing a particular role. Our upcoming posts are intended 

to consider some of these issues when assessing similarity for record linkage 

and matching.<br>

<br>

<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>What is a Data Steward and Do You Need One?</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2011/12/what-is-a-data-steward-and-do-you-need-one.html" />
    <id>tag:blog.melissadata.com,2011:/data-quality-authority//2.155</id>

    <published>2011-12-29T16:24:08Z</published>
    <updated>2012-01-05T22:30:49Z</updated>

    <summary>By Elliot King The metaphor of &quot;ownership&quot; has become popular in organizations and their IT shops. Companies have &quot;application owners&quot; and projects that are &quot;owned&quot; by this or that group. So that raises the question, who &quot;owns&quot; your data? The...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Data Governance" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Steward" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="datasteward" label="data steward" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="elliotking" label="Elliot King" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="Elliot King">Elliot King</a><br><br>

 

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/kingjpg.gif" alt="Elliot King" border="0">

<div style="margin-left:50px;">

The metaphor of "ownership" has become popular in organizations and their IT shops. Companies have "application owners" and projects that are "owned" by this or that group. So that raises the question, who "owns" your data?

 

</div><br>

The right answer is that nobody "owns" the data. Data is a resource that must be

shared across an organization. Data flows from the point of creation--perhaps

capturing contact information on a website or importing a third-party mailing

list--through staging, consumption, storage and archiving. At each step of the

way, a different functional group within an organization has to be able to use

the data in different ways.<br>

<br>

To insure that data meets the standards needed by each stakeholder in the data

lifecycle, companies have to implement enterprise-wide data management policies

and procedures. A typical policy might say that all contact information must

conform to a specific format. Don't assume that to be the case in your

organization. Unmonitored, your sales department, service organization and

billing department could easily capture names differently. Indeed, in larger

corporations, different sales organizations might have different formats for

names and addresses.<br>

<br>

Data stewards both develop those policies and create mechanisms to insure that

the policies are enforced. On the flip side, the data steward should be

accountable for enterprise data quality and the advocate for data quality

initiatives.<br>

<br>

Data stewardship is neither an easy job nor an easy job to fill. The

foundational technical skill is a deep understanding of specific business

functions, the data associated with those functions and the processes that rely

on the data. <br>

<br>

Those technical skills have to be coupled with a strong set of interpersonal

skills as, by definition, data stewardship requires interacting with a wide

range of stakeholders (often including other data stewards). Finally, regardless

of the formal position they hold, data stewards need to be able to establish

their authority as the role sometimes calls for stepping on other people's toes.<br>

<br>

Stewardship is quite different than ownership. But if your organization has

data, it probably needs a data steward.<br>

<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>Inferred Knowledge and Customer Intelligence through Matching and Linkage</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2011/12/inferred-knowledge-and-customer-intelligence-through-matching-and-linkage.html" />
    <id>tag:blog.melissadata.com,2011:/data-quality-authority//2.154</id>

    <published>2011-12-27T17:18:25Z</published>
    <updated>2011-12-27T17:25:04Z</updated>

    <summary>By David Loshin What I have found to be the most interesting byproduct of record linkage is the ability to infer explicit facts about individuals that are obfuscated as a result of distribution of data. As an example, consider these...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Data Enhancement" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Fuzzy Matching" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Record Linkage" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="davidloshin" label="David Loshin" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="matching" label="matching" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mergepurge" label="merge/purge" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="recordlinkage" label="record linkage" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="David Loshin">David Loshin</a><br><br>

 

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/loshin-big.gif">

 

<div style="margin-left:50px;">What I have found to be the most interesting byproduct of record linkage is the ability to infer explicit facts about individuals that are obfuscated as a result of distribution of data. As an example, consider these records, taken from different data sets:

 

 

 

 

 

</div>

 

<br>

 

 

 

 

 

 

 

 

 

 

 

<blockquote>

 

<b>A:</b>

 

            <br>

 

            David<br>

 

Loshin<br>

 

301-754-6350<br>

 

1163 Kersey Rd<br>

 

Silver Spring<br>

 

MD<br>

 

20902<br>

 

<br>

 

<b>B:</b><br>

 

Knowledge Integrity, Inc<br>

 

1163 Kersey Rd<br>

 

Silver Spring<br>

 

MD<br>

 

20902<br>

 

<br>

 

<b>C:</b><br>

 

H David<br>

 

Lotion<br>

 

1163 Kersey Rd<br>

 

Silver Spring<br>

 

MD<br>

 

20902<br>

 

<br>

 

<b>D:</b><br>

 

Knowledge Integrity, Inc.<br>

 

301<br>

 

7546350<br>

 

7546351<br>

 

MD<br>

 

20902<br>

 

<br></blockquote>

 

We could establish a relationship between record A and records B and C because

 

they share the same street address. We could establish a relationship between

 

record B and record D because the company names are the same. <br>

 

<br>

 

Therefore, by transitivity, we can infer a relationship between "David Loshin"

 

and the company "Knowledge Integrity, Inc" (A links to B, B links to D,

 

therefore A links to D). However, none of these records alone explicitly shows

 

the relationship between "David Loshin" and "Knowledge Integrity, Inc" - that is

 

inferred knowledge.<br>

 

<br>

 

You can probably see the opportunity here - basically, by merging a number of

 

data sets together, you can enrich all the records as a byproduct of exposed

 

transitive relationships. <br>

 

<br>

 

This provides us with one more valuable type of enhancements that record linkage

 

provides. And this is particularly valuable, since the exposure of embedded

 

knowledge can in turn contribute to our other enhancement techniques for

 

cleansing, enrichment, and merge/purge.<br>

 

<br>
<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>The Ethics of Data Quality</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2011/12/the-ethics-of-data-quality.html" />
    <id>tag:blog.melissadata.com,2011:/data-quality-authority//2.153</id>

    <published>2011-12-21T21:21:04Z</published>
    <updated>2011-12-21T21:34:16Z</updated>

    <summary>By Elliot King Technical people often don&apos;t seem too interested in ethical issues related to their work. Discussions of right and wrong are often &quot;squishy.&quot; Too frequently, they have no clear answers and the answer can change from one context...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="dataqualityethics" label="data quality ethics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="elliotking" label="Elliot King" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="Elliot King">Elliot King</a><br><br>

 

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/kingjpg.gif" alt="Elliot King" border="0">

</p>

<div style="margin-left:50px;">

Technical people often don't seem too interested in ethical issues related to their work. Discussions of right and wrong are often "squishy." Too frequently, they have no clear answers and the answer can change from one context to another. In contrast, technical  people like to  deal with facts. They like clear outcomes--it worked or it didn't work--without</div> any value judgments attached.

<br>

<br>

Perhaps the most profound ethical discussion associated with a significant 

technical advancement was the one scientists engaged in when they developed the 

atomic bomb. Was it right to contribute to the building of the most destructive 

weapon in human history--a weapon that could destroy the earth? The argument that 

the atomic bomb was the inevitable result of technical advances is just not 

compelling or satisfactory.<br>

<br>

While certainly not as momentous as the debates about the atomic bomb or those 

debated in bioethics, like it or not, data quality professionals face ethical 

questions everyday. These questions revolve around privacy, data integrity, 

security, retention, access and so on.<br>

<br>

Take the issue of privacy, for example. As we know in industries ranging from 

health care to financial, there are a slew of legal standards that companies 

have to meet. But beyond that, companies must decide exactly what data they 

collect about their customer or clients and why? Should your company routinely 

collect and store social security numbers, for example? If so, why? The question 

is not just one of legal liability but the ethics of putting your customers at 

unnecessary risk. <br>

<br>

Similar sorts of ethical questions can be raised around data retention policies. 

Once again there are legal restrictions--in certain fields, records must be 

retained for legally determined periods of time--but there are also questions of 

right and wrong. Beyond your legal obligations, how much of your data should be 

retained for what period of time and why?<br>

<br>

Incorporating the idea of ethics into your data management decision-making 

processes will help make your decisions more deliberative. Facing ethical 

concerns forces people to confront not only what they are required to do but 

what they should do as well.<br>

<br>

<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>Record Linkage and Data Enhancement</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2011/12/record-linkage-and-data-enhancement.html" />
    <id>tag:blog.melissadata.com,2011:/data-quality-authority//2.152</id>

    <published>2011-12-16T17:38:36Z</published>
    <updated>2011-12-16T17:48:20Z</updated>

    <summary>By David Loshin In my last two posts we looked at the distribution of information about entities and the use of record linkage to find corresponding data records in different data sets that can be linked together. Record linkage can...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Data Enhancement" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Enrichment" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Duplicate Elimination" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Record Linkage" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="dataenhancement" label="data enhancement" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="dataenrichment" label="data enrichment" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="davidloshin" label="David Loshin" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="recordlinkage" label="record linkage" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="David Loshin">David Loshin</a><br><br>

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/loshin-big.gif">

<div style="margin-left:50px;">In my last two posts we looked at the distribution of information about entities and the use of record linkage to find corresponding data records in different data sets that can be linked together. Record linkage can be used for a number of processes that we bundle under the concept of "data enhancement," which we'll use to describe any methods for</div> improving the value and usefulness of information. In this post, we'll look at three different types of enhancement:<br>

 

<br>

<blockquote>

<strong>· Data cleansing -</strong> The first type of enhancement is relatively straightforward: 

our idea is to link records together for the purposes of cleansing the data, or 

making it more suitable for use. Often, one data set may have a more trustworthy 

representation of an entity, or we may have more than one data set, each 

potentially containing overlapping data elements such as birth date, address, 

telephone number. By linking two different records, you can compare the 

corresponding values, find those that are of better quality (e.g. more complete 

or more current values) and update the "delinquent" record with the higher 

quality values.<br>

<br>

<strong>· Enrichment -</strong> Existing records for entities (such as people or products) can be 

matched against other data sets with additional reference information. For 

example, you might want to match your customer data with a credit bureau's data 

and enrich your own data set with each individual's credit ratings.<br>

<br>

<strong>· Merge/Purge -</strong> Duplicate records entered into one data set often plague the 

business in attempting to actively manage customer accounts. Applying the record 

linkage methodology to the records in a single data set helps find multiple 

records that refer to the same individual. These records can be presented to a 

data analyst to review and determine the surviving record and updating the 

record with the highest quality values.<br>

</blockquote>

There are many variations on these themes. For example, merge/purge can be used 

for combining customer data sets after a corporate acquisition; enrichment can 

be used to institute a taxonomic hierarchy for customer classification and 

segmentation. Loosening the matching rules for merge/purge can help with a 

process called "householding," which attempts to identify individuals with some 

shared characteristics (such as "living in the same house"). <br>

<br>

 <!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

<entry>
    <title>Who Should Lead Your Data Quality Initiatives</title>
    <link rel="alternate" type="text/html" href="http://blog.melissadata.com/data-quality-authority/2011/12/who-should-lead-your-data-quality-initiatives.html" />
    <id>tag:blog.melissadata.com,2011:/data-quality-authority//2.151</id>

    <published>2011-12-15T17:43:02Z</published>
    <updated>2011-12-15T17:51:38Z</updated>

    <summary>By Elliot King Data and data quality issues touch virtually every part of an organization. Poor data hurts organizational efficiency. It can have a measurable impact on the bottom line. And it can diminish employee morale when employees cannot access...</summary>
    <author>
        <name>Blog Administrator</name>
        
    </author>
    
        <category term="Data Management" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Data Quality" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="data quality" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="dataqualityinitiatives" label="data quality initiatives" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="elliotking" label="Elliot King" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="enterprisedatamanagement" label="enterprise data management" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en-us" xml:base="http://blog.melissadata.com/data-quality-authority/">
        <![CDATA[By <a href="http://blog.melissadata.com/data-quality-authority/authors.html" title="Elliot King">Elliot King</a><br><br>

 

<img align="left" src="http://blog.melissadata.com/data-quality-authority/mt-static/support/themes/kingjpg.gif" alt="Elliot King" border="0">

<div style="margin-left:50px;">

Data and data quality issues touch virtually every part of an organization. Poor data hurts organizational efficiency. It can have a measurable impact on the bottom line. And it can diminish employee morale when employees cannot access the information they need to succeed at their tasks, and the data they do retrieve turns out to be wrong.

</div>

<br>So the easy answer to who should lead data quality initiatives is that the 

CIO should be in charge. In most cases, the CIO is in charge of all enterprise 

IT issues. So to suggest that data quality programs should be supervised by the 

CIO is not saying much, other than data quality is an important enterprise IT 

issue (which it is but, sadly enough, has to be said.)<br>

<br>

For practical purposes, data quality initiatives should be the domain of an 

enterprise data quality team; consisting of business leaders, staff and IT 

personnel. Data quality issues do not exist in a vacuum. They can have a 

concrete impact on real operations and only the people involved in those 

operations can truly understand their severity. The fact is that IT staffs are 

not at the point where data is actually used and often are not present when data 

is created either.<br>

<br>

On the other hand, the IT organization should have the expertise to locate the 

problem data within the overall information infrastructure and the tools to 

correct what is wrong. Generally managed by IT, the enterprise data management 

team should represent all the stakeholders in data quality including the 

marketing and sales organizations, finance, operations and product development.<br>

<br>

While cross-functional teams like this are difficult to manage and sustain, done 

right, the payoff can be significant. Projects can be launched with wider 

corporate support and institutional knowledge about data quality can be 

developed. In short, poor data quality can be seen as everybody's problem and, 

as they say, admitting to having problem is the first step in fixing it.<br>

<br>

<!-- Lockerz Share BEGIN -->
<div class="a2a_kit a2a_default_style">
<a class="a2a_dd" href="http://www.addtoany.com/share_save">Share</a>
<span class="a2a_divider"></span>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_twitter"></a>
<a class="a2a_button_email"></a>
</div>
<script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script>
<!-- Lockerz Share END -->]]>
        
    </content>
</entry>

</feed>

