We talked about householding in a previous blog posting, but it is worth
reviewing the basic approaches used for determining that a group of individuals
share a household. The general approach is to analyze a collection of data
records and examine sets of identifying attributes for degrees of similarity in
naming and residence locations. Many situations are relatively straightforward,
such as this example:
Emily S. Hansen, 1824 Polk Ave., Memphis, TN 38177
In this example, two individuals share both a last name and a location address, and although the data evidence does not guarantee truth of the inference, it might be reasonable to suggest that because there is a link between the family name and the residence location, these two individuals are members of the same household. The algorithm, then, is to link records into a collection of similar records based on similarity of the surname and residence characteristics.
However, the concept of grouping is not limited to conventional groups, since there are many artificial groups formed as a result of shared interests or similarities in profile criteria. For example, people interested in certain sports car models often organize "fan clubs," new mothers often organize toddler play groups, and sports team fans are often rabid about their franchise alliances.
In turn, your company might want to create marketing campaigns that target sets of individuals grouped together by demographic or psychographic attributes. In these cases, you would adjust your algorithms to link records based on similarity of the values in other sets of data attributes.
Establishing the link goes beyond looking at the data that already exists in your data set. Rather, you may need to append additional data acquired from alternate sources.
And, interestingly enough, you will need to connect the acquired data to your existing data, and that requires yet another record linkage effort. Apparently, understanding customer collectives is pretty dependent on record linkage. And while linking records is straightforward when all the data values line up nicely, as you might suspect, there are some curious intricacies of linkage in the presence of data with questionable quality.