By David Loshin
It is easy to see this conflict using the simple examples in my previous posts, but in fact, as your data cleansing rule set grows, the potential for introducing conflicting rules not only grows, the ability to find them diminishes.
There are a couple of approaches for addressing this challenge. The first is greater differentiation in defining the cleansing rule through the use of contextual cues. In our example, we might look at these conflicts:
and introduce contextual constraints:
This approach somewhat addresses the problem in some cases, but becomes an issue again when there are new contexts, such as a string like "Trevor St. Lawrence St." which would necessitate yet another contextual rule.
We can look at a formal summarization of the challenge of conflicting data quality rules. We have two rules, R1 and R2, and the same input X:
• R1: Transform string X into string Y1
• R2: Transform string X into string Y2
It is easy to see this conflict using the simple examples in my previous posts, but in fact, as your data cleansing rule set grows, the potential for introducing conflicting rules not only grows, the ability to find them diminishes.
There are a couple of approaches for addressing this challenge. The first is greater differentiation in defining the cleansing rule through the use of contextual cues. In our example, we might look at these conflicts:
1. St is transformed into SAINT
2. St. is transformed into SAINT
3. St. is transformed into STREET
4. St. is transformed into STREET
and introduce contextual constraints:
1. St is transformed into SAINT at the beginning of a street name
2. St. is transformed into SAINT at the beginning of a street name
3. St. is transformed into STREET at the end of a street name
4. St. is transformed into STREET at the end of a street name
This approach somewhat addresses the problem in some cases, but becomes an issue again when there are new contexts, such as a string like "Trevor St. Lawrence St." which would necessitate yet another contextual rule.





Leave a comment