Recently in Duplicate Elimination Category

Record Matching Made Easy with MatchUp Web Service

| No Comments | No TrackBacks

MatchUp®, Melissa's solution to identify and eliminate duplicate records, is now available as a web service for batch processes, fulfilling one of most frequent requests from our customers - accurate database matching without maintaining and linking to libraries, or shelling out to the necessary locally-hosted data files.


Now you can integrate MatchUp into any aspect of your network that can communicate with our secure servers using common protocols like XML, JSON, REST or SOAP.

 

Select a predefined matching strategy, map the table input columns necessary to identify matches to the respective request elements, and submit the records for processing. Duplicate rows can be identified by a combination of NAME, ADDRESS, COMPANY, PHONE and/or EMAIL.

 

Our select list of matching strategies removes the complexity of configuring rules, while still applying our fast and versatile fuzzy matching algorithms and extensive datatype-specific knowledge base, ensuring the tough-to-identify duplicates will be flagged by MatchUp. 


The output response returned by the service can be used to update a database or create a unique marketing list by evaluating each record's result codes, group identifier and group count, and using the record's unique identifier to link back the original database record.

 

Since Melissa's servers do the processing, there are no key files - the temporary sorting files - to manage, freeing up valuable hardware resources on your local server.

 

Customers can access the MatchUp Web Service license by obtaining a valid license from our sales team and selecting the endpoint compatible to your development platform and necessary request structures here.

MatchUp Now Available in the Cloud

| No Comments | No TrackBacks

Did you know that most databases contain 8-10% duplicates? These duplicates get in the way of business intelligence, accurate analytics, and can even result in wasted spend and undeliverable mail costs.

 

The solution? MatchUp®! The new edition of a Cloud web service to the current lineup allows you to dedupe, household, and fuzzy match into any aspect of your network that can communicate with our secure servers using common protocols like XML, JSON, REST or SOAP. Select a predefined matching strategy, map the table input columns necessary to identify matches to the respective request elements, and submit the records for processing. 


MatchUp's matching strategies remove the complexity of configuring rules, while still applying Melissa's fast and versatile fuzzy matching algorithms and extensive datatype-specific knowledge base, ensuring the tough-to-identify duplicates will always be flagged.

 

Use MatchUp Web Service to:

·         Indicate if a record is unique or matches any other records in a submitted record set

·         Create a unique group identifier used to group and query all records that matched each other

·         Count number of records in each matched group

·         Aggregate these output properties to indicate the status of a single record and/or clean and evaluate the status of an entire database


The MatchUp Web Service is a good fit for customers who don't want to install or host local libraries or datafiles, or maintain them. Unlike other Melissa Web services, MatchUp is batch-oriented; there are no incremental deduping or survivorship capabilities available at this time.

MatchUp Web Service joins our current lineup of MatchUp solutions:

·         On-Premise API: Multiplatform, support for the following platforms: Java®, Visual C#®, Windows®, PHP, Oracle®, Linux®, Solaris®, HP UX®.

·         Integrations/Plugins: Dedupe, consolidate and create golden records using MatchUp   in popular programs including Excel®, Pentaho®, SSIS, Salesforce®.

·         Service Bureau: Send us your file for quick batch deduping with fast turnaround.

 

Try it Free:

http://www.melissa.com/contact-data-verification/global-matching.html

Saved from Nonprofit Nightmares

| No Comments | No TrackBacks
donor-nightmare-2 - small.png


It was a dark and stormy night. Robin lay in bed tossing and turning; unable to sleep. Their budget was tight, and half the invitations to their latest fundraiser had come back return-to-sender. Too many bad addresses to count. They would have to change venues, maybe even negotiate new vendors. Robin kept thinking of all the donations they were counting on. They would likely be much less than planned. This was their big event of the year. It was the culmination of the holiday season when people usually feel generous. But the stress of it all had Robin breaking out into a cold sweat.  

How were they to clean their database of old donor contacts? It went back 15 years. Their staff was small and there was no time to handle the day-to-day business and manage these files. Should they delete the whole list and start over? 

Do you feel like Robin? Working in the nonprofit sector can seem like a waking nightmare of returned mail, duplicate contact data, and missing information that leads to loss of donors and funding for important programs. Budgets are tight. Government grants are constantly shrinking. Donations may not have picked back up after the economic downturn. 

Wake up!

Melissa Data offers solutions to reach prospective donors; like our targeted donor, occupant/resident, and saturation mailing lists. Access a vast consumer database to see people with active community involvement paired to the niche you serve. There are people out there hungry to take up your cause. 

How much mail is returned after a fundraising invitation? What about the community, do you find yourself calling upon local volunteers using numbers that are out of service and emails that are undeliverable? 

You may not have the time or the staff to go through contact data one by one. You need our point-of-entry global address verification solution that clean your donor or alumni database of inaccurate, incomplete, or undeliverable addresses in real-time. It also catches data errors as they occur. Plus, with our change-of-address solutions, when donors move, you can update and verify their address to maintain contact.

End the struggle of endless duplication. Have you ever received three new donors from one family after a single event? There's no need to send them triplicate documents. With our deduplication solution, every repeated record is eliminated. This cuts down on the cost of printed materials as well as postage.

Push #GivingTuesday harder, broadcast your participation in your community's local GiveBig event, and find new, loyal philanthropists in your community waiting to meet you. Get our Social Media Append service, Melissa Data's answer to improved grassroots campaigning via Facebook, Twitter, and eighteen other social networks. With the service, you can enrich your contact data with social contact information from 20 different social networks. So instead of just having a contact's name, phone, and email address, you can better personalize your outreach efforts by adding social media information to your contact info.

Improve your fundraising efforts now with Melissa Data. Find out how by visiting our website



New Company Magazine Features Data Quality Insights on Merging Duplicate Patient Records into a Golden Record, also Tips on Improving Healthcare Data Warehousing



Rancho Santa Margarita, CALIF. - September 9, 2014 - Melissa Data, a leading provider of global contact data quality and data enrichment solutions, today announced matching and de-duping functionality that solves duplicate records for healthcare database administrators (DBAs). Using tools based on proprietary logic from Melissa Data, healthcare DBAs can consolidate duplicate customer records objectively, unlike any other data quality solution. This and other healthcare data quality challenges are featured in Melissa Data Magazine, the company's new quarterly resource for DBAs and data quality developers.

Healthcare data is characterized by a steady stream of patient records and evolving contact points, warranting a smart, consistent method to determine the best contact information. Melissa Data Magazine highlights a new way to merge duplicate records, based on a unique data quality score that retains the best pieces of data from all of the various records.

"It's essential that healthcare data managers acknowledge data quality challenges up front, implementing processes to cleanse and maintain the trustworthiness of the information that goes into their master data systems," said Bud Walker, director of data quality solutions, Melissa Data. "Our new publication outlines how to ensure this high level of data precision, creating an accurate, single view of the patient. This is known as the Golden Record and is of critical value in healthcare settings - reducing costs, streamlining business operations and improving patient care."

Highlighting industry-specific data quality tools and solutions, Melissa Data Magazine will help DBAs and health information managers adapt to evolving challenges particularly as data becomes more global in nature. Future published issues will feature technologies such as SQL Server development tools, and markets such as retail, ecommerce, government and real estate.

Melissa Data Magazine will be available at the American Health Information Management Association (AHIMA) conference, Booth #723, starting September 27 in San Diego, Calif. Click here to download the healthcare issue of Melissa Data Magazine, or call 1-800-MELISSA (635-4772) for more information.

News Release Library


A 6-Minute MatchUp for SQL Server Tutorial

| No Comments | No TrackBacks

In this short demo, learn how to eliminate duplicates and merge multiple records into a single, accurate view of your customer - also known as the Golden Record - through a process known as survivorship using Melissa Data's advanced matching tool, MatchUp for SQL Server.

Watch our video to learn more!


Powerful Data Quality Tool Consolidates Duplicates into Single Golden Record; Objective Data Quality Score Uniquely Determines Most Accurate Customer Information



Rancho Santa Margarita, CALIF - May 8, 2014 - Melissa Data, a leading provider of contact data quality and integration solutions, today announced its TechEd 2014 exhibit will feature new matching and de-duplication functionality in the company's MatchUp Component for SQL Server Integration Services (SSIS). Based on proprietary logic from Melissa Data, MatchUp consolidates duplicate customer records objectively, unlike any other data quality solution. Uniquely assessing the quality of individual data fields, MatchUp determines the best pieces of data to retain versus what to discard - enabling a smart, consistent method for data integrators to determine the best customer contact information in every field.

"A single, accurate view of the customer, known as the golden record, is the ideal for any business relying on customer data - reducing waste, optimizing marketing outreach and improving customer service. Yet common methods for matching and eliminating duplicate customer records involve subjective rules that don't consider the accuracy of the data itself," said Bud Walker, director of data quality solutions at Melissa Data. "MatchUp's intelligent rules offer a smarter, more consistent method for determining what information survives in the database and why. It's a critical data quality function that dramatically improves business operations."

MatchUp assesses the content within the customer record, in contrast to matching and de-duplication methods that rely solely on subjective principles, such as whether the record is the most recent, most complete or most frequent. Instead, selection criteria for determining a golden record is based on a relevant data quality score, derived from the validity of customer information such as addresses, phone numbers, emails and names.

Once the golden record is identified intelligently, MatchUp further references the data quality score during survivorship processes to support creation of an even better golden record; duplicate entries are then collapsed into a single customer record while retaining any additional information that may also be accurate and applicable. MatchUp relies on deep domain knowledge of names and addresses for survivorship operations, used to granularly identify matches between names and nicknames, street/alias addresses, companies, cities, states, postal codes, phones, emails, and other contact data components.

MatchUp is part of Melissa Data's Data Quality Components for SQL Server Integration Services (SSIS), a suite of custom data cleansing transformation components for Microsoft SSIS, used to standardize, verify, correct, consolidate and update contact data for the most effective business communications. The suite further includes selected Community Editions for simple data quality procedures, free to developers and downloadable with no license required. Community Editions include Contact Verify CE for address, phone and name parsing, and email syntax correction; MatchUp CE provides simple matching and de-duplication without advanced survivorship operations, for up to 50,000 records using nine basic matchcodes.

Melissa Data will be demonstrating its MatchUp Component for SSIS at booth #1934 during Microsoft TechEd, May 12-15, 2014 at the George R. Brown Convention Center in Houston, TX. To download a free trial of Melissa Data's MatchUp Component for SSIS, click here; to request access to Melissa Data's free Community Editions, click here or call 1-800-MELISSA (635-4772).

News Release Library


Data Quality Tool Consolidates Duplicates into Single Golden Record of Customer Data; Uniquely Determines Most Accurate Information Based on Objective Data Quality Score


Rancho Santa Margarita, CALIF- April 23, 2014 - Melissa Data, a leading provider of contact data quality and integration solutions, today announced new matching and de-duplication functionality in its MatchUp Component for SQL Server Integration Services (SSIS), uniquely solving the business challenge of duplicate customer data. Based on proprietary logic from Melissa Data, MatchUp determines the best pieces of data to retain versus what to discard - consolidating duplicate records objectively, unlike any other data quality solution. By assessing the quality of individual data fields, MatchUp enables a smart, consistent method for database administrators (DBAs) to determine the best customer contact information in every field.

"The average database contains 8 to 10 percent duplicate records, creating a significant and costly business problem in serving, understanding and communicating with customers effectively. The ideal is a single, accurate view of the customer - known as a golden record - yet this remains one of the biggest challenges in data quality based on methodologies that don't adequately evaluate the content of each data field. As a result, DBAs either overlook duplicates or consistently struggle with determining what information survives in the database and why," said Bud Walker, director of data quality solutions, at Melissa Data. "By using intelligent rules based on the actual quality of the data, DBAs are much better positioned to retain all the best pieces of information from two or more duplicate records into a single, golden record that provides valuable insight into user behavior and helps boost overall sales and marketing performance."

MatchUp works in sharp contrast to matching and de-duplication methods that rely solely on subjective principles, such as whether the record is the most recent, most complete or most frequent. Instead, selection criteria for determining a golden record is based on a relevant data quality score, derived from the validity of customer data such as addresses, phone numbers, emails and names. Once the golden record is identified intelligently, MatchUp further references the data quality score during survivorship processes to support creation of an even better golden record; duplicate entries are then collapsed into a single customer record while retaining any additional information that may also be accurate and applicable.

Utilizing deep domain knowledge of names and addresses, survivorship operations with MatchUp can granularly identify matches between names and nicknames, street/alias addresses, companies, cities, states, postal codes, phones, emails, and other contact data components.

Melissa Data will be demonstrating its MatchUp Component for SSIS at booth #46 during Enterprise Data World, April 27-May 1, 2014 at The Renaissance Hotel in Austin, TX. To download a free trial of Melissa Data's MatchUp Component for SSIS, click here or call 1-800-MELISSA (635-4772).

News Release Library


Performance Scalability

| No Comments | No TrackBacks
By David Loshin

In my last post I noted that there is a growing need for continuous entity identification and identity resolution as part of the information architecture for most businesses, and that the need for these tools is only growing in proportion to the types and volumes of data that are absorbed from different sources and analyzed.

While I have discussed the methods used for parsing, standardization, and matching is past blog series, one thing I alluded to a few notes back was the need for increased performance of these methods as the data volumes grow.

Let's think about this for a second. Assume we have 1,000 records, each with a set of data attributes that are selected to be compared for similarity and matching. In the worst case, if we were looking to determine duplicates in that data set, we would need to compare each records against the remaining records. That means doing 999 comparisons 1,000 times, for a total of 999,000 comparisons.

Now assume that we have 1,000, 000 records. Again, in the worst case we compare each record against all the others, and that means 999,999 comparisons performed 1,000,000 times, for a total of 999,999,000,000 potential comparisons. So if we scale up the number of records by a factor of 1,000, the number of total comparisons increases by a factor of 1,000,000!

Of course, our algorithms are going to be smart enough top figure out ways to reduce the computation complexity, but you get the idea - the number of comparisons grows in a geometric way. And even with algorithmic optimizations, the need for computational performance remains, especially when you realize that 1,000,000 records is no longer considered to be a large number of records - more often we look at data sets with tens or hundreds of millions of records, if not billions.

In the best scenario, performance scales with the size of the input. New technologies enable the use of high performance platforms, through hardware appliances, software that exploits massive parallelism and data distribution, and innovative methods for data layouts and exchanges.

In my early projects on large-scale entity recognition and master data management, we designed algorithms that would operate in parallel on a network of workstations. Today, these methods have been absorbed into the operational fabric, in which software layers adapt in an elastic manner to existing computing resources.

Either way, the demand is real, and the need for performance will only grow more acute as more data with greater variety and diversity is subjected to analysis. You can't always just throw more hardware at a problem - you need to understand its complexity and adapt the solutions accordingly. In future blog series, we will look at some of these issues and ways that new tools can be adopted to address the growing performance need.


Structural Differences and Data Matching

| No Comments | No TrackBacks
By David Loshin

Data matching is easy when the values are exact, but there are different types of variation that complicate matters. Let's start at the foundation: structural differences in the ways that two data sets represent the same concepts. For example, early application systems used data files that were relatively "wide," capturing a lot of information in each record, but with a lot of duplication.

More modern systems use a relational structure that segregates unique attributes associated with each data concept - attributes about an individual are stored in one data table, and those records are linked to other tables containing telephone numbers, street addresses, and other contact data.

Transaction records refer back to the individual records, which reduces the duplication in the transaction log tables.

The differences are largely in the representation - the older system might have a field for a name, a field for an address, perhaps a field for a telephone number, and the newer system might break up the name field into a first name, middle name, and last name, the address into fields for street, city, state, and ZIP code, and a telephone number into fields for area code and exchange/line number.

These structural differences become a barrier when performing records searches and matching. The record structures are incompatible: different number of fields, different field names, and different precision in what is stored.

This is the first opportunity to consider standardization: if structural differences affect the ability to compare a record in one data set to records in another data set, then applying some standards to normalize the data across the data sets will remove that barrier. More on structural standardization in my next post.

By David Loshin

One of the most frequently-performed activities associated with customer data is searching - given a customer's name (and perhaps some other information), looking that customer's records up in databases. And this leads to an enduring challenge for data quality management, which supports finding the right data through record matching, especially when you don't have all the data values, or if the values are incorrect.

When applications allow free-formed text to be inserted into data elements with ill-defined semantics, there is the risk that the values stored may not completely observe the expected data quality rules.

As an example, many customer service representatives may expect that if a customer calls the company, there will be a record in the customer database for that customer. If for some reason, though, the customer's name is not entered exactly the same way as presented during a lookup, there is a chance that the record won't be found. This happens a lot with me, since I go by my middle name, "David," and often people will shorten that to "Dave" when entering data, so when I give my name as "David" the search fails when there is no exact match.

The same scenario takes place when the customer herself does not recall the data used to create the electronic persona - in fact, how many times have you created a new online account when you couldn't remember your user id? Also, it is important to recognize that although we think in terms of interactive lookups of individual data, a huge amount of record matching is performed as bulk operations, such as mail merges, merging data during corporate acquisitions, eligibility validation, claims processing, and many other examples.

It is relatively easy to find a record when you have all the right data. As long as the values used for search criteria are available and exactly match the ones used in the database, the application will find the record. The big differentiator, though, is the ability to find those records even when some of the values are missing, or vary somewhat from the system of record. In the next few postings we'll dive a bit deeper into the types of variations and then some approaches used to address those variations.

Categories