Recently in Data Integration Category

PentahoWorld Presentation Details Data Blending Practices to Ensure Meaningful Intelligence


Rancho Santa Margarita, CALIF - October 1, 2014 - Melissa Data, a leading provider of contact data quality and data enrichment solutions, today advocated data quality practices as critical to maintaining the validity and power of Big Data analytics. The firm will provide details in a PentahoWorld training presentation slated for October 10 at 8:00 a.m. as part of the conference's Big Data Today breakout session. Melissa Data's training session will examine the essential correlation between reliable data and authoritative analytics, including the Big Data imperative of standardizing and validating distinct customer records from aggregated, unstructured data.

Analytics is the leading use case for Big Data - it capitalizes on the ability to process huge datasets quickly and economically, yet Big Data simultaneously introduces data quality challenges. If garbage in the form of bad data is fed through an enterprise's Big Data machine, the resulting analytics and insight are flawed, outcomes are compromised and business value negated. Up to 60 percent of IT leaders report a lack of accountability for data quality, with more than 50 percent doubting the overall validity of the data itself.

"Enterprise operations rely on understanding data relationships uncovered by Big Data. Data quality processes must be applied to ensure reliability of the distinct customer records that drive these analytics - fueling improved business intelligence or fraud prevention, understanding customer sentiment or even seeking out medical cures," said Charles Gaddy, director of global sales and alliances, Melissa Data.

Because the unstructured data used for Big Data analytics comes from a variety of sources, its supporting data quality processes are more imperative than those required to handle small relational data. By matching duplicates from multiple data sources, data mangers can create a golden record - a single version of the truth. This information can then be blended with multi-sourced reference data, for instance adding precise lat/long coordinates to a customer address, or demographic data that enriches and heightens insight. "The result is real business intelligence based on validated data, optimized for Big Data analytics and based on a 360° view of the customer," added Gaddy.

Melissa Data's training session is titled "Using Reference Data Sources and Data Quality Practices with Big Data." It will briefly explore data quality processes such as entity extraction used to identify customer and other structured data points from unstructured data, and Big Data blending with authoritative customer information.

PentahoWorld 2014 takes place October 8-10, 2014 at The Hilton Bonnet Creek in Orlando, Florida. Visit www.MelissaData.com, or call 1-800-MELISSA (635-4772) for more information.

News Release Library


By David Loshin

In the past few entries in this series we have basically been looking at an approach to understanding customer behavior at particular contextual interactions that are informed by information pulled from customer profiles.


But if the focal point is the knowledge from the profile that influences behavior, you must be able to recognize the individual, rapidly access that individual's profile, and then feed the data from the profile into the right analytical models that can help increase value.

The biggest issue is the natural variance in customer data collected at different touch points in different processes for different business functions. A search for the exact representation provided may not always result in a match, and at worse, may lead to the creation of a new record for the same individual, even one or potentially more records already exist.

In the best scenario, the ability to rapidly access the customer's profile is enabled through the combination of smart matching routines that are tolerant to some variance along with the creation of a master index.

That master index contains the right amount of identifying information about customers to be able to link two similar records together when they can be determined to represent the same individual while differentiating records that do not represent the same individual.

Once the right record is found in the index, a pointer can be followed to the data warehouse that contains the customer profile information.

This approach is often called master data management (MDM), and the technology behind it is called identity resolution. Despite the relative newness of MDM, much of the capability has been available for many years in data quality and data cleansing tools, particularly those suited to customer data integration for direct marketing, mergers, acquisitions, data warehousing, and other cross-enterprise consolidation.

In other words, customer profiles and integrated analytics builds on a level of master data competency that is likely to already be established within the organization.


Named to DBTA 100: Companies that Matter Most in Data, as well as SD Times 100 for Innovation and Market Presence

Rancho Santa Margarita, CALIF, June 18, 2014 - Melissa Data, a leading provider of contact data quality and integration solutions, today announced it was named for the second consecutive time to the DBTA 100, Database Trends and Applications magazine's list of the companies that matter most in data. SD Times has also recognized Melissa Data with its third consecutive appearance in the SD Times 100, acknowledging firms demonstrating innovation and leadership.

"Melissa Data's comprehensive data quality solutions, both onsite and cloud-based, recognize data quality as a global challenge," said Gary Van Roekel, COO, Melissa Data. "Whether you're a start-up or major international brand, trusted data is a powerful business tool that can accelerate company performance and provide the ideal foundation for global growth. These industry acknowledgements validate the reasoning that for business intelligence and governance initiatives to succeed, there needs to be the proper foundation of data quality, enrichment and accuracy."

Judged by DBTA's editorial staff as the top companies in data and enterprise information management, organizations are selected for the "DBTA 100" based on their presence, execution, vision, and innovation in delivering products and services to the marketplace. All 100 companies are highlighted in the special June edition of Database Trends and Applications magazine and on the DBTA.com website.

The "SD Times 100" spotlights industry leadership and influence in a range of essential enterprise business categories, and is based on reputation, product leadership and overall innovation. By examining newsmakers of the previous year, SD Times editorial staff reviews company achievements to determine the stand-outs. Each organization's offerings and reputation are considered, as well as industry "buzz" which demonstrates market leadership. Melissa Data has again been recognized in the Database & Database Management category, acknowledging top firms supporting developers; Melissa Data's development tools help create seamless data quality integrations, enabling custom applications to capitalize on clean, enhanced customer data.

For more on Melissa Data and its global contact data quality solutions, visit www.MelissaData.com or call 1-800-MELISSA (800-635-4772).

News Release Library


Business Users Can Seamlessly Cleanse and Enrich Customer Data During Transformation; Reduces Resources Needed for Enterprise Data Quality

Rancho Santa Margarita, CALIF - June 5, 2014 - Melissa Data, a leading provider of contact data quality and data integration solutions, today announced its strategic relationship with EXTOL International Inc., a provider of end-to-end business integration software and services, to enable seamless data quality operations in tandem with electronic data interchange (EDI) and other integrated business processes. This partnership gives EXTOL users the option to access Melissa Data's sophisticated, native data quality functions, adding value by eliminating the need for separate data quality resources and assuring data quality immediately as customer information enters the enterprise system. With Melissa Data's suite of data quality tools, EXTOL users can cleanse, validate and enhance customer information as data is being transformed. This allows more intelligent use of customer contact data for significantly better segmentation, targeting, fraud detection and identity verification. The relationship between EXTOL and Melissa Data expands Melissa Data's market through direct access to EDI, and other data syntax communities, while also enabling business users with ideal, streamlined access to powerful data quality services and operations.

"Anytime you're migrating data from one source to another, cleansing and enriching the information allows better results as the data moves forward into the enterprise," says Gary Van Roekel, COO, Melissa Data. "EXTOL and Melissa Data are solving this data quality challenge for the business user, by offering integrated processes that simplify data quality operations from step one."

Connection to Melissa Data's cloud-based data quality tools and services is available through referral from EXTOL. During data transfers, users can simultaneously validate, cleanse and augment multi-national customer contact information across countries, languages and character sets - including verification of emails and international phone numbers, and geocoding street addresses by adding precise latitude and longitude coordinates.

"By offering EXTOL customers a data quality solution through our strategic partnership with Melissa Data, we are providing a tangible advantage for database administrators to simplify data quality operations for the range of CRM, marketing automation and ERP applications," says Mark Denchy, Director of Worldwide Partner Program at EXTOL International. "Achieving intelligent customer data is faster, painless and more cost-effective, ultimately raising data quality as a true business priority."

Melissa Data and Extol have joined together to launch an Educast on how to improve the quality of customer data. The webinar will be held on June 24th at 1 pm EST. To register for free, go to: http://www.extol.com/educast.

Powerful Data Quality Tool Consolidates Duplicates into Single Golden Record; Objective Data Quality Score Uniquely Determines Most Accurate Customer Information


Rancho Santa Margarita, CALIF - May 8, 2014 - Melissa Data, a leading provider of contact data quality and integration solutions, today announced its TechEd 2014 exhibit will feature new matching and de-duplication functionality in the company's MatchUp Component for SQL Server Integration Services (SSIS). Based on proprietary logic from Melissa Data, MatchUp consolidates duplicate customer records objectively, unlike any other data quality solution. Uniquely assessing the quality of individual data fields, MatchUp determines the best pieces of data to retain versus what to discard - enabling a smart, consistent method for data integrators to determine the best customer contact information in every field.

"A single, accurate view of the customer, known as the golden record, is the ideal for any business relying on customer data - reducing waste, optimizing marketing outreach and improving customer service. Yet common methods for matching and eliminating duplicate customer records involve subjective rules that don't consider the accuracy of the data itself," said Bud Walker, director of data quality solutions at Melissa Data. "MatchUp's intelligent rules offer a smarter, more consistent method for determining what information survives in the database and why. It's a critical data quality function that dramatically improves business operations."

MatchUp assesses the content within the customer record, in contrast to matching and de-duplication methods that rely solely on subjective principles, such as whether the record is the most recent, most complete or most frequent. Instead, selection criteria for determining a golden record is based on a relevant data quality score, derived from the validity of customer information such as addresses, phone numbers, emails and names.

Once the golden record is identified intelligently, MatchUp further references the data quality score during survivorship processes to support creation of an even better golden record; duplicate entries are then collapsed into a single customer record while retaining any additional information that may also be accurate and applicable. MatchUp relies on deep domain knowledge of names and addresses for survivorship operations, used to granularly identify matches between names and nicknames, street/alias addresses, companies, cities, states, postal codes, phones, emails, and other contact data components.

MatchUp is part of Melissa Data's Data Quality Components for SQL Server Integration Services (SSIS), a suite of custom data cleansing transformation components for Microsoft SSIS, used to standardize, verify, correct, consolidate and update contact data for the most effective business communications. The suite further includes selected Community Editions for simple data quality procedures, free to developers and downloadable with no license required. Community Editions include Contact Verify CE for address, phone and name parsing, and email syntax correction; MatchUp CE provides simple matching and de-duplication without advanced survivorship operations, for up to 50,000 records using nine basic matchcodes.

Melissa Data will be demonstrating its MatchUp Component for SSIS at booth #1934 during Microsoft TechEd, May 12-15, 2014 at the George R. Brown Convention Center in Houston, TX. To download a free trial of Melissa Data's MatchUp Component for SSIS, click here; to request access to Melissa Data's free Community Editions, click here or call 1-800-MELISSA (635-4772).

News Release Library


Rancho Santa Margarita, CALIF- January 14, 2014 - Melissa Data, a leading provider of global contact data quality and integration solutions, today announced its strategic alliance with Blu Sky to solve growing data management challengesin healthcare markets. Melissa Data offers a comprehensive platform for data integration and data quality, and Blu Sky provides data capture technologies optimized for EpicCare software deployments used to administer mid-size and large medical groups, hospitals, and integrated healthcare organizations. By partnering with Melissa Data and its extensive suite of established data quality solutions, healthcare providers have a comprehensive single source for superior data management and compliance.

"Integrated data quality is essential to advancing universal healthcare options, yet the complexities of healthcare data management are evident in today's headlines," said Gary Van Roekel, COO, Melissa Data. "As government initiatives catalyze change in the market, our alliance with Blu Sky provides a significant technical and competitive advantage for healthcare CTOs - offering a comprehensive, proven resource for data quality, integration, and capture. Improved and integrated patient data quality will not only help providers reduce the cost of care, but also facilitate better diagnosis and treatment options for patients."

With this alliance, Melissa Data provides global data quality solutions that verify, standardize, consolidate, and enhance U.S. and international contact data, in combination with a comprehensive Data Integration Suite in Contact Zone, enabling cleansed and enhanced patient data to be transformed and shared securely within a healthcare network. Blu Sky adds subject matter experts to the equation - with deep expertise in the EpicCare software used extensively in healthcare networks to facilitate a "one patient, one record" approach; patient data capture, storage, and management is assured of compliance with a growing range of healthcare regulations, including CASS certification of address results, and HIPAA privacy and security policies.

"Mobile healthcare, connected pharmacy applications, and electronic medical records represent tangible advancements in healthcare accessibility," said Rick O'Connor, President, Blu Sky. "The same advances increase complexity of data management in the context of HIPAA confidentiality and other industry standards. With a single source to address compliance network-wide, providers are poised for healthcare innovations based on secure, high quality patient information."

For more information about the healthcare data management alliance between Melissa Data and Blu Sky, contact Annie Shannahan at 360-527-9111, or call 1-800-MELISSA (635-4772).


Performance Scalability

| No Comments | No TrackBacks
By David Loshin

In my last post I noted that there is a growing need for continuous entity identification and identity resolution as part of the information architecture for most businesses, and that the need for these tools is only growing in proportion to the types and volumes of data that are absorbed from different sources and analyzed.

While I have discussed the methods used for parsing, standardization, and matching is past blog series, one thing I alluded to a few notes back was the need for increased performance of these methods as the data volumes grow.

Let's think about this for a second. Assume we have 1,000 records, each with a set of data attributes that are selected to be compared for similarity and matching. In the worst case, if we were looking to determine duplicates in that data set, we would need to compare each records against the remaining records. That means doing 999 comparisons 1,000 times, for a total of 999,000 comparisons.

Now assume that we have 1,000, 000 records. Again, in the worst case we compare each record against all the others, and that means 999,999 comparisons performed 1,000,000 times, for a total of 999,999,000,000 potential comparisons. So if we scale up the number of records by a factor of 1,000, the number of total comparisons increases by a factor of 1,000,000!

Of course, our algorithms are going to be smart enough top figure out ways to reduce the computation complexity, but you get the idea - the number of comparisons grows in a geometric way. And even with algorithmic optimizations, the need for computational performance remains, especially when you realize that 1,000,000 records is no longer considered to be a large number of records - more often we look at data sets with tens or hundreds of millions of records, if not billions.

In the best scenario, performance scales with the size of the input. New technologies enable the use of high performance platforms, through hardware appliances, software that exploits massive parallelism and data distribution, and innovative methods for data layouts and exchanges.

In my early projects on large-scale entity recognition and master data management, we designed algorithms that would operate in parallel on a network of workstations. Today, these methods have been absorbed into the operational fabric, in which software layers adapt in an elastic manner to existing computing resources.

Either way, the demand is real, and the need for performance will only grow more acute as more data with greater variety and diversity is subjected to analysis. You can't always just throw more hardware at a problem - you need to understand its complexity and adapt the solutions accordingly. In future blog series, we will look at some of these issues and ways that new tools can be adopted to address the growing performance need.


Understanding Data Quality Services

| No Comments | No TrackBacks
Knowledge Base, Knowledge Discovery, Domain Management,
and Third Reference Data Sets



PASS Virtual Chapter Meeting: Thursday, Jan. 31, 2013 at 9 am PDT, 12 pm EST.

REGISTER NOW!


With the release of Data Quality Services (DQS), Microsoft innovates its solutions on Data Quality and Data Cleansing by approaching it from a Knowledge Driver Standpoint. In this presentation, Joseph Vertido from Melissa Data will discuss the key concepts behind Knowledge Driven Data Quality, implementing a Data Quality Project, and will demonstrate how to build and improve your Knowledge Base through Domain Management and Knowledge Discovery.

What sets DQS apart is its ability to provide access to Third Party Reference Data Sets through the Azure Marketplace. This access to shared knowledge empowers the business user to efficiently cleanse complicated and domain specific information such as addresses. During this session examples will be presented on how to access RBS Providers and integrate them from the DQS Client.

REGISTER NOW!

A Guide to Better Survivorship - A Melissa Data Approach

| No Comments | No TrackBacks
By Joseph Vertido

The importance of survivorship - or as others may refer to as the Golden Record - is quite often overlooked. It is the final step in the record matching and consolidation process which ultimately allows us to create a single accurate and complete version of a record. In this article, we will take a look at how Melissa Data uniquely differentiates itself in approaching the concept of survivorship compared to some of the more conventional practices.

The process of selecting surviving records means selecting the best possible candidate as its representation. However, best in the perspective of survivorship can really mean a lot of things. It can be affected by the structure of data, where the data is gathered from, how data comes in, what kind of data is stored, and sometimes by the nature of business rules. Thus techniques can be applied in order to accommodate certain types of variations when performing survivorship. We find that there are three very commonly used techniques in determining the surviving record:

I. Most Recent

Date stamped records can be ordered from most recent to less recent. The most recent record can be considered eligible as the survivor.

II. Most Frequent

Matching records containing the same information are also an indication for correctness. Repeating records indicate that the information is persistent and therefore reliable.

III. Most Complete

Field completeness is also a factor of consideration. Records with more values populated for each available field are also viable candidates for survivorship.


Although these techniques are commonly applied in survivorship schemas, its correctness may not be as reliable in many circumstances. Because these techniques apply to almost any type of data, the basis in which a surviving record is created conforms only to "generic" rules. This is where Melissa Data is able to set itself apart from "generic" survivorship. By leveraging reference data, we can steer a way to generating better and more effective schemas for survivorship.

The incorporation of reference data in survivorship changes how rules come into play. Using the Most Recent, Most Frequent or Most Complete logic really has more of an aesthetic basis for selection. Ideally, the selection of the surviving record should be based off an actual understanding of our data.

And this is where reference data comes into play. What it boils down to at the very end is simply being able to consolidate the best quality data. Thus by incorporating reference data, we gain an understanding of the actual contents of data, and create better decisions for survivorship. Let's take a look at some instances on how reference data and data quality affect decisions for survivorship.

I. Address Quality

Separating good data from bad data should take precedence in making decisions for survivorship.

Address Quality Sample

In the case of addresses, giving priority to good addresses makes for a better decision in the survivorship schema.

II. Record Quality

It could also be argued that good data may exist in a single group of matching records. In cases like these, we can assess the overall quality of data by taking into consideration other pieces of information that affect the weight of overall data quality. Take for example the following data:

Record Quality Sample

In this case, the ideal approach is to evaluate multiple elements for each record in the group. Since the second record contains a valid phone number, it can be given more weight or more importance than the third record despite it being more complete.

Whether we're working with contact data, product data or any other form of data, in summary, the methodologies and logic used for record survivorship become dependent primarily on data quality. And however we choose to define data quality, it is imperative that we keep only the best pieces of data if we are to have the most accurate and correct information. In the case of Contact Data however, Melissa Data changes the perspective as to how the quality of data is defined, therefore breaking the norm of typical survivorship schemas.


Making Sense Out of Missing Data

| No Comments | No TrackBacks
By David Loshin

I have spent the past few blog posts considering different aspects of null values and missing data. As I mentioned last time, it is easy to test for incompleteness, especially when system nulls are allowed. And even in older systems, the variable ways that missing or null data is represented is finite, making it easy to describe rules for flagging incomplete records.

The challenge is determining how to address the missing values, and unfortunately there are no magic bullets to infer a value when there is no information provided. On the other hand, one might consider some different ideas for determining whether a data element's value may be null, and if not, how to find a reasonable or valid value for it.

For example, linking data between different data sets can enable some degree of inference. If I can link a record in one data set that is missing a value with a similar record in a different data set whose data elements are complete, as long as certain rules are observed (such as timeliness and consistency rules), we could make the presumption that the missing value can be completed by copying from the linked record.

Alternatively, we could adjust the business processes, and either determine when there are situations in which a value is mandatory when it really doesn't need to be, or to examine ways to engineer aspects of a workflow to ensure that the missing data is collected prior to gating transactions to their subsequent stages.

These are just a few ideas, but the sheer fact that data incompleteness remains a problem these days is a testament to the fact that the issues is not given enough attention. But with the growth in the reliance on greater volumes of data being streamed at higher velocities than ever before, the problems of missing and incomplete data sets are only going to become more acute, so perhaps now is a good time to start considering the negative impacts of missing data within your own environments!