Udemy-Head-Banner

April2516-25off-sitewide728X90

Saturday, February 7, 2015

Data Blending

Data Analyst follow traditional approaches using spreadsheets to find answer to straight forward questions, work with simple algorithms and formulas. They consider readily available data sources from few sources.


When business questions become more Data Analyst require more complex algorithms or large amount of data from different sources.

Data exist in many formats - Structured(Relational Databases, Spreadsheets, Semi-Structured( Social Media posts, or Blog posts or comments) and Unstructured(Machine logs, twitter tweets)




In Addition to characterize data based on its format, a useful way to view data is based on its nature. Data naturally fits into three categories

Traditional - Data that comes from relational databases / spreadsheet / data from mainframe systems.

Enrichment - Data that is Industry specific or special purpose and used to supplement (or enrich) existing for example spatial grid coordinates identifying where customers like to shop would enrich sales information or demographic information about customers background could help a retailer looking at traditional sales data.

Emerging - Data that is often related to big data as well as other sources such as social media or marketing automation data are common examples of emerging data. This is newer more valuable and often the most difficult data to identify and leverage.



                             

Data Categories like Enrichment and Emerging data are most likely to include sources of Big Data, in some cases structured data can also be included.

Once the right data sources are identified the access to those data sources is established, the next step is merging, sorting, joining and otherwise combining all useful data into a functional data set while discarding the vast, loud noise of unnecessary data - this process is called Data Blending

Data Blending is a Process, and that Process can be repeated as necessary to add or remove data sources.

Data Integration vs Data Blending

Data Integration is not data blending, in Data Integration multiple data sources are combined to create a single unified version of data in database, data warehouse or data mart.

Data Blending is a process conducted by a Business or Data Analyst to build a data set for use in analytic processing to answer a specific business question.

The data for the data set is created from one or more data sources, the blending occurs as the data set is built from multiple data sources to capture only the relevant data. Analytic Processing occurs on that purpose built data set to derive an answer for the question being posed.

The Key difference is

Integration - results in a permanent database with the intent of storing a single copy of data is managed by DBA and BI experts

Blending - results in a data set with the purpose of supporting analysis for a specific business question and is created by business and data analytics.