Udemy-Head-Banner

April2516-25off-sitewide728X90

Thursday, October 16, 2014

Understanding Big Data

What is Big Data


Every company is now focusing on two things to do their business.

1) Their product or service targeting potential customers by selling or providing customer service.

2) Collecting and analyzing the data that is generated in this business process.

Storing

Typically data gets stored in various forms and mediums depending on  these factors, they are its size, accessibility, availability and security.

Analysis

Beyond data storage, this data needs to be processed, analyzed or predicted to enhance further business activities. It becomes convenient for a company when it can combine the process of storage and analysis together, this is the point where one should look for a Big Data.


Big Data Eco System

lets take a look a company IT infrastructure at different time periods

     Linkedin @ 2003                                                                           Linkedin @ 2014


 
 

  











Traditional IT applications were once storing events like Act Registration, Deposits, Sales, Purchases etc.

But today the IT applications are smart enough to recommend the products to buy, what music or movie we would like, which stock to invest. ex - facebook friend recommendations, netflix movie recommendations, spotify music recommendations, amazon product recommendations etc.

In order to achieve all these  a company should invest in Big Data.

Thus Big data is a group or collection of technologies which integrate with each other to facilitate real time solutions as business takes place.

Big data is not a like one size fits all, we need to add or remove the tools or technologies that suits our use cases.

Big Data Technologies Landscape 

The following technologies integrate with each other to facilitate big data solution to a company.






In Simple terms  the technologies altogether form a big data solution for a company.

Cloud Storage (AWS / Windows Azure )
                     +
 a HDFS file system (hadoop clusters / Map Reduce)
                     +
 NoSQL databases (Cassandra / Hbase)
                     +
DW Infrastructure (Hive + Spark = Shark)
                     +
ETL (Talend / Informatica)
                     +
Analytics / Visualization (R / Python / Qlikview / BIRST)
                     +
Business Intelligence (Tableau / Microstrategy/ Cognos)