Udemy-Head-Banner

April2516-25off-sitewide728X90

Thursday, October 16, 2014

Understanding Big Data

What is Big Data


Every company is now focusing on two things to do their business.

1) Their product or service targeting potential customers by selling or providing customer service.

2) Collecting and analyzing the data that is generated in this business process.

Storing

Typically data gets stored in various forms and mediums depending on  these factors, they are its size, accessibility, availability and security.

Analysis

Beyond data storage, this data needs to be processed, analyzed or predicted to enhance further business activities. It becomes convenient for a company when it can combine the process of storage and analysis together, this is the point where one should look for a Big Data.


Big Data Eco System

lets take a look a company IT infrastructure at different time periods

     Linkedin @ 2003                                                                           Linkedin @ 2014


 
 

  











Traditional IT applications were once storing events like Act Registration, Deposits, Sales, Purchases etc.

But today the IT applications are smart enough to recommend the products to buy, what music or movie we would like, which stock to invest. ex - facebook friend recommendations, netflix movie recommendations, spotify music recommendations, amazon product recommendations etc.

In order to achieve all these  a company should invest in Big Data.

Thus Big data is a group or collection of technologies which integrate with each other to facilitate real time solutions as business takes place.

Big data is not a like one size fits all, we need to add or remove the tools or technologies that suits our use cases.

Big Data Technologies Landscape 

The following technologies integrate with each other to facilitate big data solution to a company.






In Simple terms  the technologies altogether form a big data solution for a company.

Cloud Storage (AWS / Windows Azure )
                     +
 a HDFS file system (hadoop clusters / Map Reduce)
                     +
 NoSQL databases (Cassandra / Hbase)
                     +
DW Infrastructure (Hive + Spark = Shark)
                     +
ETL (Talend / Informatica)
                     +
Analytics / Visualization (R / Python / Qlikview / BIRST)
                     +
Business Intelligence (Tableau / Microstrategy/ Cognos)



Monday, March 3, 2014

Big Data is a Technology & Data Warehouse is an Architecture

Bigdata (aka Hadoop) is gaining popularity in recent years. Often I hear people saying that we dont need a data warehouse if we have Big data.

I do agree that there are some similarities between a data warehouse and a big data solution.

Both can be used for Reporting. 


Both are managed by electronic storage devices.


Both can hold lot of data


So if a company starts to build a Big data solution doesnt that obviate the need for a data warehouse?

What Big Data offers to an organization


- Technology capable of holding very large amounts of data.


- Technology that can hold the data in inexpensive storage devices.


- Technology where processing is done by the "Roman Census" method.


- Technology where the data is stored in an unstructured format.

What Data Warehouse offers to an organization

In principle there is the Kimball approach to  data warehouse and Inmon approach to a data warehouse

The Inmon approach to data warehouse defines a data warehouse is a subject oriented, non volatile, integrated, time variant collection of data created for the purpose of management decision making. 

In simple terms a data warehouse provides a single version of the truth for decision making in the corporation.

Companies need a data warehouse in order to make informed decisions from the data
that is reliable, believable, readily available and accessible to every one.

So what Big data offers in addition to data Warehouse -
 


In large corporations there is lot of data which are not transported into their data warehouse.
 
There are numerous reasons for not exporting this data to their data warehouses.

[This data cannot be De-normalized or require more additional data to be imported into data warehouse.]
 
For example - Tweets and Facebook posts regarding a product or service discussed by the consumers really helps the companies to understand the consumers opinion about the product or service.
 
By understanding the feedback or comments these companies can make changes accordingly
 
If a company can unlock this valuable unstructured data into a meaningful  information from various sources and then combine them with the reports from their data warehouse they can accurately predict what their customer wants and how it reflect their sales & revenue.

The difference between a Big data and Data warehouse is the difference between a hammer and nail.
 
Big data is a technology and Data warehouse is an architecture. A technology is just a means to store and manage large amount of data.
 
A data warehouse is a way of organizing data so there is a credibility and integrity. We can do compliance reporting like Sarbanes-Oxley, Base II or other styles of  compliance reporting we can depend on Data warehouse.

For all practical purposes a data warehouse and big data have little or no relationship. Finally to conclude The Data warehouse is an Architecture and Big data is a Technology.




 

Monday, February 17, 2014

High Paying Analytic Skills

In 2014 Dice Tech Salary Survey of over 17,000 technology professionals,  highest paid IT skill was R programming.

While big data skills in general featured strongly in the top tier, having R at the top of the list reflects the strong demand for skills to make sense of and extract value from big data.


Similarly, the recent O`Rielly Data Scientist survey also found R skills amongst those that pay in the $111,000 - $125,000 range.



Sunday, February 16, 2014

Digital Intelligence with Splunk

Splunk is a flexible and Powerful platform for machine data. It provides an impactful way to analyze customer behavior and product usage from websites, mobile apps and social media streams.

Splunk helps us to achieve 

1. Reliably collect data from various user interactions - web, mobile, social and offline

2. Get meaningful insights and powerful visualization with unlimited segmentation and full
data drill down on real time and historical data

3. Correlate data across various digital channels

4. Create reports, dashboards and alerts for meaningful actions based on trends




5. Use splunk DB connect and Hadoop connect to enrich streaming unstructured data with structured data from relational database or enable movement of data to Hadoop for complex batch analysis.

6. Understanding Web Behavior in Real Time

7. Improving Mobile App User Experience

In general, Splunk features can be summarized as below.

Index all types of Data Formats - Splunk indexes virtually any data and data 
data format across your infrastructure in real time.

Ad hoc Search - Search terabytes of historical data and live streaming data using
the powerful splunk search language.

Monitor and Alert - Monitor your data for patterns, breakout trends or specific 
events and turn these into proactive alerts.

Report and Analysis - Build powerful reports in minutes, visualize your data, 
perform statistical analysis, spot trends and share your reports.

Custom Dashboards - Create custom dashboards in a few clicks, integrate multiple
charts and views of your data for needs of different users.

Advanced Visualization - Integrates maps and more complex visualizations within 
splunk dashboards

Role based Access - Provide secure, role based access control to any one in your
organization.

DB Connect - Enrich unstructured data with structured data from relational database

Massive Linear Scalability - Scale splunk linearly across commodity servers 
to support the largest of data volumes.

Tuesday, February 11, 2014

Business Intelligence with Big Data


Business Intelligence environment is the top layer for any Data warehouse Platform, Querying & Analyzing, Alerts, Report publishing takes place inside this environment.

An handful of BI Tools are available in the market addressing these reporting requirements for decades.

With infinite storage and data exploration through Big Data technologies like Hadoop, Companies can extend their BI capabilities beyond querying Relational Data sets and extending the same to unstructured and schema less data sets.



















Data Science, is now taking up the Business Intelligence to the next level and opens up an opportunity to visualize the large scale structured and unstructured data.


Thursday, January 30, 2014

PRIME - Parallel Relational In-Memory Engine

Microstrategy announced their new PRIME option for their cloud customers in their 2014 Annual conference in Las vegas.

Microstrategy PRIME is a massively scalable, cloud based, in-memory analytics service designed to deliver extremly high performance for complex analytical applications that have the largest data sets and highest user concurrency.

No doubt, Microstrategy PRIME can address the reporting problems for many Bigdata customers by enforcing security and role based access.

Key features

1. Massively parallel, distributed, in-memory architecture for extreme scale designed to run on cost effective  commodity hardware.

2. Complex analytic problems can be partitioned across hundres of CPU cores and nodes to achieve unprecedented performance.

3. Microstrategy has worked closely with leading hardware vendors to take full advantage of today`s multicore, high memory servers.

4. Tightly integrated dashboard engine for beautiful, easy to use applications build on Microstrategy analytics platform.

5. The visualization engine includes hundres of optimizations designed specifically for the in-memory data store. This engine enables customers to build complete, immersive applications that deliver high speed response.

6. PRIME is available as a service on Microstrategy cloud Analytics platform.


7. PRIME is an MPP Engine optimized and tightly coupled with the visualization and dashboard front-end.

8. PRIME was co-developed with Facebook by co-locating Microstrategy engineers to Facebook.

9. Finally the PRIME solution is not open to third party front ends (like SAP HANA) at this moment, but may be available in the future.

Mobile Analytics with Microstrategy


In this series we will look on Introduction and Setup of  Microstrategy Mobile Analytics.

Develop a Free form sql report on Microstrategy Mobile Platform, so active mobile users can update their OLTP systems for placing orders, updating budgets & financials that in turn can integrate with your Data-warehouse data.

Setting up Microstrategy App access for Ios and Andriod Mobile devices.