Udemy-Head-Banner

April2516-25off-sitewide728X90

Thursday, October 16, 2014

Understanding Big Data

What is Big Data


Every company is now focusing on two things to do their business.

1) Their product or service targeting potential customers by selling or providing customer service.

2) Collecting and analyzing the data that is generated in this business process.

Storing

Typically data gets stored in various forms and mediums depending on  these factors, they are its size, accessibility, availability and security.

Analysis

Beyond data storage, this data needs to be processed, analyzed or predicted to enhance further business activities. It becomes convenient for a company when it can combine the process of storage and analysis together, this is the point where one should look for a Big Data.


Big Data Eco System

lets take a look a company IT infrastructure at different time periods

     Linkedin @ 2003                                                                           Linkedin @ 2014


 
 

  











Traditional IT applications were once storing events like Act Registration, Deposits, Sales, Purchases etc.

But today the IT applications are smart enough to recommend the products to buy, what music or movie we would like, which stock to invest. ex - facebook friend recommendations, netflix movie recommendations, spotify music recommendations, amazon product recommendations etc.

In order to achieve all these  a company should invest in Big Data.

Thus Big data is a group or collection of technologies which integrate with each other to facilitate real time solutions as business takes place.

Big data is not a like one size fits all, we need to add or remove the tools or technologies that suits our use cases.

Big Data Technologies Landscape 

The following technologies integrate with each other to facilitate big data solution to a company.






In Simple terms  the technologies altogether form a big data solution for a company.

Cloud Storage (AWS / Windows Azure )
                     +
 a HDFS file system (hadoop clusters / Map Reduce)
                     +
 NoSQL databases (Cassandra / Hbase)
                     +
DW Infrastructure (Hive + Spark = Shark)
                     +
ETL (Talend / Informatica)
                     +
Analytics / Visualization (R / Python / Qlikview / BIRST)
                     +
Business Intelligence (Tableau / Microstrategy/ Cognos)



Monday, March 3, 2014

Big Data is a Technology & Data Warehouse is an Architecture

Bigdata (aka Hadoop) is gaining popularity in recent years. Often I hear people saying that we dont need a data warehouse if we have Big data.

I do agree that there are some similarities between a data warehouse and a big data solution.

Both can be used for Reporting. 


Both are managed by electronic storage devices.


Both can hold lot of data


So if a company starts to build a Big data solution doesnt that obviate the need for a data warehouse?

What Big Data offers to an organization


- Technology capable of holding very large amounts of data.


- Technology that can hold the data in inexpensive storage devices.


- Technology where processing is done by the "Roman Census" method.


- Technology where the data is stored in an unstructured format.

What Data Warehouse offers to an organization

In principle there is the Kimball approach to  data warehouse and Inmon approach to a data warehouse

The Inmon approach to data warehouse defines a data warehouse is a subject oriented, non volatile, integrated, time variant collection of data created for the purpose of management decision making. 

In simple terms a data warehouse provides a single version of the truth for decision making in the corporation.

Companies need a data warehouse in order to make informed decisions from the data
that is reliable, believable, readily available and accessible to every one.

So what Big data offers in addition to data Warehouse -
 


In large corporations there is lot of data which are not transported into their data warehouse.
 
There are numerous reasons for not exporting this data to their data warehouses.

[This data cannot be De-normalized or require more additional data to be imported into data warehouse.]
 
For example - Tweets and Facebook posts regarding a product or service discussed by the consumers really helps the companies to understand the consumers opinion about the product or service.
 
By understanding the feedback or comments these companies can make changes accordingly
 
If a company can unlock this valuable unstructured data into a meaningful  information from various sources and then combine them with the reports from their data warehouse they can accurately predict what their customer wants and how it reflect their sales & revenue.

The difference between a Big data and Data warehouse is the difference between a hammer and nail.
 
Big data is a technology and Data warehouse is an architecture. A technology is just a means to store and manage large amount of data.
 
A data warehouse is a way of organizing data so there is a credibility and integrity. We can do compliance reporting like Sarbanes-Oxley, Base II or other styles of  compliance reporting we can depend on Data warehouse.

For all practical purposes a data warehouse and big data have little or no relationship. Finally to conclude The Data warehouse is an Architecture and Big data is a Technology.




 

Monday, February 17, 2014

High Paying Analytic Skills

In 2014 Dice Tech Salary Survey of over 17,000 technology professionals,  highest paid IT skill was R programming.

While big data skills in general featured strongly in the top tier, having R at the top of the list reflects the strong demand for skills to make sense of and extract value from big data.


Similarly, the recent O`Rielly Data Scientist survey also found R skills amongst those that pay in the $111,000 - $125,000 range.



Sunday, February 16, 2014

Digital Intelligence with Splunk

Splunk is a flexible and Powerful platform for machine data. It provides an impactful way to analyze customer behavior and product usage from websites, mobile apps and social media streams.

Splunk helps us to achieve 

1. Reliably collect data from various user interactions - web, mobile, social and offline

2. Get meaningful insights and powerful visualization with unlimited segmentation and full
data drill down on real time and historical data

3. Correlate data across various digital channels

4. Create reports, dashboards and alerts for meaningful actions based on trends




5. Use splunk DB connect and Hadoop connect to enrich streaming unstructured data with structured data from relational database or enable movement of data to Hadoop for complex batch analysis.

6. Understanding Web Behavior in Real Time

7. Improving Mobile App User Experience

In general, Splunk features can be summarized as below.

Index all types of Data Formats - Splunk indexes virtually any data and data 
data format across your infrastructure in real time.

Ad hoc Search - Search terabytes of historical data and live streaming data using
the powerful splunk search language.

Monitor and Alert - Monitor your data for patterns, breakout trends or specific 
events and turn these into proactive alerts.

Report and Analysis - Build powerful reports in minutes, visualize your data, 
perform statistical analysis, spot trends and share your reports.

Custom Dashboards - Create custom dashboards in a few clicks, integrate multiple
charts and views of your data for needs of different users.

Advanced Visualization - Integrates maps and more complex visualizations within 
splunk dashboards

Role based Access - Provide secure, role based access control to any one in your
organization.

DB Connect - Enrich unstructured data with structured data from relational database

Massive Linear Scalability - Scale splunk linearly across commodity servers 
to support the largest of data volumes.

Tuesday, February 11, 2014

Business Intelligence with Big Data


Business Intelligence environment is the top layer for any Data warehouse Platform, Querying & Analyzing, Alerts, Report publishing takes place inside this environment.

An handful of BI Tools are available in the market addressing these reporting requirements for decades.

With infinite storage and data exploration through Big Data technologies like Hadoop, Companies can extend their BI capabilities beyond querying Relational Data sets and extending the same to unstructured and schema less data sets.



















Data Science, is now taking up the Business Intelligence to the next level and opens up an opportunity to visualize the large scale structured and unstructured data.


Thursday, January 30, 2014

PRIME - Parallel Relational In-Memory Engine

Microstrategy announced their new PRIME option for their cloud customers in their 2014 Annual conference in Las vegas.

Microstrategy PRIME is a massively scalable, cloud based, in-memory analytics service designed to deliver extremly high performance for complex analytical applications that have the largest data sets and highest user concurrency.

No doubt, Microstrategy PRIME can address the reporting problems for many Bigdata customers by enforcing security and role based access.

Key features

1. Massively parallel, distributed, in-memory architecture for extreme scale designed to run on cost effective  commodity hardware.

2. Complex analytic problems can be partitioned across hundres of CPU cores and nodes to achieve unprecedented performance.

3. Microstrategy has worked closely with leading hardware vendors to take full advantage of today`s multicore, high memory servers.

4. Tightly integrated dashboard engine for beautiful, easy to use applications build on Microstrategy analytics platform.

5. The visualization engine includes hundres of optimizations designed specifically for the in-memory data store. This engine enables customers to build complete, immersive applications that deliver high speed response.

6. PRIME is available as a service on Microstrategy cloud Analytics platform.


7. PRIME is an MPP Engine optimized and tightly coupled with the visualization and dashboard front-end.

8. PRIME was co-developed with Facebook by co-locating Microstrategy engineers to Facebook.

9. Finally the PRIME solution is not open to third party front ends (like SAP HANA) at this moment, but may be available in the future.

Mobile Analytics with Microstrategy


In this series we will look on Introduction and Setup of  Microstrategy Mobile Analytics.

Develop a Free form sql report on Microstrategy Mobile Platform, so active mobile users can update their OLTP systems for placing orders, updating budgets & financials that in turn can integrate with your Data-warehouse data.

Setting up Microstrategy App access for Ios and Andriod Mobile devices.


Friday, December 13, 2013

Data Modelling - How to ?

The Data Modelling is the most important activity in any BI project. A well planned Data Model helps to analyze the business performance in many perspectives.

The Data Modelling procedure stands the same for building Transactional (OLTP) and Analysis (OLAP) Database systems.

The Data Modelling for a Transactional databases systems are termed as Relational Data Modelling and for Analysis database systems as Dimensional Modelling.

It becomes important to understand that we need to add additional information in a Dimensional Model to accommodate aggregated relational data (also called as Facts) and group description data (also called as Attributes)

The relational data model is well normalized for effective read and write operations and designed in the 3rd Normal Form.

The dimensional data model is de-normalized for quick read only operations.

Both Normalized or De normalized  database systems require an effective Data Model for future maintenance and enhancements.

The following are the steps involved in an effective Data Model Process which is carried out sequentially.


1. Data Modelling Overview

2. Elements used in Logical Data Models

3. Physical Elements of Data Models

4. Normalizing a Data Model

5. Requirements Gathering

6. Interpreting Requirements

7. Creating a Logical Model

8. Common Data Modelling Problems

9. Creating the Physical Model in a Database System.

10. Indexing Considerations

11. Creating an Abstraction Layer in Database System.


Tuesday, December 3, 2013

Setting up Adventureworks[AW] Datawarehouse[DW] Database as a Project Source for Microstrategy Projects.


The next step is to import the AW-DW Database into Microstrategy.

Pre-requisite 


1. Microsoft SQL Server 2008 R2 is installed

2. Microstrategy is installed and configured
3. Have Admin privileges on SQL Server and Microstrategy Boxes.

Installing Microsoft Adventureworks datawarehouse database.


Step 1: - Download Microsoft AdventureworksDW2008R2_Data.mdf file from 

below link

http://msftdbprodsamples.codeplex.com/downloads/get/363848


Step 2:- Login to MS SQL Server with Admin privileges and attach the mdf file

without the log file by placing the downloaded mdf file in the location path
C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA
and run this below script in the SQL Server and Execute it.

EXEC sp_attach_single_file_db @dbname='AdventureWorksDW2008R2_Data',

@physname=N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\AdventureWorksDW2008R2_Data.mdf'
GO

This should attach the database without the log file.



Now Microsoft Adventure works database is ready as a source database for Microstrategy Projects.


Adventure Works Datawarehouse Data Model 


Download the Datamodel for Adventureworks Datawarehouse Database by Right click and save the below picture.









This can be used as a reference to create a Logical Data Model for our Microstrategy Project. 

Monday, November 4, 2013

Microstrategy Analytics Desktop

Points to note before using Microstrategy Free desktop.

1. Microstrategy Analytics Desktop has a file size limitation of 100 MB

2. No secure User Authentication.

3. Runs on default port 127.0.0.1:8082 powered by Jetty Web Server.The user cannot change the IP and Port No., this means one cannot run Microstrategy Desktop using the computers Ethernet IP Address, and also cannot change the Port from 8082 to some other port no.

4. Can be accessed only through a Web browser (No Ipad or Andriod)

5. Cannot connect to OLAP cube sources.



Wednesday, October 9, 2013

Microstrategy Usher Pro


Usher is the Mobile Identity platform that enables enterprises to render digital credentials or badges on users mobile devices.
Usher increases enterprise security and safety by allowing organizations to
    • Issue digital identification badges to employees on their phones
    • Validate and ensure the identity or credentials of any individuals
    • Replace passwords with time – limited access codes
    • Replace physical keys and access cards with software keys
    • Secure sign, Store and Share digital documents and media
    • Capture abnormal activity with Behavior Monitoring Analytics
Usher System Overview

The Microstrategy Usher Pro Mobile Identity system is comprised of the Usher Mobile App, which operates on members Iphone and Android smartphones.
The Users cloud based services that maintain and validate members credentials and diverse authentication databases and systems that are managed by the Usher customer.
Usher secures the data storage and transmitted within the solution with strong encryption from end to end.

Capture_2013_10_09_12_42_40_995

Although Usher is synchronized with customer authentication databases and systems, it does not access them directly, ensuring the integrity of those critical systems is maintained. With this integrated system, Usher members can easily and securely confirm one another`s identity face to face or remotely, even when they are from different organizations using distinct identity management technologies.
“Usher is simply a  better way to manage your Facebook Events” said Michael J Saylor, Chairman and CEO of Microstrategy Inc.

Thursday, October 3, 2013

Terms & Definitions


Aggregate Function – A numeric function that acts on a column of data and produces a single result. Examples include SUM, COUNT, MAX, MIN, AVG.
Aggregate Table – A fact table that stores data that has been aggregated along one or more dimensions
Application Object – An object used to provide analysis of and insight into relevant data. The definition of application objects such as reports, documents, filters, templates, custom groups, metrics and prompts are derived from schema objects. All of these objects can be built and manipulated in Microstrategy Web.
Attribute – A data level defined by the system architect and associated with one or more columns in a data warehouse  lookup table. Attributes include data classifications like Region, Order, Customer, Age, Item, City and Year. They provide a means for aggregating and filtering at a given level.
Attribute Element – A unique set of information for an attribute, defined by the attribute forms. For example – New York and Dallas are elements of the attribute city, January and February are elements of the attribute Month.
Attribute Forms – One of several columns associated with an attribute that are different aspects of the same thing. ID, Name, Last Name, Long Description and Abbreviation could be forms of the attribute Customer. Every attribute supports its own collection of forms.
Attribute Form Expression – A mapping to the columns in the warehouse that are used to represent a specific attribute form in SQL.
Attribute Role – A database column that is used to define more than one attribute. For example, Billing City and Shopping City are two attributes that have the same table and columns defined as a look up table.
Axis – A vector along with data is displayed. There are three axes Row, Column and Page. When a user defines a template for a report, he places template units – attributes, dimensions, metrics, consolidations and custom groups along each axis.
Base Fact Column – A fact column represented by a single column in a fact table.
Browse Attribute – An attribute a user can directly browse to from a given attribute in a user hierarchy.
Business Intelligence (BI) System – A system that facilitates the analysis of volumes of complex data by providing the ability to view data from multiple perspectives.
Cache – A special data store holding recently accessed information for quick future access. This is normally done for frequently requested reports, whose execution is faster because they need not run against the database.
Results from the data warehouse are stored temporarily and can be used by new job requests that require the same data.
In Microstrategy environment when a user runs a report for the first time, the job is submitted to the database for processing. However if the results of that report are cached the results can be returned immediately without having to wait for the database to process the job the next time the report is run.
Cardinality – The number of unique elements for an attribute
Child Attribute – The lower – level attribute in an attribute relationship
Column – A one dimensional vertical array of values in a table.
The set of fields of a given name and data type in all the rows of a given table
Microstrategy object in the schema layer that can represent one or more physical table columns or no columns.
Column Alias – In a fact definition, the specific name of the column to be used in temporary tables and SQL statements. Column aliases also include the data type to be used for the fact and allow you to modify the names of existing metrics for use in data mart reports without affecting the original metric.
Compound Attribute – An attribute that has more than one key (ID) form.
Compound Key – In a relational database, a Primary Key consisting of more than one database column.
Conditionality – Conditionality of a metric enables you to associate an existing filter object with the metric so that only data that meets the filter conditions is included in the calculation.
Configuration Object – A Microstrategy object appearing in the system layer and usable across multiple projects. Configuration objects includes the following object types – Users, Database Instances, Database Logins, ID and Schedules.
Custom Group – An object that can be placed on a template and is made up of an ordered collection of elements called custom group elements. Each element contains its own set of filtering qualifications.
Data Explorer – A portion of interface used to browse through data contained in the warehouse. Users can navigate through hierarchies of attributes that are defined by the administrator to find the data they need.
Data Source – A data source is any file, system or storage location which stores data that is to be used in Microstrategy for query, reporting and analysis.
A data warehouse can be thought of as one type of data source, which refers more specifically to using a database as your data source.
Other data sources includes text files, excel files, and MDX cube sources such as SAP BW, Microsoft Analysis Services 2000 and 2005 and Hyperion Essbase.
Data Warehouse – A database, typically very large containing the historical data of an enterprise. used for decision support or business intelligence, it organizes data and allows coordinated updates and loads.
A copy of transaction data specifically structured for query, reporting and analysis.
Database Instance – A Microstrategy object created in Microstrategy Desktop that represents a connection to the warehouse.  A database instance specifies warehouse connection information such as the data warehouse DSN, Login ID and password, and other data warehouse specific information.
Database Server software running on a particular machine, although it is technically possible to have more than one instance running on a machine, there is usually only on instance per machine.
Degradation – A type of fact extension in which values at one level of aggregation are reported at  a second lower attribute level.
Description column – Optional columns that contain text descriptions of attribute elements.
Derived Attribute – An attribute calculated from a mathematical operation on columns in a warehouse table. For example – Age might be calculated from this expression – [Current Date – Birthdate]
Derived Fact Column – A fact column created through a mathematical combination of other existing fact columns.
Derived Metric – A metric based on data already available in a report. It is calculated by Intelligence Server, not in the database.  Use a derived metric to perform column math, that is calculations on other metrics, on report data after it has been returned from the database.
Drill – A method of obtaining supplementary information after a report has been executed. The new data is retrieved by re-querying the Intelligent Cube or database at a different attribute or fact level.
Dynamic Relationship – When the relationship between elements of parent and child attributes changes. These changes often occur because of organizational restructuring; geographic realignment; or the addition, reclassification or discontinuation of items or services.
For example – a store may decide to reclassify the department to which items belong.

Wednesday, October 2, 2013

Project Design Essentials – v 9.2.1 – 9.4.1


Contents

1. Introduction to Microstrategy Architect

2. Designing the Logical Data Model

3. Designing the Data Warehouse Schema

4. Creating a Project in Microstrategy Architect

5. Introduction to Architect Graphical Interface

6. Working with Tables

7. Working with Facts

8. Working with Attributes

9. Working with User Hierarchies

10. Automatic Schema Recognition

11. Other Methods for Creating Schema Objects

Report Developer Essentials v 9.2.1 – 9.4.1


Contents

1. Introduction to Microstrategy Business Intelligence

2. Introduction to Microstrategy Desktop

3. Reports

4. Report Data Manipulations

5. Report Style Manipulations

6. Filters

7. Metrics

8. OLAP Services

9. Prompts and Searches

Friday, September 27, 2013

Tuesday, September 24, 2013

BI Vendors in Market

Right now, most of the user activity in the BI and analytics platform market is from organizations that are trying to mature from descriptive to diagnostic analytics. The vendors in the market have overwhelmingly concentrated on meeting this user demand. If there were a single market theme in 2012, it would be that data discovery became a mainstream architecture. 

For years, data discovery vendors — such as QlikTech, Salient Management Company, Tableau Software and Tibco Spotfire — received more positive feedback than vendors offering OLAP cube and semantic-layer-based architectures. In 2012, the market responded:

  • MicroStrategy significantly improved Visual Insight.
  • SAP launched Visual Intelligence.
  • SAS launched Visual Analytics.
  • Microsoft bolstered PowerPivot with Power View.
  • IBM launched Cognos Insight.
  • Oracle acquired Endeca.
  • Actuate acquired Quiterian.
This emphasis on data discovery from most of the leaders in the market — which are now promoting tools with business-user-friendly data integration, coupled with embedded storage and computing layers (typically in-memory/columnar) and unfettered drilling — accelerates the trend toward decentralization and user empowerment of BI and analytics, and greatly enables organizations' ability to perform diagnostic analytics.