Friday, April 27, 2012

Big Data

One item that comes up often in discussions regarding Business Intelligence is Big Data.  Big Data refers to a data set that is so large that using traditional relational database technology to interact with it is very difficult and perhaps impractical.  So, let's say that a star schema is designed in hopes of helping to increase business intelligence in some area and an ETL job is written to populate that star.  If that ETL job takes days to run due to the size and/or complexity of the data (not due to the inefficiency of the code within the ETL job, that can be corrected by rewriting it) then that data set may be referred to as Big Data...and it can be quite frustrating. 

There are new technologies being developed that help to deal with this "thorn in the side" of data warehousing professionals.  Some of these technologies allow an analyst to view data in a source system in the same way that she would in a data warehouse without needing a data warehouse.  One of these is an SAP product called HANA.  Remember from this blog's first post that the point of a data warehouse is to create a new environment for data (apart from the source system), in which it is restructured so that it is optimal for analysis.  A product like HANA can process records much faster than other relational database products on the market. 

So, revisiting the piggy bank example from this blog's first post, suppose that Rain Man (Dustin Hoffman's character in the movie of the same name) peered inside the piggy bank.  Someone with his talent would be able to quickly determine the amount of money inside the piggy bank without the need to place the coins into money rolls.  In this case, the same business intelligence can be gained even though the coins remain in the source system (the piggy bank).  So, what does this mean for skills in dimensional modeling and ETL development?  I would imagine that there is some debate on that.  I'll leave that for a future post.   

I'll say parenthetically that I'm not trying to endorse or not endorse HANA.  I'm simply mentioning it as an example of a piece of technology that is designed to deal with Big Data.

Image: digitalart / FreeDigitalPhotos.net 

No comments:

Post a Comment