Friday, June 29, 2012

Big Data Defined

A typical question asked of business intelligence professionals is - what is big data?  In a previous post I provided some explanation of the big data challenge.  As I've learned more about this very interesting piece of business intelligence, I believe that a more complete explanation is available.  Consider this...

Suppose you own a day care, meaning that you are in the business of caring for children each day.  You have determined that in order for you to have a truly challenging day, three things have to occur.  First, you must have a large number of kids to care for on that given day.  How large is large?  There are no hard and fast rules...large enough that your current resources are strained at best and inadequate at worst.  Second, let's assume that you have no help, meaning that you need to know what each child is doing right now.  Not knowing what one of them was doing for an hour and then finding out later that he was painting the refrigerator is not acceptable.  Third, you have a very diverse group of kids.  This does not refer to diversity with regards to race or ethnicity but diversity with regards to personality.  Some of the kids love to play outside and some inside.  Some are into puzzles and others are into riding bicycles.  As a result, keeping the kids engaged in activities that they enjoy and in which they are gifted can be challenging. 

If two of the three challenges exist, the day is still challenging but not to the same extent.  You can know what each child is doing in a diverse group if you have a small number of them.  You can deal with a large number of kids if they are ALL sitting in the same room, doing the same thing.  It's the combination of the three problems that presents a challenge.  Now, consider this in terms of big data.  Big data is characterized as having three V's. 

1.) Volume - In order to data to be considered big data it must be large.  How large is large?  I'm not so sure that there is a hard and fast boundary between "normal" data and big data in terms of size.  However, I would assume that if managing the data for the purposes of business intelligence presents problems because of its size, then this V applies.

2.) Velocity - Part of the big data problem involves dealing with the speed at which the data comes in.  Decision makers want to know what is happening in their business and in the marketplace now, as opposed to experiencing a lag.  For example, if an announcement was just made with regards to a new line of business, what are people saying about that on Twitter now?  If there is a need to stream data in some way so that it can be analyzed in real time, then this V applies.

3.) Variety - Part of the big data problem involves dealing with various types of data.  Relational databases are good at storing structured data.  Structured data is data in which each element is placed into a fixed area (such as columns and rows in a database or an XML schema) that was created for that specific element's characteristics.  Dates belong here, integers belong there, etc.  Unstructured data, such as a tweet or the body of an email message, are more free form.  In other words, there is nothing governing the type of data that is stored in those environments.  If decision makers want to analyze various types of data (such as both structured and unstructured data) then this V applies.

While challenges may exist when experiencing only one or two of the above mentioned V's, the industry generally agrees that if all three characteristics apply to your data, then you have big data.

Also, remember to visit for more information on data warehousing and business intelligience.


No comments:

Post a Comment