Friday, May 25, 2012

Hadoop MapReduce

As mentioned before, one of the challenges associated with turning data into something that is truly valuable is the challenge of analyzing Big Data.  One of the tools being used by the business intelligence community to meet this challenge is a software framework called Hadoop.  Hadoop takes advantage of the MapReduce concept to quickly process large amounts of data.  To try and explain how this works, consider this example.

If mom goes to the grocery store alone, getting the groceries on her list will take a certain amount of time.  If mom, dad, and the two kids all go and each takes responsibility for getting 25% of the items, the total duration will be lessened.  However, this will require that the items in the four separate grocery carts are consolidated into one before being presented at the cash register.

In the same way, Hadoop places data onto different servers in a cluster.  When Hadoop is asked to execute a job, the servers each work on a piece of that job in parallel.  In other words, the job is divided and is mapped to the various servers.  When complete, each server has one piece of the solution.  That solution is then compiled (the reduce step) so that the many parts of the solution are reduced to the one complete solution.  This video does a great job of explaining this concept.

Image(s): FreeDigitalPhotos.net

No comments:

Post a Comment