Friday, May 25, 2012
If mom goes to the grocery store alone, getting the groceries on her list will take a certain amount of time. If mom, dad, and the two kids all go and each takes responsibility for getting 25% of the items, the total duration will be lessened. However, this will require that the items in the four separate grocery carts are consolidated into one before being presented at the cash register.
In the same way, Hadoop places data onto different servers in a cluster. When Hadoop is asked to execute a job, the servers each work on a piece of that job in parallel. In other words, the job is divided and is mapped to the various servers. When complete, each server has one piece of the solution. That solution is then compiled (the reduce step) so that the many parts of the solution are reduced to the one complete solution. This video does a great job of explaining this concept.