Map-reduce for dummies

Posted on Wed 31 January 2007
While browsing around, I incidentally found an article by Joel Spolsky where he introduces very simply and progressively the principles of the map-reduce pattern that underlies a big part of Google's infrastructure. A must read!

We use map-reduce at Joost to process usage data (roughly equivalent to the log files on a web server) and extract lots of useful information about the usage of the platform. This is built on Apache Hadoop, an open source implementation of map-reduce.

Considering the still limited numbers of users we have, map-reduce isn't absolutely necessary and a SQL database could have done the trick, but with the expected huge user base when Joost becomes generally available, having a solution that is able to scale mostly by throwing in more machines is a must have.


40

Joost on the Mac!