A pig in the incubator

Posted on Sun 23 September 2007

The people at Yahoo have proposed their Pig project to enter the Apache Incubator. Pig is a high-level data processing language built on top of the low-level mapreduce primitives provided by Hadoop.

We've seen at Joost that mapreduce allows sophisticated data analysis to be performed, but having to write a Java program for every single analysis not only increases the development time compared to a traditional database but is also not compatible with the regular need for ad-hoc queries. SQL is still king for that, but the amount of data we have to process prevents its use.

So Pig, by bringing a SQL-like language to Hadoop, will help bridging the gap. And I wish next will come an integration with reporting tools like Eclipse BIRT, so that managers can build mapreduce jobs by themselves!



My first horse-riding lesson

Micro concurrency for Big throughput