I've been following this blog:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
I'm using Mac Mini's to build a Hadoop cluster. Big Data, and all that.
After successfully figuring out how to image the Ubuntu Server install from one mac mini to many, I've got a 9-node cluster running tip-top.
The WordCount example algorithm is running at ~22 minutes to parse a 20GB file. It's a nearly linear progression from a 2 and 4-node cluster.
The recent breakthrough with the multi-node (more than 2 node) cluster is that each /etc/hosts file needs to have ALL the nodes in it... not just the master and itself.
Once I did that, the instructions work and life is grand :)
Thanks to Michael Noll for putting this together AND maintaining it for those of us making the first inroads to Hadoop and Big Data.
I'm off to build an 81-node cluster, and start pushing some real data through it.
No comments:
Post a Comment