Tuesday, November 8, 2016

Building Hadoop on Solaris

This is the first post in a series of posts about Hadoop on Solaris. I'll start out with building Hadoop (the latest stable version is 2.7.3) on Solaris both x86 and Sparc.

I am using the latest official Solaris 11.3  both on x86 and Sparc. My x86 machine is  a virtualbox image (with 5G Ram). My Sparc machine is  a Solaris 11.3 zone on a T5-2. The tools needed and the build procedure are identical for both platforms.
 
Installing tools needed for building

The following tools are from the official Oracle repository and can be simply installed using standard Solaris package commands, as follows:
 
# pkg install developer/gcc
# pkg install --accept developer/java/jdk
# pkg install developer/build/ant
# pkg install developer/build/cmake
# pkg install git

The following tools are not in the official Oracle repository so you need to download them and build them as follows: 
Protobuf 2.5 from google
Hadoop requires version 2.5 of protobuf since later versions are not backward compatible. You can either try reconstruct it from Google's github tree, or just get the sources from here.

This is a standard autoconf style project. so do: 
$ ./configure 
$ gmake 
$ sudo gmake install
Note! by default installs into /usr/local/bin (not in path) 
Edit .profile, adding /usr/local/bin to path and logout/in to effect change

Maven
Is the primary build tool used by Apache for many of their latest projects.

It is a java binary and does not need to be compiled, so just download it from http://maven.apache.org/download.cgi
  1. Unzip under /usr/local
  2. Add /usr/local/apache-maven-3.3.9/bin to path (in .profile)
  3. Finally test with mvn --version 
Maven expects to be connected to the internet during use for updating its internal modules needed for the build process.
 
If you are running behind a firewall or inside a VPN, you will need to inform maven of the proxy. Create file ~/.m2/settings file with the following content: 
 
<settings>
 <proxies>
  <proxy>
     <active>true</active>
     <protocol>http</protocol>
     <host>###.###.###.###</host>
     <port>80</port>
    </proxy>
 </proxies>
</settings>

Replace ###.###.###.### with the IP address of your proxy. 
 
Building Hadoop

Now we have all the tools in place, download Hadoop from the official repository and unpack it. Then cd to the source directory:

$ cd hadoop-2.7.3-src 

Install all the plugins for maven that are needed for building hadoop:

$ cd hadoop-maven-plugins
$ mvn install

Now, you can compile all the modules from the top of the source directory

$ cd ..
$ mvn compile -fae -l output.txt

I like to add fae ( fail at end) as well as the -l options to output all errors to a file which I can inspect at my leisure.

For all maven building options , run:

$ mvn--help

You can run the testsuite after compiling with

$ mvn test
 
Beware that this can take up to 10 hours if running from the root directory. Also, don't expect all tests to pass, even on Linux there are many failures.

My next blog will be on installing and running Hadoop.

Cheers!