Hadoop Maven Archetype

Today, I found out the easiest way to generate a maven based Hadoop project using a maven archetype. This will generate a sample Hadoop project which uses hadoop version 0.20.2. The sample project also contains the famous WordCount example. To generate the maven project type following on the command line

mvn archetype:generate -DarchetypeCatalog=http://dev.mafr.de/repos/maven2/ -DgroupId=com.hadoop.example -DartifactId=hadoop-example

You can also get more information about the archetype at http://blog.mafr.de/2010/08/01/maven-archetype-hadoop/

Maven classpath ordering lesson learnt

Today, I was working on a user story which spans across two modules(A and B) of our project (C).  As we follow test driven development, so I first wrote a test for the functionality that I need to add in module A and then wrote the piece of code(TDD is not the topic of this blog so please don’t go in detail).  The test passed with the green bar and i moved to the B module. I wrote a test and then  wrote a piece of code but this time test failed with error NoClassDefFoundError: org/objectweb/asm/CodeVisitor . I was a bit surprised why in one module A test is passing and in module B test is failing because both the projects had similar dependencies.

After googling, i found out that this error comes because hibernate has cglib-2.1_3.jar as the dependency which uses older version of asm jar which was having CodeVisitor class. CodeVisitor class has retired and does not exists in newer version of asm jars.  But now the issue was why junit testcase was passing in A module. To find why test case in A module pass I did the maven dependency check on both module using


mvn dependency:tree

In module A, I found out out that it loaded cglib-nodep-2.1_3.jar not cglib-2.1_3.jar . cglib-nodep-2.1_3.jar was loaded because easymockclassextension has dependency on cglib-nodep jar.

In module B, I found out that it loaded cglib-nodep-2.1_3.jar not cglib-nodep-2.1_3.jar. This jar was loaded because hibernate has dependency on cglib jar.

Now the problem was why in A module cglib-nodep jar is loaded but in B module cglib jar is loaded. I looked at the pom.xml and found out that in A module easymockclassextension dependency was declared before the hibernate dependency

<dependencies>
 <dependency>
 <groupId>org.easymock</groupId>
 <artifactId>easymock</artifactId>
 <version>2.5.2</version>
 <scope>test</scope>
 </dependency>
 <dependency>
 <groupId>org.easymock</groupId>
 <artifactId>easymockclassextension</artifactId>
 <version>2.5.2</version>
 <scope>test</scope>
 </dependency>

 <dependency>
 <groupId>log4j</groupId>
 <artifactId>log4j</artifactId>
 <version>1.2.15</version>
 </dependency>
 <dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-core</artifactId>
 <version>3.3.2.GA</version>
 </dependency>
 <dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-annotations</artifactId>
 <version>3.4.0.GA</version>
 </dependency>
 <dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-commons-annotations</artifactId>
 <version>3.3.0.ga</version>
 </dependency>
 <dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-entitymanager</artifactId>
 <version>3.4.0.GA</version>
 </dependency>
 <dependency>
 <groupId>junit</groupId>
 <artifactId>junit</artifactId>
 <version>3.8.1</version>
 <scope>test</scope>
 </dependency>
 </dependencies>

In module B pom.xml easymockclassextension dependency was declared after the hibernate dependency.

<dependencies>
 <dependency>
 <groupId>log4j</groupId>
 <artifactId>log4j</artifactId>
 <version>1.2.15</version>
 </dependency>
 <dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-core</artifactId>
 <version>3.3.2.GA</version>
 </dependency>
 <dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-annotations</artifactId>
 <version>3.4.0.GA</version>
 </dependency>
 <dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-commons-annotations</artifactId>
 <version>3.3.0.ga</version>
 </dependency>
 <dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-entitymanager</artifactId>
 <version>3.4.0.GA</version>
 </dependency>
 <dependency>
 <groupId>junit</groupId>
 <artifactId>junit</artifactId>
 <version>3.8.1</version>
 <scope>test</scope>
 </dependency>
 <dependency>
 <groupId>org.easymock</groupId>
 <artifactId>easymock</artifactId>
 <version>2.5.2</version>
 <scope>test</scope>
 </dependency>
 <dependency>
 <groupId>org.easymock</groupId>
 <artifactId>easymockclassextension</artifactId>
 <version>2.5.2</version>
 <scope>test</scope>
 </dependency>
 </dependencies>

Because easyclassextension was declared before the hibernate dependency, cglib-nodep was getting loaded hence test was passing.

In this way, I learn that in maven dependencies are loaded in the order they are mentioned in pom.xml. As of version 2.0.9 maven introduced deterministic ordering of dependencies on the classpath.The ordering is now preserved from your pom, with dependencies added by inheritence added last.