Category Archives: nosql

Developing Single Page Web Applications using Java 8, Spark, MongoDB, and AngularJS

In this post you will learn how to use a micro framework called Spark to build a RESTful backend. The RESTful backend is consumed by a single page web application using AngularJS and MongoDB for data storage. I’ll also show you how to run Java 8 on OpenShift. Read the full blog here https://www.openshift.com/blogs/developing-single-page-web-applications-using-java-8-spark-mongodb-and-angularjs

MongoDB Query Tip : Find All The Documents Where Array Length is Greater Than N

Suppose we have blog document in blogs collection as shown below.

> db.blogs.insert({author : "Shekhar Gulati","title":"Hello World","text":"Hello World!!","tags":["mongodb","openshift"]})
>
>
> db.blogs.insert({author : "Shekhar Gulati","title":"Hello World","text":"Hello World!!","tags":["mongodb","openshift","nosql"]})

Now you want to find out all those blogs which have more than 2 tags then query is shown below.

> db.blogs.find({$where : "this.tags.length > 2"}).pretty()
{
	"_id" : ObjectId("51011037bf779459a978f96f"),
	"author" : "Shekhar Gulati",
	"title" : "Hello World",
	"text" : "Hello World!!",
	"tags" : [
		"mongodb",
		"openshift",
		"nosql"
	]
}

How to rename field in all the MongoDB documents?

Today I was faced with a situation where in I need to rename a field in all the MongoDB documents. The best way to do this is using $rename operator shown below.

db.post.update ( {}, { $rename : { "creationDate" : "creationdate" }},false,true )

Here true corresponds to updating all the all the documents i.e. multi is true.

From MongoDB documentation. Here’s the MongoDB shell syntax for update():

db.collection.update( criteriaobjNewupsertmulti )

Arguments:

  • criteria – query which selects the record to update;
  • objNew – updated object or $ operators (e.g., $inc) which manipulate the object
  • upsert – if this should be an “upsert” operation; that is, if the record(s) do not exist, insert one. Upsert only inserts a single document.
  • multi – indicates if all documents matching criteria should be updated rather than just one. Can be useful with the $ operators below.

How Working Set Affects MongoDB Performance?

This is third post in my series of posts on MongoDB. This post will talk about how working set affects performance of MongoDB. The idea of this experiment came to me after I read a very good blog from Colin Howe on MongoDB Working Set. The tests performed in this blog are on similar lines as the ones talked by Colin Howe but performed with Java and MongoDB version 2.0.1. If you have worked with MongoDB or read about it you might have heard of the term Working Set. Working Set is the amount of data(including indexes) that will be in used by your application and if this data fits in RAM then the application performance will be great else it would degrade drastically When the data can’t fit in RAM MongoDB has to hit disk which impacts performance.  I recommend reading blog from Adrian Hills on the importance of Working Set. To help you understand working set better I am citing the example from Adrian blog :

Suppose you have 1 year’s worth of data. For simplicity, each month relates to 1GB of data giving 12GB in total, and to cover each month’s worth of data you have 1GB worth of indexes again totalling 12GB for the year.

If you are always accessing the last 12 month’s worth of data, then your working set is: 12GB (data) + 12GB (indexes) = 24GB.

However, if you actually only access the last 3 month’s worth of data, then your working set is: 3GB (data) + 3GB (indexes) = 6GB.

From the example above if your machine has more than 6GB RAM then your application will perform great otherwise it will be slow. The important thing to know about working set is that MongoDB uses LRUstrategy to decide which documents are in RAM and you can’t tell MongoDB to keep a particular document or collection in RAM. Now that you know what is working set and how important it is let’s start the experiment.

Setup 

Dell Vostro Ubuntu 11.04 box with 4 GB RAM and 300 GB hard disk. Java 6 MongoDB 2.0.1 Spring MongoDB 1.0.0.M5 which internally uses MongoDB Java driver 2.6.5 version.

Document

The documents I am storing in MongoDB looks like as shown below. The average document size is 2400 bytes.  Please note the _id field also has an index. The index that I will be creating will be on name field.

{
"_id" : ObjectId("4ed89c140cf2e821d503a523"),
"name" : "Shekhar Gulati",
"someId1" : NumberLong(1000006),
"str1" : "U",
"date1" : ISODate("1997-04-10T18:30:00Z"),
"index" : 1,
"bio" : "I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a
Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. "
}

Test Case

The test case will run 6 times with 10k, 100k, 1 million,2 million, 3 million, and 10 million records. The document used is shown under document heading and is the same as the one used in first post just with one extra field index(a simple int field which just auto increments by one) . Before inserting the records index is created on index field and then records are inserted in batch of 100 records. Finally 30000 queries are performed on a selected part of the collection. The selection varies from 1% of the data in the collection to 100% of the collection. The queries were performed 3 times on the selected dataset to give MongoDB chance to put the selected dataset in RAM. The JUnit test case is shown below.

	@Test
	public void workingSetTests() throws Exception {
		benchmark(10000);
		cleanMongoDB();
		benchmark(100000);
		cleanMongoDB();
		benchmark(1000000);
		cleanMongoDB();
		benchmark(2000000);
		cleanMongoDB();
		benchmark(3000000);
		cleanMongoDB();
		benchmark(10000000);
		cleanMongoDB();
	}

	private void benchmark(int totalNumberOfKeys) throws Exception {
		IndexDefinition indexDefinition = new Index("index", Order.ASCENDING)
				.named("index_1");
		mongoTemplate.ensureIndex(indexDefinition, User.class);
		int batchSize = 100;
		int i = 0;

		long startTime = System.currentTimeMillis();
		LineIterator iterator = FileUtils.lineIterator(new File(FILE_NAME));
		while (i < totalNumberOfKeys && iterator.hasNext()) {
			List users = new ArrayList();
			for (int j = 0; j < batchSize; j++) {
				String line = iterator.next();
				User user = convertLineToObject(line);
				user.setIndex(i);
				users.add(user);
				i += 1;
			}
			mongoTemplate.insert(users, User.class);

		}
		long endTime = System.currentTimeMillis();
		logger.info(String.format("%d documents inserted took %d milliseconds",	totalNumberOfKeys, (endTime - startTime)));

		performQueries(totalNumberOfKeys, 1);
		performQueries(totalNumberOfKeys, 1);
		performQueries(totalNumberOfKeys, 1);
		performQueries(totalNumberOfKeys, 10);
		performQueries(totalNumberOfKeys, 10);
		performQueries(totalNumberOfKeys, 10);
		performQueries(totalNumberOfKeys, 100);
		performQueries(totalNumberOfKeys, 100);
		performQueries(totalNumberOfKeys, 100);

		String collectionName = mongoTemplate
				.getCollectionName(User.class);
		CommandResult stats = mongoTemplate.getCollection(collectionName)
				.getStats();
		logger.info("Stats : " + stats);
		double size = stats.getDouble("storageSize");
		logger.info(String
				.format("Storage Size : %.2f M", size / (1024 * 1024)));

	}

	private void performQueries(int totalNumberOfKeys, int focus) {
		int gets = 30000;
		long startTime = System.currentTimeMillis();
		for (int index = 0; index < gets; index++) {
			Random random = new Random();
			boolean focussedGet = random.nextInt(100) != 0;
			int key = 0;
			if (focussedGet) {
				key = random.nextInt((totalNumberOfKeys * focus) / 100);
			} else {
				key = random.nextInt(totalNumberOfKeys);
			}
			mongoTemplate.findOne(Query.query(Criteria.where("index").is(key)),
					User.class);
		}
		long endTime = System.currentTimeMillis();
		logger.info(String.format("%d gets (focussed on bottom %d%%) took %d milliseconds", gets,focus, (endTime - startTime)));
	}

Results

In the table above number of records are in million and time to do 30k queries is in seconds.

One thing that this data clearly shows is that if you have you have working set which can fit in RAM performance almost remains same agnostic of total number of documents in MongoDB. This can easily be seen by comparing 3 run of 1 % dataset of all dataset values. Performance of 30k queries on 1% of dataset on both  3 million records and 10 million records are very close.

Using MongoDB Replica Set With Spring MongoDB 1.0.0.RC1

The primary means for replication is to ensure data survives single or multiple machine failures. The more replicas you have, the more likely is your data to survive one or more hardware crashes. With three replicas, you can afford to lose two nodes and still serve the data. MongoDB supports two forms of replication, Replica Sets and Master Slave. Replica Sets is the recommended way to do replication in MongoDB and will cover only Replica Sets in this post.

Couple of weeks back I was working in POC where we need to set up MongoDB replication. As I am Spring aficionado I decided to use Spring MongoDB to interact with Replica Set. We used Spring Roo to quickly bootstrap the project. All the project setup, Spring MongoDB setup, JUnit test cases, evern Spring MVC UI was created in minutes thanks to Spring Roo. I am big Spring Roo fan — I just love it. Thanks SpringSource for such an amazing project. Spring Roo uses Spring MongoDB version 1.0.0.M5 which has a bug that it does not support WriteConcern value REPLICAS_SAFE. But with the current release 1.0.0.RC1 that issue has been fixed and now you can use REPLICAS_SAFE. REPLICAS_SAFE is the recommended value for WriteConcern in case of replication.  This is a step by step guide from creation of Spring project to working MongoDB replica set.

  1. Create the project using Spring Roo. If you are not aware of Spring Roo you can read my Spring Roo series. I am using Spring Roo to quickly configure a Spring MongoDB project.
    project --topLevelPackage com.xebia.mongodb.replication --projectName mongodb-replication-demo --java 6
    mongo setup --databaseName bookshop --host localhost --port 27017
    entity mongo --class ~.domain.Book --testAutomatically --identifierType org.bson.types.ObjectId
    field string --fieldName title --notNull
    field string --fieldName author --notNull
    field number --type double --fieldName price --notNull
    repository mongo --interface ~.repository.BookRepository --entity ~.domain.Book
    

    This will create a Spring maven project, configure MongoDB to work with Spring, create one Collection Book and will add three fields title, author, and price to the collection. All the CRUD operations will carried out using BookRepository.

  2.  Start the MongoDB server using ./mongod and run BookIntegrationTest and make sure all tests pass.
  3. Setup replica set following the MongoDB documentation http://www.mongodb.org/display/DOCS/Replica+Set+Tutorial.
  4. Update the applicationContext-mongo.xml as shown below but before add the property mongo.replicaset which will have all nodes.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <beans xmlns="http://www.springframework.org/schema/beans" xmlns:cloud="http://schema.cloudfoundry.org/spring" xmlns:context="http://www.springframework.org/schema/context" xmlns:mongo="http://www.springframework.org/schema/data/mongo" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd        http://www.springframework.org/schema/data/mongo        http://www.springframework.org/schema/data/mongo/spring-mongo-1.0.xsd        http://www.springframework.org/schema/beans        http://www.springframework.org/schema/beans/spring-beans-3.0.xsd        http://schema.cloudfoundry.org/spring http://schema.cloudfoundry.org/spring/cloudfoundry-spring-0.8.xsd">
    
        <mongo:db-factory dbname="${mongo.database}" id="mongoDbFactory" mongo-ref="mongo"/>
    
        <mongo:repositories base-package="com.xebia.mongodb.replication"/>
    
        <!-- To translate any MongoExceptions thrown in @Repository annotated classes -->
        <context:annotation-config/>
    
        <bean class="org.springframework.data.mongodb.core.MongoTemplate" id="mongoTemplate">
            <constructor-arg ref="mongoDbFactory"/>
        </bean>
    
    	<mongo:mongo id="mongo" replica-set="${mongo.replicaset}" write-concern="REPLICA_SAFE">
    		<mongo:options auto-connect-retry="true"/>
    	</mongo:mongo>
    </beans>
    

    If you run the tests again all the tests will fail and you will see following exception.

    Caused by: org.springframework.beans.factory.BeanDefinitionStoreException: Unexpected exception parsing XML document from file [/home/shekhar/dev/workspaces/writing/mongodb-replication-demo/target/classes/META-INF/spring/applicationContext-mongo.xml]; nested exception is java.lang.ArrayIndexOutOfBoundsException: 1
    at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.
    doLoadBeanDefinitions(XmlBeanDefinitionReader.java:412)
    

    The reason for this exception is because there is a bug in Spring MongoDB 1.0.0.M5 which is not able to parse WriteConcern REPLICA_SAFE value.

  5. To make it work we have to work with Spring MongoDB latest version 1.0.0.RC1. This is released just 3 days back on 7th December 2011.Update the pom.xml with 1.0.0.RC1.
     <dependency>
    	<groupId>org.springframework.data</groupId>
            <artifactId>spring-data-mongodb</artifactId>
            <version>1.0.0.RC1</version>
    </dependency>
    

    Run the BookIntegrationTest the tests will fail again and see the following exception stacktraces.

    java.lang.NoSuchMethodError: org.springframework.core.annotation.AnnotationUtils
    .getAnnotation(Ljava/lang/reflect/AnnotatedElement;Ljava/lang/Class;)
    Ljava/lang/annotation/Annotation;
    at org.springframework.transaction.annotation.SpringTransactionAnnotationParser
    .parseTransactionAnnotation(SpringTransactionAnnotationParser.java:38)
    
  6. To make it ran you have to use latest Spring version 3.1.0.RC2 in pom.xml
    <spring.version>3.1.0.RC2</spring.version>
    
  7. Final change you need to make is in applicationContext-mongo.xml. Change the value of write-concern to REPLICAS_SAFE.
    <mongo:mongo id="mongo" replica-set="${mongo.replicaset}" write-concern="REPLICAS_SAFE">
    	<mongo:options auto-connect-retry="true"/>
    </mongo:mongo>
    
  8. Run the tests and all the tests will pass.

Are We Really Talking About Commodity Hardware When Working With MongoDB?

Last couple of months I have been reading, learning, playing with MongoDB and one thing that I have read or found myself is that its performance depends largely on the amount of RAM in your system. As a general rule larger the RAM better the performance which I can easily understand as you are not hitting disk so you get great performance. When we talk about commodity hardware I think we talk about 4GB or at max 8 GB RAM boxes which means if your application working set can fit in 4GB or 8 GB RAM you are good otherwise your performance will suffer. Then you have two choices either add more RAM or horizontally scale your system i.e Sharding. To me adding more RAM means you are moving away from commodity hardware and moving toward big costly boxes. So we should horizontally scale our system by adding more 4 GB or 8GB RAM boxes. Correct??

I thought companies or people who are using MongoDB would have been following this approach i.e. they are using commodity boxes and scaling their systems. But I was wrong. Most of presentations (from companies like Craiglist and ForeSquare) that I saw are using big 64 GB or more RAM, faster disks. So where are we talking about commodity hardware?

 

Installing HBase over HDFS on a Single Ubuntu Box

I faced some issues making HBase run over HDFS on my Ubuntu box. This is a informal step-by-step guide from setting up HDFS to running HBase on a single Ubuntu machine.

    1. Download hadoop (hadoop-0.20.203.0rc1.tar.gz)and install it following this great tutorial http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/. I installed on my system user rather than creating hduser. Make sure the 4 files (core-site.xml, hadoop-env.sh, hdfs-site.xml, mapred-site.xml) under hadoop/conf folder have values as shown below. Check the hadoop is working fine by running wordcount example as mentioned in tutorial. Also update .bashrc files with required variables.core-site.xml
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <!-- Put site-specific property overrides in this file. -->
      
      <configuration>
      
      <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/shekhar/hadoop-data</value>
        <description>A base for other temporary directories.</description>
      </property>
      
      <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:54310</value>
        <description>The name of the default file system.  A URI whose
        scheme and authority determine the FileSystem implementation.  The
        uri's scheme determines the config property (fs.SCHEME.impl) naming
        the FileSystem implementation class.  The uri's authority is used to
        determine the host, port, etc. for a filesystem.</description>
      </property>
      
      </configuration>
      
      
      <strong>hdfs-site.xml</strong>
      
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <!-- Put site-specific property overrides in this file. -->
      
      <configuration>
      
      <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        </description>
      </property>
      
      </configuration>
      

      mapred-site.xml

      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <!-- Put site-specific property overrides in this file. -->
      
      <configuration>
      
      <property>
        <name>mapred.job.tracker</name>
        <value>localhost:54311</value>
        <description>The host and port that the MapReduce job tracker runs
        at.  If "local", then jobs are run in-process as a single map
        and reduce task.
        </description>
      </property>
      
      </configuration>
      

      hadoop-env.sh

      # Set Hadoop-specific environment variables here.
      
      # The only required environment variable is JAVA_HOME.  All others are
      # optional.  When running a distributed configuration it is best to
      # set JAVA_HOME in this file, so that it is correctly defined on
      # remote nodes.
      
      # The java implementation to use.  Required.
      export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.26
      
      # Extra Java CLASSPATH elements.  Optional.
      # export HADOOP_CLASSPATH=
      
      # The maximum amount of heap to use, in MB. Default is 1000.
      # export HADOOP_HEAPSIZE=2000
      
      # Extra Java runtime options.  Empty by default.
      # export HADOOP_OPTS=-server
      
      # Command specific options appended to HADOOP_OPTS when specified
      export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
      export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
      export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
      export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
      export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
      # export HADOOP_TASKTRACKER_OPTS=
      # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
      # export HADOOP_CLIENT_OPTS
      
      # Extra ssh options.  Empty by default.
      # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
      
      # Where log files are stored.  $HADOOP_HOME/logs by default.
      # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
      
      # File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
      # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
      
      # host:path where hadoop code should be rsync'd from.  Unset by default.
      # export HADOOP_MASTER=master:/home/$USER/src/hadoop
      
      # Seconds to sleep between slave commands.  Unset by default.  This
      # can be useful in large clusters, where, e.g., slave rsyncs can
      # otherwise arrive faster than the master can service them.
      # export HADOOP_SLAVE_SLEEP=0.1
      
      # The directory where pid files are stored. /tmp by default.
      # export HADOOP_PID_DIR=/var/hadoop/pids
      
      # A string representing this instance of hadoop. $USER by default.
      # export HADOOP_IDENT_STRING=$USER
      
      # The scheduling priority for daemon processes.  See 'man nice'.
      # export HADOOP_NICENESS=10
      
    2. Download HBase(version hbase-0.90.4.tar.gz). Update hbase-site.xml in hbase/conf folder with required properties.
      hbase-site.xml

      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <configuration>
      
      	<property>
      		<name>hbase.rootdir</name>
          		<value>hdfs://localhost:54310/hbase</value>
      	</property>
      
      	<property>
      		<name>dfs.replication</name>
      		<value>1</value>
      	</property>
      
      	<property>
      	      <name>hbase.zookeeper.property.clientPort</name>
      	      <value>2222</value>
      	      <description>Property from ZooKeeper's config zoo.cfg.
      	      The port at which the clients will connect.
      	      </description>
          	</property>
      	<property>
      	      <name>hbase.zookeeper.quorum</name>
      	      <value>localhost</value>
      	      <description>Comma separated list of servers in the ZooKeeper Quorum.
      	      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      	      By default this is set to localhost for local and pseudo-distributed modes
      	      of operation. For a fully-distributed setup, this should be set to a full
      	      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
      	      this is the list of servers which we will start/stop ZooKeeper on.
      	      </description>
      	</property>
          <property>
            <name>hbase.zookeeper.property.dataDir</name>
            <value>/home/shekhar/zookeeper</value>
            <description>Property from ZooKeeper's config zoo.cfg.
            The directory where the snapshot is stored.
            </description>
          </property>
      
      </configuration>
      

      Update hbase-env.sh so that HBase should manage ZooKeeper.

      # Tell HBase whether it should manage it's own instance of Zookeeper or not.
      export HBASE_MANAGES_ZK=true
      
    3. Run hbase using ./start-hbase.sh in bin folder. You will see following exception in log file.
      2011-12-06 13:59:29,979 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
      java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
      
      2011-12-06 13:59:30,577 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
      2011-12-06 13:59:30,577 WARN org.apache.zookeeper.ClientCnxn: Session 0x134127deaaf0002 for server null, unexpected error, closing socket connection and attempting reconnect
      java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
      at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
      

      Kill the HBase using kill -9 <processid>

    4. The exception in step 3 is because hadoop jar in hbase lib directory is different from the one used in hadoop. Copy the hadoop-core-0.20.203.0.jar in hadoop folder to the hbase/lib folder.
    5. start the hbase again using ./start-hbase.sh and you will get another exception
      2011-12-06 14:51:05,778 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
      java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
      

      Kill the HBase using kill -9 <proceesid>

    6. To fix this copy commons-configuration-1.6.jar from hadoop lib folder to hbase lib folder.
    7. Start the hbase again using ./start-hbase.sh it should start fine now and you should be able to see hbase running at http://localhost:60010/master.jsp . If you see a valid page coming
      hbase has started fine.