Installing HBase over HDFS on a Single Ubuntu Box


I faced some issues making HBase run over HDFS on my Ubuntu box. This is a informal step-by-step guide from setting up HDFS to running HBase on a single Ubuntu machine.

    1. Download hadoop (hadoop-0.20.203.0rc1.tar.gz)and install it following this great tutorial http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/. I installed on my system user rather than creating hduser. Make sure the 4 files (core-site.xml, hadoop-env.sh, hdfs-site.xml, mapred-site.xml) under hadoop/conf folder have values as shown below. Check the hadoop is working fine by running wordcount example as mentioned in tutorial. Also update .bashrc files with required variables.core-site.xml
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <!-- Put site-specific property overrides in this file. -->
      
      <configuration>
      
      <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/shekhar/hadoop-data</value>
        <description>A base for other temporary directories.</description>
      </property>
      
      <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:54310</value>
        <description>The name of the default file system.  A URI whose
        scheme and authority determine the FileSystem implementation.  The
        uri's scheme determines the config property (fs.SCHEME.impl) naming
        the FileSystem implementation class.  The uri's authority is used to
        determine the host, port, etc. for a filesystem.</description>
      </property>
      
      </configuration>
      
      
      <strong>hdfs-site.xml</strong>
      
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <!-- Put site-specific property overrides in this file. -->
      
      <configuration>
      
      <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        </description>
      </property>
      
      </configuration>
      

      mapred-site.xml

      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <!-- Put site-specific property overrides in this file. -->
      
      <configuration>
      
      <property>
        <name>mapred.job.tracker</name>
        <value>localhost:54311</value>
        <description>The host and port that the MapReduce job tracker runs
        at.  If "local", then jobs are run in-process as a single map
        and reduce task.
        </description>
      </property>
      
      </configuration>
      

      hadoop-env.sh

      # Set Hadoop-specific environment variables here.
      
      # The only required environment variable is JAVA_HOME.  All others are
      # optional.  When running a distributed configuration it is best to
      # set JAVA_HOME in this file, so that it is correctly defined on
      # remote nodes.
      
      # The java implementation to use.  Required.
      export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.26
      
      # Extra Java CLASSPATH elements.  Optional.
      # export HADOOP_CLASSPATH=
      
      # The maximum amount of heap to use, in MB. Default is 1000.
      # export HADOOP_HEAPSIZE=2000
      
      # Extra Java runtime options.  Empty by default.
      # export HADOOP_OPTS=-server
      
      # Command specific options appended to HADOOP_OPTS when specified
      export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
      export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
      export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
      export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
      export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
      # export HADOOP_TASKTRACKER_OPTS=
      # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
      # export HADOOP_CLIENT_OPTS
      
      # Extra ssh options.  Empty by default.
      # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
      
      # Where log files are stored.  $HADOOP_HOME/logs by default.
      # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
      
      # File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
      # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
      
      # host:path where hadoop code should be rsync'd from.  Unset by default.
      # export HADOOP_MASTER=master:/home/$USER/src/hadoop
      
      # Seconds to sleep between slave commands.  Unset by default.  This
      # can be useful in large clusters, where, e.g., slave rsyncs can
      # otherwise arrive faster than the master can service them.
      # export HADOOP_SLAVE_SLEEP=0.1
      
      # The directory where pid files are stored. /tmp by default.
      # export HADOOP_PID_DIR=/var/hadoop/pids
      
      # A string representing this instance of hadoop. $USER by default.
      # export HADOOP_IDENT_STRING=$USER
      
      # The scheduling priority for daemon processes.  See 'man nice'.
      # export HADOOP_NICENESS=10
      
    2. Download HBase(version hbase-0.90.4.tar.gz). Update hbase-site.xml in hbase/conf folder with required properties.
      hbase-site.xml

      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <configuration>
      
      	<property>
      		<name>hbase.rootdir</name>
          		<value>hdfs://localhost:54310/hbase</value>
      	</property>
      
      	<property>
      		<name>dfs.replication</name>
      		<value>1</value>
      	</property>
      
      	<property>
      	      <name>hbase.zookeeper.property.clientPort</name>
      	      <value>2222</value>
      	      <description>Property from ZooKeeper's config zoo.cfg.
      	      The port at which the clients will connect.
      	      </description>
          	</property>
      	<property>
      	      <name>hbase.zookeeper.quorum</name>
      	      <value>localhost</value>
      	      <description>Comma separated list of servers in the ZooKeeper Quorum.
      	      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      	      By default this is set to localhost for local and pseudo-distributed modes
      	      of operation. For a fully-distributed setup, this should be set to a full
      	      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
      	      this is the list of servers which we will start/stop ZooKeeper on.
      	      </description>
      	</property>
          <property>
            <name>hbase.zookeeper.property.dataDir</name>
            <value>/home/shekhar/zookeeper</value>
            <description>Property from ZooKeeper's config zoo.cfg.
            The directory where the snapshot is stored.
            </description>
          </property>
      
      </configuration>
      

      Update hbase-env.sh so that HBase should manage ZooKeeper.

      # Tell HBase whether it should manage it's own instance of Zookeeper or not.
      export HBASE_MANAGES_ZK=true
      
    3. Run hbase using ./start-hbase.sh in bin folder. You will see following exception in log file.
      2011-12-06 13:59:29,979 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
      java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
      
      2011-12-06 13:59:30,577 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
      2011-12-06 13:59:30,577 WARN org.apache.zookeeper.ClientCnxn: Session 0x134127deaaf0002 for server null, unexpected error, closing socket connection and attempting reconnect
      java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
      at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
      

      Kill the HBase using kill -9 <processid>

    4. The exception in step 3 is because hadoop jar in hbase lib directory is different from the one used in hadoop. Copy the hadoop-core-0.20.203.0.jar in hadoop folder to the hbase/lib folder.
    5. start the hbase again using ./start-hbase.sh and you will get another exception
      2011-12-06 14:51:05,778 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
      java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
      

      Kill the HBase using kill -9 <proceesid>

    6. To fix this copy commons-configuration-1.6.jar from hadoop lib folder to hbase lib folder.
    7. Start the hbase again using ./start-hbase.sh it should start fine now and you should be able to see hbase running at http://localhost:60010/master.jsp . If you see a valid page coming
      hbase has started fine.

18 thoughts on “Installing HBase over HDFS on a Single Ubuntu Box”

  1. Great help shekhar..This is the best article a beginner could find..I have been struggling since last many days and finally today I was able to successfully run hbase in atleast pseudo distributed mode..Many many thanks to you.Keep the good work

  2. Hi Shekhar,
    I am having problem with hbase. i have configured hadoop and it is working fine. But after configuring hbase, the UI of HMaster is not launching and also not creating table in shell. Following are the details:
    I am using hadoop 0.20.2 and HBase 0.90.4 on 2 nodes
    1. umaster : namenode, sec namenode, jobtracker, HMaster
    /etc/hosts file :
    172.25.20.74 umaster
    172.25.20.93 slavee

    2. slavee : datanode, tasktracker, HRegionserver
    /etc/hosts file :
    172.25.20.74 umaster
    172.25.20.93 slavee

    UI for HMaster i.e umaster:60010 gives error like problem accessing master.jsp caused by exception: HRegionInfo was null or empty

    And while creating table it gives error like retriesexhaustedexception and gives exception as java.io.IOException: HRegionInfo was null or empty in -ROOT-

    Please give me the solution for this.

  3. Great pointers. I am using hadoop-0.20.205.0 with hbase-0.90.5. I finally managed to get HBase running by using the clustering directive in hbase-site.xml and replacing jar files as you indicated. The hbase directive is as below;

    hbase.cluster.distributed
    true

    The jar file I removed (rename to ignore) was;

    hadoop-core-0.20-append-r1056497.jar

    The jar files I copied from the hadoop distribution were;

    hadoop-core-0.20.205.0.jar
    commons-configuration-1.6.jar

    thanks,

    Kazan.

  4. Shekar … great work:) I think u can also include a small example for the beginners to start with.!

  5. Very good setup instructions. Helped me a lot. Actually was struggling to setup hbase over hdfs in standalone but solved easily following the above steps. Thank you very much.

  6. while running hbase every other component of hbase is running except hmaster
    and when i stop the hbase i give following message

    no master to stop because kill of pid 8640 failed with status 1
    localhost: stopping zookeeper.

    please help me.

  7. thx for all,
    your instruction success for me, but
    i have problem with hbase shell, i running
    1. $ bin/start-hbase.sh
    2. $ bin/hbase shell
    3. hbase(main):001:0> create ‘table’,’column1′

    ERROR: org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

    please help me, why occur?

  8. This is the perfect webpage for anyone who
    would like to find out about this topic. You realize a whole lot its almost hard to argue with you (not that I personally will need to…HaHa).
    You definitely put a brand new spin on a topic that has been written about for
    ages. Excellent stuff, just wonderful!

  9. hello sir.
    i have problem in conneting zookeeper
    INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.0.1.56/10.0.1.56:2222. Will not attempt to authenticate using SASL (unknown error)
    2013-07-22 02:18:01,986 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

    please help me .. i am doing it for many days but not able to solve this .

  10. I am new to HBASE, and while trying to install the same on Ubuntu system, I am facing some problem.

    Below is the error log from Zookeeper log file

    2014-01-18 06:10:51,392 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x143a5b052980000, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:744) 2014-01-18 06:10:51,394 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:56671 which had sessionid 0x143a5b052980000

    Below is error log from master log:

    2014-01-18 06:10:51,381 INFO org.apache.zookeeper.ZooKeeper: Session: 0x143a5b052980000 closed 2014-01-18 06:10:51,381 INFO org.apache.hadoop.hbase.master.HMaster: HMaster main thread exiting 2014-01-18 06:10:51,381 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master java.lang.RuntimeException: HMaster Aborted at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2120)

    Please note, I am able to start Hbase successfully. I mean after starting Hbase, I am able to see Hmaster running using jps command. But as soon as I try to go to Hbase shell, this issue arises and then by executing jps command, I don’t find Hmaster in list.

    Please help me in this issue, as I tried to solve it by myself from last for days, but no luck. Please help

    1. Hi Swati,

      I have never seen this exception and I am not much into Hbase these days. I can work with you via Google Hangout, but I guess that would help much.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: