Skip to content

Install and work with Apache HBase

  1. Prerequisites
  2. Download, Configure, and Start HBase in Standalone Mode
  3. Working with HBase via command line
  4. Working with HBase from Python via HappyBase


Prerequisites

As you may know, there are three options to install HBase (see Quick Start - Standalone HBase:

  • A standalone instance which has all HBase daemons - the Master, RegionServers, and ZooKeeper - running in a single JVM and persisting to the local filesystem. It is the most simple and basic deploy profile.

    A sometimes useful variation on standalone hbase has all daemons running inside the one JVM but rather than persist to the local filesystem, instead they persist to an HDFS instance. (see: 5.1.1. Standalone HBase over HDFS

  • pseudo-distributed which is distributed but all daemons run on a single node. Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the Hadoop Distributed File System (HDFS).
  • fully-distributed where the daemons are spread across all nodes in the cluster. Fully-distributed mode can ONLY run on HDFS

Whatever your mode, you will need to configure HBase by editing files in the HBase conf directory. At a minimum, you must edit conf/hbase-env.sh to tell HBase which java to use. In this file you set HBase environment variables such as the heapsize and other options for the JVM, the preferred location for log files, etc. Set JAVA_HOME to point at the root of your java install.


Download, Configure, and Start HBase in Standalone Mode

  1. Step 1 - download
    Choose a download site from this list of Apache Download Mirrors. Click on the suggested top link. This will take you to a mirror of HBase Releases. Click on the folder named stable and then download the binary file that ends in .tar.gz to your local filesystem. Do not download the file ending in src.tar.gz for now.

    Download HBase from HBase download page (in my case I downloaded HBase 2.5 which is quite fresh as of time I'm writing this; version 3.0 is available but not stable):

    as well as file(s) to check correctnes of the binary file:

    and saved them in (of course you can select different location):

  2. Step 2 - verrify downloaded file integrity.

  3. Step 3 - unpack downloaded file.

  4. Step 4 - move uncompressed files (hbase-2.5.0 directory) to new, working directory.

    Above you can see other components which are present in my system but not required for this tutori

  5. Step 5 - remove compressed file.

  6. Step 6 - check if the JAVA_HOME environment variable is set.

    If not, you can do this manually:

    or edit conf/hbase-env.sh and uncomment the line starting with #export JAVA_HOME= or add it, and then set it to your Java installation path.

    You can set it also in your environment config file (for example ~/.bashrc) by adding to it the following line

  7. Step 7 - configure
    Because HBase depends on Hadoop, it bundles Hadoop jars under its lib directory. The bundled jars are ONLY for use in stand-alone mode. In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jars found in the HBase lib directory with the equivalent hadoop jars from the version you are running on your cluster to avoid version mismatch issues. Make sure you replace the jars under HBase across your whole cluster. ([see also])


    If you are going to start HBase in stand-alone mode, be sure that no other Hadoop component is running and no Hadoop environment is set. If you forget about this, you will get some strange messages like:

    This is why in my file I have to comment the following lines:

    To make this, you can use any editor you like:

    In this case you should logout and again login to "clear" all environment variables which may be set. The following command:

    will not do this. After source $HADOOP_HOME is still defined:

    After re-login:

  8. Step 8 - start HBase
    Use start-hbase.sh, hbase shell, stop-hbase.sh to start, interact with and stop HBase:

    The bin/start-hbase.sh script is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. You can use the jps command to verify that you have one running process called HMaster. In standalone mode HBase runs all daemons within this single JVM, i.e. the HMaster, a single HRegionServer, and the ZooKeeper daemon. Go to http://localhost:16010 to view the HBase Web UI:

  9. Step 9 - start HBase shell to allow interaction

  10. Step 10 - stop HBase


Working with HBase via command line

You may find useful the following documents:

Start HBase:

Start HBase Shell:

Check existing tables:

If you want, you may create a namespace for your table(s):

Create a table table1 in namespace test:

Verify if table exists:

Add some data to your table. Pay attention to order of keys:

Count the number of items (records) in table:

Scan table to get all elements:

Scan range of table providing STARTROW and/or STOPROW. Note that the startrow is inclusive while the stoprow is exclusive.


Scan only selected columns limiting to a given number of elements:

Scan table providing prefix for row key:

You can also redirect result of scan to a file with the following command (execute this command not from HBase shell but from terminal):

Now you can check what scan returned:


If you need to load a lot of data, you may find interesting this material: Load data from hdfs to hbase


Working with HBase from Python via HappyBase

You may find useful the following documents:

In my case for develping Python code I use PyCharm. So as a first step open PyCharm, create a project named hbase_test and save it in your desktop directory (folder) (Pulpit in my case). Next you should install HappyBase module. Unfortunatelly it my case PyCharm failed to install HappyBase so I had to do (if it works for you, please skip the following set of commands):

Now HappyBase should be ready so you can run HBase and Thrift service you will use to communicate with HBase:

Finally use the following code to test simple interaction with your database:

After runing it from PyCharm, in terminal window you will see the following result: