HBase install in 2024 year
Initial version: 2024-10-08
Last update: 2024-10-08
In this tutorial you will learn how to install HBase.
Most of the steps are performed according to description given in my previous tutorial
Install and work with Apache HBase
Comunication from host to guest
My goal in this instalation is to treat virtual machine like a remote server so virtual machine will host HBase, serve web application dedicated to use it etc. while client code will run on host.
VirtualBox offers you a planty of
networking modes two of which are in our area of interest:
Bridged networking and
NAT Network. The first one allows you to run servers in a guest, while the second is mostly dedicated to communication initiated from guest.
In my case option number one, the easiest, wasn't possible to apply so I had to do some "tricks" to work with NAT. The "trick" is well known as a
port forwarding. It occured to be not so hard – for some details please refer to
VirtualBox Network Settings: Complete Guide
|
Figure: Port forwarding setup |
The guest IP
10.0.2.15
is the default IP address assigned to every virtual machine by VirtualBox.
The first rule allows you to use
ssh
and connect from host to guest. Second will be used by comunication with HBase.
It may sounds silly but remember to install
ssh
on guest befor you start to test it. First test connection from guest to gest:
nosql@nosql-virtualbox:~$ ssh 127.0.0.1
The authenticity of host '127.0.0.1 (127.0.0.1)' can't be established.
ED25519 key fingerprint is SHA256:JutNl+moUSIwNnE2Tw6YiynbWh9Vl3MrOqmQEYsK1+s.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '127.0.0.1' (ED25519) to the list of known hosts.
nosql@127.0.0.1's password:
Welcome to Ubuntu 24.04.1 LTS (GNU/Linux 6.8.0-45-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
Expanded Security Maintenance for Applications is not enabled.
94 updates can be applied immediately.
45 of these updates are standard security updates.
To see these additional updates run: apt list --upgradable
8 additional security updates can be applied with ESM Apps.
Learn more about enabling ESM Apps service at https://ubuntu.com/esm
Last login: Fri Sep 27 00:53:09 2024 from 10.0.2.2
nosql@nosql-virtualbox:~$
Of course in this case you will log into the same system – this is not important because the only thing you should care is if connection will be established successfully or not.
nosql@nosql-virtualbox:~$ exit
logout
Connection to 127.0.0.1 closed.
nosql@nosql-virtualbox:~$
If there will be no errors or problems you can try to connect from host to guest (
fulmanp-ThinkPad-T540p
is a host,
nosql-virtualbox
is a guest); remember to use
p
flag with
2222
value:
fulmanp@fulmanp-ThinkPad-T540p:~$ ssh -p 2222 nosql@127.0.0.1
The authenticity of host '[127.0.0.1]:2222 ([127.0.0.1]:2222)' can't be established.
ED25519 key fingerprint is SHA256:JutNl+moUSIwNnE2Tw6YiynbWh9Vl3MrOqmQEYsK1+s.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[127.0.0.1]:2222' (ED25519) to the list of known hosts.
nosql@127.0.0.1's password:
Welcome to Ubuntu 24.04.1 LTS (GNU/Linux 6.8.0-45-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
8 additional security updates can be applied with ESM Apps.
Learn more about enabling ESM Apps service at https://ubuntu.com/esm
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
nosql@nosql-virtualbox:~$ cd ~/Desktop/nosql/
nosql@nosql-virtualbox:~/Desktop/nosql$ ls -l
total 8
drwxrwxr-x 8 nosql nosql 4096 Sep 26 23:39 hbase-2.5.10
drwxrwxr-x 4 nosql nosql 4096 Sep 26 23:39 tmp
nosql@nosql-virtualbox:~/Desktop/nosql$ exit
logout
Connection to 127.0.0.1 closed.
You neet to have both JRE and JDK installed (mostly you need JRE but JDK is neede for example to use
jps
command).
|
Figure: Synaptic and packages |
After instalation remember to set
$JAVA_HOME
environment variable:
nosql@nosql-virtualbox:~/Desktop/nosql$ java -version
openjdk version "21.0.4" 2024-07-16
OpenJDK Runtime Environment (build 21.0.4+7-Ubuntu-1ubuntu224.04)
OpenJDK 64-Bit Server VM (build 21.0.4+7-Ubuntu-1ubuntu224.04, mixed mode, sharing)
nosql@nosql-virtualbox:~/Desktop/nosql$ echo $JAVA_HOME
[empty]
nosql@nosql-virtualbox:~/Desktop/nosql$ update-java-alternatives --list
java-1.21.0-openjdk-amd64 2111 /usr/lib/jvm/java-1.21.0-openjdk-amd64
Because in this case java is installed in
/usr/lib/jvm/java-1.21.0-openjdk-amd64
directory, so add below line to
conf/hbase-env.sh
:
export JAVA_HOME=/usr/lib/jvm/java-1.21.0-openjdk-amd64
Alternatively you can add it directly to
.bashrc
file so that it may be set permanently. If you want to see it right now remember to use
source
command:
nosql@nosql-virtualbox:~$ echo $JAVA_HOME
nosql@nosql-virtualbox:~$ source ~/.bashrc
nosql@nosql-virtualbox:~$ echo $JAVA_HOME
/usr/lib/jvm/java-1.21.0-openjdk-amd64
- Create directory to store downloaded files
nosql@nosql-virtualbox:~/Desktop$ mkdir ~/Desktop/install
nosql@nosql-virtualbox:~/Desktop$ mkdir ~/Desktop/install/hbase
- Download files. I downloaded stable release
2.5.10
dated on 2024.07.24
– it was a real surprise for me because it means that HBase is still maintained and under active development:
|
Figure: HBase download page (September, 2024) |
- Verify file integrity
nosql@nosql-virtualbox:~/Desktop/install/hbase$ sha512sum hbase-2.5.10-bin.tar.gz
[... s1: SHA 512 SUM of downloaded binary file ...]
nosql@nosql-virtualbox:~/Desktop/install/hbase$ cat hbase-2.5.10-bin.tar.gz.sha512
[... s2: Correct SHA 512 SUM of a binary file ...]
Of course s1
must agree with s2
.
- Extract archive and move to final destination (
~/Desktop/nosql/
in my case):
nosql@nosql-virtualbox:~/Desktop/install/hbase$ tar zxvf hbase-2.5.10-bin.tar.gz
nosql@nosql-virtualbox:~/Desktop/install/hbase$ mkdir ~/Desktop/nosql
nosql@nosql-virtualbox:~/Desktop/install/hbase$ mv hbase-2.5.10 ~/Desktop/nosql/
Test the basic functionality of HBase
- Start HBase:
nosql@nosql-virtualbox:~/Desktop/nosql$ /home/nosql/Desktop/nosql/hbase-2.5.10/bin/start-hbase.sh
running master, logging to /home/nosql/Desktop/nosql/hbase-2.5.10/bin/../logs/hbase-nosql-master-nosql-virtualbox.out
- Verify if all HBase prcesses are running:
nosql@nosql-virtualbox:~/Desktop/nosql$ jps
6134 Jps
5436 HMaster
- Start HBase Shell:
nosql@nosql-virtualbox:~/Desktop/nosql$ hbase-2.5.10/bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.5.10, ra3af60980c61fb4be31e0dcd89880f304d01098a, Thu Jul 18 22:45:17 PDT 2024
Took 0.0169 seconds
hbase:001:0> list
TABLE
0 row(s)
Took 0.6565 seconds
=> []
hbase:002:0> create_namespace 'test'
Took 0.2385 seconds
hbase:003:0> create 'test:table1', 'f1'
Created table test:table1
Took 0.7019 seconds
=> Hbase::Table - test:table1
hbase:004:0> list
TABLE
test:table1
1 row(s)
Took 0.0193 seconds
=> ["test:table1"]
hbase:005:0> put 'test:table1', 'key005', 'f1:column_a', 'value_005_f1_a'
Took 0.2712 seconds
hbase:006:0> scan 'test:table1'
ROW COLUMN+CELL
key005 column=f1:column_a, timestamp=2024-09-26T23:56:09.939, value=value_005_f1_a
1 row(s)
Took 0.0981 seconds
hbase:007:0> exit
- Connect again to check if your changes in database are still there:
nosql@nosql-virtualbox:~/Desktop/nosql$ hbase-2.5.10/bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.5.10, ra3af60980c61fb4be31e0dcd89880f304d01098a, Thu Jul 18 22:45:17 PDT 2024
Took 0.0029 seconds
hbase:001:0> scan 'test:table1'
ROW COLUMN+CELL
key005 column=f1:column_a, timestamp=2024-09-26T23:56:09.939, value=value_005_f1_a
1 row(s)
Took 0.7532 seconds
hbase:002:0> exit
I have some problem with disappearing data. Sometimes after restarting my database is empty. Maybe the following executed just before stopping database (not before exiting from shell) will help:
hbase memstore manual flush
hbase:003:0> flush 'test:table1'
Took 0.5539 seconds
- Run Thrift to allow communication from different programming languages:
nosql@nosql-virtualbox:~/Desktop/nosql$ cd hbase-2.5.10/bin/
nosql@nosql-virtualbox:~/Desktop/nosql/hbase-2.5.10/bin$ ./stop-hbase.sh
stopping hbase.............
nosql@nosql-virtualbox:~/Desktop/nosql/hbase-2.5.10/bin$ ./hbase-daemon.sh start thrift
running thrift, logging to /home/nosql/Desktop/nosql/hbase-2.5.10/bin/../logs/hbase-nosql-thrift-nosql-virtualbox.out
nosql@nosql-virtualbox:~/Desktop/nosql/hbase-2.5.10/bin$ ./start-hbase.sh
running master, logging to /home/nosql/Desktop/nosql/hbase-2.5.10/bin/../logs/hbase-nosql-master-nosql-virtualbox.out
Write Python script and execute it on host
- Create working directory
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop$ mkdir nosql
- Create directory for virtual environments configuration:
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop$ mkdir python_virtual_environments
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop$ cd python_virtual_environments/
- Create virtual environment:
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ python3 -m venv nosql
Because I got the message:
The virtual environment was not created successfully because ensurepip is not
available. On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.
apt install python3.12-venv
You may need to use sudo with that command. After installing the python3-venv
package, recreate your virtual environment.
Failing command: /home/fulmanp/Desktop/python_virtual_environments/nosql/bin/python3
so I did sudo apt install python3.12-venv
:
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ sudo apt install python3.12-venv
[sudo] password for fulmanp:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
[...]
After this operation, 2 771 kB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://pl.archive.ubuntu.com/ubuntu noble/universe amd64 python3-pip-whl all 24.0+dfsg-1ubuntu1 [1 702 kB]
[...]
Setting up python3.12-venv (3.12.3-1ubuntu0.2) ...
and then again:
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ python3 -m venv nosql
- Activate virtual environment:
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ source /home/fulmanp/Desktop/python_virtual_environments/nosql/bin/activate
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$
- Add system components needed by
happybase
(which is used to simplify communication from Python to HBase):
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ sudo apt-get update
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ sudo apt-get install gcc
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ sudo apt-get install python3-dev
- Install
happybase
:
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ python3 -m pip install happybase
- Create simple test program:
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/python_virtual_environments$ cd ../nosql/
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/nosql$ ls -l
total 0
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/nosql$ touch hbase_simple_test.py
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/nosql$ ls -l
total 4
-rw-rw-r-- 1 fulmanp fulmanp 575 wrz 27 01:21 hbase_simple_test.py
Paste the following code as hbase_simple_test.py
content:
import happybase
def test():
connection = happybase.Connection(host='127.0.0.1',
port=9090,
autoconnect=True)
print(f'Tables: {connection.tables()}')
table = connection.table('test:table1')
rows = table.rows([b'key005'])
for key, data in rows:
print(f'row key={key}, data={data}')
for k in data:
print(f' data key={k}')
print(f' data[{k}]={data[k]}')
print(f' decode data key: {k.decode("utf-8")}')
# Get value and decode it:
val = data[k].decode("utf-8")
print(f' Decoded value={val}')
print(f' Encoded decoded value: {val.encode('utf-8')}')
print(' ---')
if __name__ == '__main__':
test()
- Execute simple test program:
Because when I tried to execute it:
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/nosql$ python3 hbase_simple_test.py
I got a message:
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/nosql$ python3 hbase_simple_test.py
Traceback (most recent call last):
File "/home/fulmanp/Desktop/nosql/hbase_simple_test.py", line 1, in
import happybase
File "/home/fulmanp/Desktop/python_virtual_environments/nosql/lib/python3.12/site-packages/happybase/__init__.py", line 6, in
import pkg_resources as _pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'
I had to install setuptools
:
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/nosql$ python3 -m pip install setuptools
Now exectution was successful:
(nosql) fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/nosql$ python3 hbase_simple_test.py
Tables: [b'test:table1']
row key=b'key005', data={b'f1:column_a': b'value_005_f1_a'}
data key=b'f1:column_a'
data[b'f1:column_a']=b'value_005_f1_a'
decode data key: f1:column_a
Decoded value=value_005_f1_a
Encoded decoded value: b'value_005_f1_a'
---